An Introduction to the markdown Package

Yihui Xie

2023-12-05

The markdown package is built on top of commonmark. It renders Markdown to output formats supported by commonmark, and the primary output formats are HTML and LaTeX.

Historically, it uses a C library named sundown, which has been deleted and replaced by commonmark since v1.3 (2022-10-30). The main advantage of the latter is that it follows a clear and widely used spec, i.e., GFM (GitHub Flavored Markdown), which can be seen as a subset of Pandoc’s Markdown. Therefore the markdown package can be viewed as a small subset of the rmarkdown package (the latter is based on Pandoc but the former doesn’t depend on external tools). It aims at simplicity, lightweight, and speed, at the cost of giving up a lot of features. This package is intended for minimalists. Most users may want to use tools based on Pandoc instead, such as rmarkdown or Quarto.

1. Markdown Syntax

1.1 Basic syntax

For the full list of supported document elements, please read the GFM spec. Below is a quick summary:

1.2 Add-on features

In addition to GFM features, the markdown package also supports the following features.

1.2.1 Raw LaTeX/HTML blocks

Raw LaTeX and HTML blocks can be written as fenced code blocks with language names =latex (or =tex) and =html, e.g.,

```{=tex}
This only appears in \LaTeX{} output.
```

Raw LaTeX blocks will only appear in LaTeX output, and will be ignored in other output formats. Similarly, raw HTML blocks will only appear in HTML output. One exception is raw LaTeX blocks that are LaTeX math environments, which also work for HTML output (see the next section).

1.2.2 LaTeX math

You can write both $inline$ and $$display$$ LaTeX math, e.g., \(\sin^{2}(\theta)+\cos^{2}(\theta) = 1\)

$$\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i$$

$$|x| = \begin{cases} x &\text{if } x \geq 0 \\ -x &\text{if } x < 0 \end{cases}$$

LaTeX math environments are also supported, e.g., below are an align environment and an equation environment:

\begin{align} a^{2}+b^{2} & = c^{2}\\ \sin^{2}(\theta)+\cos^{2}(\theta) & = 1 \end{align} \begin{equation} \begin{split} (a+b)^2 &=(a+b)(a+b)\\ &=a^2+2ab+b^2 \end{split} \end{equation}

These math environments can be written in raw LaTeX blocks, and they work for both LaTeX and HTML output, e.g.,

```{=latex}
\begin{align}
a^{2}+b^{2} & =  c^{2}\\
\sin^{2}(\theta)+\cos^{2}(\theta) & =  1
\end{align}
```

For HTML output, it is up to the JavaScript library (MathJax or KaTeX) whether a math environment can be rendered.

1.2.3 Superscripts and subscripts

Write superscripts in ^text^ and subscripts in ~text~ (same syntax as Pandoc’s Markdown), e.g., 210 and H2O. Currently only alphanumeric characters, *, (, and ) are allowed in the scripts. For example, a^b c^ will not be recognized as a superscript (because the space is not allowed). Note that GFM supports striking out text via ~text~, but this feature has been disabled and replaced by the feature of subscripts in markdown. To strike out text, you must use a pair of double tildes.

1.2.4 Footnotes

Insert footnotes via [^n], where n is a footnote number (a unique identifier). The footnote content should be defined in a separate block starting with [^n]:. For example:

Insert a footnote here.[^1]

[^1]: This is the footnote.

The support is limited for LaTeX output at the moment,2 and there are two caveats if the document is intended to be converted to LaTeX:

The two limitations do not apply to HTML output, e.g., you can write arbitrary elements in footnotes and not necessarily one paragraph.

1.2.5 Attributes

Attributes on images, fenced code blocks, and section headings can be written in {}. For example, ![text](path){.foo #bar width="50%"} will generate an <img> tag with attributes in HTML output:

<img src="path" alt="text" id="bar" class="foo" width="50%" />

and ## Heading {#baz} will generate:

<h2 id="baz">Heading</h2>

For fenced code blocks, a special rule is that the first class name will be treated as the language name for a block, and the class attribute of the result <code> tag will have a language- prefix. For example, the following code block

```{.foo .bar #my-code style="color: red;"}
```

will generate the HTML output below:

<pre>
  <code class="language-foo bar" id="my-code" style="color: red;">
  </code>
</pre>

Most attributes in {} are ignored for LaTeX output except for:

1.2.6 Appendices

When a top-level heading has the attribute .appendix, the rest of the document will be treated as the appendix. If section numbering is enabled, the appendix section headings will be numbered differently.

1.2.7 Fenced Divs

A fenced Div can be written in ::: fences. Note that the opening fence must have at least one attribute, such as the class name. For example:

::: foo
This is a fenced Div.
:::

::: {.foo}
The syntax `::: foo` is equivalent to `::: {.foo}`.
:::

::: {.foo #bar style="color: red;"}
This div has more attributes.

It will be red in HTML output.
:::

A fenced Div will be converted to <div> with attributes in HTML output, e.g.,

<div class="foo" id="bar" style="color: red;">
</div>

For LaTeX output, it can be converted to a LaTeX environment if both the class name and an attribute data-latex are present. For example,

::: {.tiny data-latex=""}
This is _tiny_ text.
:::

will be converted to:

\begin{tiny}
This is \emph{tiny} text.
\end{tiny}

The data-latex attribute can be used to specify arguments to the environment (which can be an empty string if the environment doesn’t need an argument). For example,

::: {.minipage data-latex="{.5\linewidth}"}

will be converted to:

\begin{minipage}{.5\linewidth}

If a fenced Div doesn’t have the data-latex attribute, the fence will be ignored, and its content will be written out normally without a surrounding environment. If a fenced Div has multiple class names (e.g., {.a .b .c}), only the first class name will be used as the LaTeX environment name. However, all class names will be used if the output format is HTML (e.g., <div class="a b c">).

1.2.8 Smart HTML entities

“Smart” HTML entities can be represented by ASCII characters, e.g., you can write fractions in the form n/m. Below are some example entities:

1/2 1/3 2/3 7/8 1/7 1/9 1/10 (c) (r) (tm)
½ © ®

1.3 Comparison to Pandoc

As mentioned earlier, a lot of features in Pandoc’s Markdown are not supported in the markdown package. Any feature that you find missing in previous sections is likely to be unavailable, such as citations and figure/table captions. In addition, a lot of R Markdown and Quarto (both are based on Pandoc) features are not supported, either. Some HTML features may be implemented via JavaScript, but currently it is not straightforward and may be improved in future.

Pandoc can convert Markdown to many output formats, such as Word, PowerPoint, LaTeX beamer, and EPUB. The markdown package is unlikely to support output formats beyond HTML and LaTeX.

2. Markdown Rendering

The main function to convert Markdown to other formats is markdown::mark(); mark_html() and mark_latex() are simple wrapper functions for HTML and LaTeX output, respectively. The function mark() generates a document fragment by default, and the wrapper functions mark_*() generates full documents.

You can either call markdown::mark() to render a Markdown document programmatically, or click the Knit button in RStudio to render a (Markdown or R Markdown) document interactively. The latter requires you to specify the output format in the output field in YAML metadata (see the section “YAML metadata”), e.g.,

---
output:
  markdown::html_format:
    options:
      js_math:
        package: "katex"
        version: "0.16.4"
      number_sections: true
      embed_resources: ["local", "https"]
    meta:
      css: "custom.css"
---

2.1 Markdown options

The options argument of mark() can be used to enable/disable/set options to control Markdown rendering. This argument can take either a list, e.g., list(toc = TRUE, smart = FALSE), or a character vector, e.g., c("+toc", "-smart"), or equivalently, +toc-smart, where + means to enable an option, and - means to disable an option. The options can also be set in YAML metadata (recommended). Available options are listed below.

2.1.1 auto_identifiers

Add automatic IDs to headings, e.g.,

# Hello world!

will be converted to

<h1 id="hello-world">Hello world!</h1>

You can override the automatic ID by providing an ID manually via the ID attribute, e.g.,

# Hello world! {#hello}

An automatic ID is generated by substituting non-alphanumeric characters in the heading text with hyphens. If the result is empty, the ID will be section. If any ID is duplicated, a numeric suffix will be added to the ID, e.g., example_1 and example_2.

2.1.2 embed_resources

Embed resources (images, CSS, and JS) in the HTML output using their base64-encoded data (images) or raw content (CSS/JS). Possible values are:

The default is "local", i.e., local resources are embedded, whereas https resources are not. This means the output document may not work offline. If you have to view the output offline, you need to use the option value "https" (or "all") and render the document at least once before you go offline.

2.1.3 js_highlight

Specify the JavaScript library to syntax highlight code blocks. Possible values are highlight (highlight.js) and prism (Prism.js). The default is prism. This option can also take a list of the form list(package, version, style, languages), which specifies the package name (highlight or prism), version, CSS style/theme name, and names of languages to be highlighted.

By default, languages are automatically detected and the required JS files are automatically loaded. Normally you need to specify the languages array only if the automatic detection fails.

Technically this option is a shorthand for setting the metadata variables css and js. If you want full control, you may disable this option (set it to false or null) and use metadata variables directly, which requires more familiarity with the JS libraries and the jsdelivr CDN.

2.1.4 js_math

Specify the JavaScript library for rendering math expressions in HTML output. Possible values are "mathjax" and "katex" (the default). Like the js_highlight option, this option is also essentially a shorthand for setting the metadata variables css and js.

If you want finer control, you can provide a list of the form list(package, version, css, js). This will allow you to specify the package name, version, and css/js files. For example, if you want to use MathJax’s tex-chtml.js instead, you may set:

js_math:
  package: mathjax
  version: 3
  js: es5/tex-chtml.js

By default, MathJax version 3 is used. If you want to use the older v2, you may set:

js_math:
  package: mathjax
  version: 2
  js: MathJax.js?config=TeX-AMS-MML_CHTML

Please visit the MathJax CDN to know which versions and JS files are available.

For KaTeX, the version is not specified by default, which means the latest version from the CDN. Below is an example of specifying the version 0.16.4 and using the mhchem extension:

js_math:
  package: katex
  version: 0.16.4
  js: [dist/katex.min.js, dist/contrib/mhchem.min.js]

Note that if you want the HTML output to be self-contained via the embed_resources option, KaTeX can be embedded and used offline, but MathJax cannot be fully embedded due to its complexity. MathJax v3 can be partially embedded and used offline, but currently only its fonts can be embedded, and extensions cannot. If you must view HTML output offline, we recommend using KaTeX, but please also note that KaTeX and MathJax do not fully cover each other’s features.

2.1.5 latex_math

Whether to identify LaTeX math expressions in pairs of single ($ $) or double dollar signs ($$ $$), and transform them so that they could be correctly rendered by MathJax (HTML output) or LaTeX.

2.1.6 number_sections

Whether to number section headings. To skip numbering a specific heading, add an attribute {.unnumbered} to it.

2.1.7 smartypants

Whether to translate certain ASCII strings into smart typographic characters (see ?markdown::smartypants).

2.1.8 superscript

Whether to translate strings between two carets into superscripts, e.g., text^foo^ to text<sup>foo</sup>.

2.1.9 subscript

Whether to translate strings between two tildes into subscripts, e.g., text~foo~ to text<sub>foo</sub>.

2.1.10 toc

Whether to generate a table of contents (TOC) from section headings. If a heading has an id attribute, the corresponding TOC item will be a link to this heading. You can also set a sub-option:

2.1.11 top_level

The desired type of the top-level headings in LaTeX output. Possible values are 'chapter' and 'part'. For example, if top_level = 'chapter', # heading will be rendered to \chapter{heading} instead of the default \section{heading}.

Options not described above can be found on the help pages of commonmark, e.g., the hardbreaks option is for the hardbreaks argument of commonmark::markdown_*() functions, and the table option is for the table extension in commonmark’s extensions.

markdown::markdown_options()
#>  [1] "+auto_identifiers" "+autolink"         "+embed_resources" 
#>  [4] "+js_highlight"     "+js_math"          "+latex_math"      
#>  [7] "+smart"            "+smartypants"      "+strikethrough"   
#> [10] "+subscript"        "+superscript"      "+table"           
#> [13] "+tasklist"         "-hardbreaks"       "-number_sections" 
#> [16] "-tagfilter"        "-toc"
# commonmark's arguments
opts = formals(commonmark::markdown_html)
opts = opts[setdiff(names(opts), c('text', 'extensions'))]
unlist(opts)
#> hardbreaks      smart  normalize  sourcepos  footnotes 
#>      FALSE      FALSE      FALSE      FALSE      FALSE
# commonmark's extensions
commonmark::list_extensions()
#> [1] "table"         "strikethrough" "autolink"      "tagfilter"    
#> [5] "tasklist"

2.2 Templates

By default, mark() generates a document fragment (i.e., the body). To generate a full document, you need a template. Below is a simple HTML template example:

<html>
  <head>
    <title>$title$</title>
  </head>

  <body>
  $body$
  </body>
</html>

It contains two variables, $title$ and $body$. All variables will be substituted by metadata values, except for $body$, which takes the value from mark().

The markdown has provided default templates for HTML and LaTeX output. To use them, call mark(..., template = TRUE), or the wrapper functions mark_html() / mark_latex(). To pass metadata to templates, use the meta argument, e.g.,

markdown::mark(..., meta = list(title = "My Title"), template = TRUE)

You can provide your own template file to the template argument, too.

2.3 YAML metadata

Alternatively, the meta argument can read YAML metadata in the Markdown document. The following variables can be set in the top-level fields in YAML:

For example:

---
title: "My Title"
author: "[Frida Gomam](https://example.com)"
date: "2023-01-09"
---

Note that you can use Markdown syntax in them.

Other variables need to be specified under output -> markdown::*_format -> meta, where * can be html or latex, e.g.,

---
title: "My Title"
output:
  markdown::html_format:
    meta:
      css: "style.css"
      js: "script.js"
  markdown::latex_format:
    meta:
      documentclass: "book"
      header_includes: "\\usepackage{microtype}"
---

The following metadata variables are supported for both HTML and LaTeX templates:

Variables specific to the HTML template:

Variables specific to the LaTeX template:

Note that you can use either underscores or hyphens in the variable names. Underscores will be normalized to hyphens internally, e.g., header_includes will be converted to header-includes. This means if you use a custom template, you must use hyphens instead of underscores as separators in variable names in the template.

The above are variables supported in the default templates. If you use a custom template, you can use arbitrary variable names consisting of alphanumeric characters and hyphens, except for $body$ (which is a reserved name), and your metadata values will be passed to these variables in your template.

Besides metadata variables, the aforementioned Markdown options can also be set in YAML under output -> markdown::*_format -> options, e.g.,

output:
  markdown::html_format:
    options:
      toc: true
      js_highlight:
        package: highlight
        theme: github
        languages: [diff, latex]

See the help page ?markdown::html_format for possible fields in addiction to meta and options that can be specified under the format name, e.g.,

output:
  markdown::latex_format:
    latex_engine: xelatex
    keep_md: true
    template: custom-template.tex

3. Applications

The markdown package aims at lightweight with a minimal number of features. You can build lightweight applications on top of it. In this section, we introduce some example applications.

3.1 HTML slides

With an extra CSS file and a JS file, you can create lightweight HTML slides:

---
output:
  markdown::html_format:
    meta:
      css: [default, slides]
      js: [slides]
---

You can learn more in vignette('slides', package = 'markdown').

3.2 HTML articles

Similarly, you can write an HTML article with extra CSS and JS. Learn more about it in vignette('article', package = 'markdown').

3.3 External JS/CSS

You can load arbitrary external JS and CSS files via the js and css variables. There are numerous JS libraries and CSS frameworks on the web. Here we will only use the JS/CSS from the repo https://github.com/yihui/misc.js to show a few brief examples.

3.3.1 Tabbed sections

You can load the script tabsets.js and CSS tabsets.css to create tabsets from sections (see documentation here).

css: ["@npm/@xiee/utils/js/tabsets.min.css"]
js: ["@npm/@xiee/utils/js/tabsets.min.js"]

3.3.2 Code folding

Code folding can be supported by fold-details.js (see documentation here).

js: ["@npm/@xiee/utils/js/fold-details.min.js"]

You can use the script right-quote.js to right-align a blockquote footer if it starts with an em-dash (---).

js: ["@npm/@xiee/utils/js/right-quote.min.js"]

The CSS is necessary only if you want to hide the anchors by default and reveal them on hover.

css: ["default", "@npm/@xiee/utils/css/heading-anchor.min.css"]
js: ["@npm/@xiee/utils/js/heading-anchor.min.js"]

3.3.5 Style keyboard shortcuts

The script key-button.js identifies keys and the CSS styles them, which can be useful for showing keyboard shortcuts.

css: ["default", "@npm/@xiee/utils/css/key-buttons.min.css"]
js: ["@npm/@xiee/utils/js/key-buttons.min.js"]

Of course, you can combine any number of JS scripts and CSS files if you want multiple features.

4. Using RStudio

If you use the RStudio IDE, the Knit button can render your Markdown or R Markdown document to the output format specified in YAML (e.g., markdown::html_format or markdown::latex_format). This requires the *r*markdown package (>= v2.18), although the markdown package itself doesn’t really require rmarkdown.

If you only need to render a document to the HTML format, you can bypass RStudio’s requirement for rmarkdown:

Then restart R, and you will be able to use the Knit button to render your (R) Markdown document with the markdown package alone.

Since the Markdown syntax of markdown can be viewed as a small and strict subset of Pandoc’s Markdown, you can use RStudio’s visual Markdown editor to author documents. Please bear in mind that most common, but not all, Markdown features are supported.

Appendix

A. For rmarkdown Users

The markdown package has also provided two internal output formats for compatibility with rmarkdown: markdown:::html_document and markdown:::pdf_document.4 The purpose is to make it a little easier to switch from rmarkdown to markdown by mapping some rmarkdown output format options to markdown.

For example, for an R Markdown document with the following output format:

output:
  rmarkdown::html_document:
    toc: true
    number_sections: true
    anchor_sections: true
    self_contained: false

You can switch to markdown simply by changing the output format name from rmarkdown::html_document to markdown:::html_document. Internally, the above output format is transformed to:

output:
  markdown::html_format:
    options:
      toc: true
      number_sections: true
      embed_resources: false
    meta:
      css: ["default", "@npm/@xiee/utils/css/heading-anchor.min.css"]
      js: ["@npm/@xiee/utils/js/heading-anchor.min.js"]

Note that not all rmarkdown options are supported, and not even all supported options have exactly the same effects in markdown. The supported options include: toc, toc_depth, number_sections, anchor_sections, code_folding, self_contained, math_method, css, and includes.

B. Technical Notes

B.1 Embedding resources

When https resources needs to be embedded (via the embed_resources option), only these elements are considered:

<img src="..." />
<link rel="stylesheet" href="...">
<script src="..."></script>

Background images set in the attribute style="background-image: url(...)" are also considered. If an external CSS file contains url() resources, these resources will also be downloaded and embedded.

  1. Please note that for links and images, their URLs should not contain spaces. If they do, the URLs must be enclosed in <>, e.g., ![alt](<some dir/a subdir/foo.png>).

  2. If you know C, I’ll truly appreciate it if you could help with the LaTeX implementation in GFM: https://github.com/github/cmark-gfm/issues/314

  3. The specific number doesn’t matter, as long as it’s a unique footnote number in the document. For example, the first footnote can be [^100] and the second can be [^64]. Eventually they will appear as [1] and [2]. If you use the RStudio visual editor to edit Markdown documents, the footnote numbers will be automatically generated and updated when new footnotes are inserted before existing footnotes.

  4. The triple-colon ::: means these functions are not exported, which is to avoid name conflicts between the two packages.