Sharing common code across analyses

workflowr version 1.4.0

Tim Trice, John Blischak

2019-06-08

During the course of a project, you may want to repeat a similar analysis across multiple R Markdown files. To avoid duplicated code across your files (which is difficult to update), there are multiple strategies you can use to share common code:

  1. To share R code like function definitions, you can put this code in an R script and import it in each file with the function source()

  2. To share common R Markdown text and code chunks, you can use child documents

  3. To share common templates, you can use the function knitr::knit_expand()

Each of these strategies is detailed below, with a special emphasis on how to use them within the workflowr framework. In order to source scripts or use child documents, it is suggested you use the here package, which helps to locate the root directory of your project regardless of the directory your script or analysis file is, making sourcing documents cleaner.

Overview of directories

First, a quick overview of the directories in a workflowr project. This is critical for importing these shared files.

In a standard R Markdown file, the code is executed in the directory where the R Markdown file is saved. Thus any paths to files in the R Markdown file should be relative to this directory. However, the directory where the code is executed, referred to as the “knit directory” in the workflowr documentation, can be configured. The default for a new workflowr project is to run the code in the root of the workflowr project (this is defined in the file _workflowr.yml; see ?wflow_html for configuration details). Thus any filepaths should be relative to the root of the project. As an example, if you have shared R functions defined in the file ~/Desktop/myproject/code/common.R, the relative filepath from the root of the project directory would be "code/common.R".

Share R code with source()

If you have R code you want to re-use across multiple R Markdown files, the most straightforward option is to save this code in an R script, e.g. code/functions.R.

Then in each R Markdown file that needs to use the code defined in that file, you can use source() to load it. If the code in your workflowr project is executed in the root of the project directory (which is the default behavior for new workflowr projects), then you would add the following chunk:

```{r shared-code}
source("code/functions.R")
```

On the other hand, if you have changed the value of knit_root_dir in the file _workflowr.yml, you need to ensure that the filepath to the R script is relative to this directory. For example, if you set knit_root_dir: "analysis", you would use this code chunk:

```{r shared-code}
source("../code/functions.R")
```

To avoid having to figure out the correct relative path (or having to update it in the future if you were to change knit_root_dir), you can use here::here() as it is always based off the project root. Additionally, it will help readability when using child documents as discussed below.

```{r shared-code}
source(here::here("code/functions.R"))
```

Share child documents with chunk option

To share text and code chunks across R Markdown files, you can use child documents, a feature of the knitr package.

Here is a example of a simple R Markdown file that you can use to test this feature. Note that it contains an H2 header, some regular text, and a code chunk.

## Header in child document

Text in child document.

```{r child-code-chunk}
str(mtcars)
```

You can save this child document anywhere in the workflowr project with one critical exception: it cannot be saved in the R Markdown directory (analysis/ by default) with the file extension .Rmd or .rmd. This is because workflowr expects every R Markdown file in this directory to be a standalone analysis that has a 1:1 correspondence with an HTML file in the website directory (docs/ by default). We recommend saving child documents in a subdirectory of the R Markdown directory, e.g. analysis/child/ex-child.Rmd.

To include the content of the child document, you can reference it using here::here() in your chunk options.

```{r parent, child = here::here("analysis/child/ex-child.Rmd")}
```

However, this fails if you wish to include plots in the code chunks of the child documents. It will not generate an error, but the plot will be missing 1. In a situation like this, you would want to generate the plot within the parent R Markdown file or use knitr::knit_expand() as described in the next section.

Share templates with knit_expand()

If you need to pass parameters to the code in your child document, then you can use knitr::knit_expand(). Also, this strategy has the added benefit that it can handle plots in the child document. However, this requires setting knit_root_dir: "analysis" in the file _workflowr.yml for plots to work properly.

Below is an example child document with one variable to be expanded: {{title}} refers to a species in the iris data set. The value assigned will be used to filter the iris data set and label the section, chunk, and plot. We will refer to this file as analysis/child/iris.Rmd.


## {{title}}

```{r plot_{{title}}}
iris %>%
  filter(Species == "{{title}}") %>%
  ggplot() +
  aes(x = Sepal.Length, y = Sepal.Width) +
  geom_point() +
  labs(title = "{{title}}")
```

To generate a plot using the species "setosa", you can expand the child document in a hidden code chunk:

```{r, include = FALSE}
src <- knitr::knit_expand(file = here::here("analysis/child/iris.Rmd"),
                          title = "setosa")
```

and then later knit it using an inline code expression2:

`r knitr::knit(text = unlist(src))`

The convenience of using knitr::knit_expand() gives you the flexibility to generate multiple plots along with custom headers, figure labels, and more. For example, if you want to generate a scatter plot for each Species in the iris datasets, you can call knitr::knit_expand() within a lapply() or purrr::map() call:

```{r, include = FALSE}
src <- lapply(
  sort(unique(iris$Species)), 
  FUN = function(x) {
    knitr::knit_expand(
      file = here::here("analysis/child/iris.Rmd"), 
      title = x
    )
  }
)
```

This example code loops through each unique iris$Species and sends it to the template as the variable title. title is inserted into the header, the chunk label, the dplyr::filter(), and the title of the plot. This generates three plots with custom plot titles and labels while keeping your analysis flow clean and simple.

Remember to insert knitr::knit(text = unlist(src)) in an inline R expression as noted above to knit the code in the desired location of your main document.

Read the knitr::knit_expand() vignette for more information.

vignette("knit_expand", package = "knitr")

  1. The reason for this is very technical and requires more understanding of how workflowr is implemented than is necessary to use it effectively in the majority of cases. Whenever workflowr builds an R Markdown file, it first copies it to a temporary directory so that it can inject extra code chunks that implement some of its reproducibility features. The figures in the child documents end up being saved there and then lost.

  2. Before calling knitr::knit(), you’ll need to load the dplyr and ggplot2 packages to run the code in this example child document.