10 Reproducibility

openstatsware Workshop: Good Software Engineering Practice for R Packages

April 18, 2024

Acknowledgments

This section is based on and adapted from great (Quarto) slides by Louisa Smith, see her course website R Bootcamp (EPI590R) from Northeastern University.

Thanks a lot Louisa!

Overview

Literate programming: Quarto
Package environment: renv
Exercise on how to use them together

Literate Programming: Quarto

What is Quarto?

Format of a book or pamphlet produced from full sheets printed with eight pages of text, four to a side, then folded twice to produce four leaves (Wikipedia)

An open-source scientific and technical publishing system (Quarto.org)
- Integrates text, code, and output
- Can create multiple different types of products (documents, slides, websites, books)

Why not R Markdown?

Only because Quarto is newer and more featured!

Almost everything you already know how to do in R Markdown you can do in Quarto, and more!
- See a comparison here
All of these slides, website, etc. are all made in Quarto.
If you know and love R Markdown, you can keep using it
- There are no plans for deprecation of R Markdown

Quarto workflow

Create a Quarto document
Write code (in R, Python, Julia, or Observable JavaScript)
Write text (with markdown syntax)
Repeat 2-3 in whatever order you want
Render

How does the rendering work?

knitr processes the code chunks, executes the R code, and inserts the code outputs (e.g., plots, tables) back into the markdown document
pandoc transforms the markdown document into various output formats

Text and code…

# My header

Some text

Some *italic text*

Some **bold text**

- Eggs
- Milk

```{r}
x <- 3
x
```

… becomes …

My header

Some text

Some italic text

Some bold text

Eggs
Milk

[1] 3

R chunks

Everything within the R chunks has to be valid R.

```{r}
x <- 3
```

```{r}
x + 4
```

[1] 7

Chunks run in order, continuously, like a single script.

YAML

At the top of your Quarto document, a header written in yaml describes options for the document:

---
title: "My document"
author: Louisa Smith
format: html
---

There are a ton of possible options (more below), but importantly, this determines the document output.

Output

Chunk options

For example, to suppress the code printing:

```{r}
#| echo: false
2 * 2
```

#| echo: false tells knitr to exclude the source code from the output.

Chunk options

Additional chunk options which are often used:

#| eval: false: Don’t evaluate this chunk! Just print the code.
#| error: true: Render this even if the chunk causes an error.
#| warning: false: Don’t print warnings.
#| include: false: Suppresses all output from the code block.
#| cache: true: Use knitr caching mechanism for this chunk.

Document options

You can tell the entire document not to evaluate or print code (so just include the text!) at the top:

---
title: "My document"
author: Louisa Smith
format: html
execute:
  eval: false
  echo: false
---

Careful! YAML is really picky about spacing.

Document options

There are lots of different options for the document.

For example, you can choose a theme:

---
format:
  html:
    theme: yeti
---

Remember the pickiness: when you have a format option, html: moves to a new line and the options are indented 2 spaces

Chunks can produce figures and tables

```{r}
#| label: tbl-one
#| tbl-cap: "This is a great table"
knitr::kable(mtcars[1:3,])
```

Table 1: This is a great table
	mpg	cyl	disp	hp	drat	wt	qsec	vs	am	gear	carb
Mazda RX4	21.0	6	160	110	3.90	2.620	16.46	0	1	4	4
Mazda RX4 Wag	21.0	6	160	110	3.90	2.875	17.02	0	1	4	4
Datsun 710	22.8	4	108	93	3.85	2.320	18.61	1	1	4	1

Chunks can produce figures or tables

```{r}
#| label: fig-hist
#| fig-cap: "This is a histogram"
hist(rnorm(100))
```

Figure 1: This is a histogram

Cross-referencing

You can then refer to those with @tbl-one and @fig-hist and the Table and Figure ordering will be correct (and linked)

@fig-hist contains a histogram and @tbl-one a table.

gets printed as:

Figure 1 contains a histogram and Table 1 a table.

Inline R

Along with just regular text, you can also run R code within the text:

There were `r 3 + 4` participants

becomes:

There were 7 participants

Inline stats

You might want to create list of stats that you want to report in your manuscript:

I can then print these numbers in the text with:

There were `r stats$n` participants with a mean age of `r stats$mean_age`.

which turns into:

There were 1123 participants with a mean age of 43.5.

Package Environment: `{renv}`

What is `{renv}`?

{renv} is an R package for managing project dependencies and creating reproducible package environments.

Benefits of using `{renv}`

Isolation: Installing a new or updated package for one project won’t break your other projects, and vice versa. That’s because {renv} gives each project its own, private, library of R packages.
Reproducibility: {renv} records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.
Portability: Easily transport your projects from one computer to another, even across different platforms. {renv} makes it easy to install the packages your project depends on.

Getting Started with `{renv}`

Install {renv} (only once):
```
install.packages("renv")
```
Initialize a project (only once):
```
renv::init()
```

Install packages:

install.packages("other_package")
# Below only works because we use `renv` here:
install.packages("github_user/github_package")

Track dependencies via a “lockfile”:
```
renv::snapshot()
```

Behind the scenes

Your project .Rprofile is updated to include:
```
source("renv/activate.R")
```
This is run every time R starts, and does some management of the library paths to make sure when you call install.packges("package") or library(package) it uses the private library
An renv.lock file (really just a text file) is created to store the names and versions of the packages. This is the “lockfile” mentioned above.

`renv.lock`

{
  "R": {
    "Version": "4.3.0",
    "Repositories": [
      {
        "Name": "CRAN",
        "URL": "https://cran.rstudio.com"
      }
    ]
  },
  "Packages": {
    "R6": {
      "Package": "R6",
      "Version": "2.5.1",
      "Source": "Repository",
      "Repository": "CRAN",
      "Requirements": [
        "R"
      ],
      "Hash": "470851b6d5d0ac559e9d01bb352b4021"
    },
    base64enc": {
      "Package": "base64enc",
      "Version": "0.1-3",
      "Source": "Repository",
      "Repository": "CRAN",
      "Requirements": [
        "R"
      ],
      "Hash": "543776ae6848fde2f48ff3816d0628bc"
    },

Using `{renv}` later

Restore an environment:
```
renv::restore()
```
Install new packages:
```
install.packages("other_package")
```
Update the lockfile:
```
renv::snapshot()
```

Collaboration with `{renv}`

Share the project’s renv.lock file with collaborators to ensure consistent environments
- With git, you’ll need to commit renv.lock, .Rprofile, renv/settings.json and renv/activate.R. This is particularly simple because {renv} will create a .gitignore for you, and you can just commit all suggested files.
When they run renv::restore(), the correct versions of the packages will be installed on their computer
```
renv::restore()
```

Other helpful functions

Remove packages that are no longer used:
```
renv::clean()
```
Check the status of the project library with respect to the lockfile:
```
renv::status()
```
This will tell you to renv::snapshot() to add packages you’ve installed but haven’t snapshotted, or renv::restore() if you’re missing packages you need but which aren’t installed.
Update packages which are out-of-date (only checked from their original source):
```
renv::update()
```

Package Development with `{renv}`

Install all of your package’s dependencies as per DESCRIPTION file:
```
renv::install()
```
If you need to test your package with other development versions, use Remotes field and a project specific library:
```
Remotes:
  r-lib/ggplot2
```
In order to avoid R CMD build performance hit, by default, {renv} will create a package project specific library outside of the directory.
- Exact location may be controlled with environment variable RENV_PATHS_LIBRARY_ROOT

Package Development with `{renv}` (cont’d)

Continuous integration (CI) is well supported
- Basic idea is to renv::restore() the package environment on the CI machine, and use provided cache as best as possible
- Example: GitHub Actions (details are given here)
```
steps:
- uses: actions/checkout@v3
- uses: r-lib/actions/setup-r@v2
- uses: r-lib/actions/setup-renv@v2
```
Ignore the {renv} lockfile and the package folder when building the tarball for CRAN submission
- {renv} should automatically edit .Rbuildignore accordingly, just good to double check

Conclusion

{renv} benefits are isolation, reproducibility, and portability.

Getting started with {renv}:

Initialize a project using renv::init().
Install packages and then save with renv::snapshot().
Restore later or elsewhere with renv::restore().

Exercise

Exercise 1: Create a Quarto Project with `{renv}`

Create a new Quarto project via File > New Project > New Directory > Quarto Project in RStudio.
Enable the use of {renv}
Create Project
Write a short introduction and perform a simple analysis
Render the document

Exercise 2: Use `{renv}` for dependencies

Add any additional R packages your analysis needs:

install.packages("RQuantLib") # Example non-standard package

Check renv::status() and the lockfile - did anything change?
Now use the additional R package in your Quarto document:
```
library("RQuantLib")
```
Check the status again and record the state with renv::snapshot()
Close the project and confirm that the package is not available anymore
Open the project and confirm that the package is available

License Information

Creators (initial authors): Louisa Smith , see her course website R Bootcamp (EPI590R) from Northeastern University
In the current version, changes were done by (later author): Daniel Sabanes Bove
This work is licensed under the Creative Commons Attribution-Noncommercial 4.0 International License.
The source files are hosted at github.com/RCONIS/workshop-r-swe-zrh, which is forked from the original version at github.com/openpharma/workshop-r-swe-mtl.
Important: To use this work you must provide the name of the creators (initial authors), a link to the material, a link to the license, and indicate if changes were made

10 Reproducibility

Acknowledgments

Overview

Literate Programming: Quarto

What is Quarto?

Why not R Markdown?

Quarto workflow

How does the rendering work?

Text and code…

… becomes …

My header

R chunks

YAML

Output

Chunk options

Chunk options

Document options

Document options

Chunks can produce figures and tables

Chunks can produce figures or tables

Cross-referencing

Inline R

Inline stats

Package Environment: {renv}

What is {renv}?

Benefits of using {renv}

Getting Started with {renv}

Behind the scenes

renv.lock

Using {renv} later

Collaboration with {renv}

Other helpful functions

Package Development with {renv}

Package Development with {renv} (cont’d)

Conclusion

Exercise

Exercise 1: Create a Quarto Project with {renv}

Exercise 2: Use {renv} for dependencies

License Information

Package Environment: `{renv}`

What is `{renv}`?

Benefits of using `{renv}`

Getting Started with `{renv}`

`renv.lock`

Using `{renv}` later

Collaboration with `{renv}`

Package Development with `{renv}`

Package Development with `{renv}` (cont’d)

Exercise 1: Create a Quarto Project with `{renv}`

Exercise 2: Use `{renv}` for dependencies