[1] 3
openstatsware
Workshop: Good Software Engineering Practice for R Packages
April 18, 2024
This section is based on and adapted from great (Quarto) slides by Louisa Smith, see her course website R Bootcamp (EPI590R) from Northeastern University.
Thanks a lot Louisa!
renv
Only because Quarto is newer and more featured!
knitr
processes the code chunks, executes the R code, and inserts the code outputs (e.g., plots, tables) back into the markdown documentpandoc
transforms the markdown document into various output formatsSome text
Some italic text
Some bold text
[1] 3
Everything within the R chunks has to be valid R.
Chunks run in order, continuously, like a single script.
At the top of your Quarto document, a header written in yaml describes options for the document:
There are a ton of possible options (more below), but importantly, this determines the document output.
For example, to suppress the code printing:
#| echo: false
tells knitr
to exclude the source code from the output.
Additional chunk options which are often used:
#| eval: false
: Don’t evaluate this chunk! Just print the code.#| error: true
: Render this even if the chunk causes an error.#| warning: false
: Don’t print warnings.#| include: false
: Suppresses all output from the code block.#| cache: true
: Use knitr
caching mechanism for this chunk.You can tell the entire document not to evaluate or print code (so just include the text!) at the top:
Careful! YAML is really picky about spacing.
There are lots of different options for the document.
html:
moves to a new line and the options are indented 2 spacesmpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
---|---|---|---|---|---|---|---|---|---|---|---|
Mazda RX4 | 21.0 | 6 | 160 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
Mazda RX4 Wag | 21.0 | 6 | 160 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
Datsun 710 | 22.8 | 4 | 108 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
You can then refer to those with @tbl-one
and @fig-hist
and the Table and Figure ordering will be correct (and linked)
@fig-hist contains a histogram and @tbl-one a table.
gets printed as:
Along with just regular text, you can also run R code within the text:
There were `r 3 + 4` participants
becomes:
There were 7 participants
You might want to create list of stats that you want to report in your manuscript:
I can then print these numbers in the text with:
There were `r stats$n`
participants with a mean age of `r stats$mean_age`
.
which turns into:
There were 1123 participants with a mean age of 43.5.
{renv}
{renv}
?{renv}
is an R package for managing project dependencies and creating reproducible package environments.
{renv}
{renv}
gives each project its own, private, library of R packages.{renv}
records the exact package versions you depend on, and ensures those exact versions are the ones that get installed wherever you go.{renv}
makes it easy to install the packages your project depends on.{renv}
Install {renv}
(only once):
Initialize a project (only once):
Install packages:
Track dependencies via a “lockfile”:
Your project .Rprofile
is updated to include:
This is run every time R starts, and does some management of the library paths to make sure when you call install.packges("package")
or library(package)
it uses the private library
An renv.lock
file (really just a text file) is created to store the names and versions of the packages. This is the “lockfile” mentioned above.
renv.lock
{
"R": {
"Version": "4.3.0",
"Repositories": [
{
"Name": "CRAN",
"URL": "https://cran.rstudio.com"
}
]
},
"Packages": {
"R6": {
"Package": "R6",
"Version": "2.5.1",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R"
],
"Hash": "470851b6d5d0ac559e9d01bb352b4021"
},
base64enc": {
"Package": "base64enc",
"Version": "0.1-3",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"R"
],
"Hash": "543776ae6848fde2f48ff3816d0628bc"
},
{renv}
laterRestore an environment:
Install new packages:
Update the lockfile:
{renv}
Share the project’s renv.lock
file with collaborators to ensure consistent environments
git
, you’ll need to commit renv.lock
, .Rprofile
, renv/settings.json
and renv/activate.R
. This is particularly simple because {renv}
will create a .gitignore
for you, and you can just commit all suggested files.When they run renv::restore()
, the correct versions of the packages will be installed on their computer
Remove packages that are no longer used:
Check the status of the project library with respect to the lockfile:
This will tell you to renv::snapshot()
to add packages you’ve installed but haven’t snapshotted, or renv::restore()
if you’re missing packages you need but which aren’t installed.
Update packages which are out-of-date (only checked from their original source):
{renv}
Install all of your package’s dependencies as per DESCRIPTION
file:
If you need to test your package with other development versions, use Remotes
field and a project specific library:
In order to avoid R CMD build
performance hit, by default, {renv}
will create a package project specific library outside of the directory.
RENV_PATHS_LIBRARY_ROOT
{renv}
(cont’d)Basic idea is to renv::restore()
the package environment on the CI machine, and use provided cache as best as possible
Example: GitHub Actions (details are given here)
{renv}
lockfile and the package folder when building the tarball for CRAN submission
{renv}
should automatically edit .Rbuildignore
accordingly, just good to double check{renv}
benefits are isolation, reproducibility, and portability.
Getting started with {renv}
:
renv::init()
.renv::snapshot()
.renv::restore()
.{renv}
{renv}
{renv}
for dependenciesAdd any additional R packages your analysis needs:
Check renv::status()
and the lockfile - did anything change?
Now use the additional R package in your Quarto document:
Check the status again and record the state with renv::snapshot()
Close the project and confirm that the package is not available anymore
Open the project and confirm that the package is available