0 Preview

Short Course: Good Software Engineering Practice for R Packages

Daniel Sabanés Bové and Jack Talboys

July 1, 2025

Disclaimer

Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Today

What are we going to do today?

Give you a preview of the course “Good Software Engineering Practice for R Packages” at the SnB 2025 conference in Paris, France!

But no worries if you can’t make it to the conference, we will share all materials online …

Plus you will learn some awesome things right here, today!

Where are we from? openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - already 3 years now!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 60 statisticians from more than 30 organizations
  • What: Engineer packages and spread best practices

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of RCONIS mid 2024 - it has already been 1 year now!
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

Jack

  • BSc in Statistics from the University of Bath
  • Data Scientist with Mango Solutions/Ascent, a Data consultancy, for 5 years
  • Joined Novartis as a Software Developer in April 2024, part of the Open-source enablment team.
  • Day-to-day is helping study teams to use Open-source, through direct support or building tools!

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successfull statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

What you will learn in the short course

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Learn how to make an R available to others

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as an R package

Structure of an R package

  • The source of an R package is a directory with a specific structure:
    • DESCRIPTION file with metadata
    • NAMESPACE file to define API
    • R/ directory with R scripts - the “meat” of the package!
    • man/ directory with documentation
    • tests/ directory with test scripts (optional but important!)
    • other optional directories (data/, vignettes/, inst/, etc.)
  • The package can be built into a single file (tarball) for CRAN submission, email etc.

How to start creating an R package?

  • Use the usethis package to create a new package
    • usethis::create_package("path/to/package")
  • Use RStudio to create a new package project
    • File > New Project > New Directory > R Package
  • Creates the required directory structure and files for you!

Which steps should I make?

  • Recommend to start with a “design doc”
    • What is the purpose of the package? (objectives)
    • What is the user interface? (conceptual design)
    • What are the main functions? (prototype code)
  • Align with your team and stakeholders
  • Then start with the actual package implementation
    • Build up function by function
    • Always include documentation and tests for each function

How can I ensure a high quality of my R package?

  • Use a consistent coding style
    • e.g. snake_case or camelCase for function names
  • Use a consistent format
    • Use air to automatically format
  • Use lintr package to check for common issues
  • Test all of your functions, typically with the testthat package
  • Include user facing long form documentation (vignettes) to explain how to use the package
  • Follow the openstatsguide checklist

How to publish my R package?

  • Start with a GitHub repository
  • Add a pkgdown website to showcase the package
  • Use devtools::check() to check the package for common issues
  • Use rhub::check() to check the package on different platforms
  • Once ready, submit to CRAN (make sure to read the CRAN policies)
  • In addition or alternatively, create your R Universe to host the package

Before and after the course

What difference will the course make?

Before

  • Unstructured scripts
  • Hard to reproduce results
  • Difficult to share or reuse code
  • No automated tests
  • Inconsistent style and documentation
  • Challenging to maintain or extend

After

  • Organized R packages
  • Reproducible workflows
  • Easy to share and collaborate
  • Automated testing and quality checks
  • Consistent style and clear documentation
  • Easier maintenance and future development

The conference

  • 10th Statistics & Biopharmacy Conference (SnB 2025)
  • 8 to 10 October 2025, Paris, France
    • Venue is “Les salons de l’Aveyron”
  • Invited speakers give keynote presentations, see the list
  • Contributed talks and posters, see the program
  • Poster and wine session
  • Conference dinner

How to register for the short course?

  • Register for the conference and/or this short course here
  • Short course fee is 150 EUR (early bird until 15 July 2025, later 200 EUR)

There is a second short course!

  • The second course is “Applied Modelling in Drug Development - Flexible regression modelling in Stan via brms” by Sebastian Weber and Lukas Widmer (Novartis) on 8 October 2025, also highly recommended!
  • You get a discount if you book two short courses! (2 for 250 EUR)
  • Limited number of places available, so register early!

Question, Comments?

License information