1 Introduction

Short Course: Good Software Engineering Practice for R Packages

Daniel Sabanés Bové and Jack Talboys

October 10, 2025

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of RCONIS
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

Jack

  • BSc in Statistics from the University of Bath
  • Data Scientist with Mango Solutions/Ascent, a Data consultancy, for 5 years
  • Joined Novartis as a Software Developer in April 2024, part of the Open-source enablement team.
  • Day-to-day is helping study teams to use Open-source, through direct support or building tools!

openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - over 3 years now!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 50 statisticians from more than 30 organizations
  • What: Engineer packages and spread best practices

What you will learn here

  • Understand the basic structure of an R package
  • Create your own R
  • Learn & apply fundamentals of quality control for R
  • Learn how to make an R available to others

Program outline

Time Topic
14:00 - 14:30 CEST Introduction and outline
14:30 - 15:30 CEST R packages, what are they? + practical
15:30 - 16:30 CEST Package quality + exercise
16:30 - 17:15 CEST Publication + practical
17:15 - 17:30 CEST Conclusion

House-keeping

What you will need

  • Local R development environment with Rtools/R/Rstudio IDE
  • Install additional R packages using the installation script
  • Curiosity 🦝
  • Positive attitude 😄

Speed intros and what would you like to learn?

  • Name? 🫡
  • Organization? 🏢
  • Motivation for this workshop/ what would you like to learn 🧠

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successfull statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as ‘bundle’

\(\leadsto\) R package

The R package ecosystem - huge success

Pharma perspective: GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and openstatsware
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information