1 Introduction

Tutorial: Good Software Engineering Practice for R Packages

Daniel Sabanés Bové and Friedrich Pahlke

July 8, 2024

Disclaimer

Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of RCONIS
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

Friedrich

  • Since 2008 self-employed consultant for computer science and data science as well as biostatistics
  • Co-founder and CEO of RPACT, a company developing the formally validated R package rpact with 28 releases on CRAN since 2018
  • Co-founder of RCONIS
  • Trained software architect; R programmer since 2004; R Shiny developer since 2019
  • Feel free to connect at LinkedIn or Github

openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - almost 2 years now!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 50 statisticians from more than 30 organizations
  • What: Engineer packages and spread best practices

What you will learn here

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Learn how to make an R available to others

Program outline

Time Topic
14:00 - 14:30 CEST Introduction and outline
14:30 - 15:15 CEST R packages, what are they? + practical
15:15 - 15:45 CEST Workflow for creating R packages + practical
15:45 - 16:30 CEST Package quality + exercise
16:30 - 17:15 CEST Publication + practical
17:15 - 17:30 CEST Conclusion

House-keeping

What you will need

  • Local R development environment with
    • git
    • Rtools/R/Rstudio IDE
  • Install additional R packages using the installation script
  • Curiosity 🦝
  • Positive attitude 😄

Speed intros and what would you like to learn?

  • Name? 😀
  • Organization? 🏢
  • Motivation for this workshop/ what would you like to learn 🧠

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successfull statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as ‘bundle’

\(\leadsto\) R package

The R package ecosystem - huge success

Pharma perspective: GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and openstatsware
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information