1 Introduction

openstatsware Workshop: Good Software Engineering Practice for R Packages

April 18, 2024

Disclaimer




Any opinions expressed in this presentation and on the following slides are solely those of the presenter and not necessarily those of their employers.

Daniel

  • Ph.D. in Statistics from University of Zurich, Bayesian Model Selection
  • Biostatistician at Roche for 5 years, Data Scientist at Google for 2 years, Statistical Software Engineer at Roche for the last 4 years
  • Co-founder of inferential.bio and RCONIS
  • Multiple R packages on CRAN and Bioconductor, co-wrote book on Likelihood and Bayesian Inference, chair of openstatsware
  • Feel free to connect

openstatsware

  • openstatsware.org
  • Since: 19 August 2022 - almost 2 years now!
  • Where: American Statistical Association (ASA) Biopharmaceutical Section (BIOP), European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)
  • Who: Currently more than 50 statisticians from more than 30 organizations
  • What: Engineer packages and spread best practices

What you will learn here

  • Understand the basic structure of an R package
  • Create your own R
  • Learn about & apply professional development workflow
  • Learn & apply fundamentals of quality control for R
  • Get crash-course in version control and modern collaboration techniques on GitHub.com
  • Learn how to make an R available to others
  • Optimize your R code for correctness and speed
  • Learn how to approach the design and modularity of Shiny apps and how to test them

Program outline: Day 1

9.00 - 9.45 Introduction and outline
9.45 - 10.00 Coffee break
10.00 - 10.45 R Package Syntax
10.45 - 11.30 Exercise
11.30 - 12.15 Software Engineering Workflow
12.15 - 13.15 Lunch break
13.15 - 14.00 Exercise
14:00 - 14:45 Package Quality
14.45 - 15.30 Exercise
15.30 - 15.45 Coffee break
15.45 - 16.30 Collaboration via GitHub
16.30 - 17.00 Exercise

Program outline: Day 2

9.00 - 9.45 Publication of R Packages
9.45 - 10.00 Coffee break
10.00 - 10.45 Exercise
10.45 - 11.30 Code Optimization
11.30 - 12.15 Exercise
12.15 - 13.15 Lunch break
13.15 - 14.00 Shiny Design and Modules
14.00 - 14.45 Exercise
14.45 - 15.15 Shiny Tests
15.15 - 15.45 Exercise
15.45 - 16.00 Coffee break
16.00 - 16.30 Summary and Q&A

House-keeping

What you will need

  • Github.com (free) account
  • Local R development environment with
    • git
    • Rtools/R/Rstudio IDE
  • Install additional R packages using the installation script
  • Curiosity 🦝
  • Positive attitude 😄

Speed intros and what would you like to learn?

  • Name? 🤠
  • Organization? 🏢
  • Motivation for this workshop/ what would you like to learn 🧠
  • Favorite food? 🥗
  • Favorite music? 🎵

What do we mean by GSWEP4R*?

  • Applying concept of “Good XYZ Practice” to SWE with R
  • Improve quality and longevity of R code/packages
  • Not a universal standard; we share our perspectives
  • Collection of best practices
  • Do not reinvent the wheel: learn from the community

Why care about GSWEP4R?

  • R is one of the most successfull statistical programming languages
  • R is a powerful yet complex ecosystem
    • Core component: R packages
    • Mature analysts: users & contributors
    • Deep understanding crucial, even to just assess quality
  • Analyses increasingly require complex scripts/programs
  • The concepts are applicable to other languages, too (Python, Julia, etc.)

Start small - from script to package

  1. Encapsulate behavior (functions)
  2. Avoid global state/variables
  3. Adopt consistent coding style
  4. Document well
  5. Add test cases
  6. Refactor and optimize code
  7. Version your code
  8. Share as ‘bundle’

\(\leadsto\) R package

The R package ecosystem - huge success

Pharma perspective: GxP + R =

  • Core infrastructure packages only through industry
  • Quality, burden sharing: open-source pharmaverse and openstatsware
  • Open methodological packages can de-risk innovative methods
  • R packages make (statistical/methodological) code
    • testable (with documented evidence thereof, CFR Part 11)
    • reusable
    • shareable
    • easier to document

Question, Comments?

License information