templateR template
For users who are creating a new R package from scratch, we have provided a CRAN/Bioc-compatible template (templateR). To get started, one simply forks the template by navigating to the GitHub repository (see Code availability section), clicking “Use this template”, and cloning a copy of the new R package to begin editing it (Fig. 1a). The user need only replace key metadata fields (e.g. Package, Title, Description, URL) in the DESCRIPTION file (a required file for all R packages). What makes this template unique is that all other components of the package (README, vignettes, unit test setup scripts) are programmatically autofilled based on the DESCRIPTION file. This strategy greatly minimises redundant and error-prone aspects of R package documentation.
Alternatively, users can start with any pre-existing R package and skip directly to the next step: using rworkflows R package. In either case, we have created a companion Wiki page to help guide users who are unfamiliar with the Bioc standards and offer a variety of tips and tricks to make this process easier, which we continue to maintain (see Code availability section).
rworkflows R package
The rworkflows R package is available on both CRAN and GitHub (see Code availability). Workflow scripts (written in yaml format) placed within a specific subdirectory within the GitHub repository (.github/workflows/*.yml), dictate which actions are triggered under which conditions. For those not familiar with creating GHA workflows, learning the GHA-specific expressions and idiosyncrasies can be a time-consuming and iterative process. Instead, we have abstracted this step away by autogenerating workflow scripts from a single R command in the dedicated R package: rworkflows::use_workflow(). This creates a fully functional workflow file in the correct subdirectory even with no arguments supplied, and only needs to be run once per R package (Fig. 1b). For greater flexibility, users can supply the function with their preferred arguments to generate (or regenerate) a customised workflow script to trigger the rworkflows action. By default, the workflow will trigger the rworkflows action (see rworkflows action section below) upon pushes or pull requests to the remote GitHub repository. For minor pushes (e.g. fixing a typo in the README text), one can avoid triggering the action by simply adding the string “[skip ci]” to the commit message. Triggers can be set to activate for specific GitHub branches only (e.g. “main”, “master”) or even regex expressions (e.g. “RELEASE_**”), which can be quite helpful for developing Bioc packages with regular release updates without having to modify the workflow script each time. Finally, the rworkflows::use_workflow() allows users to control exactly which specific release of the rworkflows action they wish to trigger (via the tag argument). For a full description of all arguments of the rworkflows::use_workflow() function, please refer to Table S1.
In addition, the rworkflows R package contains other useful functions for developers, including rworkflows::use_badges(), which dynamically generates badges indicating various aspects of the software package’s status to the documentation pages (e.g. the README file). It also provides the function rworkflows::use_dockerfile(), which writes a Docker recipe file (i.e. Dockerfile) to create a Docker image with the user’s R package (and all of its dependencies) pre-installed). Note that this same function is called automatically in step 8 of the rworkflows action, but if a pre-existing Dockerfile in the current working directory is detected, this step is skipped and the pre-existing Dockerfile is used instead. Thus, if preferred, users can have more customised control over how their Docker container is configured. Finally, rworkflows::use_readme(), rworkflows::use_vignette_docker() and rworkflows::use_vignette_getstarted() can generate autofiled templates for each of these R package documentation components respectively.
rworkflows action
Once triggered by a workflow, the rworkflows action launches three virtual machines (VM) in parallel to test the R package across multiple OS, including Linux, Mac, and Windows. Within each VM, the following steps are performed (Fig. 1d):
-
Install system: Installs all OS-specific system dependencies that account for a variety of different functionalities that R users may .
-
Install R: Installs all R dependencies for the R package being tested. Three rounds dependency installation are attempted using slightly different methods to ensure robustness of this procedure without requiring the user to manually troubleshoot this step.
-
CRAN checks: Run CRAN checks via rcmdcheck::rcmdcheck(). When run_rcmdcheck=TRUE, all checks must pass in order for the GHA to succeed. This step uses CRAN standards by default, but can run rcmdcheck without CRAN standards by setting the argument as_cran=FALSE.
-
Bioc checks: Run Bioc checks via BiocCheck::BiocCheck(). When run_bioc=TRUE, all checks must pass in order for the action to succeed.
-
Unit tests: Runs unit tests implemented via the testthat30 and/or RUnit31 R packages and generates a downloadable report of the results.
-
Code coverage: Runs code coverage tests and uploads the results to Codecov.
-
Build website: (Re)builds the documentation website from README files, in-line roxygen notes, and vignettes using the pkgdown16. It then deploys the website via GitHub Pages in a new branch named “gh-pages” in the same repository. Deploying the website via a separate branch is advantageous as it avoids accidentally adding large HTML/CSS/JavaScript source files and libraries to the R package itself (which can slow down its installation and performance in some situations).
-
Push container: Pushes a container to Docker Hub with your R package, all of its dependencies, and an interactive Rstudio interface pre-installed. Included in templateR is an auto-filled vignette for how to create a local Docker or SIngularity container. This step requires a valid DockerHub authentication token, which can be stored as a GitHub Secrets variable. This ensures that only users with appropriate push permissions to a given Docker Hub account can update the container there.
Steps 6-8 are only run on the Linux VM to avoid redundancy and avoid conflicts due to simultaneous pushes to their respective repositories (i.e. Codecov, GitHub, Docker Hub).
Container usage
Containerisation is especially useful when distributing R packages to many users using a wide variety of OS platforms, including high-performance computing (HPC) clusters which may have software installation restrictions for non-root users. Once the rworkflows action has successfully completed at least once on the Linux VM, both developers can create Docker and/or Singularity images from the container hosted on Docker Hub. If templateR was used as a template, a vignette detailing a step-by-step reproducible example is autogenerated. A rendered version of this vignette can be accessed via the dedicated GitHub Pages site, and a link to this vignette is automatically rendered within the templateR template README file (see Code availability section) under the “Documentation → Docker/Singularity” subheader.
rworkflows adoption
Metadata was gathered from the GitHub application programming interface (API) for each repository using the R packages echodeps 32. This was used to both identify which packages are currently using the rworkflows action (i.e. dependents), and to gather relevant metadata on each of the repositories. Of particular interest were the following metrics; stars (the number of users that bookmarked the GitHub repo with a star), unique clones (the number of unique instances that the GitHub repo was downloaded from Github), and unique views (the number unique instances the GitHub repo was viewed in a web browser). Here, “unique” means the number of distinct internet protocol (IP) addresses. Sums of each of these metrics across all were computed to represent the total downstream impact of rworkflows. All dependents were visualised as nodes in a directed graph, connecting to an additional node representing the rworkflows action (Fig. 2).
To identify the R packages with the highest potential for downstream impact on other packages, we collected data on the number of downloads for every packages in CRAN and Bioc using echogithub(Schilder et al. 2021). We then selected the packages with the greatest numbers of downloads and prioritised them for making Pull Requests on their respective GitHub repos to implement rworkflows.
An R markdown script to fully reproduce these analyses, as well as an interactive version of the graph with additional metadata, is available as a vignette on the official rworkflows GitHub Pages documentation website (See the Code availability section for link).
GitHub as a package distributor
To comprehensively assess which repositories R packages are distributed via, we collected metadata on all known R packages from base R, CRAN, Bioc, rOpenSci, R-Forge, and GitHub using the package echogithub32. The total and intersection between packages in each of these repositories were then computed and visualised using the R package UpSetR33 (Fig. 3).
It should be noted that the data on GitHub-hosted R packages comes from a static snapshot previously collected in February 2018 via the echogithub dependency githubinstall 34, whereas all the CRAN/Bioc/rOpenSci/R-Forge data is fully up-to-date. This means that our estimates of the proportion of R packages that are distributed exclusively through GitHub are almost certainly an underestimate. An R markdown script to fully reproduce these analyses is available as a vignette on the rworkflows documentation website (See the Code availability section).