5 R package development
In this session you should learn:
5.1 R packages
An R package is the fundamental unit of shareable code1.
So why should you write a package?
- to share your code with others
- to share data with others (e.g.
{spData}
) - to automate your workflows (e.g.
{loreabad6::terrain}
) - to document your work (using vignettes, e.g. Loiseau et al. (2020) with
{ecorar}
) - to become a better programmer!
Package forms
The set of files you get in your computer when you install a package are different to the ones you work with when you develop its source code. When you build your package, it is compressed into a single file called a bundle with extension .tar.gz
. The way CRAN distributes packages is in their binary form. This is also a single file but unlike a bundle, it is platform specific.
5.2 Basic R package structure
5.3 Requirements and good practice
We will go through some important elements of an R package and some advice and good practice for package development. This is by all means not a comprehensive list. For a better overview and also to get much more in depth into package development, you have the “R Packages” book by Hadley Wickham (2023).
5.3.1 Metadata
Information on your package is stored in the package metadata. The DESCRIPTION
file is the place where you will find the name of the package, its description, maintainers, version, dependencies, license, etc.
A DESCRIPTION
file in the wild.
Package: mypackage
Title: What the Package Does (One Line, Title Case)
Version: 0.0.0.9000
Authors@R:
person("First", "Last", , "first.last@example.com",
role = c("aut", "cre"))
Description: What the package does (one paragraph).
License: `use_mit_license()`, `use_gpl3_license()` or
friends to pick a license
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.3.2
DESCRIPTION
file structure. Source: Hadley Wickham (2023).
From Lucas van der Meer workshop on R package creation:
Every released open-source project should have a license! Without a license, the default copyright laws apply, meaning that you retain all rights to your source code and no one may reproduce, distribute, or create derivative works from your work. Also, a license specifies that the software is released without any kind of warranty. Your are basically telling people: “Do with it what you want, but don’t sue me if anything goes wrong!”. Many licenses will also require people to attribute you once they copy parts of your code.
This page has a good overview of Licenses to help you choose one for your project.
5.3.2 Tests
Unit testing is a way to validate expected behavior of isolated source code. Integration testing is a way to test multiple parts of a software system as a group. In whatever form, testing is a vital for package development as it ensures your code does what you want it to do.
Testing gives you the following benefits: - fewer bugs - better code structure - robust code
The package
{testthat}
is your companion for R test writing.
5.3.3 Documentation
You have learned that when you do ?functionname
you can access the help page of a specific function in a package.
A documented function in the wild.
::st_network_travel ?sfnetworks
No documentation for 'st_network_travel' in specified packages and libraries:
you could try '??st_network_travel'
When you develop a package, writing this documentation is your job! 🎉
As you just saw, your code and your documentation are located in the same R file. That way, if you do any changes to your function, you will (hopefully) remember to update the documentation too!
The package
{roxygen2}
will help you write documentation.
Package website
Writing documentation for your functions is going to be a very good basis for you to build a website for your package.
A documented function part of the package website… in the wild.
Your website can also host your package vignettes (which are also part of the package itself). Vignettes are useful to show the basic functionalities of your package, specially if there are certain functions that are used together for a specific workflow. They can be written as a Quarto article (which you are familiar with already!). They are shown as articles on the website of the package.
The package
{pkgdown}
will help you build your package website.
5.3.4 Distribution
Should my package go to CRAN?
CRAN as you know is the repository for R packages. Here, your package will be reviewed by other people before its first release. This can be a bit of a longer process since some minimum requirements are expected. These include that your package passes checks in different R versions and OS platforms, that your examples, vignettes, tests run completely smooth without errors, etc.
Once accepted, you will have certain responsibilities as package maintainer. CRAN runs automatic checks of your package, and if for some reason your package does not pass these checks, you will get a notice to fix the errors or your package will not be any more on CRAN. The usual time is two weeks.
Usually, packages that go to CRAN are those that provide something novel and new to the ecosystem of available packages. If you have created a package to reproduce a very specific workflow for your project, or to accompany a research paper, this would typically not be submitted to CRAN. Those packages are usually in other type of repositories like GitHub.
Version control for R packages
You are already aware, through the various hints and practicals in the previous sessions, that GitHub or other platforms for version control are commonly used to host R packages. Version control is commonly used in software development to keep track of the changes one does to their code, and to easily revert to previous versions when something breaks.
Most R packages have a GitHub or GitLab repository where the development version of the work is hosted, while also being submitted to CRAN. Having your package on GitHub or any other platform makes it easier for other people to browse the code, and is also a useful tool to collect issues or open up discussions around the package. In addition, tools like GitHub pages allow you to host the documentation of your package in a fairly easy manner.
5.4 Your first R package!
In this section, you will be creating your first (maybe?) R package. We will call this package yournameR
, so for me it will be lorenaR
.
We will be using the function you wrote last week as the R code to package, so make sure to have it on hand.
5.5 Further reading:
- The Whole Game chapter in Hadley Wickham (2023) for a quick intro to a simple package. Read the whole book to dive deeper into R package development.
- Pacakge structure and state chapter in Hadley Wickham (2023).
- Licensing chapter in Hadley Wickham (2023).
- “Sharing and organizing research products as R packages” by Vuorre & Crump (2020)
- “Understanding the Basics of Package Writing in R” by Meyer (2022)
- “Why your research deserves to be an R package” by Vreede (2023)
- Package development cheatsheet
- Create your first R package workshop at the Geoinformatics PhD seminar by Lucas van der Meer
Hadley Wickham, R packages (2nd. edition)↩︎