Notes from reading through R Packages by Hadley Wickham. This is meant to review, not replace, a thorough readthrough. I mainly wrote this as a personal review, since writing summaries and attempting to teach others are some of the best ways to learn things.
Packages are used to organize code together so that it can be used repeatedly and shared with others.
A lot of work with packages is done via the devtools package.
Creating the Package
To create a package, use
Naming things is hard. The only R requirements for your package name is that (1) it may only contain numbers, letters, and periods; (2) it must start with a letter; and (3) it can't end in a period. This means package names cannot have hyphens or underscores, unfortunately. Such is the R way.
Every package must have a file called
DESCRIPTION which has the package metadata, a file called
NAMESPACE with the exported functions, and a directory called
R/, which has the files that construct the package.
Here's an example package so you can see what I mean. It's easier to learn from examples.
Unlike normal R code, you should not use
source within packages to load things, since this can break stuff. Instead, put dependencies within the
DESCRIPTION, which we will talk about later.
You can achieve side effects with
You should call functions within other packages using
:: syntax (i.e.,
devtools::install instead of just
DESCRIPTION file is largely straightforward and best learned by example. The one challening thing is Imports vs. Suggests.
Imports vs. Suggests
It Depends on a specific R version (3.0.2 or higher). This means the package won't be run on R version 3.0.1 or worse.
It Imports several packages (httr, curl, etc.) These packages will be installed automatically when the package is loaded and the package will not work without these packages.
It Suggests other packages (testthat, Rcpp, etc.) These packages will not be installed automatically and the package supposedly works fine without them, but installing them is recommended.
When to use Imports, when to use Suggests
I'd recommend putting something in Imports if the package would break without it, and Suggests otherwise. Suggests is good for (a) things that are only run in tests (since the user doesn't have to run the tests to make the package work) or (b) things that are only used in a one-off unimportant function (since the user can still use the rest of the package just fine).
Documenting Code with Roxygen
Documenting your code in your package is technically optional, but highly recommended.
Most people these days use Roxygen. This is a four step process:
1.) Add roxygen comments to your .R files.
devtools::document() to convert roxygen comments to .Rd files.
3.) Preview documentation with
functionname is the name of the function you documented.
4.) Rinse and repeat until the documentation looks the way you want.
The most important part of Roxygen is
#' @export, which exports your function from your package and makes it available to others.
@param documents how parameters work. These are also used to express non-binding type preconditions. You can use
@inheritsParams to get parameter documentation from another function (more DRY).
@return describes what the function should return.
@examples gives examples of how the code should be used.
There are some other cool things you can do with Roxygen, so it's worth skimming through http://r-pkgs.had.co.nz/man.html.
To document S4 classes, write the Roxygen documentation right before
setClass and use
@slot instead of
Learning Roxygen by Example
Documenting with Vignettes
Vignettes are long-form documentation -- it's more like a quick paper about a package and how to use it rather than just a description of the functions. Usually vignettes describe the problem the package was meant to solve and gives examples about how to solve it.
You can see vignettes for a specific package with
"packagename" is the name of the package. You can read a specific vignette with
"vignettename" is the name of the vignette.
How to Write a Vignette
- Install R Markdown (
- Install Knitr (
- Install Pandoc
devtools::use_vignette("my-vignette")where "my-vignette" is the desired name of your new vignette.
- Learn Markdown (if you haven't already).
- Learn how to use Knitr (if you haven't already).
vignettes/my-vignette.Rmdin your favorite editor.
knitr::knit("vignettes/my-vignette.Rmd")in R to create the vignette markdown file. (Or just Ctrl/Cmd + Shift + K if you use RStudio.) You'll have to manually move it into the vignettes directory, I think.
- When you're ready, run
devtools::build_vignettes()to build all the vignettes. Note that this is low. Also note is usually better to just run
devtools::build()to create a package bundle with the vignettes included.
Most R testing is done just by playing around in the console to see if things work, but this is a bad strategy long-term. Instead, you should run tests to make sure everything works as you'd expect.
To start testing, run
devtools::use_testthat(). This will use Hadley's testthat package, which I recommend you learn pretty well.
This would actually be a great time to just read the chapter in detail and not expect a summary.
You can also learn testthat pretty well by example.
Bottom line: Automated testing is important for R and people don't do it enough. I'm particularly proud of the test coverage for Batchman. It takes a certain mindset to think through all the things your program is supposed to do and all the ways in which it could error to generate the correct tests.
While we learned about
R/ (R Code),
man/ (Roxygen documentation), and
tests/ (testthat tests), there are other directories you can have in your R project:
There's a directory for R packages called
inst/. Generally this is just used to hold miscellaneous things.
inst/ is installed into the root directory when the package is installed. This means it's important to avoid making something like
inst/R, since that will conflict with the
It's possible to include C++ code within R via Rcpp. These are loaded from a
src/ directory. I've not yet done this, so I'm not including it in this review. Instead I recommend you dive into the R Packages chapter directly and I'll revisit this once I know C++ better.
There are yet more possible folders, but you'll never need them. The only interesting one is
exec/ for executable scripts. Unlike files placed in other directories, files in
exec/ are automatically executable. Could be cool.
Git and GitHub
Using GitHub is a must for making R packages. If you don't know what GitHub is, read this. If you don't know how to use GitHub, do this tutorial. Hadley's chapter on GitHub is extensive too; definitely read it if you don't know GitHub. However, covering GitHub is outside the scope of this review.
When you're working from GitHub, you should release updates to your package periodically in the form of "releases" where you increment a version number. This allows people to upgrade their package to more up-to-date versions as you come up with them, as well as go back to previous versions if something in the future versions are not working for them.
Versions are updated in two places: (1) within the
DESCRIPTION file and (2) within the GitHub tag.
Versioning from the Command Line
To update the github tag from the command line, make sure you have merged your change into master and have pulled the most-up-to-date master on the CLI. Then do
git tag X.X.X (where X.X.X represents the correct version number) and
git push origin X.X.X.
Thus the proper Git-friendly workflow for updating a package is to (1) make a change; (2) make a pull request; (3) within that PR, increment the version in the
DESCRIPTION file; (4) merge the PR to master branch; (5) checkout and pull master branch; (6) create a git tag with the correct version number; (7) push to that tag.
The Correct Version Number
Usually versions start at 1.0.0 and follow Semantic Versioning. However, a convention in the R world is to have your development package start at 0.0.9000, move up to 0.0.9001 with a patch, 0.1.9000 with a minor release, and 1.0.9000 with a major release. You only switch to something like 1.0.0 (no 9000) for stable releases that are well-tested and not in development, usually when you send something to CRAN. (You'd release 1.0.0 to CRAN, and then get started again on 1.0.9000 for development.)
R CMD CHECK
R CMD CHECK automatically checks your R code in your package for problems. It is the bane of R developers, because it has very aggressive requirements for passing and passing it is a requirement for getting code onto CRAN, the public repository of R packages. Code that does not pass
R CMD CHECK is doomed to be installed from GitHub forever.
You can run
R CMD CHECK . from within the repository of a package, but it's better to just run
devtools::check() from the R console.
R CMD CHECK has over fifty checks. Read the section on Hadley's chapter for the rundown of what each check does and how to fix a failure.
I think it's ok if
R CMD CHECK fails for you, as long as you don't plan on submitting to CRAN. I don't think I have any packages personally that pass
R CMD CHECK. But it's definitely virtuous for you to pass
R CMD CHECK, just like it's virtuous to have 100% test coverage.
Travis and Covr
Travis lets you run your test suite automatically upon every push to a pull request, so you make sure that you only merge in code that passes all of the tests.
Covr gives you a loose estimation of how much of your package is covered by tests and can give suggestions on where to add more tests.
To make it easy, my friend Robertzk has set up a package template repo with Travis and Covr already set up. To make it even easier, my friend Kirillseva turned that into an automatic package generator via Yeoman.