Notes from reading through R Packages by Hadley Wickham. This is meant to review, not replace, a thorough readthrough. I mainly wrote this as a personal review, since writing summaries and attempting to teach others are some of the best ways to learn things.
-
Packages are used to organize code together so that it can be used repeatedly and shared with others.
-
A lot of work with packages is done via the devtools package.
To create a package, use devtools::create("desired.name.of.package")
.
Naming things is hard. The only R requirements for your package name is that (1) it may only contain numbers, letters, and periods; (2) it must start with a letter; and (3) it can't end in a period. This means package names cannot have hyphens or underscores, unfortunately. Such is the R way.
Every package must have a file called DESCRIPTION
which has the package metadata, a file called NAMESPACE
with the exported functions, and a directory called R/
, which has the files that construct the package.
Here's an example package so you can see what I mean. It's easier to learn from examples.
Unlike normal R code, you should not use library
, require
, or source
within packages to load things, since this can break stuff. Instead, put dependencies within the DESCRIPTION
, which we will talk about later.
You can achieve side effects with .onLoad
and .onAttach
.
You should call functions within other packages using ::
syntax (i.e., devtools::install
instead of just install
).
The DESCRIPTION
file is largely straightforward and best learned by example. The one challening thing is Imports vs. Suggests.
Look at this DESCRIPTION file from devtools:
It Depends on a specific R version (3.0.2 or higher). This means the package won't be run on R version 3.0.1 or worse.
It Imports several packages (httr, curl, etc.) These packages will be installed automatically when the package is loaded and the package will not work without these packages.
It Suggests other packages (testthat, Rcpp, etc.) These packages will not be installed automatically and the package supposedly works fine without them, but installing them is recommended.
I'd recommend putting something in Imports if the package would break without it, and Suggests otherwise. Suggests is good for (a) things that are only run in tests (since the user doesn't have to run the tests to make the package work) or (b) things that are only used in a one-off unimportant function (since the user can still use the rest of the package just fine).
Documenting your code in your package is technically optional, but highly recommended.
Most people these days use Roxygen. This is a four step process:
1.) Add roxygen comments to your .R files.
2.) Run devtools::document()
to convert roxygen comments to .Rd files.
3.) Preview documentation with ?functionname
, where functionname
is the name of the function you documented.
4.) Rinse and repeat until the documentation looks the way you want.
The most important part of Roxygen is #' @export
, which exports your function from your package and makes it available to others.
@param
documents how parameters work. These are also used to express non-binding type preconditions. You can use @inheritsParams
to get parameter documentation from another function (more DRY).
@return
describes what the function should return.
@examples
gives examples of how the code should be used.
There are some other cool things you can do with Roxygen, so it's worth skimming through http://r-pkgs.had.co.nz/man.html.
To document S4 classes, write the Roxygen documentation right before setClass
and use @slot
instead of @param
.
Roxygen documentation is also best learned by example. Look at how I document batchman::batch
. Hadley is the master at documentation -- look at how he documents dplyr::distinct
.
Vignettes are long-form documentation -- it's more like a quick paper about a package and how to use it rather than just a description of the functions. Usually vignettes describe the problem the package was meant to solve and gives examples about how to solve it.
You can see vignettes for a specific package with browseVignettes("packagename")
, where "packagename"
is the name of the package. You can read a specific vignette with vignette("vignettename")
where "vignettename"
is the name of the vignette.
- Install R Markdown (
install.packages("rmarkdown")
) - Install Knitr (
install.packages("knitr")
) - Install Pandoc
- Run
devtools::use_vignette("my-vignette")
where "my-vignette" is the desired name of your new vignette. - Learn Markdown (if you haven't already).
- Learn how to use Knitr (if you haven't already).
- Edit
vignettes/my-vignette.Rmd
in your favorite editor. - Run
knitr::knit("vignettes/my-vignette.Rmd")
in R to create the vignette markdown file. (Or just Ctrl/Cmd + Shift + K if you use RStudio.) You'll have to manually move it into the vignettes directory, I think. - When you're ready, run
devtools::build_vignettes()
to build all the vignettes. Note that this is low. Also note is usually better to just rundevtools::build()
to create a package bundle with the vignettes included.
Most R testing is done just by playing around in the console to see if things work, but this is a bad strategy long-term. Instead, you should run tests to make sure everything works as you'd expect.
To start testing, run devtools::use_testthat()
. This will use Hadley's testthat package, which I recommend you learn pretty well.
This would actually be a great time to just read the chapter in detail and not expect a summary.
You can also learn testthat pretty well by example.
Bottom line: Automated testing is important for R and people don't do it enough. I'm particularly proud of the test coverage for Batchman. It takes a certain mindset to think through all the things your program is supposed to do and all the ways in which it could error to generate the correct tests.
While we learned about R/
(R Code), vignettes/
(vignettes), man/
(Roxygen documentation), and tests/
(testthat tests), there are other directories you can have in your R project:
There's a directory for R packages called inst/
. Generally this is just used to hold miscellaneous things.
Anything in inst/
is installed into the root directory when the package is installed. This means it's important to avoid making something like inst/R
, since that will conflict with the R/
folder.
It's possible to include C++ code within R via Rcpp. These are loaded from a src/
directory. I've not yet done this, so I'm not including it in this review. Instead I recommend you dive into the R Packages chapter directly and I'll revisit this once I know C++ better.
There are yet more possible folders, but you'll never need them. The only interesting one is exec/
for executable scripts. Unlike files placed in other directories, files in exec/
are automatically executable. Could be cool.
Using GitHub is a must for making R packages. If you don't know what GitHub is, read this. If you don't know how to use GitHub, do this tutorial. Hadley's chapter on GitHub is extensive too; definitely read it if you don't know GitHub. However, covering GitHub is outside the scope of this review.
When you're working from GitHub, you should release updates to your package periodically in the form of "releases" where you increment a version number. This allows people to upgrade their package to more up-to-date versions as you come up with them, as well as go back to previous versions if something in the future versions are not working for them.
Versions are updated in two places: (1) within the DESCRIPTION
file and (2) within the GitHub tag.
To update the github tag from the command line, make sure you have merged your change into master and have pulled the most-up-to-date master on the CLI. Then do git tag X.X.X
(where X.X.X represents the correct version number) and git push origin X.X.X
.
Thus the proper Git-friendly workflow for updating a package is to (1) make a change; (2) make a pull request; (3) within that PR, increment the version in the DESCRIPTION
file; (4) merge the PR to master branch; (5) checkout and pull master branch; (6) create a git tag with the correct version number; (7) push to that tag.
Usually versions start at 1.0.0 and follow Semantic Versioning. However, a convention in the R world is to have your development package start at 0.0.9000, move up to 0.0.9001 with a patch, 0.1.9000 with a minor release, and 1.0.9000 with a major release. You only switch to something like 1.0.0 (no 9000) for stable releases that are well-tested and not in development, usually when you send something to CRAN. (You'd release 1.0.0 to CRAN, and then get started again on 1.0.9000 for development.)
R CMD CHECK
automatically checks your R code in your package for problems. It is the bane of R developers, because it has very aggressive requirements for passing and passing it is a requirement for getting code onto CRAN, the public repository of R packages. Code that does not pass R CMD CHECK
is doomed to be installed from GitHub forever.
You can run R CMD CHECK .
from within the repository of a package, but it's better to just run devtools::check()
from the R console.
R CMD CHECK
has over fifty checks. Read the section on Hadley's chapter for the rundown of what each check does and how to fix a failure.
I think it's ok if R CMD CHECK
fails for you, as long as you don't plan on submitting to CRAN. I don't think I have any packages personally that pass R CMD CHECK
. But it's definitely virtuous for you to pass R CMD CHECK
, just like it's virtuous to have 100% test coverage.
Lastly, if you're doing development on GitHub with PRs, you can benefit greatly from two tools -- Travis and Covr for R -- that improve testing for R packages.
Travis lets you run your test suite automatically upon every push to a pull request, so you make sure that you only merge in code that passes all of the tests.
Covr gives you a loose estimation of how much of your package is covered by tests and can give suggestions on where to add more tests.
To make it easy, my friend Robertzk has set up a package template repo with Travis and Covr already set up. To make it even easier, my friend Kirillseva turned that into an automatic package generator via Yeoman.