peterhurford/r-pkgs.md

## r-pkgs.md

      
    Raw
  

              r-pkgs.md
            
          
    Notes from reading through R Packages by Hadley Wickham.  This is meant to review, not replace, a thorough readthrough.  I mainly wrote this as a personal review, since writing summaries and attempting to teach others are some of the best ways to learn things.
Introduction


Packages are used to organize code together so that it can be used repeatedly and shared with others.


A lot of work with packages is done via the devtools package.


Creating the Package

To create a package, use devtools::create("desired.name.of.package").
Naming

Naming things is hard.  The only R requirements for your package name is that (1) it may only contain numbers, letters, and periods; (2) it must start with a letter; and (3) it can't end in a period.  This means package names cannot have hyphens or underscores, unfortunately.  Such is the R way.
Structure

Every package must have a file called DESCRIPTION which has the package metadata, a file called NAMESPACE with the exported functions, and a directory called R/, which has the files that construct the package.
Here's an example package so you can see what I mean.  It's easier to learn from examples.
Coding

Loading Files

Unlike normal R code, you should not use library, require, or source within packages to load things, since this can break stuff.  Instead, put dependencies within the DESCRIPTION, which we will talk about later.
You can achieve side effects with .onLoad and .onAttach.
You should call functions within other packages using :: syntax (i.e., devtools::install instead of just install).
Style

Good style is important.
Metadata

The DESCRIPTION file is largely straightforward and best learned by example.  The one challening thing is Imports vs. Suggests.
Imports vs. Suggests

Look at this DESCRIPTION file from devtools:
It Depends on a specific R version (3.0.2 or higher).  This means the package won't be run on R version 3.0.1 or worse.
It Imports several packages (httr, curl, etc.)  These packages will be installed automatically when the package is loaded and the package will not work without these packages.
It Suggests other packages (testthat, Rcpp, etc.)  These packages will not be installed automatically and the package supposedly works fine without them, but installing them is recommended.
When to use Imports, when to use Suggests

I'd recommend putting something in Imports if the package would break without it, and Suggests otherwise.  Suggests is good for (a) things that are only run in tests (since the user doesn't have to run the tests to make the package work) or (b) things that are only used in a one-off unimportant function (since the user can still use the rest of the package just fine).
Documenting Code with Roxygen

Documenting your code in your package is technically optional, but highly recommended.
Most people these days use Roxygen.  This is a four step process:
1.) Add roxygen comments to your .R files.
2.) Run devtools::document() to convert roxygen comments to .Rd files.
3.) Preview documentation with ?functionname, where functionname is the name of the function you documented.
4.) Rinse and repeat until the documentation looks the way you want.
Roxygen Basics

The most important part of Roxygen is #' @export, which exports your function from your package and makes it available to others.
@param documents how parameters work.  These are also used to express non-binding type preconditions.  You can use @inheritsParams to get parameter documentation from another function (more DRY).
@return describes what the function should return.
@examples gives examples of how the code should be used.
There are some other cool things you can do with Roxygen, so it's worth skimming through http://r-pkgs.had.co.nz/man.html.
Documenting S4

To document S4 classes, write the Roxygen documentation right before setClass and use @slot instead of @param.
Learning Roxygen by Example

Roxygen documentation is also best learned by example.  Look at how I document batchman::batch.  Hadley is the master at documentation -- look at how he documents dplyr::distinct.
Documenting with Vignettes

Vignettes are long-form documentation -- it's more like a quick paper about a package and how to use it rather than just a description of the functions.  Usually vignettes describe the problem the package was meant to solve and gives examples about how to solve it.
You can see vignettes for a specific package with browseVignettes("packagename"), where "packagename" is the name of the package.  You can read a specific vignette with vignette("vignettename") where "vignettename" is the name of the vignette.
How to Write a Vignette


Install R Markdown (install.packages("rmarkdown"))
Install Knitr (install.packages("knitr"))
Install Pandoc
Run devtools::use_vignette("my-vignette") where "my-vignette" is the desired name of your new vignette.
Learn Markdown (if you haven't already).
Learn how to use Knitr (if you haven't already).
Edit vignettes/my-vignette.Rmd in your favorite editor.
Run knitr::knit("vignettes/my-vignette.Rmd") in R to create the vignette markdown file. (Or just Ctrl/Cmd + Shift + K if you use RStudio.)  You'll have to manually move it into the vignettes directory, I think.
When you're ready, run devtools::build_vignettes() to build all the vignettes. Note that this is low. Also note is usually better to just run devtools::build() to create a package bundle with the vignettes included.

Automated Testing

Most R testing is done just by playing around in the console to see if things work, but this is a bad strategy long-term.  Instead, you should run tests to make sure everything works as you'd expect.
To start testing, run devtools::use_testthat(). This will use Hadley's testthat package, which I recommend you learn pretty well.
This would actually be a great time to just read the chapter in detail and not expect a summary.
You can also learn testthat pretty well by example.
Bottom line: Automated testing is important for R and people don't do it enough.  I'm particularly proud of the test coverage for Batchman.  It takes a certain mindset to think through all the things your program is supposed to do and all the ways in which it could error to generate the correct tests.
Other Directories

While we learned about R/ (R Code), vignettes/ (vignettes), man/ (Roxygen documentation), and tests/ (testthat tests), there are other directories you can have in your R project:
inst/

There's a directory for R packages called inst/.  Generally this is just used to hold miscellaneous things.
Anything in inst/  is installed into the root directory when the package is installed.  This means it's important to avoid making something like inst/R, since that will conflict with the R/ folder.
src/

It's possible to include C++ code within R via Rcpp.  These are loaded from a src/ directory.  I've not yet done this, so I'm not including it in this review.  Instead I recommend you dive into the R Packages chapter directly and I'll revisit this once I know C++ better.
Other Others

There are yet more possible folders, but you'll never need them.  The only interesting one is exec/ for executable scripts.  Unlike files placed in other directories, files in exec/ are automatically executable.  Could be cool.
Git and GitHub

Using GitHub is a must for making R packages.  If you don't know what GitHub is, read this.  If you don't know how to use GitHub, do this tutorial.  Hadley's chapter on GitHub is extensive too; definitely read it if you don't know GitHub.  However, covering GitHub is outside the scope of this review.
Releases

When you're working from GitHub, you should release updates to your package periodically in the form of "releases" where you increment a version number.  This allows people to upgrade their package to more up-to-date versions as you come up with them, as well as go back to previous versions if something in the future versions are not working for them.
Versions are updated in two places: (1) within the DESCRIPTION file and (2) within the GitHub tag.
Versioning from the Command Line

To update the github tag from the command line, make sure you have merged your change into master and have pulled the most-up-to-date master on the CLI.  Then do git tag X.X.X (where X.X.X represents the correct version number) and git push origin X.X.X.
Thus the proper Git-friendly workflow for updating a package is to (1) make a change; (2) make a pull request; (3) within that PR, increment the version in the DESCRIPTION file; (4) merge the PR to master branch; (5) checkout and pull master branch; (6) create a git tag with the correct version number; (7) push to that tag.
The Correct Version Number

Usually versions start at 1.0.0 and follow Semantic Versioning.  However, a convention in the R world is to have your development package start at 0.0.9000, move up to 0.0.9001 with a patch, 0.1.9000 with a minor release, and 1.0.9000 with a major release.  You only switch to something like 1.0.0 (no 9000) for stable releases that are well-tested and not in development, usually when you send something to CRAN.  (You'd release 1.0.0 to CRAN, and then get started again on 1.0.9000 for development.)
R CMD CHECK

R CMD CHECK automatically checks your R code in your package for problems.  It is the bane of R developers, because it has very aggressive requirements for passing and passing it is a requirement for getting code onto CRAN, the public repository of R packages.  Code that does not pass R CMD CHECK is doomed to be installed from GitHub forever.
You can run R CMD CHECK . from within the repository of a package, but it's better to just run devtools::check() from the R console.
R CMD CHECK has over fifty checks.  Read the section on Hadley's chapter for the rundown of what each check does and how to fix a failure.
I think it's ok if R CMD CHECK fails for you, as long as you don't plan on submitting to CRAN.  I don't think I have any packages personally that pass R CMD CHECK.  But it's definitely virtuous for you to pass R CMD CHECK, just like it's virtuous to have 100% test coverage.
Travis and Covr

Lastly, if you're doing development on GitHub with PRs, you can benefit greatly from two tools -- Travis and Covr for R -- that improve testing for R packages.
Travis lets you run your test suite automatically upon every push to a pull request, so you make sure that you only merge in code that passes all of the tests.
Covr gives you a loose estimation of how much of your package is covered by tests and can give suggestions on where to add more tests.
To make it easy, my friend Robertzk has set up a package template repo with Travis and Covr already set up.  To make it even easier, my friend Kirillseva turned that into an automatic package generator via Yeoman.