Skip to content

Instantly share code, notes, and snippets.

@stevenpollack
Last active February 28, 2024 00:56
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save stevenpollack/141b14437c6c4b071fff to your computer and use it in GitHub Desktop.
Save stevenpollack/141b14437c6c4b071fff to your computer and use it in GitHub Desktop.
A simple R package development best-practices-example

R package development "best practices"

The core of this tutorial gist lies in bestPracticesWalkThrough.R. Running assumes you have the following packages at versions equal (or above) those specified

library('devtools') # 1.9.1
library('testthat') # 0.11.0
library('stringr')  # 1.0.0
library('git2r')    # 0.12.1

If you want to run the tutorial in one shot, from the R interpreter (say):

devtools::source_gist('https://gist.github.com/stevenpollack/141b14437c6c4b071fff')

Though I've provided a few references below, reading the testthat docs, and copying the roxygen code in extractPhoneNumbers.R as a template should get you up-and-running.

References

It doesn't matter what you read in a blog, or elsewhere, the letter of the law when it comes to R packages is (and CRAN packages, in particular) is Writing R Extensions (the CRAN manual on packages). For cryptic errors that result from R CMD check, all the way to protocol for adding data to your package, you're gonna want to consult that resource. It's not an easy read (at first), but you eventually get used to it, and will develop a muscle memory for where/how to look certain things up.

That being said, writing packages to the letter of the law is a particularly onerous task, hence why tools like roxygen2, devtools, and testthat have been developed. The devtools endorsed check(), build(), install() workflow is pretty well documented. While roxygen2 has lots of documentation scattered throughout various source (e.g. R Packages and roxygen2 Vignettes, there is one particular feature that I find shabbily documented: templates.

Templates

Roxygen templates are immensely powerful. The most basic way of using a template would be to have static content, maybe something like:

#' @param x a character or numeric vector
#' @param y a character or numeric vector

and you'd save this chunk of text in the man-roxygen/ of your package, under the filename xyParam.R. Then, if you had a function,

plot <- function(x, y, ...) {
    # put real code here
}

you could skimp on the roxumentation by using the following roxygen block:

#' Plot Y vs. X
#'
#' @description Create a basic scatter plot that puts \code{y} on the
#' y-axis, and \code{x} on the x-axis. Categorical and numeric variables
#' are both valid input.
#'
#' @template xyParam
#' @param ... extra parameters to be passed onto \code{\link{someOtherFunction}}
#'
#' @return plot of y vs. x that's sent to the X11 (or XQuartz or whatever) server.

And you might content yourself to work like this for a while, but then you may find yourself writing a function,

threeWayDistance <- function(x, y, z) {
    # more code
}

whose roxygen block looks like

#' Calculate the three-way distance between variables
#'
#' @description Some sort of mumbo-jumbo
#'
#' @template xyParam
#' @param z a character or numeric vector
#'
#' @return a numeric value indicating the distance between
#' \code{x}, \code{y}, and \code{z}.

Now we've hit a weird violation of the DRY-principle: the documentation for z is nearly identical to that of x or y. If only there was a way to abstract the documentation so we didn't have to go through all of this silliness!?!

The answer lies in an innocent paragraph in the Roxygen templates section of the intro vignette -- I'll quote the whole section, it's deceptively shallow:

Roxygen templates are R files containing only roxygen comments that live in the man-roxygen directory. Use @template file-name (without extension) to insert the contents of a template into the current documentation.

You can make templates more flexible by using template variables defined with @templateVar name value. Template files are run with brew, so you can retrieve values (or execute any other arbitrary R code) with <%= name %>.

Note that templates are parsed a little differently to regular blocks, so you’ll need to explicitly set the title, description and details with @title, @description and @details.

You'll notice the second paragraph makes a very quick intro to brew. Understanding brew (a templating language that is not unlike knitr) is paramount to successfully wielding template variables. The help for brew::brew() contains syntax explanations and examples in the details section; I won't repeat them, here, but I will go through a practical example of using template variables that necessarily uses brew capabilities.

We identified, above, that there seems to be some repetition above, with our @param tag. What if we could make the name of the parameter variable, so we could use a roxumentation block like:

#' @paramName x
#' @someTemplate

and the processed help file would look identical to what would've happened if we had written

#' @param x a character or numeric vector

We cannot achieve something exactly like above, BUT we can come close: if we build a template like

<%
# helper function to evaluate stringified R-code
evalString <- function(str) {
  eval.parent(parse(text=str))
}

# helper constants
roxygenBlock <- c("@param", "a numeric or character vector")

# reassign paramName to the its passed in value
evalString(paste0("params <- ", paramNames))

# for each parameter, pump out the appropriate roxygen block;
# be sure to use `cat` since we are in a <\% and not a <\%= brew-chunk
sapply(params,
       function(param) {
         cat(paste(roxygenBlock[1], param, roxygenBlock[2]), fill = TRUE)
       })
%>

and save this as man-roxygen/vectorParams.R, then we could modify threeWayDistance's (and similarly plot's) roxygen block to look like:

#' Calculate the three-way distance between variables
#'
#' @description Some sort of mumbo-jumbo
#'
#' @templateVar paramNames c('x', 'y', 'z')
#' @template vectorParams
#'
#' @return a numeric value indicating the distance between
#' \code{x}, \code{y}, and \code{z}.

To see this in action, with an extra layer of abstraction, I've included vectorParams.R and threeWayDistance.R in this gist, and in the package built during bestPracticesWalkThrough.R. Templates (in particular, brew) can feel a bit awkward in the beginning -- brew is a bit fussy when you try and write a brew chunk (e.g. <% ... %> or <%= ... %>) inside roxygen's #'. When your chunks are sitting in open air (i.e. not proceeding any #'s), you can write R code like normal. When they're sitting inside roxygen blocks, you need to be sure to wrap everyline. E.g.,

#' <%
#' stringFun <- function(str) {
#'    paste0(str, collapse="8")
#' }
#' %>

will NOT initialize stringFun... brew will pretend as if you didn't even make it. Insteand, you'd have to write

#' <% stringFun <- function(str) { %>
#' <%   paste0(str, collapse="8")  %>
#' <% } %>

... Don't think too hard about it, it's just a consequence of the #'s...

A good (production) usage of templates is the R-Shopify interface. For example, Shop.R and man-roxygen/api.r.

# ignore roxygen templates:
man-roxygen
library('devtools') # 1.9.1
library('testthat') # 0.11.0
library('stringr') # 1.0.0
library('git2r') # 0.12.1
pkgName <- "demoPkg"
# set the author and license of the package:
options(devtools.desc.author = '"Guy Fawkes <guy@fawkes.net> [aut,cre]"',
devtools.desc.license = "GPL-3")
# set the other fields of the DESCRIPTION file
pkgDescription <- list("Title" = "R developement examples",
"Version" = "0.1.0",
"Description" = "This is a barebones package meant to demonstrate \
best practices with devtools, testthat, and roxygen.")
devtools::create(path = "demoPkg",
description = pkgDescription,
rstudio = FALSE,
check = TRUE) # setting check = TRUE to validate your DESCRIPTION parameters
# add testthat infrastructure to the package
devtools::use_testthat(pkg = pkgName)
# initialize a git repo -- this can be done with certain convenience functions
# from devtools, but for the sake of demonstration, we'll be a bit more bare metal
pkgRepo <- git2r::init(path = pkgName)
# configure your repo:
# only necessary if you haven't configure global user names and emails;
# ignore the error since git2r::git2r_signature_default is anticipated to fail
git2r::config(pkgRepo,
global = FALSE,
user.name = "Guy Fawkes",
user.email = "guy@fawkes.net")
# make your first commit -- add everything
git2r::add(pkgRepo, "*")
git2r::commit(pkgRepo, message = "Initial commit")
# code your first function, maybe it looks like
# https://gist.github.com/stevenpollack/141b14437c6c4b071fff
# pull that in and then save it in the R/ directory as "extractPhoneNumbers.R"...
git2r::clone(url = 'https://gist.github.com/stevenpollack/141b14437c6c4b071fff',
local_path = 'tmp_gist_dir' )
file.rename(from = file.path("tmp_gist_dir", "extractPhoneNumbers.R"),
to = file.path(pkgName, "R", "extractPhoneNumbers.R"))
# While this adds a new function to our package's NAMESPACE
# (thanks to @export), we still need to update our DESCRIPTION...
# add stringr to your package dependencies via the DESCRIPTION.
# Note: stringr v0.9.3 is has a different API than stringr v1.0.0+
# so, we'll want to be explicit that we need a version >= 1.0.0...
devtools::use_package(package = 'stringr',
type = 'Imports',
pkg = pkgName)
# Now, you'll want to make unit-tests for extractPhoneNumbers to make
# later refactoring less scary... In order for testthat to run them,
# they'll have to sit in tests/testthat
unitTests <- "test-extractPhoneNumbers.R"
testthatDir <- file.path(pkgName, "tests", "testthat")
file.rename(from = file.path("tmp_gist_dir", unitTests),
to = file.path(testthatDir, unitTests))
# let's run the test to check it works
devtools::test(pkg = pkgName)
# wonderful!
# at this point we'll want to check the package
# to make sure everything builds okay. I.e.,
# the roxumentation is properly formed, and the
# unit tests don't fail.
devtools::check(pkg = pkgName)
# you'll always want to make sure the roxumentation
# LOOKS good, so:
# let's install our package and then look at the help...
devtools::dev_mode() # if you don't know what this is, read the docs
devtools::load_all(pkg = pkgName)
devtools::dev_help(topic = "extractPhoneNumbers", type = 'html')
# Now, let's pull in a function that
# demonstrates template (and templateVar) usage:
file.rename(from = file.path("tmp_gist_dir", "threeWayDistance.R"),
to = file.path(pkgName, "R", "threeWayDistance.R"))
# however, this won't properly build, unless we
# bring in the appropriate template...
# first, we need to make the man-roxygen/ otherwise
# roxygen won't be able to find the template
dir.create(file.path(pkgName, "man-roxygen"))
file.rename(from = file.path("tmp_gist_dir", "vectorParams.R"),
to = file.path(pkgName, "man-roxygen", "vectorParams.R"))
# But now that we've brough it a "foreign" folder into our
# package, we'll need to tell R to ignore it during builds;
# Hence, we'll include the directory in our .Rbuildignore:
file.rename(from = file.path("tmp_gist_dir", ".Rbuildignore"),
to = file.path(pkgName, ".Rbuildignore"))
# again, we'll want to load and check that things look good:
devtools::load_all(pkg = pkgName)
devtools::check(pkg = pkgName)
devtools::dev_help(topic = "threeWayDistance", type = 'html') #this is a bug
#devtools::dev_help(topic = "threeWayDistance", type = 'text') #this is a bug
# finally, if everything lookgs good you'll want to stage your
# file-changes and make meaningful commit messages...
#
# I generally believe in writing commit messages like emails:
# you should have a proper subject, and some informative body.
# In abstract, you should format (whitespace in all), something like
#
# SUBJECT
#
# * something
# * something else
#
# I'll do a partial demonstration:
git2r::add(pkgRepo, file.path("DESCRIPTION")) # note that all paths are relative to repo root
git2r::commit(pkgRepo,
message = "updated Imports in DESCRIPTION: \n \
* added stringr -- still need to explicitly label the minimum package version.")
git2r::add(pkgRepo, file.path("R", "threeWayDistance.R"))
git2r::add(pkgRepo, file.path("man-roxygen", "vectorParams"))
git2r::commit(pkgRepo,
message = "Created and exported threeWayDistance.R \n \
* used to demonstrate templates and template variables.
* corresponding template (man-roxygen/vectorParams.R) is also being commited")
git2r::add(pkgRepo, file.path("R", "extractPhoneNumbers.R"))
git2r::add(pkgRepo, file.path("tests", "testthat", unitTests))
git2r::add(pkgRepo, file.path("NAMESPACE"))
git2r::add(pkgRepo, file.path("man", "*")) # add the help .Rd files
git2r::commit(pkgRepo,
message = "Created and exported extractPhoneNumbers.R \n \
* imports from stringr \
* does no input type coercion -- if the input isn't right, it fails. \
* unit test coverage is hella incomplete.")
# Now, let's clean up after ourselves
unlink(x = "tmp_gist_dir", recursive = TRUE)
# if you want to delete this package, and move out of dev_mode uncomment below:
# unlink(pkgName, recursive = TRUE)
# devtools::dev_mode()
#' @title Extract Phone Numbers
#'
#' @description Search for and extract all 10-digit phone numbers in a string,
#' provided the phone numbers are visually delimited with the typical characters
#' (i.e. '(', ')', '.', '+', '-', or ' ').
#'
#' @param inputStr a character vector (supposedly) containing phone numbers
#' to be extracted.
#'
#' @return a list of length equal to \code{inputStr} whose entries
#' contain character vectors whose entries are all extracted phone
#' numbers from the individual entries in \code{inputStr} (if any).
#'
#' @importFrom stringr %>% str_replace_all str_extract_all
#' @export
#' @examples
#' testStrings <- c("1234567890",
#' "123 456 7890",
#' "123.456.7890",
#' "(123) 456 7890",
#' "(123) 456 78 90",
#' "123.456.78.90",
#' "12 34 56 78 90",
#' "12.34.56.78.90",
#' "call me at 1234567890 OR 1234567890")
#'
#' extractPhoneNumbers(testStrings)
extractPhoneNumbers <- function(inputStr) {
# check input:
if (!is.character(inputStr)) {
stop("'inputStr' must be a (vector of) string(s)!")
}
# imports
`%>%` <- stringr::`%>%`
str_replace_all <- stringr::str_replace_all
str_extract_all <- stringr::str_extract_all
# intermediary regex's
visualDelimitersRegex <- "[()+\\-_. ]"
phoneNumberRegex <- "[:digit:]{10}"
inputStr %>%
str_replace_all(pattern = visualDelimitersRegex, replacement = "") %>%
str_extract_all(pattern = phoneNumberRegex)
}
context("extractPhoneNumbers")
test_that("expected cases are properly extracted", {
testStrings <- c("1234567890",
"123 456 7890",
"123.456.7890",
"(123) 456 7890",
"(123) 456 78 90",
"123.456.78.90",
"12 34 56 78 90",
"12.34.56.78.90",
"call me at 1234567890 OR 1234567890")
expectedOutput <- list("1234567890",
"1234567890",
"1234567890",
"1234567890",
"1234567890",
"1234567890",
"1234567890",
"1234567890",
c("1234567890", "1234567890"))
expect_equal(extractPhoneNumbers(testStrings), expectedOutput)
})
#' Calculate the three-way distance between variables
#'
#' @description Some sort of mumbo-jumbo
#'
#' @templateVar paramNames c('x', 'y', 'z')
#' @templateVar paramTypes c('numeric', 'character')
#' @template vectorParams
#'
#' @return a numeric value indicating the distance between
#' \code{x}, \code{y}, and \code{z}.
#' @export
threeWayDistance <- function(x, y, z) {
return(NULL)
}
# we want to build a roxygen block that looks like
#
# @param param1 a type1, type2, ... or typeN vector
# ...
# @param paramM a type1, type2, ... or typeN vector
#
# where we are fed c('param1', ..., 'paramM') as
# templateVar 'paramNames', and c('type1', ..., 'typeN')
# as templateVar 'paramTypes'.
#
# The fact that the parameters have class 'vector' is
# to make this code not completely unreadable. We could
# easily generalize it.
<%
# helper function to evaluate stringified R-code
evalString <- function(str) {
eval.parent(parse(text=str))
}
# reassign paramName to the its passed in value
evalString(paste0("params <- ", paramNames))
evalString(paste0("types <- ", paramTypes))
# build the model roxygen block backwards:
# be sure to account for situation where multiple
# types are passed in.
roxygenBlock <- tail(types, n=1)
numTypes <- length(types)
if (numTypes > 1) {
roxygenBlock <-
paste(paste(types[1:(numTypes-1)], collapse=", "),
"or",
roxygenBlock)
}
roxygenBlock <- c("#' @param",
paste("a", roxygenBlock, "vector"))
# for each parameter, pump out the appropriate roxygen block;
# be sure to use `cat` since we are in a <\% and not a <\%= brew-chunk
sapply(params,
function(param) {
cat(paste(roxygenBlock[1], param, roxygenBlock[2]), fill = TRUE)
})
%>
# sometimes you need to end a template with NULL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment