Skip to content

Instantly share code, notes, and snippets.

@infotroph
Created June 28, 2018 13:52
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save infotroph/281068d93a4e67815ad23899193009d8 to your computer and use it in GitHub Desktop.
Save infotroph/281068d93a4e67815ad23899193009d8 to your computer and use it in GitHub Desktop.
Investigating dependencies for a large suite of tightly-coupled, non-CRAN R packages

What software do I need to have installed for a working copy of PEcAn?

Great question. Let's find out, with two big caveats.

  1. This approach will find components formally required by one or more of the PEcAn R packages. It will not tell us what dependencies are missing from the package descriptions, nor about any of PEcAn's non-R dependencies -- notably, the list it produces will not contain Postgres or any of the components of Bety. But we will get a list of the system libraries needed by each R package (e.g. RCurl depends on your OS's libcurl), at least to the extent that the packages declare them.

  2. Ironically, it only works on a system that already has all of PEcAn installed. If your machine is already in dependency hell, this probably won't help because R won't know how to find and recursively check the dependencies it doesn't yet have. But with some refinements, this approach could probably autogenerate a list of dependencies so that we can, say, mention new ones in the changelog.

For the most accurate results, run a $(make clean; make install; make check) on your local PEcAn repository immediately before this, to make sure you've found and installed any recently-added dependencies.

First, get a list of all the PEcAn packages on our system. The consistent package naming scheme is very helpful here!

library(tidyverse)
pecan_pkgs <- grep("PEcAn", installed.packages()[,"Package"], value = TRUE)

Now list all Depends and Imports to find the minimum set of packages requires to install PEcAn. Note that we check recursively to account for indirect dependencies -- Each dependency package will refuse to install until its own dependencies are met.

install_deps <- tools::package_dependencies(
	packages = pecan_pkgs,
	db = installed.packages(),
	which = c("Depends", "Imports"),
	recursive = TRUE)

(Side note: If you were to call package_dependencies with db = available.packages() instead of db = installed.packages(), you'd see the dependencies on packages that are in CRAN whether they're installed or not, while ignoring all non-CRAN dependencies. But since PEcAn depends heavily on a number of non-CRAN packages, this isn't very informative for today's question.)

If you plan to hack on PEcAn, you'll want to routinely run package checks. R CMD check demands that you also install all packages Suggests-ed by the package you're checking, so we need a list of those too. This is true no matter how you run the checks -- both devtools::check and make check run R CMD check under the hood.

check_deps <- tools::package_dependencies(
	packages = pecan_pkgs,
	db = installed.packages(),
	which = "Suggests",
	recursive=FALSE)

...But the suggested packages might have dependencies of their own (adding recursive = TRUE above would get more levels of Suggests, not of Depends and Imports), so we need to find those as well.

Caution: The assumption that you have all the needed packages on your own machine becomes very important here. If you haven't run make check on every package in PEcAn (which takes a while), it would be pretty easy to miss a dependency of a suggested package, and therefore not have it show up in this list.

add_indirect <- function(x){
	xi <- tools::package_dependencies(
		packages = x,
		db = installed.packages(),
		which = c("Depends", "Imports"),
		recursive = TRUE)
	append(x, reduce(xi, union))
}
check_deps <- (
	check_deps
	%>% compact()
	%>% map(add_indirect))

We now have two lists of character vectors, each giving all the dependencies needed to install or check a particular package. Since many packages are required by multiple PEcAn packages, these lists have a lot of redundancy in them: For example, lubridate is needed by length(keep(install_deps, ~"lubridate" %in% .)) = 32 packages. For easier reading, let's invert these into tables of unique dependencies, each specifying the PEcAn packages that need them either directly or indirectly.

(Side note: This is telling us which of the PEcAn packages need a particular dependency. To see immediate reverse dependencies (i.e. the packages that specifically asked for this one) use tools::package_dependencies(pkg, db=installed.packages(), reverse=TRUE).)

install_parents <- (
	install_deps
	%>% imap_dfr(
		~tibble(requesting_package = .y, dependency = .x))
	%>% group_by(dependency)
	%>% arrange(requesting_package)
	%>% summarize(
		installed_by = paste(unique(requesting_package),collapse=", "))
)
check_parents <- (
	check_deps
	%>% imap_dfr(
		~tibble(requesting_package = .y, dependency = .x))
	%>% anti_join(install_parents)
	%>% group_by(dependency)
	%>% arrange(requesting_package)
	%>% summarize(
		suggested_by = paste(unique(requesting_package), collapse=", "))
)

I won't print the whole lists in this example, but here are the first few rows of each.

install_parents
## A tibble: 229 x 2
#   dependency    installed_by                           
#   <chr>         <chr>                                  
# 1 abind         PEcAn.all, PEcAn.assim.batch, PEcAn.as…
# 2 animation     PEcAn.data.mining                      
# 3 ape           PEcAn.all, PEcAn.assim.batch, PEcAn.BI…
# 4 assertthat    PEcAn.all, PEcAn.assim.batch, PEcAn.be…
# 5 backports     PEcAn.all, PEcAn.BIOCRO, PEcAn.data.la…
# 6 base64enc     PEcAn.all, PEcAn.BIOCRO, PEcAn.data.la…
# 7 BayesianTools PEcAn.all, PEcAn.assim.batch, PEcAnRTM 
# 8 bindr         PEcAn.all, PEcAn.assim.batch, PEcAn.be…
# 9 bindrcpp      PEcAn.all, PEcAn.assim.batch, PEcAn.be…
#10 bit           PEcAn.ED2                              
## ... with 219 more rows

check_parents
## A tibble: 46 x 2
#   dependency suggested_by                    
#   <chr>      <chr>                           
# 1 bibtex     PEcAn.DB                        
# 2 BioCro     PEcAn.BIOCRO                    
# 3 blob       PEcAn.DB                        
# 4 broom      PEcAn.DB                        
# 5 callr      PEcAn.DB                        
# 6 debugme    PEcAn.DB                        
# 7 dotCall64  PEcAn.data.land                 
# 8 evaluate   PEcAn.DB, PEcAn.MAESPA, PEcAnRTM
# 9 fields     PEcAn.data.land                 
#10 geometry   PEcAn.MAESPA                    
## ... with 36 more rows

So to install PEcAn, we need a total of 229 R packages! And to include all the suggested packages, we'll need an additional 46 on top of that.

On consideration, it's a little misleading to refer to the list of suggested packages as being "for checking". For example, the BioCro model package is "suggested" for PEcAn.BIOCRO, but there is very little that PEcAn.BIOCRO can do without BioCro -- the only reason it's suggested rather than required is so that users of other models can install PEcAn without it. Most of the model packages work this way.

Now let's go one step further and think about non-R dependencies. We don't have a comprehensive list, but R package descriptions do have a SystemRequirements field that (is supposed to) describe any components the package needs beyond R itself. I'm not sure how universally used these are, but let's see what the packages in our dependency tree declare. tools::package_dependencies doesn't check this field for us, so we have to read the description files directly.

get_requirement <- function(x){
	xx <- tryCatch(
		packageDescription(x)$SystemRequirements,
		error = function(e){})
	if(is.null(xx)) return(NA_character_)
	xx
}
install_externals <- (
	install_parents
	%>% select(dependency)
	%>% mutate(
		requirement_string = map_chr(dependency, get_requirement))
	%>% na.omit()
)
check_externals <- (
	check_parents
	%>% select(dependency)
	%>% mutate(
		requirement_string = map_chr(dependency, get_requirement))
	%>% na.omit()
)
# Warning messages:
# 1: In packageDescription(x) : no package 'PEcAn.ed' was found
# 2: In packageDescription(x) : no package 'PEcAn.photosythesis' was found

The warnings suggest that we have some typos in Suggests fields! Where are they?

(check_parents
	%>% filter(dependency %in% c("PEcAn.ed", "PEcAn.photosythesis"))
	%>% pull(suggested_by))
# [1] "PEcAn.all" "PEcAn.all"

... Yep, sure enough. Patch to come soon.

Meanwhile, let's look at the reported system requirements:

install_externals
## A tibble: 26 x 2
#   dependency requirement_string                        
#   <chr>      <chr>                                     
# 1 animation  "ImageMagick (http://imagemagick.org) or\…
# 2 clipr      "xclip (https://github.com/astrand/xclip)…
# 3 curl       "libcurl: libcurl-devel (rpm) or\nlibcurl…
# 4 haven      GNU make                                  
# 5 hdf5r      libhdf5 (>= 1.8.13)                       
# 6 httpuv     GNU make                                  
# 7 igraph     gmp (optional), libxml2 (optional), glpk …
# 8 jpeg       libjpeg                                   
# 9 lubridate  "A system with zoneinfo data (e.g.\n/usr/…
#10 MCMCpack   gcc (>= 4.0)                              
#11 minqa      GNU make                                  
#12 ncdf4      netcdf library version 4.1 or later       
#13 nimble     GNU make                                  
#14 openssl    OpenSSL >= 1.0.1                          
#15 PEcAn.MA   JAGS                                      
#16 png        libpng                                    
#17 raster     C++11                                     
#18 RCurl      GNU make, libcurl                         
#19 redland    "Mac OSX: redland (>= 1.0.14) ; Linux: li…
#20 rgdal      "for building from source: GDAL >= 1.11.4…
#21 rjags      JAGS 4.x.y                                
#22 shinyjs    pandoc with https support                 
#23 stringi    ICU4C (>= 52, optional)                   
#24 udunits2   udunits-2                                 
#25 XML        libxml2 (>= 2.6.3)                        
#26 xml2       libxml2: libxml2-dev (deb), libxml2-devel…

check_externals
## A tibble: 8 x 2
#  dependency     requirement_string                     
#  <chr>          <chr>                                  
#1 knitr          "Package vignettes based on R Markdown…
#2 PEcAn.DALEC    dalec                                  
#3 PEcAn.LINKAGES LINKAGES                               
#4 PEcAn.sipnet   SIPNET ecosystem model                 
#5 reprex         pandoc (>= 1.12.3) - http://pandoc.org 
#6 rgl            "OpenGL, GLU Library, XQuartz (on OSX)…
#7 rmarkdown      pandoc (>= 1.12.3) - http://pandoc.org 
#8 RMySQL         "libmariadb-client-dev | libmariadb-cl…

It appears many package authors treat the SystemRequirements field as free text, not a parseable version specification. It's unlikely that we'd be able to rely on this for automatic dependency resolution, but it's nice to have them collected in one place for user reference. We should consider highlighting any substantial changes to this requirement list in the PEcAn release notes.

@robkooper
Copy link

Wow this is exactly what I was looking for. My plan is to have for the docker containers it to first install all the dependencies, and then install and compile PEcAn. This will speed up the build process drastically.

@infotroph
Copy link
Author

infotroph commented Jun 28, 2018

For installation, my first (untested) guess would be:

all_deps = c(install_parents$dependency, check_parents$dependency)
install.packages(all_deps[all_deps %in% available.packages()[,"Package"]])

And then let Make and devtools::install_deps handle the non-CRAN dependencies during PEcAn installation?

@robkooper
Copy link

Ended up with the following to get a list of all dependencies needed to compile PEcAn:

library(tidyverse)

pecan_pkgs <- grep("PEcAn", installed.packages()[,"Package"], value = TRUE)

install_deps <- tools::package_dependencies(
    packages = pecan_pkgs,
    db = installed.packages(),
    which = c("Depends", "Imports"),
    recursive = FALSE)

install_parents <- (
    install_deps
    %>% imap_dfr(
        ~tibble(requesting_package = .y, dependency = .x))
    %>% group_by(dependency)
    %>% arrange(requesting_package)
    %>% summarize(
        installed_by = paste(unique(requesting_package),collapse=", "))
)

print(paste(grep("PEcAn.*", install_parents$dependency, value = TRUE, invert = TRUE), collapse=" "))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment