Great question. Let's find out, with two big caveats.
-
This approach will find components formally required by one or more of the PEcAn R packages. It will not tell us what dependencies are missing from the package descriptions, nor about any of PEcAn's non-R dependencies -- notably, the list it produces will not contain Postgres or any of the components of Bety. But we will get a list of the system libraries needed by each R package (e.g. RCurl depends on your OS's
libcurl
), at least to the extent that the packages declare them. -
Ironically, it only works on a system that already has all of PEcAn installed. If your machine is already in dependency hell, this probably won't help because R won't know how to find and recursively check the dependencies it doesn't yet have. But with some refinements, this approach could probably autogenerate a list of dependencies so that we can, say, mention new ones in the changelog.
For the most accurate results, run a $(make clean; make install; make check)
on your local PEcAn repository immediately before this, to make sure you've found and installed any recently-added dependencies.
First, get a list of all the PEcAn packages on our system. The consistent package naming scheme is very helpful here!
library(tidyverse)
pecan_pkgs <- grep("PEcAn", installed.packages()[,"Package"], value = TRUE)
Now list all Depends and Imports to find the minimum set of packages requires to install PEcAn. Note that we check recursively to account for indirect dependencies -- Each dependency package will refuse to install until its own dependencies are met.
install_deps <- tools::package_dependencies(
packages = pecan_pkgs,
db = installed.packages(),
which = c("Depends", "Imports"),
recursive = TRUE)
(Side note: If you were to call package_dependencies
with db = available.packages()
instead of db = installed.packages()
, you'd see the dependencies on packages that are in CRAN whether they're installed or not, while ignoring all non-CRAN dependencies. But since PEcAn depends heavily on a number of non-CRAN packages, this isn't very informative for today's question.)
If you plan to hack on PEcAn, you'll want to routinely run package checks. R CMD check
demands that you also install all packages Suggests
-ed by the package you're checking, so we need a list of those too. This is true no matter how you run the checks -- both devtools::check
and make check
run R CMD check
under the hood.
check_deps <- tools::package_dependencies(
packages = pecan_pkgs,
db = installed.packages(),
which = "Suggests",
recursive=FALSE)
...But the suggested packages might have dependencies of their own (adding recursive = TRUE
above would get more levels of Suggests, not of Depends and Imports), so we need to find those as well.
Caution: The assumption that you have all the needed packages on your own machine becomes very important here. If you haven't run make check
on every package in PEcAn (which takes a while), it would be pretty easy to miss a dependency of a suggested package, and therefore not have it show up in this list.
add_indirect <- function(x){
xi <- tools::package_dependencies(
packages = x,
db = installed.packages(),
which = c("Depends", "Imports"),
recursive = TRUE)
append(x, reduce(xi, union))
}
check_deps <- (
check_deps
%>% compact()
%>% map(add_indirect))
We now have two lists of character vectors, each giving all the dependencies needed to install or check a particular package. Since many packages are required by multiple PEcAn packages, these lists have a lot of redundancy in them: For example, lubridate
is needed by length(keep(install_deps, ~"lubridate" %in% .))
= 32 packages. For easier reading, let's invert these into tables of unique dependencies, each specifying the PEcAn packages that need them either directly or indirectly.
(Side note: This is telling us which of the PEcAn packages need a particular dependency. To see immediate reverse dependencies (i.e. the packages that specifically asked for this one) use tools::package_dependencies(pkg, db=installed.packages(), reverse=TRUE)
.)
install_parents <- (
install_deps
%>% imap_dfr(
~tibble(requesting_package = .y, dependency = .x))
%>% group_by(dependency)
%>% arrange(requesting_package)
%>% summarize(
installed_by = paste(unique(requesting_package),collapse=", "))
)
check_parents <- (
check_deps
%>% imap_dfr(
~tibble(requesting_package = .y, dependency = .x))
%>% anti_join(install_parents)
%>% group_by(dependency)
%>% arrange(requesting_package)
%>% summarize(
suggested_by = paste(unique(requesting_package), collapse=", "))
)
I won't print the whole lists in this example, but here are the first few rows of each.
install_parents
## A tibble: 229 x 2
# dependency installed_by
# <chr> <chr>
# 1 abind PEcAn.all, PEcAn.assim.batch, PEcAn.as…
# 2 animation PEcAn.data.mining
# 3 ape PEcAn.all, PEcAn.assim.batch, PEcAn.BI…
# 4 assertthat PEcAn.all, PEcAn.assim.batch, PEcAn.be…
# 5 backports PEcAn.all, PEcAn.BIOCRO, PEcAn.data.la…
# 6 base64enc PEcAn.all, PEcAn.BIOCRO, PEcAn.data.la…
# 7 BayesianTools PEcAn.all, PEcAn.assim.batch, PEcAnRTM
# 8 bindr PEcAn.all, PEcAn.assim.batch, PEcAn.be…
# 9 bindrcpp PEcAn.all, PEcAn.assim.batch, PEcAn.be…
#10 bit PEcAn.ED2
## ... with 219 more rows
check_parents
## A tibble: 46 x 2
# dependency suggested_by
# <chr> <chr>
# 1 bibtex PEcAn.DB
# 2 BioCro PEcAn.BIOCRO
# 3 blob PEcAn.DB
# 4 broom PEcAn.DB
# 5 callr PEcAn.DB
# 6 debugme PEcAn.DB
# 7 dotCall64 PEcAn.data.land
# 8 evaluate PEcAn.DB, PEcAn.MAESPA, PEcAnRTM
# 9 fields PEcAn.data.land
#10 geometry PEcAn.MAESPA
## ... with 36 more rows
So to install PEcAn, we need a total of 229 R packages! And to include all the suggested packages, we'll need an additional 46 on top of that.
On consideration, it's a little misleading to refer to the list of suggested packages as being "for checking". For example, the BioCro model package is "suggested" for PEcAn.BIOCRO, but there is very little that PEcAn.BIOCRO can do without BioCro -- the only reason it's suggested rather than required is so that users of other models can install PEcAn without it. Most of the model packages work this way.
Now let's go one step further and think about non-R dependencies. We don't have a comprehensive list, but R package descriptions do have a SystemRequirements
field that (is supposed to) describe any components the package needs beyond R itself. I'm not sure how universally used these are, but let's see what the packages in our dependency tree declare. tools::package_dependencies
doesn't check this field for us, so we have to read the description files directly.
get_requirement <- function(x){
xx <- tryCatch(
packageDescription(x)$SystemRequirements,
error = function(e){})
if(is.null(xx)) return(NA_character_)
xx
}
install_externals <- (
install_parents
%>% select(dependency)
%>% mutate(
requirement_string = map_chr(dependency, get_requirement))
%>% na.omit()
)
check_externals <- (
check_parents
%>% select(dependency)
%>% mutate(
requirement_string = map_chr(dependency, get_requirement))
%>% na.omit()
)
# Warning messages:
# 1: In packageDescription(x) : no package 'PEcAn.ed' was found
# 2: In packageDescription(x) : no package 'PEcAn.photosythesis' was found
The warnings suggest that we have some typos in Suggests fields! Where are they?
(check_parents
%>% filter(dependency %in% c("PEcAn.ed", "PEcAn.photosythesis"))
%>% pull(suggested_by))
# [1] "PEcAn.all" "PEcAn.all"
... Yep, sure enough. Patch to come soon.
Meanwhile, let's look at the reported system requirements:
install_externals
## A tibble: 26 x 2
# dependency requirement_string
# <chr> <chr>
# 1 animation "ImageMagick (http://imagemagick.org) or\…
# 2 clipr "xclip (https://github.com/astrand/xclip)…
# 3 curl "libcurl: libcurl-devel (rpm) or\nlibcurl…
# 4 haven GNU make
# 5 hdf5r libhdf5 (>= 1.8.13)
# 6 httpuv GNU make
# 7 igraph gmp (optional), libxml2 (optional), glpk …
# 8 jpeg libjpeg
# 9 lubridate "A system with zoneinfo data (e.g.\n/usr/…
#10 MCMCpack gcc (>= 4.0)
#11 minqa GNU make
#12 ncdf4 netcdf library version 4.1 or later
#13 nimble GNU make
#14 openssl OpenSSL >= 1.0.1
#15 PEcAn.MA JAGS
#16 png libpng
#17 raster C++11
#18 RCurl GNU make, libcurl
#19 redland "Mac OSX: redland (>= 1.0.14) ; Linux: li…
#20 rgdal "for building from source: GDAL >= 1.11.4…
#21 rjags JAGS 4.x.y
#22 shinyjs pandoc with https support
#23 stringi ICU4C (>= 52, optional)
#24 udunits2 udunits-2
#25 XML libxml2 (>= 2.6.3)
#26 xml2 libxml2: libxml2-dev (deb), libxml2-devel…
check_externals
## A tibble: 8 x 2
# dependency requirement_string
# <chr> <chr>
#1 knitr "Package vignettes based on R Markdown…
#2 PEcAn.DALEC dalec
#3 PEcAn.LINKAGES LINKAGES
#4 PEcAn.sipnet SIPNET ecosystem model
#5 reprex pandoc (>= 1.12.3) - http://pandoc.org
#6 rgl "OpenGL, GLU Library, XQuartz (on OSX)…
#7 rmarkdown pandoc (>= 1.12.3) - http://pandoc.org
#8 RMySQL "libmariadb-client-dev | libmariadb-cl…
It appears many package authors treat the SystemRequirements
field as free text, not a parseable version specification. It's unlikely that we'd be able to rely on this for automatic dependency resolution, but it's nice to have them collected in one place for user reference. We should consider highlighting any substantial changes to this requirement list in the PEcAn release notes.
Wow this is exactly what I was looking for. My plan is to have for the docker containers it to first install all the dependencies, and then install and compile PEcAn. This will speed up the build process drastically.