Skip to content

Instantly share code, notes, and snippets.

@sckott
Last active March 21, 2016 17:03
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sckott/11407832 to your computer and use it in GitHub Desktop.
Save sckott/11407832 to your computer and use it in GitHub Desktop.
example parallel workflow with an rOpenSci package

Example parallel workflow

In this example, we use the rgbif package to search for occurrence (lat/long) data for 1000 species

Install and load rgbif

install.packages("rgbif")
library("rgbif")

Get Aves (birds) gbif key

name_backbone("Aves")

Get 1000 bird species names

out <- name_lookup(rank = "Species", higherTaxonKey = 212, limit = 1000)
head(out$data)

Get a vector of keys with species as the names for each key (for your own reference)

keys <- out$data$key
names(keys) <- out$data$canonicalName

Search for occurrences.

Note 1: that the function occ_search accepts many keys so you could pass in all 1000 names, but splittig up in chunks of 250 would allow all to be retrieved faster

Note 2: the default record limit in occ_search is 20 records, so we're getting a max of 20 records back per species, but that can be changed of course

library("plyr")
library("doMC")
registerDoMC(cores = 4)
chunksof10 <- split(keys[1:50], ceiling(seq_along(keys[1:50])/10))
res <- llply(chunksof10, occ_search, hasCoordinate = TRUE, limit = 5, return = 'data', .parallel = TRUE)

Remove species with no data

res2 <- lapply(res, function(x) Filter(is.data.frame, x))

Collapse to a single data.frame and inspect

library("data.table")
df <- setDF(rbindlist(unlist(res2, FALSE), fill = TRUE))

Map a few of the species (mapping 1000 species on a single map would be too much)

library("ggplot2")
gbifmap(df[1:40, ])

map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment