Skip to content

Instantly share code, notes, and snippets.

@sckott
Last active August 29, 2015 14:26
Show Gist options
  • Save sckott/57cbe0ee3738632bb49f to your computer and use it in GitHub Desktop.
Save sckott/57cbe0ee3738632bb49f to your computer and use it in GitHub Desktop.
occ_search and occ_download

to get similar results for GBIF search and download APIs

Load rgbif

library("rgbif")

occ_search() method

occ_search(taxonKey=6351, fields = "all", hasCoordinate=TRUE, hasGeospatialIssue = FALSE)
#> Records found [6522] 
#> Records returned [500] 
#> No. unique hierarchies [2] 
#> No. media records [1] 
#> Args [taxonKey=6351, hasCoordinate=TRUE, hasGeospatialIssue=FALSE, limit=500, offset=0, fields=all] 
#> First 10 rows of data
#> 
#>                    name       key decimalLatitude decimalLongitude                       issues
#> 1  Raphoneis amphiceros 113516533        53.74579          1.67182 cdround,cudc,gass84,txmatfuz
#> 2           Diatomaceae 113425616        53.94419          1.23671  cdround,cudc,gass84,txmathi
#> 3  Raphoneis amphiceros 113524813        53.94419          1.23671 cdround,cudc,gass84,txmatfuz
#> 4           Diatomaceae 113422759        53.64906          1.10778  cdround,cudc,gass84,txmathi
#> 5  Raphoneis amphiceros 113516272        52.72840         -5.72135 cdround,cudc,gass84,txmatfuz
#> 6           Diatomaceae 113422760        53.80887          2.19924  cdround,cudc,gass84,txmathi
#> 7           Diatomaceae 113425650        54.07139          1.52203  cdround,cudc,gass84,txmathi
#> 8  Raphoneis amphiceros 113516576        53.64906          1.10778 cdround,cudc,gass84,txmatfuz
#> 9  Raphoneis amphiceros 113517123        53.33904          4.61214 cdround,cudc,gass84,txmatfuz
#> 10          Diatomaceae 113424196        44.22176        -60.25779  cdround,cudc,gass84,txmathi
#> ..                  ...       ...             ...              ...                          ...
#> Variables not shown: datasetKey (chr), publishingOrgKey (chr), publishingCountry (chr), protocol
#>      (chr), lastCrawled (chr), extensions (chr), basisOfRecord (chr), taxonKey (int), kingdomKey
#>      (int), phylumKey (int), classKey (int), familyKey (int), genusKey (int), speciesKey (int),
#>      scientificName (chr), kingdom (chr), phylum (chr), family (chr), genus (chr), species (chr),
#>      genericName (chr), specificEpithet (chr), taxonRank (chr), depth (dbl), depthAccuracy (dbl),
#>      year (int), month (int), day (int), eventDate (chr), lastInterpreted (chr), identifiers (chr),
#>      facts (chr), relations (chr), geodeticDatum (chr), class (chr), countryCode (chr), country
#>      (chr), catalogNumber (chr), institutionCode (chr), collectionCode (chr), gbifID (chr),
#>      lastParsed (chr), elevation (dbl), elevationAccuracy (dbl), stateProvince (chr), recordedBy
#>      (chr), county (chr), locality (chr), identifiedBy (chr)

occ_download_*() method

For some of the download API functions, note that you have to pass in your username, email and password for the GBIF website

Start a download

(dload <- occ_download('taxonKey = 6351', 
                      'hasCoordinate = TRUE', 
                      'hasGeospatialIssue = FALSE'))
#> <<gbif download>>
#>   Username: xxx
#>   E-mail: myrmecocystus@gmail.com
#>   Download key: 0003358-150721130643939

Then you have to wait for the download file to be made ready by GBIF servers. In the meantime, check on all your downloads like

occ_download_list()
#> $meta
#>   offset limit endofrecords count
#> 1      0     3        FALSE    38
#> 
#> $results
#>                       key                    doi request.predicate.type
#> 1 0003358-150721130643939 doi:10.15468/dl.ll3wue                    and
#> 2 0003357-150721130643939 doi:10.15468/dl.bjzstf                 equals
#> 3 0007658-150615163101818 doi:10.15468/dl.jh2wda                 equals
#>                                                                 request.predicate.predicates
#> 1 equals, equals, equals, TAXON_KEY, HAS_COORDINATE, HAS_GEOSPATIAL_ISSUE, 6351, TRUE, FALSE
#> 2                                                                                       NULL
#> 3                                                                                       NULL
#>   request.predicate.key request.predicate.value request.creator request.format
#> 1                  <NA>                    <NA>          sckott           DWCA
#> 2             TAXON_KEY                    6351          sckott           DWCA
#> 3             TAXON_KEY                 2433433          sckott           DWCA
#>   request.notificationAddresses request.sendNotification                      created
#> 1       myrmecocystus@gmail.com                    FALSE 2015-08-04T15:58:11.902+0000
#> 2       myrmecocystus@gmail.com                    FALSE 2015-08-04T15:56:23.864+0000
#> 3       myrmecocystus@gmail.com                    FALSE 2015-07-09T14:48:40.705+0000
#>                       modified    status
#> 1 2015-08-04T15:59:00.874+0000 SUCCEEDED
#> 2 2015-08-04T15:57:19.062+0000 SUCCEEDED
#> 3 2015-07-09T14:50:28.628+0000 SUCCEEDED
#>                                                                     downloadLink size totalRecords
#> 1 http://api.gbif.org/v1/occurrence/download/request/0003358-150721130643939.zip 0.41         6522
#> 2 http://api.gbif.org/v1/occurrence/download/request/0003357-150721130643939.zip 0.50         6943
#> 3 http://api.gbif.org/v1/occurrence/download/request/0007658-150615163101818.zip 0.00        16613
#>   numberDatasets
#> 1              7
#> 2             11
#> 3            174

And you can check on a specific download like

occ_download_meta(dload)
#> <<gbif download metadata>>
#>   Status: SUCCEEDED
#>   Download key: 0003358-150721130643939
#>   Created: 2015-08-04T15:58:11.902+0000
#>   Modified: 2015-08-04T15:59:00.874+0000
#>   Download link: http://api.gbif.org/v1/occurrence/download/request/0003358-150721130643939.zip
#>   Total records: 6522
#>   Request: 
#>     type:  and
#>     predicates: 
#>       - type: equals, key: TAXON_KEY, value: 6351
#>       - type: equals, key: HAS_COORDINATE, value: TRUE
#>       - type: equals, key: HAS_GEOSPATIAL_ISSUE, value: FALSE

Once the download status is SUCCEEDED, then you can download the file, and import it (into R)

occ_download_get(dload) %>% occ_download_import(path)
#>       gbifID abstract accessRights accrualMethod accrualPeriodicity accrualPolicy alternative
#> 1  197225455       NA           NA            NA                 NA            NA          NA
#> 2  197226215       NA           NA            NA                 NA            NA          NA
#> 3  113421846       NA           NA            NA                 NA            NA          NA
#> 4  113421847       NA           NA            NA                 NA            NA          NA
#> 5  113421851       NA           NA            NA                 NA            NA          NA
#> 6  113421860       NA           NA            NA                 NA            NA          NA
#> 7  113421982       NA           NA            NA                 NA            NA          NA
#> 8  113421988       NA           NA            NA                 NA            NA          NA
#> 9  113421994       NA           NA            NA                 NA            NA          NA
#> 10 113422101       NA           NA            NA                 NA            NA          NA
#> ..       ...      ...          ...           ...                ...           ...         ...
#> Variables not shown: audience (lgl), available (lgl), bibliographicCitation (lgl), conformsTo (lgl),
#>      contributor (lgl), coverage (lgl), created (lgl), creator (lgl), date (lgl), dateAccepted
#>      (lgl), dateCopyrighted (lgl), dateSubmitted (lgl), description (lgl), educationLevel (lgl),
#>      extent (lgl), format (lgl), hasFormat (lgl), hasPart (lgl), hasVersion (lgl), identifier (lgl),
#>
#> .... cutoff for brevity

You may get some read file warnings on the occ_download_import() call, but that shouldn't be a problem. You can also just read in the file however you like. The output of occ_download_get() has the path to the file downloaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment