Skip to content

Instantly share code, notes, and snippets.

@sckott
Last active August 29, 2015 14:01
Show Gist options
  • Save sckott/371aa16efe3314fb8afb to your computer and use it in GitHub Desktop.
Save sckott/371aa16efe3314fb8afb to your computer and use it in GitHub Desktop.
NOAA ERDDAP data in R

NOAA oceanographic data (ERDDAP) data in R

NOAA ERDDAP server gives access to apparently 802 datasets

Install and load rnoaa

install_github("ropensci/rnoaa")
library("rnoaa")

A workflow for interacting with ERDDAP data

Search for data using erddap_search(). Grab a datasetid that you want more information on.

(out <- erddap_search(query='fish size'))
## 7 results, showing first 20 
##                                         title          dataset_id
## 1                          CalCOFI Fish Sizes    erdCalCOFIfshsiz
## 2                        CalCOFI Larvae Sizes    erdCalCOFIlrvsiz
## 3                                CalCOFI Tows      erdCalCOFItows
## 4     GLOBEC NEP MOCNESS Plankton (MOC1) Data       erdGlobecMoc1
## 5 GLOBEC NEP Vertical Plankton Tow (VPT) Data        erdGlobecVpt
## 6         CalCOFI Larvae Counts Positive Tows erdCalCOFIlrvcntpos
## 7  OBIS - ARGOS Satellite Tracking of Animals           aadcArgos
id <- out$info$dataset_id[1]

Using a datasetid, search for information on a datasetid using erddap_info(). A list of length 2 is returned. The variables slot has all column names, and whether they're a float, string, etc., The alldata slot has the comprehensive list of information on each column - it's given as a list as descriptions can be quite long, and would make for a hard to use data frame

erddap_info(datasetid=id)$variables
##     row_type        variable_name data_type
## 31  variable               cruise    String
## 35  variable                 ship    String
## 38  variable            ship_code    String
## 42  variable       order_occupied       int
## 46  variable             tow_type    String
## 49  variable             net_type    String
## 53  variable           tow_number       int
## 58  variable         net_location    String
## 62  variable standard_haul_factor     float
## 66  variable       volume_sampled     float
## 71  variable       percent_sorted     float
## 76  variable       sample_quality     float
## 80  variable             latitude     float
## 88  variable            longitude     float
## 96  variable                 line     float
## 101 variable              station     float
## 106 variable                 time    double
## 116 variable      scientific_name    String
## 119 variable          common_name    String
## 122 variable             itis_tsn       int
## 126 variable calcofi_species_code       int
## 130 variable            fish_size     float
## 134 variable           fish_count     float
## 137 variable          fish_1000m3     float

Get data from the dataset, with column names gleaned from the last step

head(erddap_data(datasetid = id, fields = c('latitude','longitude','scientific_name')))
##    latitude  longitude       scientific_name
## 2 35.038334 -120.88333 Microstomus pacificus
## 3  34.97167 -121.02333    Cyclothone signata
## 4  34.97167 -121.02333    Cyclothone signata
## 5  34.97167 -121.02333    Cyclothone signata
## 6  34.97167 -121.02333    Cyclothone signata
## 7  34.97167 -121.02333    Cyclothone signata

Integrate with taxize

Some datasets have ITIS taxonomic identifiers - we can use taxize to get more information on each species using these identifiers.

Get data on the California Cooperative Oceanic Fisheries Investigations fish sizes dataset (erdCalCOFIfshsiz)

out <- erddap_data(datasetid = 'erdCalCOFIfshsiz', fields = c('latitude','longitude','scientific_name','itis_tsn'))
tsns <- unique(out$itis_tsn[1:100])

Load taxize and get classifications for each taxon, then combine to a single data frame

install.packages("taxize")
library("taxize")
classif <- classification(tsns, db = "itis")
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=172887
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162168
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=623625
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162172
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162301
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162685
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162664
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162221
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=164792
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162167
## http://www.itis.gov/ITISWebService/services/ITISService/getFullHierarchyFromTSN?tsn=162092
alldata <- rbind(classif)
nrow(alldata)
## [1] 166
head(alldata)
##   source taxonid          name         rank
## 1   itis  172887      Animalia      Kingdom
## 2   itis  172887     Bilateria   Subkingdom
## 3   itis  172887 Deuterostomia Infrakingdom
## 4   itis  172887      Chordata       Phylum
## 5   itis  172887    Vertebrata    Subphylum
## 6   itis  172887 Gnathostomata  Infraphylum

Constraining the search

So ERDDAP is a great service. They don't have a unified API against which to specify constraints on what data gets returned - each dataset has different possible parameters. So you do need to use erddap_ino() to find out what columns there are, then you can pass in parameter to erddap_search() based on those column names. For example, from above we see that there is a column caled fish_count in the erdCalCOFIfshsiz dataset. We can use that column to get back only data that meet a criterion.

Records with a count of more than 6

erddap_data(datasetid = id, fields = c('latitude','longitude','scientific_name','fish_count'), 'fish_count>=6')
##     latitude   longitude           scientific_name fish_count
## 2     35.595 -121.736664        Cyclothone signata          6
## 3  35.336666    -122.555 Stenobrachius leucopsarus          8
## 4  35.336666 -122.861664        Cyclothone signata          6
## 5  35.336666 -122.861664        Cyclothone signata          6
## 6  35.336666 -122.861664 Stenobrachius leucopsarus          6
## 7  35.356667  -123.12666        Cyclothone signata          6
## 8  35.356667  -123.12666 Stenobrachius leucopsarus          7
## 9  35.333332 -123.473335        Cyclothone signata          6
## 10 35.333332 -123.473335        Cyclothone signata          6
## 11 35.333332 -123.473335        Cyclothone signata          6
## 12 35.333332 -123.473335        Cyclothone signata          7
## 13 35.333332 -123.473335        Cyclothone signata          7
## 14 35.333332 -123.473335        Cyclothone signata          6
## 15 35.336666     -123.77        Cyclothone signata          8
## 16 35.336666     -123.77        Cyclothone signata          7
## 17 35.331665    -124.375        Cyclothone signata          7
## 18 35.338333  -125.28667        Cyclothone signata          6
## 19 35.338333  -125.28667        Cyclothone signata          6
## 20 36.836666 -124.878334        Cyclothone signata          6
## 21 35.981667     -124.88        Cyclothone signata          6
## 22 36.333332 -124.878334        Cyclothone signata          6
## 23 36.333332 -124.878334        Cyclothone signata          6
## 24 36.583332    -124.875        Cyclothone signata          6
## 25    35.335  -123.26667 Stenobrachius leucopsarus          6
## 26 37.001667 -124.901665        Cyclothone signata          6
## 27      37.5  -126.70167        Cyclothone signata          6
## 28      38.0      -128.5        Cyclothone signata          6
## 29      38.0      -128.5        Cyclothone signata         11
## 30      38.5      -125.5        Cyclothone signata          7

Records with a count of more than or equal to 6

erddap_data(datasetid = id, fields = c('latitude','longitude','scientific_name','fish_count'), 'fish_count>=6')
##     latitude   longitude           scientific_name fish_count
## 2     35.595 -121.736664        Cyclothone signata          6
## 3  35.336666    -122.555 Stenobrachius leucopsarus          8
## 4  35.336666 -122.861664        Cyclothone signata          6
## 5  35.336666 -122.861664        Cyclothone signata          6
## 6  35.336666 -122.861664 Stenobrachius leucopsarus          6
## 7  35.356667  -123.12666        Cyclothone signata          6
## 8  35.356667  -123.12666 Stenobrachius leucopsarus          7
## 9  35.333332 -123.473335        Cyclothone signata          6
## 10 35.333332 -123.473335        Cyclothone signata          6
## 11 35.333332 -123.473335        Cyclothone signata          6
## 12 35.333332 -123.473335        Cyclothone signata          7
## 13 35.333332 -123.473335        Cyclothone signata          7
## 14 35.333332 -123.473335        Cyclothone signata          6
## 15 35.336666     -123.77        Cyclothone signata          8
## 16 35.336666     -123.77        Cyclothone signata          7
## 17 35.331665    -124.375        Cyclothone signata          7
## 18 35.338333  -125.28667        Cyclothone signata          6
## 19 35.338333  -125.28667        Cyclothone signata          6
## 20 36.836666 -124.878334        Cyclothone signata          6
## 21 35.981667     -124.88        Cyclothone signata          6
## 22 36.333332 -124.878334        Cyclothone signata          6
## 23 36.333332 -124.878334        Cyclothone signata          6
## 24 36.583332    -124.875        Cyclothone signata          6
## 25    35.335  -123.26667 Stenobrachius leucopsarus          6
## 26 37.001667 -124.901665        Cyclothone signata          6
## 27      37.5  -126.70167        Cyclothone signata          6
## 28      38.0      -128.5        Cyclothone signata          6
## 29      38.0      -128.5        Cyclothone signata         11
## 30      38.5      -125.5        Cyclothone signata          7

Geographic constraints

You can use latitude and longitude parameters to constrain search geographically

out <- erddap_data(datasetid = 'erdCalCOFIfshsiz', fields = c('latitude','longitude','scientific_name'), 'latitude>=34.8', 'latitude<=35', 'longitude>=-125', 'longitude<=-124.4')
head(out)
##    latitude longitude        scientific_name
## 2 34.881668   -124.48     Cyclothone atraria
## 3 34.881668   -124.48   Lipolagus ochotensis
## 4 34.881668   -124.48  Bathylagoides wesethi
## 5 34.881668   -124.48 Cyclothone acclinidens
## 6 34.881668   -124.48 Cyclothone acclinidens
## 7 34.881668   -124.48     Cyclothone signata
@dill
Copy link

dill commented May 15, 2014

Looks good! A couple of comments:

  • 'fish_count>=' = 6 is a bit icky. Any chance of having the whole thing in the quote and parse the string?
  • (this is a lazy question, but) can you specify upper and lower bound inequalities? e.g. latitudes between 35.5 and 37 and longitudes between -125 and -124, i.e. I just want a square of ocean

@sckott
Copy link
Author

sckott commented May 15, 2014

hey @dill -

  • I agree with you that fish_count>=' = 6 is weird. I'll try that change.
  • I just started exploring this data source a few days ago, but I think you can, but you'd have to do something like latitude<40 & latitude>30, etc. for longitude. That is, I don't think there's any global parameter for a bounding box or passing in a WKT string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment