RNRFA: an R package to interact with the UK National River Flow Archive
18 November 2015
*** Updated on the 01.01.2016 to work with rnrfa version 0.4.0 ***
The UK National River Flow Archive serves daily streamflow data, spatial rainfall averages and information regarding elevation, geology, land cover and FEH related catchment descriptors. There is currently an API under development that in future should provide access to the following services: metadata catalogue, catalogue filters based on a geographical bounding-box, catalogue filters based on metadata entries, gauged daily data for about 400 stations available in WaterML2 format, the OGC standard used to describe hydrological time series. The information returned by the first three services is in JSON format, while the last one is an XML variant. The RNRFA package aims to achieve a simpler and more efficient access to data by providing wrapper functions to send HTTP requests and interpret XML/JSON responses.
install.packages( c("devtools", "parallel", "ggplot2", "DT", "leaflet", "dygraphs") )
Install the package
The stable version (preferred option) of rnrfa is available from CRAN using
install.packages("rnrfa"), while the development version is available on github via devtools:
library(devtools) install_github("cvitolo/r_rnrfa", subdir = "rnrfa")
List monitoring stations
The R function that deals with the NRFA catalogue to retrieve the full list of monitoring stations is called NRFA_Catalogue(). The function, used with no inputs, requests the full list of gauging stations with associated metadata. The output is a dataframe containing one record for each station and as many columns as the number of metadata entries available.
library(rnrfa) # Retrieve information for all the stations operated by the Natural Resources Wales someStations <- catalogue(metadataColumn="operator", entryValue="Natural Resources Wales")
The only geospatial information contained in the list of station in the catalogue is the OS grid reference (column "gridRef"). The RNRFA package allows convenient conversion to more standard coordinate systems. The function "OSGparse()" converts the string to easting and northing in the British/Irish National Grid coordinate system (EPSG code: 27700/29902) by default. To get coordinates in latitude and longitude (WSGS84 coordinate system, EPSG code: 4326) use the parameter CoordSystem = "WGS84".
# Convert OS Grid reference to BNG OSGparse("SN853872") # Convert BNG to WSGS84 OSGparse("SN853872", CoordSystem = "WGS84")
Get time series data
The first column of the table "someStations" contains the id number. This can be used to retrieve the streamflow time series converting the waterml2 file to a time series object. Retrieving 129 time series is a time consuming task, here I use a library for parallel programming to speed up the process.
library(parallel) detectCores() system.time( s <- mclapply(someStations$id, GDF) ) # from the parallel package
Use the result for a simple analysis
someStations$meanGDF <- unlist( lapply(s, mean) )
# Linear model library(ggplot2) ggplot(someStations, aes(x = as.numeric(catchmentArea), y = meanGDF)) + geom_point() + stat_smooth(method = "lm", col = "red") + xlab(expression(paste("Catchment area [Km^2]",sep=""))) + ylab(expression(paste("Mean flow [m^3/s]",sep="")))
Upgrade your data.frame to a data.table:
Create interactive maps using leaflet:
library(leaflet) leaflet(data = someStations) %>% addTiles() %>% addMarkers(~lon, ~lat, popup = ~as.character(paste(id,name)))
Generate interactive plots using dygraphs:
library(dygraphs) dygraph(s[]) %>% dyRangeSelector()