Last active
January 1, 2016 19:18
-
-
Save cvitolo/5f476832243a37923615 to your computer and use it in GitHub Desktop.
Dynamic Report - Demo for the talk on "Improving access to geospatial Big Data in the hydrology domain" - Royal Statistical Society 18.11.2015
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "RNRFA: an R package to interact with the UK National River Flow Archive" | |
author: "Claudia Vitolo" | |
date: "18 November 2015" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(cache=TRUE) | |
``` | |
*** Updated on the 01.01.2016 to work with rnrfa version 0.4.0 *** | |
The UK National River Flow Archive serves daily streamflow data, spatial rainfall averages and information regarding elevation, geology, land cover and FEH related catchment descriptors. There is currently an API under development that in future should provide access to the following services: metadata catalogue, catalogue filters based on a geographical bounding-box, catalogue filters based on metadata entries, gauged daily data for about 400 stations available in WaterML2 format, the OGC standard used to describe hydrological time series. The information returned by the first three services is in JSON format, while the last one is an XML variant. The RNRFA package aims to achieve a simpler and more efficient access to data by providing wrapper functions to send HTTP requests and interpret XML/JSON responses. | |
# Install dependencies | |
```{r, eval=TRUE, include=TRUE, echo=TRUE} | |
install.packages( c("devtools", "parallel", "ggplot2", "DT", "leaflet", "dygraphs") ) | |
``` | |
# Install the package | |
The stable version (preferred option) of rnrfa is available from CRAN using `install.packages("rnrfa")`, while the development version is available on github via devtools: | |
```{r, eval=TRUE, include=TRUE, echo=TRUE} | |
library(devtools) | |
install_github("cvitolo/r_rnrfa", subdir = "rnrfa") | |
``` | |
# List monitoring stations | |
The R function that deals with the NRFA catalogue to retrieve the full list of monitoring stations is called NRFA_Catalogue(). The function, used with no inputs, requests the full list of gauging stations with associated metadata. The output is a dataframe containing one record for each station and as many columns as the number of metadata entries available. | |
```{r} | |
library(rnrfa) | |
# Retrieve information for all the stations operated by the Natural Resources Wales | |
someStations <- catalogue(metadataColumn="operator", entryValue="Natural Resources Wales") | |
``` | |
# Convert coordinates | |
The only geospatial information contained in the list of station in the catalogue is the OS grid reference (column "gridRef"). The RNRFA package allows convenient conversion to more standard coordinate systems. The function "OSGparse()" converts the string to easting and northing in the British/Irish National Grid coordinate system (EPSG code: 27700/29902) by default. To get coordinates in latitude and longitude (WSGS84 coordinate system, EPSG code: 4326) use the parameter CoordSystem = "WGS84". | |
```{r} | |
# Convert OS Grid reference to BNG | |
OSGparse("SN853872") | |
# Convert BNG to WSGS84 | |
OSGparse("SN853872", CoordSystem = "WGS84") | |
``` | |
# Get time series data | |
The first column of the table "someStations" contains the id number. This can be used to retrieve the streamflow time series converting the waterml2 file to a time series object. Retrieving 129 time series is a time consuming task, here I use a library for parallel programming to speed up the process. | |
```{r} | |
library(parallel) | |
detectCores() | |
system.time( s <- mclapply(someStations$id, GDF) ) # from the parallel package | |
``` | |
Use the result for a simple analysis | |
```{r, message=F, warning=F} | |
someStations$meanGDF <- unlist( lapply(s, mean) ) | |
``` | |
```{r} | |
# Linear model | |
library(ggplot2) | |
ggplot(someStations, aes(x = as.numeric(catchmentArea), y = meanGDF)) + | |
geom_point() + | |
stat_smooth(method = "lm", col = "red") + | |
xlab(expression(paste("Catchment area [Km^2]",sep=""))) + | |
ylab(expression(paste("Mean flow [m^3/s]",sep=""))) | |
``` | |
# INTEROPERABILITY | |
Upgrade your data.frame to a data.table: | |
```{r, cache=FALSE} | |
library(DT) | |
datatable(someStations[,c(1:4,7,9,10,12:14,17)]) | |
``` | |
Create interactive maps using leaflet: | |
```{r, cache=FALSE} | |
library(leaflet) | |
leaflet(data = someStations) %>% addTiles() %>% | |
addMarkers(~lon, ~lat, popup = ~as.character(paste(id,name))) | |
``` | |
Generate interactive plots using dygraphs: | |
```{r, cache=FALSE} | |
library(dygraphs) | |
dygraph(s[[1]]) %>% dyRangeSelector() | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment