Last active
September 23, 2016 05:31
-
-
Save tomfaulhaber/e4917c86ac5ef8988e3ad57ec4cd201c to your computer and use it in GitHub Desktop.
The RMarkdown file I was using during the AWS Agriculture in the Cloud event at Ohio State
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Precipitation predictions for Putnam County, OH" | |
author: "Tom Faulhaber" | |
date: "September 17, 2016" | |
output: html_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
# We're only doing this here because the docker container we're using | |
# doesn't have them installed by default | |
install.packages(c("ggmap", "colorRamps")) | |
library(ggplot2) | |
library(dplyr) | |
library(lubridate) | |
library(scales) | |
``` | |
```{r def.avgprecip} | |
cvt.to.cm.per.year <- function(kg.per.m2.per.sec) { | |
# 1kg of water = 1000 cm^3 | |
# 1kg/m^2 = 1mm of accumulation | |
# simplifying assumption that all months are the same length. | |
(365*86400/12)*0.1*kg.per.m2.per.sec | |
} | |
cvt.to.in.per.month <- function(kg.per.m2.per.sec) cvt.to.cm.per.year(kg.per.m2.per.sec)/2.54 | |
``` | |
## Looking at a year of rainfall for Putnam County, Ohio | |
In this vignette, we read data for for Putnam County, Ohio for 2017 and see predictions for precipitation. You can see the selection page here [http://opennex.planetos.com/dcp30/Liymn](http://opennex.planetos.com/dcp30/Liymn). | |
#### Load the data | |
First let's define a function for reading the data: | |
```{r def.load.data} | |
load.data <- function(unique.id) { | |
options(timeout=600) | |
print(system.time( | |
temps <- read.csv(url(sprintf("http://opennex/dataset/%s/data.csv", unique.id)), | |
colClasses=c(Date="Date")) %>% | |
mutate(Precipitation=cvt.to.in.per.month(Value)))) | |
temps | |
} | |
``` | |
The function reads data using the standard `read.csv` function in R. We perform two operations while loading the data: | |
1. Convert the `Date` column to R's internal date format. `read.csv` can do this directly. | |
2. Convert the units of precipitation from $\frac{kg}{m^{2}s}$ to $\frac{inches}{month}$, because that's more common usage. | |
Here `opennex` is the name of the link that we used in `docker-compose.yml` to connect to the OpenNEX data access container. | |
Now let's use that function to load the Putnam County data: | |
```{r load.data} | |
data <- load.data("Liymn") | |
``` | |
#### Examine the data | |
Let's take a look at the data. How many data points are there? | |
```{r nrow} | |
nrow(data) | |
``` | |
What are the fields and values like? | |
```{r summary} | |
summary(data) | |
``` | |
```{r strs} | |
str(lapply(data, unique)) | |
``` | |
#### View the data on maps | |
We'll use R's `ggmap` package to look at this precipitation on a map. | |
We need to define a couple of functions to get the maps to look just the way we want: | |
```{r} | |
library(ggmap) | |
library(colorRamps) | |
bbox <- function(df, frame=0.25) { | |
c(left=min(df$Longitude)-frame, | |
right=max(df$Longitude)+frame, | |
top=max(df$Latitude)+frame, | |
bottom=min(df$Latitude)-frame) | |
} | |
map.df <- function(df, zoom=10) { | |
df.bb <- bbox(df, frame=0.05) | |
map <- get_stamenmap(df.bb, zoom = zoom, maptype = "toner-lite") | |
ggmap(map) + | |
geom_raster(aes(x=Longitude, y=Latitude, fill = Precipitation), | |
data=df, alpha=0.8, interpolate = T) + | |
coord_cartesian() + | |
scale_fill_gradientn(colors=matlab.like2(40)) + | |
theme(axis.title=element_blank()) | |
} | |
map.df.byMonth <- function(df, ...) { | |
months <- format(df$Date, "%B %Y") | |
levs <- format(unique(df$Date), "%B %Y") | |
df$Month <- factor(months, levels=levs) | |
map.df(df, ...) + | |
facet_wrap(~ Month) + | |
ggtitle("Monthly Precipitation") | |
} | |
``` | |
And finally, lets render the map for this data: | |
```{r} | |
map.df.byMonth(data) | |
``` | |
We see significant month to month variation but not much geographic variation. We can look directly at a single month (May 2017) to see what geographic variation there is: | |
```{r} | |
map.df.byMonth(subset(data, Date==as.Date("2017-05-01"))) | |
``` | |
This shows that there _is_ some variation across the county, but not very much. | |
This is not too surprising, given Ohio's geography. To see a different situation, look at the work I did on the Tuolumne watershed in California's Sierra Nevada mountains at [http://www.infolace.com/blog/2016/08/31/tuolumne-report/](http://www.infolace.com/blog/2016/08/31/tuolumne-report/). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment