tomfaulhaber/putnam-co.Rmd

## putnam-co.Rmd
---
title: "Precipitation predictions for Putnam County, OH"
author: "Tom Faulhaber"
date: "September 17, 2016"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)

# We're only doing this here because the docker container we're using
# doesn't have them installed by default
install.packages(c("ggmap", "colorRamps"))

library(ggplot2)
library(dplyr)
library(lubridate)
library(scales)
```

```{r def.avgprecip}
cvt.to.cm.per.year <- function(kg.per.m2.per.sec) {
  # 1kg of water = 1000 cm^3
  # 1kg/m^2 = 1mm of accumulation
  # simplifying assumption that all months are the same length.
  (365*86400/12)*0.1*kg.per.m2.per.sec
}

cvt.to.in.per.month <- function(kg.per.m2.per.sec) cvt.to.cm.per.year(kg.per.m2.per.sec)/2.54
```

## Looking at a year of rainfall for Putnam County, Ohio

In this vignette, we read data for for Putnam County, Ohio for 2017 and see predictions for precipitation. You can see the selection page here [http://opennex.planetos.com/dcp30/Liymn](http://opennex.planetos.com/dcp30/Liymn).

#### Load the data

First let's define a function for reading the data:
```{r def.load.data}
load.data <- function(unique.id) {
  options(timeout=600)
  print(system.time(
    temps <- read.csv(url(sprintf("http://opennex/dataset/%s/data.csv", unique.id)),
                      colClasses=c(Date="Date")) %>%
      mutate(Precipitation=cvt.to.in.per.month(Value))))
  temps
}
```

The function reads data using the standard `read.csv` function in R. We perform two operations while loading the data:

1. Convert the `Date` column to R's internal date format. `read.csv` can do this directly.
2. Convert the units of precipitation from $\frac{kg}{m^{2}s}$ to $\frac{inches}{month}$, because that's more common usage.

Here `opennex` is the name of the link that we used in `docker-compose.yml` to connect to the OpenNEX data access container.

Now let's use that function to load the Putnam County data:

```{r load.data}
data <- load.data("Liymn")
```

#### Examine the data
Let's take a look at the data. How many data points are there?
```{r nrow}
nrow(data)
```

What are the fields and values like?
```{r summary}
summary(data)
```

```{r strs}
str(lapply(data, unique))
```

#### View the data on maps

We'll use R's `ggmap` package to look at this precipitation on a map.

We need to define a couple of functions to get the maps to look just the way we want:

```{r}
library(ggmap)
library(colorRamps)

bbox <- function(df, frame=0.25) {
  c(left=min(df$Longitude)-frame,
    right=max(df$Longitude)+frame,
    top=max(df$Latitude)+frame,
    bottom=min(df$Latitude)-frame)
}

map.df <- function(df, zoom=10) {
  df.bb <- bbox(df, frame=0.05)
  map <- get_stamenmap(df.bb, zoom = zoom, maptype = "toner-lite")
  ggmap(map) +
    geom_raster(aes(x=Longitude, y=Latitude, fill = Precipitation),
              data=df, alpha=0.8, interpolate = T) +
    coord_cartesian() +
    scale_fill_gradientn(colors=matlab.like2(40)) +
    theme(axis.title=element_blank())
}

map.df.byMonth <- function(df, ...) {
  months <- format(df$Date, "%B %Y")
  levs <- format(unique(df$Date), "%B %Y")
  df$Month <- factor(months, levels=levs)
  map.df(df, ...) +
    facet_wrap(~ Month) +
    ggtitle("Monthly Precipitation")

}
```

And finally, lets render the map for this data:

```{r}
map.df.byMonth(data)
```

We see significant month to month variation but not much geographic variation. We can look directly at a single month (May 2017) to see what geographic variation there is:

```{r}
map.df.byMonth(subset(data, Date==as.Date("2017-05-01")))
```

This shows that there _is_ some variation across the county, but not very much.

This is not too surprising, given Ohio's geography. To see a different situation, look at the work I did on the Tuolumne watershed in California's Sierra Nevada mountains at [http://www.infolace.com/blog/2016/08/31/tuolumne-report/](http://www.infolace.com/blog/2016/08/31/tuolumne-report/).
	---
	title: "Precipitation predictions for Putnam County, OH"
	author: "Tom Faulhaber"
	date: "September 17, 2016"
	output: html_document
	---

	```{r setup, include=FALSE}
	knitr::opts_chunk$set(echo = TRUE)

	# We're only doing this here because the docker container we're using
	# doesn't have them installed by default
	install.packages(c("ggmap", "colorRamps"))

	library(ggplot2)
	library(dplyr)
	library(lubridate)
	library(scales)
	```

	```{r def.avgprecip}
	cvt.to.cm.per.year <- function(kg.per.m2.per.sec) {
	# 1kg of water = 1000 cm^3
	# 1kg/m^2 = 1mm of accumulation
	# simplifying assumption that all months are the same length.
	(36586400/12)0.1*kg.per.m2.per.sec
	}

	cvt.to.in.per.month <- function(kg.per.m2.per.sec) cvt.to.cm.per.year(kg.per.m2.per.sec)/2.54
	```

	## Looking at a year of rainfall for Putnam County, Ohio

	In this vignette, we read data for for Putnam County, Ohio for 2017 and see predictions for precipitation. You can see the selection page here [http://opennex.planetos.com/dcp30/Liymn](http://opennex.planetos.com/dcp30/Liymn).

	#### Load the data

	First let's define a function for reading the data:
	```{r def.load.data}
	load.data <- function(unique.id) {
	options(timeout=600)
	print(system.time(
	temps <- read.csv(url(sprintf("http://opennex/dataset/%s/data.csv", unique.id)),
	colClasses=c(Date="Date")) %>%
	mutate(Precipitation=cvt.to.in.per.month(Value))))
	temps
	}
	```

	The function reads data using the standard `read.csv` function in R. We perform two operations while loading the data:

	1. Convert the `Date` column to R's internal date format. `read.csv` can do this directly.
	2. Convert the units of precipitation from $\frac{kg}{m^{2}s}$ to $\frac{inches}{month}$, because that's more common usage.

	Here `opennex` is the name of the link that we used in `docker-compose.yml` to connect to the OpenNEX data access container.

	Now let's use that function to load the Putnam County data:

	```{r load.data}
	data <- load.data("Liymn")
	```

	#### Examine the data
	Let's take a look at the data. How many data points are there?
	```{r nrow}
	nrow(data)
	```

	What are the fields and values like?
	```{r summary}
	summary(data)
	```

	```{r strs}
	str(lapply(data, unique))
	```

	#### View the data on maps

	We'll use R's `ggmap` package to look at this precipitation on a map.

	We need to define a couple of functions to get the maps to look just the way we want:

	```{r}
	library(ggmap)
	library(colorRamps)

	bbox <- function(df, frame=0.25) {
	c(left=min(df$Longitude)-frame,
	right=max(df$Longitude)+frame,
	top=max(df$Latitude)+frame,
	bottom=min(df$Latitude)-frame)
	}

	map.df <- function(df, zoom=10) {
	df.bb <- bbox(df, frame=0.05)
	map <- get_stamenmap(df.bb, zoom = zoom, maptype = "toner-lite")
	ggmap(map) +
	geom_raster(aes(x=Longitude, y=Latitude, fill = Precipitation),
	data=df, alpha=0.8, interpolate = T) +
	coord_cartesian() +
	scale_fill_gradientn(colors=matlab.like2(40)) +
	theme(axis.title=element_blank())
	}

	map.df.byMonth <- function(df, ...) {
	months <- format(df$Date, "%B %Y")
	levs <- format(unique(df$Date), "%B %Y")
	df$Month <- factor(months, levels=levs)
	map.df(df, ...) +
	facet_wrap(~ Month) +
	ggtitle("Monthly Precipitation")

	}
	```

	And finally, lets render the map for this data:

	```{r}
	map.df.byMonth(data)
	```

	We see significant month to month variation but not much geographic variation. We can look directly at a single month (May 2017) to see what geographic variation there is:

	```{r}
	map.df.byMonth(subset(data, Date==as.Date("2017-05-01")))
	```

	This shows that there _is_ some variation across the county, but not very much.

	This is not too surprising, given Ohio's geography. To see a different situation, look at the work I did on the Tuolumne watershed in California's Sierra Nevada mountains at [http://www.infolace.com/blog/2016/08/31/tuolumne-report/](http://www.infolace.com/blog/2016/08/31/tuolumne-report/).