Skip to content

Instantly share code, notes, and snippets.

@mbjoseph
Created September 14, 2018 19:25
smapr Ropensci blog post
---
title: 'Mapping the 2018 East Africa floods from space with smapr'
author: "Max Joseph"
date: "2018-09-14"
output:
html_document:
keep_md: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE)
```
Hundreds of thousands of people in east Africa have been displaced and hundreds have died as a result of torrential rains which ended a drought but saturated soils and engorged rivers, resulting in extreme flooding in 2018.
This post will explore these events using the R package smapr, which provides access to global satellite-derived soil moisture data collected by the NASA Soil Moisture Active-Passive (SMAP) mission and abstracts away some of the complexity associated with finding, acquiring, and working with the HDF5 files that contain the observations (shout out to [Laura DeCicco](https://github.com/ldecicco-USGS) and [Marco Sciaini](https://github.com/marcosci) for reviewing smapr, and [Noam Ross](https://github.com/noamross) for editing in the Ropensci onboarding process).
We will focus on Somalia and Kenya, two of the hardest hit countries.
We'll also lean on another Ropensci package, rnoaa, to link precipitation to soil moisture.
First, let's get spatial boundaries for the study area:
```{r map-aoi, message = FALSE}
library(raster)
library(tidyverse)
library(smapr)
library(rworldmap)
library(rnoaa)
library(plotly)
library(rasterVis)
library(animation)
library(patchwork)
library(sf)
worldmap <- getMap()
study_area <- subset(worldmap, NAME_SORT %in% c('Somalia', 'Kenya'))
study_area_sf <- as(study_area, 'sf')
plot(worldmap)
plot(study_area, add = TRUE, col = 'dodgerblue')
```
## Finding soil moisture data
Next, we can use the `find_smap` function to find global soil moisture data for any day.
The SMAP satellite was launched in 2015 with two sensors: an active microwave sensor and a passive radiometer.
The active sensor has since failed, so we will use the radiometer data, specifically version five of the 'SPL3SMP' data product (for a full list of data products see https://smap.jpl.nasa.gov/data/).
```{r find-smap}
find_smap('SPL3SMP', dates = '2018-03-01', version = 5)
```
This returns a data frame with one row per file - we can see here that there is one file available for that date.
If we wanted to search over a range of dates, we could provide a date sequence:
```{r find-smap-date-seq}
date_seq <- seq(as.Date('2018-03-01'), as.Date('2018-03-06'), by = 1)
files <- find_smap('SPL3SMP', dates = date_seq, version = 5)
files
```
## Downloading and extracting soil moisture data
The `download_smap` function takes the results from `find_smap` and downloads files locally:
```{r download-smap}
downloads <- download_smap(files, overwrite = FALSE)
downloads
```
Next, we can list the contents of the files we downloaded.
These are HDF5 files, so each file contains multiple datasets.
For this data product, a soil moisture data set is named `Soil_Moisture_Retrieval_Data_AM/soil_moisture`, and we can use the `extract_smap` function to generate rasters from this dataset (see `list_smap` for a list of all datasets contained in any file):
```{r list-smap}
sm_raster <- extract_smap(downloads, name = "Soil_Moisture_Retrieval_Data_AM/soil_moisture")
sm_raster
```
Let's plot the RasterBrick to see what the soil moisture data look like:
```{r plot-smap}
levelplot(sm_raster)
```
The striping occurs because of the orbit of the SMAP satellite.
If we were interested in some weekly measure of soil moisture, we could simply average across days to get a more continuous picture of soil moisture:
```{r get-weekly-mean}
weekly_sm <- mean(sm_raster, na.rm = TRUE)
plot(weekly_sm)
```
We'll want to do that set of operations a bunch of times for this case study, so we'll write a little function that takes a date range and returns a raster averaging over dates:
```{r summarize-sm}
average_smap <- function(date_range) {
mean_sm <- find_smap('SPL3SMP', dates = date_range, version = 5) %>%
download_smap(overwrite = FALSE) %>%
extract_smap(name = "Soil_Moisture_Retrieval_Data_AM/soil_moisture") %>%
mean(na.rm = TRUE)
mean_sm
}
```
Notice that spatial coverage is still not 100%.
Notably, places where the satellite missed and frozen regions in the arctic have `NA` values.
## Getting global precipitation data with rnoaa
We can better interpret soil moisture if we know about precipitation, and the NOAA Climate Prediction Center (CPC) has global precipitation data that are a cinch to get with the `cpc_prcp` function.
```{r get-one-day-pr}
cpc_prcp('2018-03-01')
```
Of course, we don't want to get data just for one date.
Instead, just like we used a date range to get soil moisture data, we can get precipitation data for each date within a range using the `map` function from the purrr package.
We want to map the `get_prcp` function to each date in our vector `date_seq`:
```{r map-pr-vec}
date_seq %>%
map(cpc_prcp) %>%
str
```
Amazing!
We now have a list where each element contains a data frame with the amount of precipitation over a consistent spatial grid that covers the whole globe, over a one week interval.
To merge these data frames together into one, we can use `bind_rows`, and we'll also filter out `NA` values, which are represented as negative numbers:
```{r bind-row-pr}
date_seq %>%
map(cpc_prcp) %>%
bind_rows %>%
filter(precip >= 0)
```
Now we have one data frame, and we can compute a mean over all dates for each grid cell using a `group_by`, `summarize` operation.
```{r group_by-summ-pr}
date_seq %>%
map(cpc_prcp) %>%
bind_rows %>%
filter(precip >= 0) %>%
group_by(lon, lat) %>%
summarize(precip = mean(precip, na.rm = TRUE))
```
One last little detail: the longitude values range from 0 to 360, but it's going to be easier later if they range from -180 to 180, so we'll use `mutate` to get longitude defined over (-180, 180).
```{r fix-longitude}
weekly_pr <- date_seq %>%
map(cpc_prcp) %>%
bind_rows %>%
filter(precip >= 0) %>%
group_by(lon, lat) %>%
summarize(precip = mean(precip, na.rm = TRUE)) %>%
ungroup %>%
mutate(lon = ifelse(lon > 180, lon - 360, lon))
```
We can generate a raster object by way of a SpatialGridDataFrame, which will be useful later to ensure that the soil moisture and precipitation data are on the same spatial grid:
```{r make-pr-raster, warning=FALSE}
# little helper function
make_precip_raster <- function(prcp_df) {
coordinates(prcp_df) <- ~lon+lat
gridded(prcp_df) <- TRUE
prcp_df <- as(prcp_df, "SpatialGridDataFrame") # to full grid
proj4string(prcp_df) <- '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
raster(prcp_df)
}
# create a precip raster
pr_raster <- make_precip_raster(weekly_pr)
# plot it
plot(pr_raster)
```
Just like we did for the soil moisture data, we'll bundle up all of these steps into a helper function:
```{r pr-helper}
average_precip <- function(date_range) {
date_seq %>%
map(cpc_prcp) %>%
bind_rows %>%
filter(precip >= 0) %>%
group_by(lon, lat) %>%
summarize(precip = mean(precip, na.rm = TRUE)) %>%
ungroup %>%
mutate(lon = ifelse(lon > 180, lon - 360, lon)) %>%
make_precip_raster
}
```
## Fetching global soil moisture and precipitation from 2015 to present
Now we have two functions `average_smap` and `average_precip` that we can use to get global soil moisture and precipitation data for any date range.
Next, we'll use these functions to get data at weekly intervals from the beginning of the SMAP data archive in 2015 through the end of August 2018.
```{r get-all-the-data}
start_dates <- seq(as.Date("2015-04-01"), as.Date("2018-09-01"), by = 7)
end_dates <- start_dates + 6
weekly_smap <- vector(mode = 'list', length = length(start_dates))
weekly_precip <- vector(mode = 'list', length = length(start_dates))
for (i in seq_along(start_dates)) {
date_seq <- seq(start_dates[i], end_dates[i], by = 1)
geotiff_name <- paste0('sm-', i, '.tif')
if (!file.exists(geotiff_name)) {
average_smap(date_seq) %>%
writeRaster(geotiff_name)
}
weekly_smap[[i]] <- raster(geotiff_name)
weekly_precip[[i]] <- average_precip(date_seq)
}
```
Now we have two lists, `weekly_smap` and `weekly_precip`, where each element is a raster.
We'd like to convert these to RasterStack objects, and get them on the same spatial grid, in the same projection as our study area polygon.
```{r grid-fernaggling}
weekly_smap <- stack(weekly_smap)
names(weekly_smap) <- start_dates
weekly_smap <- projectRaster(weekly_smap, crs = projection(study_area))
weekly_precip <- stack(weekly_precip)
names(weekly_precip) <- start_dates
weekly_precip <- resample(weekly_precip, weekly_smap)
```
Now that we have RasterStack objects, we will create tidy data frames that we can use in ggplot2.
Because the code to do this is the same for both objects, I'll write a little helper function.
```{r make-sm-pr-dfs}
make_study_area_df <- function(raster) {
raster %>%
mask(study_area) %>%
trim %>%
as('SpatialPixelsDataFrame') %>%
as.data.frame() %>%
gather(date, value, -x, -y) %>%
as_tibble %>%
mutate(date = start_dates[as.numeric(as.factor(date))])
}
soil_moisture <- make_study_area_df(weekly_smap) %>%
rename(sm = value)
precip <- make_study_area_df(weekly_precip) %>%
rename(pr = value)
```
Let's check these out!
```{r print-soil_moisture}
soil_moisture
```
So we have a data frame where each row is a pixel with a date corresponding to the first day of the week, and then the mean soil moisture for that week in the `sm` column.
What do these data look like?
```{r sm-timeseries, fig.width = 8, fig.height = 4.5}
my_theme <- theme_minimal() +
theme(panel.grid.minor = element_blank())
soil_moisture %>%
ggplot(aes(date, sm)) +
geom_point(alpha = .02) +
my_theme +
xlab('') +
ylab('Mean soil moisture (m^3 water per m^3 soil)')
```
What about the precipitation data?
```{r pr-timeseries, fig.width = 8, fig.height = 4.5}
precip %>%
ggplot(aes(date, pr)) +
geom_point(alpha = .02) +
my_theme +
xlab('') +
ylab('Mean precipitation (mm)')
```
What does the relationship between precipitation and soil moisture look like?
```{r pr-vs-sm}
soil_moisture %>%
left_join(precip) %>%
ggplot(aes(pr, sm)) +
geom_point(alpha = .02) +
my_theme +
ylab('Mean soil moisture (m^3 water per m^3 soil)') +
xlab('Mean precipitation (mm)') +
scale_y_log10() +
scale_x_log10()
```
Notice the band of zero-precipitation points that sit on the y-axis.
Among nonzero precipitation values, it seems like there is a nonlinear relationship between precipitation and soil moisture.
So, now let's take a look at the spring and summer of 2018, when the flooding was worst in Somalia and Kenya.
```{r soil-moisture-raster-ts, fig.width = 8, fig.height = 8}
soil_moisture %>%
filter(date > as.Date('2018-01-01')) %>%
ggplot(aes(x=x, y=y, fill=sm)) +
geom_raster() +
scale_fill_viridis_c(direction = -1,
'Soil moisture') +
facet_wrap(~date, nrow = 7) +
theme_minimal() +
theme(axis.text = element_blank()) +
geom_sf(data = study_area_sf, inherit.aes = FALSE, fill = NA, size = .2) +
xlab('') +
ylab('')
```
```{r precip-raster-ts, fig.width = 8, fig.height = 8}
precip %>%
filter(date > as.Date('2018-01-01')) %>%
ggplot(aes(x=x, y=y, fill=pr)) +
geom_raster() +
scale_fill_gradient(low = 'grey95',
high = 'dodgerblue',
'Precipitation') +
facet_wrap(~date, nrow = 7) +
theme_minimal() +
theme(axis.text = element_blank()) +
geom_sf(data = study_area_sf, inherit.aes = FALSE, fill = NA, size = .2) +
xlab('') +
ylab('')
```
To get a sense for how these values compare to the overall distribution of values in space, we can plot the same data, but rather than using the soil moisture and precipitation values to color the map, we can color the map using the empirical cumulative distribution function (CDF).
This will give us values between 0 and 1, which tell us the fraction of values below a particular value.
So for instance if the empirical CDF at a particular value gives us 0.9, then 90% of the observations were less than that value.
One quick thing to notice is that the distribution of values is different for every pixel:
```{r soil-cdfs}
sm_ecdf <- soil_moisture %>%
group_by(x, y) %>%
mutate(ecdf = ecdf(sm)(sm)) %>%
ungroup
sm_ecdf %>%
ggplot(aes(sm, ecdf, group = interaction(x, y))) +
geom_line(alpha = .1) +
my_theme +
xlab('Soil moisture (m^3 water per m^3 soil)') +
ylab('Empirical cumulative distribution function')
```
```{r precip-cdfs}
pr_ecdf <- precip %>%
group_by(x, y) %>%
mutate(ecdf = ecdf(pr)(pr)) %>%
ungroup
pr_ecdf %>%
ggplot(aes(pr, ecdf, group = interaction(x, y))) +
geom_line(alpha = .1) +
my_theme +
xlab('Precipitation (mm)') +
ylab('Empirical cumulative distribution function')
```
A few things to notice about these empirical cumulative distribution functions:
- The distributions of soil moisture and precipitation are quite different (e.g., the distribution of precipitation has a lot of zero values, and a long tail).
- The empirical CDFs provide a mapping from the range of data to the interval from zero to one, so that we can visualize how any particular amount of soil moisture or precipitation compares to the full distribution of values for each pixel. Essentially, when the empirical CDF gives a value close to 0, that's a low value (a small fraction of values are less than or equal to it). If the empirical CDF gives a value close to one, that's a high value (a large fraction of values are less than or equal to it).
Let's see what the empirical CDF values look like on a map:
```{r soil-moisture-raster-cdf, fig.width = 8, fig.height = 8}
sm_ecdf %>%
filter(date > as.Date('2018-01-01')) %>%
ggplot(aes(x=x, y=y, fill=ecdf)) +
geom_raster() +
scale_fill_viridis_c('Soil moisture\nempirical CDF',
option = 'B', direction = -1) +
facet_wrap(~date, nrow = 7) +
theme_minimal() +
theme(axis.text = element_blank()) +
geom_sf(data = study_area_sf, inherit.aes = FALSE, fill = NA, size = .2) +
xlab('') +
ylab('')
```
```{r precip-raster-cdf, fig.width = 8, fig.height = 8}
precip %>%
group_by(x, y) %>%
mutate(ecdf = ecdf(pr)(pr)) %>%
filter(date > as.Date('2018-01-01')) %>%
ggplot(aes(x=x, y=y, fill=ecdf)) +
geom_raster() +
scale_fill_viridis_c('Precipitation\nempirical CDF',
option = 'B',
direction = -1) +
facet_wrap(~date, nrow = 7) +
theme_minimal() +
theme(axis.text = element_blank()) +
geom_sf(data = study_area_sf, inherit.aes = FALSE, fill = NA, size = .2) +
xlab('') +
ylab('')
```
Across much of Somalia and Kenya, soils were exceptionally dry prior to March 2018 (light colors), and with the prolonged levels of high rain beginning in March, soils were much more saturated than usual through mid to late May (dark colors).
The rain throughout April fell on wet soils, which are less able to absorb water, leading to overland flow and flooding.
We can put this all together in a visualization that shows the evolution of soil moisture and precipitation:
```{r make-gif, echo = FALSE, results='hide', message=FALSE}
sm_summary <- soil_moisture %>%
group_by(date) %>%
na.omit %>%
summarize(med = median(sm),
lo = quantile(sm, .1),
hi = quantile(sm, .9),
q25 = quantile(sm, .25),
q75 = quantile(sm, .75)) %>%
ungroup
pr_summary <- precip %>%
group_by(date) %>%
na.omit %>%
summarize(med = median(pr),
lo = quantile(pr, .1),
hi = quantile(pr, .9),
q25 = quantile(pr, .25),
q75 = quantile(pr, .75)) %>%
ungroup
saveGIF({
for (i in seq_along(start_dates)) {
sm_plot <- soil_moisture %>%
filter(date == start_dates[i]) %>%
ggplot(aes(x=x, y=y, fill=sm)) +
geom_raster() +
scale_fill_viridis_c(direction = -1,
'Soil moisture',
limits = c(min(soil_moisture$sm, na.rm = TRUE),
max(soil_moisture$sm, na.rm = TRUE))) +
theme_minimal() +
theme(axis.text = element_blank(),
legend.position = 'left') +
geom_sf(data = study_area_sf, inherit.aes = FALSE, fill = NA, size = .2) +
xlab('') +
ylab('') +
ggtitle(start_dates[i])
pr_plot <- precip %>%
filter(date == start_dates[i]) %>%
ggplot(aes(x=x, y=y, fill=pr)) +
geom_raster() +
scale_fill_gradient(low = 'grey95',
high = 'dodgerblue',
'Precipitation',
limits = c(min(precip$pr), max(precip$pr))) +
theme_minimal() +
theme(axis.text = element_blank()) +
geom_sf(data = study_area_sf, inherit.aes = FALSE, fill = NA, size = .2) +
xlab('') +
ylab('')
sm_ts <- sm_summary %>%
mutate(past = date < start_dates[i]) %>%
ggplot(aes(date, med, group = past, alpha = as.numeric(past))) +
geom_ribbon(aes(ymin = lo, ymax = hi), fill = 'grey90') +
geom_ribbon(aes(ymin = q25, ymax = q75), fill = 'grey80') +
geom_line() +
my_theme +
xlab('') +
ylab('Soil moisture') +
scale_alpha(range = c(.1, 1)) +
theme(legend.position = 'none')
pr_ts <- pr_summary %>%
mutate(past = date < start_dates[i]) %>%
ggplot(aes(date, med, group = past, alpha = as.numeric(past))) +
geom_ribbon(aes(ymin = lo, ymax = hi), fill = 'grey90') +
geom_ribbon(aes(ymin = q25, ymax = q75), fill = 'grey80') +
geom_line() +
my_theme +
xlab('') +
ylab('Precipitation') +
scale_alpha(range = c(.1, 1)) +
theme(legend.position = 'none')
print((sm_plot + pr_plot) / sm_ts / pr_ts + plot_layout(ncol = 1, heights = c(2, 1, 1)))
}
},
movie.name = 'joint-timeseries.gif',
ani.width = 800,
ani.height = 700,
interval = .3)
```
![](joint-timeseries.gif)
Here the soil moisture and precipitation values are mapped over time, with line plots below to show the median value (black line), interquartile range (dark grey ribbon), 10% and 90% quantiles (light grey ribbon).
The exceptional nature of the flooding in 2018 shows up as a high and broad peak in soil moisture and precipitation, and the drought that preceded the flooding in 2017 is visible as a period of low rains and dry soil.
### Want to contribute?
If you're psyched on global soil moisture and want to contribute to smapr, there is room to develop support for more level 2 products (currently smapr supports the more processed level 3 science grade and level 4 enhanced value products primarily).
For a starting point, check out: https://github.com/ropensci/smapr/issues/35
### Acknowledgements
The smapr package was developed in Earth Lab with help from [Matt Oakley](https://github.com/matt-oak) who worked with the Earth Lab Analytics Hub.
The idea for the package emerged from a NOAA Data Partnership event in 2016, where we began working with the NASA SMAP data and realized that we had all the makings for an R package.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment