Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
The Costliest and Most Harmful Extreme Weather Events of the Past 65 Years. [R] [statistics]
title author date output
The Costliest and Most Harmful Extreme Weather Events of the Past 65 Years.
Adam J Heller (aj@drfloob.com; [drfloob.com](http://drfloob.com))
May 18th, 2016
html_document
toc toc_float theme
true
true
readable

Synopsis

In support of the preparations for severe weather events, this report explores the NOAA Storm Database to determine which types of severe weather events cause the most harm to people, and which have the greatest economic consequences. The database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. To work with the data, we must normalize damage costs, translating multipliers like "H" into values of 100, for example. The data is grouped by event type, and summaries are performed over groups to determine which events are the most harmful and costly. We can then answer the questions:

  1. Across the United States, which types of events are most harmful with respect to population health? Tornadoes, by far.

  2. Across the United States, which types of events have the greatest economic consequences? Floods, hurricanes/typhoons, tornadoes, storm surges, and hail.

Setup

We first initialize the environment to load helpful R packages and instruct knitr to cache results to speed up repeated processing (for development).

library(dplyr)
library(data.table)
library(knitr)
library(ggplot2)
library(tidyr)

opts_chunk$set(cache = TRUE)

Data Processing

The data must first be downloaded, decompressed, and parsed from CSV, if not already done. Along the way, intermediate variables are deleted to decrease the RAM needs of this script. On a chromebook with 1GB of RAM, a 500MB in-memory dataset is pretty tough to work with otherwise.

sdf <- "StormData.bz2"
if (!file.exists(sdf)) {
    download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = sdf,
                  method = "libcurl")
}
sd <- read.csv(bzfile(sdf), stringsAsFactors = FALSE)
sdt <- data.table(sd)
rm("sd")
invisible(gc())

Next, the cost multipliers are normalized to upper-case letters, and events are logically grouped by the reported severe weather event type.

sdt <- sdt %>% mutate(PROPDMGEXP = toupper(PROPDMGEXP), 
                      CROPDMGEXP = toupper(CROPDMGEXP))

To compare damage costs directly, the symbolic cost multipliers like "H" are translated into numeric values like 100 and evaluated to create true numeric costs for each weather event.

convMult <- list("H"=10^2, "K"=10^3, "M"=10^6, "B"=10^9)
conv <- function(exp) {
    ifelse(is.null(convMult[[exp]]), 1, convMult[[exp]])
}
sdt <- sdt %>% mutate(
    PROPMULT = sapply(PROPDMGEXP, conv, simplify=TRUE),
    CROPMULT = sapply(CROPDMGEXP, conv, simplify=TRUE),
    PROPDMG_N = PROPDMG * PROPMULT,
    CROPDMG_N = CROPDMG * CROPMULT
)

Results

The Most Harmful Weather

The database contains two measures of population health consequences: fatalities and injuries. It's difficult to quantify which of these entail the higher health cost, since the financial cost to treat the injured is infinitely greater than that of the deceased, but the value of a life is impossible to measure. In lack of a better summary statistic, we approximate the health cost of a severe weather event by summing total fatalities and injuries together.

The data are grouped by event type, summary statistics are calculated on each group, and they're ordered in descending population health cost. Each event's fatality and injury numbers are gathered into independent observations to support the stacked barchart plot below.

hlth <- sdt %>% 
    group_by(EVTYPE) %>%
    summarise(totalHealthEffect = sum(FATALITIES + INJURIES), 
              FATALITIES=sum(FATALITIES), 
              INJURIES=sum(INJURIES)) %>%
    gather(TYPE, VALUE, FATALITIES, INJURIES) %>%
    arrange(desc(totalHealthEffect))
ggplot(data=head(hlth, 10), 
       aes(x=factor(EVTYPE, levels=EVTYPE), 
           y=VALUE, 
           fill=TYPE)) + 
    geom_bar(stat="identity") +
    labs(x="Extreme Weather Event", 
         y="Total Combined Fatalities & Injuries",
         title="Stacked Barchart of Population Health Cost by Event Type")

It's clear that tornadoes cause the most harm to people, by far -- r sprintf("%0.1f", hlth[1,]$totalHealthEffect / hlth[3,]$totalHealthEffect) times more than next most harmful weather event: Excessive Heat. Let's go one step further and figure out how eliminating tornado-induced population harm would stack up against all other weater-induced harm.

cs <- select(hlth, EVTYPE, totalHealthEffect) %>% distinct() %>% slice(-1)
print(totalNonTornadoHarm <- sum(cs$totalHealthEffect))
print(tornadoHarm <- hlth[1,]$totalHealthEffect)

Eliminating tornado-related harm would eliminate nearly 2/3 of the total harm caused to the U.S. population by all extreme weather events. Preparing for tornado catastrophes alone could make a significant improvement in the health of the population.

The Costliest Weather

The database separates weather-induced costs into property damage and crop damage. To establish a total cost per event, we'll simply add the two costs together. However, before acting on these results, it may be worth investigating whether crop damage has any cascading economic consequences: loss to farmers and farm workers, loss to truckers, loss to grocery sellers, etc.

Much as before, the data are grouped by event type, arranged, summary statistics are calculated, and individual values are gathered to support the stacked barchart below. The total cost is also divided into billions of dollars to make the plot more readable.

cst <- sdt %>% group_by(EVTYPE) %>% 
    summarise(totalCost=sum(PROPDMG_N + CROPDMG_N),
              PROPERTY_DAMAGE = sum(PROPDMG_N),
              CROP_DAMAGE = sum(CROPDMG_N)) %>% 
    gather(TYPE, COST, PROPERTY_DAMAGE, CROP_DAMAGE) %>%
    arrange(desc(totalCost))
ggplot(data=head(cst, 10), 
       aes(x=factor(EVTYPE, levels=EVTYPE), 
           y=COST/1000000000,
           fill=TYPE)) + 
    geom_bar(stat="identity") + 
    labs(x="Extreme Weather Event", 
         y="Total Combined Property and Crop Damage Cost ($ Billions)",
         title="Stacked Barchart of Total Damage Costs by Event Type")

The costs of extreme weather are not nearly as dominated by a single event compared to population health consequences. Floods cause nearly twice as much damage as hurricanes and typhoons, at approximately $150 billion and $70 billion, respectively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment