johandahlberg/rwanda_genocide.Rmd

## rwanda_genocide.Rmd
---
title: Understanding the Rwandan genocide of 1994 through data from the Uppsala Conflict
  Data Program
author: "Johan Dahlberg"
date: "October 27, 2015"
output: html_document
licence: http://creativecommons.org/licenses/by-sa/3.0/
---

``` {r, echo=FALSE, message=FALSE}
library(ggplot2)
library(dplyr)

load("ged20-rdata/ged20.Rdata")

# Rename the dataset to data since I'm lazy.
data <- data.frame(ged20.rg)
rm(ged20.rg)

```

In this post I'm attempting to understand the one of the most horrific events in recent history through the use of data - the [Rwandan genocide](https://en.wikipedia.org/wiki/Rwandan_Genocide) of 1994. The [Uppsala Conflict Data Program](http://www.pcr.uu.se/research/ucdp/program_overview/) provides data on armed conflicts around the world. For this post I've specifically used the [UCDP GED 2.0 dataset](http://ucdp.uu.se/ged/data.php). It contains 89033 data points, each detailing an event of organized violence in Africa or Asia in the time period of 1989 to 2014. It has data on the actors of the conflict, high, low, and best estimates of the number of deaths, the location of events, and much more. It's important to note that the UCDP is, as far as I understand, quite conservative in what is included in the dataset so it will not necessarily include all known events of organized violence.

Firstly I need to caveat this saying that I have no expertice whatsoever in conflict studies, I simply found the dataset and thought that it would be interesting to explore it a bit. What struck me when I started to poke around, and what inspired me to write this blog post was how shocked I was to see the extreme extent of the Rwandan genocide. While I knew about it, the exact scale had not really come clear to me until I looked at this data

I began my analysis by looking at the distribution of people killed per event. This is illustrated in the histogram below. Note the log10 scale of the x-axis. This tells us that the vast number of events have relatively few casualites. However it also shows us that there is the wide range - with a single event in-fact having a 316744 casualites - with the dataset dryly describing this event as "Government of Rwanda - Civilians".

```{r, echo=FALSE, message=FALSE}

ggplot(data = data, aes(x = best_est)) +
  geom_histogram() +
  scale_x_log10() +
  labs(title="Histogram of number of deaths", x = "log10(number of deaths)")

killed.in.1994 <-
  data %>%
    select(country, year, best_est) %>%
    filter(country == "Rwanda", year == 1994) %>%
    summarise(total = sum(best_est))

```

Moving on from there I looked at the number of deaths per year. While there is a downward trend (that is however not statistically significant), it once again highlights the events in Rwanda in 1994 as an extreme outlier. According to this data `r as.integer(killed.in.1994)` people were killed in Rwanda in 1994 - which concurs with the estimates provided at Wikipedia of 500000 to 1000000 deaths.

```{r, echo=FALSE}

by.year <- group_by(data, year)
summary.per.year <- summarise(by.year,
                              deaths = sum(best_est))

ggplot(data = summary.per.year, aes(x = year, y = deaths)) +
      geom_point() +
      geom_smooth(method = "lm", se=TRUE, color="black") +
      labs(title="Deaths per year", x="Year", y="Number of deaths")


```

While this post adds nothing new to the understanding of this subject I hope that even simplistic data analysis can be a very powerful tool in understanding something. For me personally the figures provided here tell a nauseating story of extreme human tradgedy. I wish I could say that I was sure that this was the last time the world looked on while something like this happened. Sadly I'm not so sure.

A R markdown version of this post including code used in the analysis is available here: https://gist.github.com/johandahlberg/41ad32bc02279bc06b6d

Finally thanks to Henrik Persson and Sara Engström for fact-checking and proof-reading.
	---
	title: Understanding the Rwandan genocide of 1994 through data from the Uppsala Conflict
	Data Program
	author: "Johan Dahlberg"
	date: "October 27, 2015"
	output: html_document
	licence: http://creativecommons.org/licenses/by-sa/3.0/
	---

	``` {r, echo=FALSE, message=FALSE}
	library(ggplot2)
	library(dplyr)

	load("ged20-rdata/ged20.Rdata")

	# Rename the dataset to data since I'm lazy.
	data <- data.frame(ged20.rg)
	rm(ged20.rg)

	```

	In this post I'm attempting to understand the one of the most horrific events in recent history through the use of data - the [Rwandan genocide](https://en.wikipedia.org/wiki/Rwandan_Genocide) of 1994. The [Uppsala Conflict Data Program](http://www.pcr.uu.se/research/ucdp/program_overview/) provides data on armed conflicts around the world. For this post I've specifically used the [UCDP GED 2.0 dataset](http://ucdp.uu.se/ged/data.php). It contains 89033 data points, each detailing an event of organized violence in Africa or Asia in the time period of 1989 to 2014. It has data on the actors of the conflict, high, low, and best estimates of the number of deaths, the location of events, and much more. It's important to note that the UCDP is, as far as I understand, quite conservative in what is included in the dataset so it will not necessarily include all known events of organized violence.

	Firstly I need to caveat this saying that I have no expertice whatsoever in conflict studies, I simply found the dataset and thought that it would be interesting to explore it a bit. What struck me when I started to poke around, and what inspired me to write this blog post was how shocked I was to see the extreme extent of the Rwandan genocide. While I knew about it, the exact scale had not really come clear to me until I looked at this data

	I began my analysis by looking at the distribution of people killed per event. This is illustrated in the histogram below. Note the log10 scale of the x-axis. This tells us that the vast number of events have relatively few casualites. However it also shows us that there is the wide range - with a single event in-fact having a 316744 casualites - with the dataset dryly describing this event as "Government of Rwanda - Civilians".

	```{r, echo=FALSE, message=FALSE}

	ggplot(data = data, aes(x = best_est)) +
	geom_histogram() +
	scale_x_log10() +
	labs(title="Histogram of number of deaths", x = "log10(number of deaths)")

	killed.in.1994 <-
	data %>%
	select(country, year, best_est) %>%
	filter(country == "Rwanda", year == 1994) %>%
	summarise(total = sum(best_est))

	```

	Moving on from there I looked at the number of deaths per year. While there is a downward trend (that is however not statistically significant), it once again highlights the events in Rwanda in 1994 as an extreme outlier. According to this data `r as.integer(killed.in.1994)` people were killed in Rwanda in 1994 - which concurs with the estimates provided at Wikipedia of 500000 to 1000000 deaths.

	```{r, echo=FALSE}

	by.year <- group_by(data, year)
	summary.per.year <- summarise(by.year,
	deaths = sum(best_est))

	ggplot(data = summary.per.year, aes(x = year, y = deaths)) +
	geom_point() +
	geom_smooth(method = "lm", se=TRUE, color="black") +
	labs(title="Deaths per year", x="Year", y="Number of deaths")


	```

	While this post adds nothing new to the understanding of this subject I hope that even simplistic data analysis can be a very powerful tool in understanding something. For me personally the figures provided here tell a nauseating story of extreme human tradgedy. I wish I could say that I was sure that this was the last time the world looked on while something like this happened. Sadly I'm not so sure.

	A R markdown version of this post including code used in the analysis is available here: https://gist.github.com/johandahlberg/41ad32bc02279bc06b6d

	Finally thanks to Henrik Persson and Sara Engström for fact-checking and proof-reading.