Skip to content

Instantly share code, notes, and snippets.

@geoffwoollard
Last active December 23, 2015 17:09
Show Gist options
  • Save geoffwoollard/6666457 to your computer and use it in GitHub Desktop.
Save geoffwoollard/6666457 to your computer and use it in GitHub Desktop.
# Homework 3
Install dependencies
```{r}
#install.packages("plyr", dependencies = TRUE)
library(plyr)
#install.packages("xtable", dependencies = TRUE)
library(xtable)
```
Load the [data](http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt)
```{r}
gdURL <- "http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt"
gDat <- read.table(gdURL, header = TRUE, sep = '\t', quote = "\"")
```
Check the data is cleaned and ready to roll
```{r}
str(gDat)
tail(gDat)
```
## Rich vs. Poor
Let's quantitatively look at the weath of nations.
We simply break the data set by continent, and get the max and min gdp and their ratio
```{r results='asis'}
gdpByContinent <- ddply(gDat, ~continent, summarize,
minGdpPercap=min(gdpPercap),
maxGdpPercap=max(gdpPercap),
richVsPoor = max(gdpPercap) / min(gdpPercap)
) # round is supposed to give nice numbers back!
gdpByContinent <- arrange(gdpByContinent,richVsPoor)
gdpByContinent <- xtable(gdpByContinent, digits=0) # digits truncates output
print(gdpByContinent, type = "html", include.rownames = FALSE)
```
Here I take "wealth distrubution" to be the fold difference between the max and min gdpPercap <br>
Yes, it is true, your eyes don't deceive you, the "richest" country is *that much poorer* that the "poorest" country <br>
Asia has the largest "wealth distribution", the gap between rich and poor
## Life Expectancy Spread
Let's look at the spread of the life expectancy
There are various metrics
* standard deviation
* [median absolute deviation](http://en.wikipedia.org/wiki/Median_absolute_deviation)
* [interquartile range aka middle fifty](http://en.wikipedia.org/wiki/Interquartile_range)
```{r}
roundDec <- 1
lifeExpByCont <- ddply(gDat, ~continent, summarize,
sdLifeExp = round(sd(lifeExp),roundDec),
madLifeExp = round(mad(lifeExp),roundDec),
IQRLifeExp = round(IQR(lifeExp),roundDec)
)
arrange(lifeExpByCont,sdLifeExp)
arrange(lifeExpByCont,madLifeExp)
arrange(lifeExpByCont,IQRLifeExp)
```
As you can see the results depend on the metric used <br>
Asia always has the highest spread, but sometimes the lowest spread is Europe, sometimes Oceania <br>
Take home lesson - *always mention* what you mean by "spread" <br>
## Are people living longer and longer?
We compute the average life expencancy for each year over the whole data set<br>
But we remove 5% of the max outliers and 5% of the min outliers, since the mean is sensitive to outliers
```{r results='asis'}
trimFrac <- 0.05 # this is about 7 maxs and 7 mins lopped off
lifeExpByYear <- ddply(gDat,~year,summarize,
avLifeExp = mean(lifeExp, trim=trimFrac)
)
lifeExpByYear <- xtable(lifeExpByYear, digit=1)
print(lifeExpByYear, type = "html", include.rownames = FALSE)
```
## Middle Age
Imagine not making it to "middle age"" (taken to be 40 years) <br>
How many countries are there in each continent that have a life expectancy less thatn 40? <br>
We feed in a subset of data with out middle age cut off right at the start <br>
Let's keep the table ordered by continent and year
```{r results='asis'}
middleAge <- 40
middleAgeCount <- ddply(subset(gDat,subset = lifeExp < middleAge),
~continent + year,summarize,
countryCount=length(unique(country))
)
middleAgeCount <- xtable(middleAgeCount)
print(middleAgeCount, type = "html", include.rownames = FALSE)
```
## Life Expectancy Extrema
Now let's look at who has the most extreme life expectancy in a given year <br>
We first write a funciton that gives you the answer to "what country has this life expencancy?"
```{r results='asis'}
getCountryWithLE <- function(lifeExpVal) return(gDat[which(gDat$lifeExp == lifeExpVal),]$country)
lifeExpByYear <- ddply(gDat, ~year, summarize,
minLifeExp = min(lifeExp),
minCountry = getCountryWithLE(minLifeExp)[1] ,
maxLifeExp = max(lifeExp),
maxCountry = getCountryWithLE(maxLifeExp)[1]
) # because multiple countries can return we need to truncate minCountry and maxCountry, so there may be other additional countries
lifeExpByYear <- xtable(lifeExpByYear, digits=0)
print(lifeExpByYear, type = "html", include.rownames = FALSE)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment