Skip to content

Instantly share code, notes, and snippets.

@folias
Created October 7, 2013 15:30
Show Gist options
  • Select an option

  • Save folias/6869843 to your computer and use it in GitHub Desktop.

Select an option

Save folias/6869843 to your computer and use it in GitHub Desktop.
stat545a-2013-hw05_lee-woo
stat545a-2013-hw05_lee-woo
========================================================
> Presets:
```{r preset}
Sys.setenv(lang="EN")
library("plyr")
library("lattice")
library("ggplot2")
```
I use Gapminder data without Oceania here:
```{r}
dat <- read.delim("http://www.stat.ubc.ca/~jenny/notOcto/STAT545A/examples/gapminder/data/gapminderDataFiveYear.txt")
dat = droplevels(subset(dat, continent != "Oceania"))
str(dat)
```
A violin plot (a variation of stripplot) in `ggplot2`
-------------------------
First, I plot a violin plot for life expectancy versus year. I treat `year` as if it is a categorical variable:
```{r}
ggplot(data=dat) + geom_violin(aes(x=factor(year), y=lifeExp))
```
By this figure, we can clearly see that how the distribution of life expectancy of countries changes over time. In the earlier years, the "center of gravity" was at the low-end of the "violin". However, as time goes by, it gradually moves upward, and after 1987 the "center of gravity" is clearly at the high-end of the "violin".
An enhanced scatterplot: `geom_path` in `ggplot2`
-------------------------
Now I sample 6 countries randomly from the dataset:
```{r}
set.seed(100)
dat.string = sample(unique(dat$country),6)
dat.sample = droplevels(subset(dat, country %in% dat.string))
str(dat.sample)
levels(dat.sample$country)
```
First, let's try a basic scatterplot:
```{r}
ggplot(data=dat.sample) + geom_point(aes(x=gdpPercap,y=lifeExp,color=country))
```
Now, let's try an "enhanced" scatterplot. The following depicts the trajectory of $(gdpPercap, ~ lifeExp)$ over time on the 2-dimensional space:
```{r}
ggplot(data=dat.sample) + geom_path(aes(x=gdpPercap,y=lifeExp,group=country,color=year,size=2))
```
As we can see, there is a overall pattern on the trajectory of $(gdpPercap, ~ lifeExp)$ on the space of $gdpPercap \times lifeExp$. For GDP per capita less than 5000, life expectancy grows rapidly as GDP per capita increases. Then, after GDP per capita reaching 5000, life expectancy grows gradually as GDP per capita grows. This pattern is persistent if we draw more trajectories.
If I try this with lattice, I would try the following:
```{r}
xyplot(lifeExp ~ gdpPercap, data=dat.sample, group = country, col.line=dat.sample$year, type="l", auto.key=list(columns = nlevels(dat.sample$country)/2),lwd=5)
```
As in `ggplot2`, I put different variables for group and color. However, looking at the figure above, `lattice` does not give what I want to draw. Also, the `col.line` option works in a strange way. Compare the plot above with the plot below:
```{r}
xyplot(lifeExp ~ gdpPercap, data=dat.sample, group = country, type="l", auto.key=list(columns = nlevels(dat.sample$country)/2),lwd=5)
```
In the second plot, I removed `col.line` option. Now we can see that, in the first plot, the colors in the plot does not match those in the legend; the colors in the plot matches to those in legend in the second plot. I don't know how `col.line` worked in the first plot, but it did not produce what I wanted to draw anyway. On the other hand, `ggplot2` works quite intuitively and it produces what I meant exactly when I put different values to `group` and `color`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment