Skip to content

Instantly share code, notes, and snippets.

@JoFrhwld
Created September 27, 2012 19:03
Show Gist options
  • Save JoFrhwld/3795792 to your computer and use it in GitHub Desktop.
Save JoFrhwld/3795792 to your computer and use it in GitHub Desktop.
Code for my UseR_Sept2012 talk
#### Values created by statistics
Statistical layers added to plots actually create new pieces of data, like the y-coordinates of the smoother. Some statistical layers create a few different values, and you can choose which one you want to plot. For example, here is a density plot, where the kernel density estimate is represented by a colored line.
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, color = Word))+
geom_density()
```
You have to understand the densities represented in this plot as being conditional on selecting a specific word. That is, given that we have decided to think about the lexical item "I've", what is the probability it will be found in a specific range of durations?
However, it's also possible to plot densities that are not conditional lexical item. That is, given a range of durations, what is the mixture of lexical items we'll find there? We can do this by plotting a value called `..count..` along the y-axis. `..count..` is created by `stat_density()`, which is why it's begins and ends with `..`. All values that are created by a statistical layer begin and end with `..`. Here's the result of plotting `..count..` along the y-axis.
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, color = Word))+
geom_density(aes(y = ..count..))
```
The density distribution for "I" is suddenly huge, and that's because it's so frequent. Note that I used `y = ..count..` for stacking and filling density distributions above, because that's actually the only thing that makes any sense. Compare the following two filled density plots.
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, fill = Word))+
geom_density(aes(y = ..count..), position = "fill")
```
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, fill = Word))+
geom_density(aes(y = ..density..), position = "fill")
```
We _know_ that "I" is super frequent, so the plot with `y = ..count..` is the accurate one. What the plot with `y = ..density..` is actually displaying is a little complicated to worry about, and it would almost certainly confuse any readers of your papers.
Another cool use of values generated by a statistic is with two dimensional density estimation. Here's an F2 by F1 plot, illustrating the default behavior of `stat_density2d()`.
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
stat_density2d()
```
The default behavior of `stat_density2d()` is to bin up the two dimensional density estimates into discrete levels for plotting as topographic contours. The value corresponding to this discretized density estimate is called `..level..`, and we can use it to replace the contour lines with filled in polygons.
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
stat_density2d(geom = "polygon", aes(fill = ..level..))
```
Or, we can turn off the discritization entirely by saying `contour = F`, and then access the density estimate itself, `..density..` and map that to a variety of different aesthetics.
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
stat_density2d(geom = "point",contour = F,
aes(size = ..density..), alpha = 0.3)
```
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
stat_density2d(geom = "tile", contour = F, aes(alpha = ..density..))
```
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
stat_density2d(geom = "tile", contour = F, aes(fill = ..density..))
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment