Created
September 27, 2012 19:03
-
-
Save JoFrhwld/3795792 to your computer and use it in GitHub Desktop.
Code for my UseR_Sept2012 talk
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#### Values created by statistics | |
Statistical layers added to plots actually create new pieces of data, like the y-coordinates of the smoother. Some statistical layers create a few different values, and you can choose which one you want to plot. For example, here is a density plot, where the kernel density estimate is represented by a colored line. | |
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(Dur_msec, color = Word))+ | |
geom_density() | |
``` | |
You have to understand the densities represented in this plot as being conditional on selecting a specific word. That is, given that we have decided to think about the lexical item "I've", what is the probability it will be found in a specific range of durations? | |
However, it's also possible to plot densities that are not conditional lexical item. That is, given a range of durations, what is the mixture of lexical items we'll find there? We can do this by plotting a value called `..count..` along the y-axis. `..count..` is created by `stat_density()`, which is why it's begins and ends with `..`. All values that are created by a statistical layer begin and end with `..`. Here's the result of plotting `..count..` along the y-axis. | |
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(Dur_msec, color = Word))+ | |
geom_density(aes(y = ..count..)) | |
``` | |
The density distribution for "I" is suddenly huge, and that's because it's so frequent. Note that I used `y = ..count..` for stacking and filling density distributions above, because that's actually the only thing that makes any sense. Compare the following two filled density plots. | |
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(Dur_msec, fill = Word))+ | |
geom_density(aes(y = ..count..), position = "fill") | |
``` | |
```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(Dur_msec, fill = Word))+ | |
geom_density(aes(y = ..density..), position = "fill") | |
``` | |
We _know_ that "I" is super frequent, so the plot with `y = ..count..` is the accurate one. What the plot with `y = ..density..` is actually displaying is a little complicated to worry about, and it would almost certainly confuse any readers of your papers. | |
Another cool use of values generated by a statistic is with two dimensional density estimation. Here's an F2 by F1 plot, illustrating the default behavior of `stat_density2d()`. | |
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(-F2.n, -F1.n))+ | |
stat_density2d() | |
``` | |
The default behavior of `stat_density2d()` is to bin up the two dimensional density estimates into discrete levels for plotting as topographic contours. The value corresponding to this discretized density estimate is called `..level..`, and we can use it to replace the contour lines with filled in polygons. | |
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(-F2.n, -F1.n))+ | |
stat_density2d(geom = "polygon", aes(fill = ..level..)) | |
``` | |
Or, we can turn off the discritization entirely by saying `contour = F`, and then access the density estimate itself, `..density..` and map that to a variety of different aesthetics. | |
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(-F2.n, -F1.n))+ | |
stat_density2d(geom = "point",contour = F, | |
aes(size = ..density..), alpha = 0.3) | |
``` | |
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(-F2.n, -F1.n))+ | |
stat_density2d(geom = "tile", contour = F, aes(alpha = ..density..)) | |
``` | |
```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2} | |
ggplot(I_jean, aes(-F2.n, -F1.n))+ | |
stat_density2d(geom = "tile", contour = F, aes(fill = ..density..)) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment