JoFrhwld/Rmd_example.rmd

## Rmd_example.rmd
#### Values created by statistics
Statistical layers added to plots actually create new pieces of data, like the y-coordinates of the smoother. Some statistical layers create a few different values, and you can choose which one you want to plot. For example, here is a density plot, where the kernel density estimate is represented by a colored line.

```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, color = Word))+
  geom_density()
```

You have to understand the densities represented in this plot as being conditional on selecting a specific word. That is, given that we have decided to think about the lexical item "I've", what is the probability it will be found in a specific range of durations?

However, it's also possible to plot densities that are not conditional lexical item. That is, given a range of durations, what is the mixture of lexical items we'll find there? We can do this by plotting a value called `..count..` along the y-axis. `..count..` is created by `stat_density()`, which is why it's begins and ends with `..`. All values that are created by a statistical layer begin and end with `..`. Here's the result of plotting `..count..` along the y-axis.

```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, color = Word))+
  geom_density(aes(y = ..count..))
```

The density distribution for "I" is suddenly huge, and that's because it's so frequent. Note that I used `y = ..count..` for stacking and filling density distributions above, because that's actually the only thing that makes any sense. Compare the following two filled density plots.

```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, fill = Word))+
  geom_density(aes(y = ..count..), position = "fill")
```

```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(Dur_msec, fill = Word))+
  geom_density(aes(y = ..density..), position = "fill")
```
We _know_ that "I" is super frequent, so the plot with `y = ..count..` is the accurate one. What the plot with `y = ..density..` is actually displaying is a little complicated to worry about, and it would almost certainly confuse any readers of your papers.

Another cool use of values generated by a statistic is with two dimensional density estimation. Here's an F2 by F1 plot, illustrating the default behavior of `stat_density2d()`.

```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
  stat_density2d()
```

The default behavior of `stat_density2d()` is to bin up the two dimensional density estimates into discrete levels for plotting as topographic contours. The value corresponding to this discretized density estimate is called `..level..`, and we can use it to replace the contour lines with filled in polygons.


```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
  stat_density2d(geom = "polygon", aes(fill = ..level..))
```

Or, we can turn off the discritization entirely by saying `contour = F`, and then access the density estimate itself, `..density..` and map that to a variety of different aesthetics.


```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
  stat_density2d(geom = "point",contour = F,
                 aes(size = ..density..), alpha = 0.3)
```


```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
  stat_density2d(geom = "tile", contour = F, aes(alpha = ..density..))
```

```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
ggplot(I_jean, aes(-F2.n, -F1.n))+
  stat_density2d(geom = "tile", contour = F, aes(fill = ..density..))
```
	#### Values created by statistics
	Statistical layers added to plots actually create new pieces of data, like the y-coordinates of the smoother. Some statistical layers create a few different values, and you can choose which one you want to plot. For example, here is a density plot, where the kernel density estimate is represented by a colored line.

	```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(Dur_msec, color = Word))+
	geom_density()
	```

	You have to understand the densities represented in this plot as being conditional on selecting a specific word. That is, given that we have decided to think about the lexical item "I've", what is the probability it will be found in a specific range of durations?

	However, it's also possible to plot densities that are not conditional lexical item. That is, given a range of durations, what is the mixture of lexical items we'll find there? We can do this by plotting a value called `..count..` along the y-axis. `..count..` is created by `stat_density()`, which is why it's begins and ends with `..`. All values that are created by a statistical layer begin and end with `..`. Here's the result of plotting `..count..` along the y-axis.

	```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(Dur_msec, color = Word))+
	geom_density(aes(y = ..count..))
	```

	The density distribution for "I" is suddenly huge, and that's because it's so frequent. Note that I used `y = ..count..` for stacking and filling density distributions above, because that's actually the only thing that makes any sense. Compare the following two filled density plots.

	```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(Dur_msec, fill = Word))+
	geom_density(aes(y = ..count..), position = "fill")
	```

	```{r tidy = F, fig.width = 8/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(Dur_msec, fill = Word))+
	geom_density(aes(y = ..density..), position = "fill")
	```
	We _know_ that "I" is super frequent, so the plot with `y = ..count..` is the accurate one. What the plot with `y = ..density..` is actually displaying is a little complicated to worry about, and it would almost certainly confuse any readers of your papers.

	Another cool use of values generated by a statistic is with two dimensional density estimation. Here's an F2 by F1 plot, illustrating the default behavior of `stat_density2d()`.

	```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(-F2.n, -F1.n))+
	stat_density2d()
	```

	The default behavior of `stat_density2d()` is to bin up the two dimensional density estimates into discrete levels for plotting as topographic contours. The value corresponding to this discretized density estimate is called `..level..`, and we can use it to replace the contour lines with filled in polygons.



	```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(-F2.n, -F1.n))+
	stat_density2d(geom = "polygon", aes(fill = ..level..))
	```

	Or, we can turn off the discritization entirely by saying `contour = F`, and then access the density estimate itself, `..density..` and map that to a variety of different aesthetics.


	```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(-F2.n, -F1.n))+
	stat_density2d(geom = "point",contour = F,
	aes(size = ..density..), alpha = 0.3)
	```


	```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(-F2.n, -F1.n))+
	stat_density2d(geom = "tile", contour = F, aes(alpha = ..density..))
	```

	```{r tidy = F,fig.width = 6/1.2, fig.height=5/1.2}
	ggplot(I_jean, aes(-F2.n, -F1.n))+
	stat_density2d(geom = "tile", contour = F, aes(fill = ..density..))
	```