Introduction to ggplot2
This is a short demo on how to convert an R Markdown Notebook into an IPython Notebook using knitr and notedown.
Adding a Python Chunk
return x + 2
This is an introduction to ggplot2. You can view the source as an R Markdown document, if you are using an IDE like RStudio, or as an IPython notebook, thanks to notedown.
We need to first make sure that we have
ggplot2 and its dependencies installed, using the
Now that we have it installed, we can get started by loading it into our workspace
We are now fully set to try and create some amazing plots.
We will use the ubiqutous iris dataset.
Let us create a simple scatterplot of
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point()
The basic idea in
ggplot2 is to map different plot aesthetics to variables in the dataset. In this plot, we map the x-axis to the variable
Sepal.Length and the y-axis to the variable
Now suppose, we want to color the points based on the
ggplot2 makes it really easy, since all you need to do is map the aesthetic
color to the variable
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point(aes(color = Species))
Note that I could have included the color mapping right inside the
ggplot line, in which case this mapping would have been applicable globally through all layers. If that doesn't make any sense to you right now, don't worry, as we will get there by the end of this tutorial.
We are interested in the relationship between
Sepal.Length. So, let us fit a regression line through the scatterplot. Now, before you start thinking you need to run a
lm command and gather the predictions using
predict, I will ask you to stop right there and read the next line of code.
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length)) + geom_point() + geom_smooth(method = "lm",
se = F)
If you are like me when the first time I ran this, you might be thinking this is voodoo! I thought so too, but apparently it is not. It is the beauty of
ggplot2 and the underlying notion of grammar of graphics.
You can extend this idea further and have a regression line plotted for each
ggplot(iris, aes(x = Sepal.Length, y = Petal.Length, color = Species)) + geom_point() +
geom_smooth(method = "lm", se = F)