Create a gist now

Instantly share code, notes, and snippets.

Embed
What would you like to do?
ggplot2: aesthetics and geoms exploration
In this post I have a few goals:
1. Become (re-)familiar with available geoms
2. Become (re-)familiar with aesthetic mappings in geoms (stroke who knew?)
3. Answer these questions:
<ul>
<li>How often do various geoms appear and how often do they have required aesthetics?</li>
<li>How often do various aesthetics appear and how often are they required?</li>
<li>What geoms are most similar based on mappings?</li>
</ul>
<h2>The Back Story</h2>
Two weeks go I made whipped cream for the first time. In a tweet I lamented not having tried making it earlier in my life:
<img class=" wp-image-2275 aligncenter" src="https://trinkerrstuff.files.wordpress.com/2018/03/capture6.png" alt="Capture" width="356" height="162" />
This was an error that resulted in missing out because of a lack of confidence. Now the opposite tale. Missing out because of false confidence. I know ggplot2...there's nothing new to learn. Recently I realized it's been too long since I've re-read the documentation. I'm betting my time making this blog post that there's a few more like me.
I'm teaching an upcoming analysis courseTo prepare I'm reading and rereading many important texts including <a href="http://r4ds.had.co.nz/">R for Data Science</a>.  In my close read I noticed that some ggplot2 functions have a stroke aesthetic.
<img class=" aligncenter" src="http://www.relatably.com/m/img/puzzled-memes/b805e4fd98e530149b466b9ec501f39d.jpg" alt="Image result for puzzled meme" />
Didn't know that...I figure I needed to spend a bit more time with the documentation and really get to know geoms and aesthetics that I may have overlooked
<h2>What's an aesthetic Anyways?</h2>
ggplot2's author, Hadley, states:
<blockquote>In brief, the grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars)....aesthetics...are the properties that can be perceived on the graphic. Each aesthetic can be mapped to a variable, or set to a constant valuepp. 4; 176
</blockquote>
So without aesthetics, geoms can't represent data.
<h2>Getting the Geoms and Their Aesthetics</h2>
Before getting started note that wordpress.com destroys code. To get the undestroyed version of code from this blog use this gist: <a href="https://gist.github.com/trinker/a03c24a7fe9603d3091f183ffe4781ab">https://gist.github.com/trinker/a03c24a7fe9603d3091f183ffe4781ab</a>
I wasn't sure how I was going to accomplish this task but a little digging showed that it was pretty easyFirst I used <a href="https://github.com/Dasonk">Dason Kurkiewicz</a>  & I's <a href="https://github.com/trinker/pacman">pacman</a> package to get all the geoms from ggplot2.  This is where I first hit a road block.  How do I get the aesthetics for a geom?  I recalled that the documentation tells you the aesthetics for each geom.  I noticed that in <a href="https://github.com/tidyverse/ggplot2/blob/master/R/geom-bar.r">the roxygen2 markup</a> the following line that creates the aesthetics reference in the documentation:
[sourcecode class="r"]@eval rd_aesthetics("geom", "bar")[/sourcecode]
That allowed me to follow the bread crumbs to make the following code to grab the aesthetics per geom and a flag for when an aesthetic is required:
[sourcecode class="r"]
if (!require("pacman")) install.packages("pacman")
pacman::p_load(pacman, ggplot2, dplyr, textshape, numform, tidyr, lsa, viridis)
## get the geoms
geoms <- gsub('^geom_', '', grep('^geom_', p_funs(ggplot2), value = TRUE))
## function to grab the aesthetics
get_aesthetics <- function(name, type){
obj <- switch(type, geom = ggplot2:::find_subclass("Geom", name, globalenv()),
stat = ggplot2:::find_subclass("Stat", name, globalenv()))
aes <- ggplot2:::rd_aesthetics_item(obj)
req <- obj$required_aes
data_frame(
aesthetic = union(req, sort(obj$aesthetics())),
required = aesthetic %in% req,
)
}
## loop through and grab the aesthetics per geom
aesthetics_list <- lapply(geoms, function(name){
type <- 'geom'
name <- switch(name,
jitter = 'point',
freqpoly = 'line',
histogram = 'bar',
name
)
out <- try(get_aesthetics(name, 'geom'), silent = TRUE)
if (inherits(out, 'try-error')) out <- try(get_aesthetics(name, 'stat'), silent = TRUE)
out
}) %>%
setNames(geoms)
## convert the list of data.frames to one tidy data.frame
aesthetics <- aesthetics_list %>%
tidy_list('geom') %>%
tbl_df()
aesthetics
[/sourcecode]
[sourcecode class="r"]
geom aesthetic required
<chr> <chr> <lgl>
1 abline slope TRUE
2 abline intercept TRUE
3 abline alpha FALSE
4 abline colour FALSE
5 abline group FALSE
6 abline linetype FALSE
7 abline size FALSE
8 area x TRUE
9 area y TRUE
10 area alpha FALSE
# ... with 341 more rows
[/sourcecode]
<h2>Geoms: Getting Acquainted</h2>
First I wanted to get to know geoms again. There are currently 44 of them.
[sourcecode class="r"]
length(unique(aesthetics$geom))
## 44
[/sourcecode]
This bar plot details the geoms and how many aesthetics are optional/required.
[sourcecode class="r"]
geom_ord <- aesthetics %>%
count(geom) %>%
arrange(n) %>%
pull(geom)
aesthetics %>%
mutate(geom = factor(geom, levels = geom_ord)) %>%
ggplot(aes(geom, fill = required)) +
geom_bar() +
coord_flip() +
scale_y_continuous(expand = c(0, 0), limits = c(0, 15)) +
scale_fill_manual(name = 'Required', values = c('gray88', '#1C86EE'), labels = f_response) +
theme_minimal() +
theme(
panel.grid.major.y = element_blank(),
axis.text = element_text(color = 'grey55', size = 10),
legend.key.size = unit(.35, 'cm'),
legend.title = element_text(color = 'grey30', size = 10),
legend.position = c(.76, .1),
axis.title = element_text(color = 'gray55')
) +
labs(
x = 'Geom',
y = 'Count',
title = 'Count of Geom Aesthetics',
subtitle = 'Distribution of geom aesthetic count filled by requirement flag.'
)
[/sourcecode]
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/geom_count.png" alt="geom_count" width="80%" class="alignnone size-full wp-image-2264"/>
Some interesting things come out. Most geoms have 2ish required aesthetics. The boxplot geom has the most required and unrequired aesthetics. Sensibly, a blank geom requires no aesthetics. I wanted to see what all of these aesthetics were for the boxplot. Some quick dplyr-ing has us therein no time.
[sourcecode class="r"]
aesthetics %>%
filter(geom == 'boxplot')
[/sourcecode]
[sourcecode class="r"]
geom aesthetic required
<chr> <chr> <lgl>
1 boxplot x TRUE
2 boxplot lower TRUE
3 boxplot upper TRUE
4 boxplot middle TRUE
5 boxplot ymin TRUE
6 boxplot ymax TRUE
7 boxplot alpha FALSE
8 boxplot colour FALSE
9 boxplot fill FALSE
10 boxplot group FALSE
11 boxplot linetype FALSE
12 boxplot shape FALSE
13 boxplot size FALSE
14 boxplot weight FALSE
[/sourcecode]
Seems that <code>x</code> is the only aesthetic that is truly required. The other "required" ones are computed if you just supply <code>x</code>.
<h2>Aesthetics: May I join You?</h2>
Now time to get to know aesthetics. Are there others like <code>stroke</code> that I've overlooked? THere are 36 aesthetics. I see right away there is a <code>weight</code> aesthetic I've never seen before.
[sourcecode class="r"]
length(unique(aesthetics$aesthetic))
## 36
[/sourcecode]
[sourcecode class="r"]
aes_ord <- aesthetics %>%
count(aesthetic) %>%
arrange(n) %>%
pull(aesthetic)
aesthetics %>%
mutate(aesthetic = factor(aesthetic, levels = aes_ord)) %>%
ggplot(aes(aesthetic, fill = required)) +
geom_bar() +
scale_y_continuous(expand = c(0, 0), limits = c(0, 45), breaks = seq(0, 45, by = 5)) +
scale_fill_manual(name = 'Required', values = c('gray88', '#1C86EE'), labels = f_response) +
theme_minimal() +
theme(
panel.grid.major.y = element_blank(),
panel.grid.minor = element_blank(),
axis.text = element_text(color = 'grey55', size = 10),
legend.key.size = unit(.35, 'cm'),
legend.title = element_text(color = 'grey30', size = 10),
legend.position = c(.83, .1),
axis.title = element_text(color = 'gray55')
) +
labs(
x = 'Aesthetic',
y = 'Count',
title = 'Count of Aesthetics',
subtitle = 'Distribution of aesthetics filled by requirement flag.'
) +
coord_flip()
[/sourcecode]
<img class="alignnone size-full wp-image-2265" src="https://trinkerrstuff.files.wordpress.com/2018/03/aes_count.png" alt="aes_count" width="90%"/>
It seems from this plot that almost every geom requires an x/y position aesthetic. That makes sense. What doesn't make sense are cases besides <code>geom_blank</code> that don't require x/y.
[sourcecode class="r"]
aesthetics %>%
filter(aesthetic %in% c('x', 'y') & !required)
[/sourcecode]
[sourcecode class="r"]
geom aesthetic required
<chr> <chr> <lgl>
1 count y FALSE
2 qq x FALSE
3 qq y FALSE
4 rug x FALSE
5 rug y FALSE
[/sourcecode]
This dplyr summarize/filter shows where an x/y are possible aesthetics but not required. Sensible. But are there some geoms that x/y aren't even in their mapping?
[sourcecode class="r"]
aesthetics %>%
group_by(geom) %>%
summarize(
has_x = 'x' %in% aesthetic,
has_y = 'y' %in% aesthetic
) %>%
filter(!has_x | ! has_y) %>%
arrange(has_x)
[/sourcecode]
[sourcecode class="r"]
geom has_x has_y
<chr> <lgl> <lgl>
1 abline FALSE FALSE
2 blank FALSE FALSE
3 hline FALSE FALSE
4 map FALSE FALSE
5 rect FALSE FALSE
6 vline FALSE FALSE
7 boxplot TRUE FALSE
8 errorbar TRUE FALSE
9 linerange TRUE FALSE
10 ribbon TRUE FALSE
[/sourcecode]
Dig into the documentation if you want to make sense of why these geoms don't require x/y positions.
<h2>Geoms & Aesthetics Intersect</h2>
OK so we've explored variability in geoms and aesthetics a bit...let's see how they covary. The heatmap below provides an understanding of what geoms utilize what aesthetics and if they are required. <code>NA</code> means that the aesthetic is not a part of the geom's mapping.
[sourcecode class="r"]
boolean <- aesthetics %>%
count(geom, aesthetic) %>%
spread(aesthetic, n, fill = 0)
boolean %>%
gather(aesthetic, value, -geom) %>%
left_join(aesthetics, by = c('geom', 'aesthetic')) %>%
mutate(
aesthetic = factor(aesthetic, levels = rev(aes_ord)),
geom = factor(geom, levels = rev(geom_ord))
) %>%
ggplot(aes(y = aesthetic, x = geom, fill = required)) +
geom_tile(color = 'grey95') +
scale_fill_manual(name = 'Required', values = c('gray88', '#1C86EE'), labels = f_response) +
theme_minimal() +
theme(
axis.ticks = element_blank(),
axis.text.y = element_text(size = 8, margin = margin(r = -3)),
axis.text.x = element_text(size = 8, hjust = 1, vjust = 1,
angle = 45, margin = margin(t = -3)),
legend.key.size = unit(.35, 'cm'),
legend.title = element_text(color = 'grey30', size = 10),
panel.grid = element_blank(),
axis.title = element_text(color = 'gray55')
) +
labs(
title = 'Geom & Aesthetic Co-occurrence',
subtitle = NULL,
y = 'Aesthetic',
x = 'Geom'
)
[/sourcecode]
Here we can see that <code>weight</code> aesthetic again. Hmmm, boxplot has it and other, mostly univariate functions as well. The documentation is a bit sparse on it. Hadley says: <a href="https://github.com/tidyverse/ggplot2/issues/1893">https://github.com/tidyverse/ggplot2/issues/1893</a>
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/aes_geom_count.png" alt="aes_geom_count" width="100%" height="90%" class="alignnone size-full wp-image-2268"/>
Also there are two geoms that are just now catching my eye...<code>geom_count</code> & <code>geom_spoke</code>. I'll come back to them later.
Also, there are a few aesthetics like <code>height</code> and <code> intercept</code> that are one time use only aesthetics. Also, the bottom 10 aesthetics are used, by far, the most frequently. For beginners, these are the ones that will really pay to learn quickly.
<h2>Geom Similarity</h2>
I thought it my be fun to use the geoms aesthetics to see if we could cluster aesthetically similar geoms closer together. The heatmap below uses cosine similarity and heirarchical clustering to reorder the matrix that will allow for like geoms to be found closer to one another (note that today I learned from "R for Data Science" about the seriation package [https://cran.r-project.org/web/packages/seriation/index.html] that make make this matrix reordering task much easier).
[sourcecode class="r"]
boolean %>%
column_to_rownames() %>%
as.matrix() %>%
t() %>%
cosine() %>%
cluster_matrix() %>%
tidy_matrix('geom', 'geom2') %>%
mutate(
geom = factor(geom, levels = unique(geom)),
geom2 = factor(geom2, levels = unique(geom2))
) %>%
group_by(geom) %>%
ggplot(aes(geom, geom2, fill = value)) +
geom_tile() +
scale_fill_viridis(name = bquote(cos(theta))) +
theme(
axis.text.y = element_text(size = 8) ,
axis.text.x = element_text(size = 8, hjust = 1, vjust = 1, angle = 45),
legend.position = 'bottom',
legend.key.height = grid::unit(.2, 'cm'),
legend.key.width = grid::unit(.7, 'cm'),
axis.title = element_text(color = 'gray55')
) +
labs(
title = "ggplot2 Geom Cosine Similarity",
subtitle = 'Geoms with similar aesthetics are clustered together along the diagonal.',
x = 'Geom',
y = 'Geom'
)
[/sourcecode]
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/geom_sim.png" alt="geom_sim" width="100%" height="100%" class="alignnone size-full wp-image-2269"/>
Looking at the bright square clusters along the diagonal is a good starting place for understanding which geoms tend to aesthetically cluster together. Generally, this ordering seems pretty reasonable.
<h2>Aesthetic Similarity</h2>
I performed the same analysis of aesthetics, asking which ones tended to be used within the same geoms.
[sourcecode class="r"]
boolean %>%
column_to_rownames() %>%
as.matrix() %>%
cosine() %>%
{x <- .; diag(x) <- NA; x} %>%
cluster_matrix() %>%
tidy_matrix('aesthetic', 'aesthetic2') %>%
mutate(
aesthetic = factor(aesthetic, levels = unique(aesthetic)),
aesthetic2 = factor(aesthetic2, levels = unique(aesthetic2))
) %>%
group_by(aesthetic) %>%
ggplot(aes(aesthetic, aesthetic2, fill = value)) +
geom_tile() +
scale_fill_viridis(name = bquote(cos(theta))) +
theme(
axis.text.y = element_text(size = 8) ,
axis.text.x = element_text(size = 8, hjust = 1, vjust = 1, angle = 45),
legend.position = 'bottom',
legend.key.height = grid::unit(.2, 'cm'),
legend.key.width = grid::unit(.7, 'cm'),
axis.title = element_text(color = 'gray55')
) +
labs(
title = "ggplot2 Aesthetics Cosine Similarity",
subtitle = f_wrap(c(
'Aesthetics that tend to be used together within the same geoms are clustered together along the diagonal.'
), width = 95, collapse = TRUE
),
x = 'Aesthetic',
y = 'Aesthetic'
)
[/sourcecode]
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/aes_sim.png" alt="aes_sim" width="100%" height="100%" class="alignnone size-full wp-image-2270"/>
The result is pretty sensible. The lower left corner has the largest cluster which seems to be related to text based geoms. The next cluster up and to the right one, has group, size, x, y, etc. This seems to be the most common set of typically geometric aesthetics. The upper, lower, middle cluster is specific to the boxplot summary stat. Stroke and shape as a cluster are related to geoms that are point based.
<h2>Geom Similarity: Required Aesthetics</h2>
The last clustering activity I wanted was to reduce the seahetics to jsut required (as we might assume these are the truest attributes of a geom) and see which geoms cluster from that analysis.
[sourcecode class="r"]
boolean2 <- aesthetics %>%
filter(required) %>%
count(geom, aesthetic) %>%
spread(aesthetic, n, fill = 0)
boolean2 %>%
column_to_rownames() %>%
as.matrix() %>%
t() %>%
cosine() %>%
cluster_matrix() %>%
tidy_matrix('geom', 'geom2') %>%
mutate(
geom = factor(geom, levels = unique(geom)),
geom2 = factor(geom2, levels = unique(geom2))
) %>%
group_by(geom) %>%
ggplot(aes(geom, geom2, fill = value)) +
geom_tile() +
scale_fill_viridis(name = bquote(cos(theta))) +
theme(
axis.text.y = element_text(size = 8) ,
axis.text.x = element_text(size = 8, hjust = 1, vjust = 1, angle = 45),
legend.position = 'bottom',
legend.key.height = grid::unit(.2, 'cm'),
legend.key.width = grid::unit(.7, 'cm'),
axis.title = element_text(color = 'gray55')
) +
labs(
title = "ggplot2 Geom Cosine Similarity",
subtitle = 'Geoms with similar aesthetics are clustered together along the diagonal.',
x = 'Geom',
y = 'Geom'
)
[/sourcecode]
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/geom_sim2.png" alt="geom_sim2" width="100%" height="100%" class="alignnone size-full wp-image-2271"/>
This seemed less interesting. I didn't really have a plausible explanation for what patterns did show up and for the most part, clusters became really large or really small. I'm open to others' interpretations.
<h2>A Few Un-/Re-discovered ggplot2 Features</h2>
I did want to learn more about <code>geom_count</code> & <code>geom_spoke</code> in the remaining exploration.
[sourcecode class="r"]
?geom_count
ggplot(diamonds, aes(x = cut, y = clarity)) +
geom_count(aes(size = ..prop.., group = 1)) +
scale_size_area(max_size = 10)
[/sourcecode]
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/geom_count1.png" alt="geom_count" width="70%" height="70%" class="alignnone size-full wp-image-2272"/>
Oh yeah! Now I remember. A shortcut to make the <code>geom_point</code> bubble plot for investigating categorical variable covariance as an alternative to the heatmap.
[sourcecode class="r"]
?geom_spoke
df <- expand.grid(x = 1:10, y=1:10)
df$angle <- runif(100, 0, 2*pi)
df$speed <- runif(100, 0, sqrt(0.1 * df$x))
ggplot(df, aes(x, y)) +
geom_point() +
geom_spoke(aes(angle = angle), radius = 0.5)
[/sourcecode]
<img src="https://trinkerrstuff.files.wordpress.com/2018/03/geom_spoke.png" alt="geom_spoke" width="70%" height="70%" class="alignnone size-full wp-image-2273"/>
Interesting. The documentation says:
<blockquote>...useful when you have variables that describe direction and distance.</blockquote>
Not sure if I have a use case for my own work. But I'll store it in the vault (but better than I remembered <code>geom_count</code>).
<h2>Learning More About Geoms & Aesthetics</h2>
I wanted to leave people with a quick reference guide that RStudio has kindly provides to help give quick reference to geoms and aesthetics and whe to use them.
<a href="https://www.rstudio.com/wp-content/uploads/2016/11/ggplot2-cheatsheet-2.1.pdf">
<img style="width: 50%;" src="http://www.dartistics.com/images/ggplot2_geoms.png" alt="Image result for all ggplot geoms" />
</a>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment