Skip to content

Instantly share code, notes, and snippets.

Last active October 25, 2021 18:08
Show Gist options
  • Save agoldst/d92ffa634aa6ddbf1c72b2401b21cf11 to your computer and use it in GitHub Desktop.
Save agoldst/d92ffa634aa6ddbf1c72b2401b21cf11 to your computer and use it in GitHub Desktop.
title: "Nobel genre tallies"
self_contained: false
```{r setup, include=F}
```{r constants}
# JSON data available via
# <> and
# <>
# but Wikipedia's table is easier to scrape
url <- ""
data_file <- "nobel.tsv"
```{r prize-retrieval}
if (!file.exists(data_file)) {
list_page <- read_html(url)
# first table is the big list
list_page %>% html_node("table") %>%
html_table(fill=T) %>%
select(-Picture) %>%
```{r load-prize}
laureates <- read_tsv(data_file) %>%
rename(year=Year, genre=`Genre(s)`)
```{r genre-tally}
genres <- laureates %>%
mutate(genre=str_split(genre, ", ")) %>%
select(year, genre) %>%
unnest(genre) %>%
filter(genre != "Not awarded") %>%
# recode genre
genre %in% c("novel", "short story") ~ "fiction",
genre %in% c("memoirs", "biography", "autobiography",
"philosophy", "literary criticism", "philology",
"history", "law", "essay") ~ "non-fiction",
genre %in% c("music", "songwriting") ~ "music",
TRUE ~ genre)) %>%
# deduplicate
distinct() %>%
mutate(count=1) %>%
pivot_wider(names_from="genre", values_from="count",
values_fill = 0, values_fn=sum)
```{r genre-cum-plot}
genres_cum <- genres %>%
mutate(across(!year, cumsum)) %>%
pivot_longer(-year, names_to="genre", values_to="count") %>%
mutate(genre=fct_reorder2(genre, year, count))
genres_cum %>%
group_by(year) %>%
mutate(count = count/sum(count)) %>%
ggplot(aes(year, count, fill=genre)) +
geom_area(position="stack") +
scale_x_continuous(breaks=seq(1910, 2020, by=10)) +
scale_y_continuous(labels=function (x) str_c(x * 100, "%")) +
scale_fill_viridis_d() +
coord_cartesian(expand=F) +
labs(y="cumulative proportion of laureates' genres",
title="Genres' cumulative share of the literature Nobel, 1901–2021",
"Data from The height of each strip represents the cumulative proportion of laureates up to that date who worked in that genre. N.B. when a writer worked in multiple genres, that writer's prize is counted multiple times."))
```{r genre-cum-facet}
genres_cum %>%
filter(genre %in% c("fiction", "poetry", "non-fiction", "drama")) %>%
ggplot(aes(year, count, fill=genre)) +
geom_area() +
facet_wrap(~ genre) +
scale_fill_viridis_d(guide="none") +
coord_cartesian(expand=F) +
labs(y="cumulative count of laureates' genres",
title="Cumulative tally of literature Nobel laureates' genres",
caption="Data as above, for the four most numerous genres only.")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment