Skip to content

Instantly share code, notes, and snippets.

@giocomai
Created November 15, 2022 09:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save giocomai/f8316b9db52c420b8618b487fd81815c to your computer and use it in GitHub Desktop.
Save giocomai/f8316b9db52c420b8618b487fd81815c to your computer and use it in GitHub Desktop.
Are movies that win the Oscars really longer than in the past? Nah... (slow at first run, but data are cached locally)
# https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films
library("dplyr", warn.conflicts = FALSE)
library("tidywikidatar")
library("ggplot2")
# https://github.com/ivelasq/severance
library(severance)
tw_set_language(language = "en")
tw_enable_cache()
tw_create_cache_folder()
tw_set_cache_folder(path = fs::path(fs::path_home_r(),
"R",
"tw_data_oscar"))
linked_df <- tw_get_wikipedia_page_links(url = "https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films")
duration_df <- linked_df %>%
dplyr::select("qid") %>%
dplyr::distinct(qid, .keep_all = TRUE) %>%
dplyr::mutate(title = tw_get_label(id = qid),
date = tw_get_p1(id = qid, p = "P577"),
duration = tw_get_p1(id = qid, p = "P2047")) %>%
dplyr::filter(is.na(date)==FALSE) %>%
mutate(date = stringr::str_extract(string = date, pattern = "[[:digit:]]+") %>% as.numeric(),
duration = stringr::str_extract(string = duration, pattern = "[[:digit:]]+") %>% as.numeric()) %>%
dplyr::filter(duration>60)
duration_df %>%
#dplyr::filter(date>1945) %>%
ggplot(mapping = aes(x = date, y = duration)) +
geom_point() +
geom_smooth(method = "lm")
duration_df %>%
dplyr::mutate(decade = stringr::str_extract(string = date, pattern = "[[:digit:]][[:digit:]][[:digit:]]")) %>%
dplyr::group_by(decade) %>%
dplyr::summarise(median_by_decade = median(duration)) %>%
dplyr::ungroup() %>%
dplyr::mutate(decade = stringr::str_c(decade, "0s")) %>%
dplyr::arrange(dplyr::desc(median_by_decade)) %>%
#dplyr::filter(date>1945) %>%
ggplot(mapping = aes(x = median_by_decade,
y = decade,
label = scales::number(median_by_decade, accuracy = 1),
fill = decade)) +
geom_col() +
geom_text(hjust = 1.2, colour = "white", family = "Roboto Condensed") +
scale_y_discrete("") +
scale_x_continuous(name = "") +
scale_fill_manual(values = rep(x = severance_palette("Jazz02"),2)) +
#scale_colour_identity() +
theme_minimal(base_family = "Roboto Condensed",
base_size = 16) +
theme(legend.position = "none") +
labs(title = "Median running time of Academy Award-winning films",
subtitle = "Only films longer than 60 minutes included.\nDuration expressed in minutes. Colour of bars by decade.",
caption = "Source: https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films")
ggsave(filename = "duration_of_films_barchart.png",
width = 7, height = 7,bg = "white")
duration_df %>%
dplyr::mutate(decade = stringr::str_extract(string = date, pattern = "[[:digit:]][[:digit:]][[:digit:]]")) %>%
dplyr::group_by(decade) %>%
dplyr::mutate(median_by_decade = median(duration)) %>%
dplyr::ungroup() %>%
dplyr::mutate(decade = stringr::str_c(decade, "0s")) %>%
dplyr::arrange(dplyr::desc(median_by_decade)) %>%
dplyr::filter(duration<300) %>%
#dplyr::filter(date>1945) %>%
ggplot(mapping = aes(x = duration,
y = date,
label = scales::number(median_by_decade, accuracy = 1),
colour = decade)) +
geom_point() +
#geom_text(hjust = 1.2, colour = "white", family = "Roboto Condensed") +
scale_y_continuous("", n.breaks = 10) +
scale_x_continuous(name = "") +
scale_colour_manual(values = rep(x = severance_palette("Jazz02"),2)) +
#scale_colour_identity() +
theme_minimal(base_family = "Roboto Condensed",
base_size = 16) +
theme(legend.position = "none") +
labs(title = "Median running time of Academy Award-winning films",
subtitle = "Only films longer than 60 and shorter than 300 minutes included.\nDuration expressed in minutes. Colour of dots by decade.",
caption = "Source: https://en.wikipedia.org/wiki/List_of_Academy_Award-winning_films")
ggsave(filename = "duration_of_films_scatter.png",
width = 7, height = 7,bg = "white")
@giocomai
Copy link
Author

Resulting charts:

duration_of_films_barchart
duration_of_films_scatter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment