Skip to content

Instantly share code, notes, and snippets.

@stephlocke
Created July 8, 2018 11:41
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save stephlocke/3ed1f3804d824f49fc5fdc3d638f0837 to your computer and use it in GitHub Desktop.
Save stephlocke/3ed1f3804d824f49fc5fdc3d638f0837 to your computer and use it in GitHub Desktop.
Scrape and consolidate the year superheroes made their debut in comics
library(tidyverse)
library(rvest)
"https://en.wikipedia.org/wiki/List_of_superhero_debuts" %>%
read_html() %>%
html_nodes(xpath = "//*[@id='mw-content-text']/div/table") %>%
map(html_table, fill = TRUE) %>%
map_df( ~ mutate(., `Year Debuted` = as.character(`Year Debuted`))) %>%
mutate(`Char_Team` = coalesce(`Character / Team / Series`, `Character / Team`)) %>%
select(Char_Team, Year_Debut = `Year Debuted`) %>%
mutate(Year = str_extract(Year_Debut, "18[0-9]{2,2}|19[0-9]{2,2}|20[0-9]{2,2}")) %>%
filter(!is.na(Year)) %>%
View()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment