Skip to content

Instantly share code, notes, and snippets.

@jemus42
Created June 9, 2021 10:34
Show Gist options
  • Save jemus42/b4e8ae67db97a47dc7ba10c04bc97e37 to your computer and use it in GitHub Desktop.
Save jemus42/b4e8ae67db97a47dc7ba10c04bc97e37 to your computer and use it in GitHub Desktop.
Quick example to scrape some tabular data from wikipedia
library(rvest)
library(dplyr)
library(stringr)
# Game of Thrones ----
got_wiki <- read_html("https://en.wikipedia.org/wiki/List_of_Game_of_Thrones_episodes") %>%
html_table(fill = TRUE) %>%
magrittr::extract(c(2:9)) %>%
bind_rows() %>%
setNames(c(
"episode_abs", "episode", "title", "director",
"writer", "firstaired", "viewers"
)) %>%
select(-firstaired) %>%
mutate(
viewers = str_replace_all(viewers, "\\[\\d+\\]", ""),
viewers = as.numeric(viewers)
) %>%
select(-episode, -title) %>%
as_tibble()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment