Skip to content

Instantly share code, notes, and snippets.

@erikgahner
Created February 22, 2023 22:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save erikgahner/c87b585c8496f85a8acb9654c37c0c24 to your computer and use it in GitHub Desktop.
Save erikgahner/c87b585c8496f85a8acb9654c37c0c24 to your computer and use it in GitHub Desktop.
Getting 9/11 pager data into R
library("tidyverse")
library("rvest")
wikileaks_html <- read_html("https://911.wikileaks.org/files/index.html")
wikileaks_urls <- wikileaks_html |>
html_nodes("a") |>
html_attr("href") |>
as_tibble() |>
filter(str_detect(value, ".txt")) |>
transmute(link = paste0("https://911.wikileaks.org/", value))
pager_list <- map(wikileaks_urls$link, readLines)
pager_text <- pager_list |>
flatten_chr()
pager_text |>
writeLines("911wikileaks.txt")
pager_text <- readLines("911wikileaks.txt")
pager_df <- tibble(pager_text) |>
mutate(
day = word(pager_text, 1),
time = word(pager_text, 2),
text = word(pager_text, 9, str_count(pager_text, "\\S+") + 2)
) |>
mutate(ymdhms = lubridate::ymd_hms(paste(day, time)))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment