Skip to content

Instantly share code, notes, and snippets.

@jroberayalas
Last active March 18, 2019 11:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jroberayalas/ab6eeb0f3b180c054d81ac63a54fc0f2 to your computer and use it in GitHub Desktop.
Save jroberayalas/ab6eeb0f3b180c054d81ac63a54fc0f2 to your computer and use it in GitHub Desktop.
# Load packages
library(rvest)
library(stringr)
library(dplyr)
library(lubridate)
library(readr)
# Read web page
webpage <- read_html("https://www.nytimes.com/interactive/2017/06/23/opinion/trumps-lies.html")
# Extract records info
results <- webpage %>% html_nodes(".short-desc")
# Building the dataset
records <- vector("list", length = length(results))
for (i in seq_along(results)) {
date <- str_c(results[i] %>%
html_nodes("strong") %>%
html_text(trim = TRUE), ', 2017')
lie <- str_sub(xml_contents(results[i])[2] %>% html_text(trim = TRUE), 2, -2)
explanation <- str_sub(results[i] %>%
html_nodes(".short-truth") %>%
html_text(trim = TRUE), 2, -2)
url <- results[i] %>% html_nodes("a") %>% html_attr("href")
records[[i]] <- data_frame(date = date, lie = lie, explanation = explanation, url = url)
}
df <- bind_rows(records)
# Transform to datetime format
df$date <- mdy(df$date)
# Export to csv
write_csv(df, "trump_lies.csv")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment