Skip to content

Instantly share code, notes, and snippets.

Avatar

Matt Rosinski machinatoonist

View GitHub Profile
@machinatoonist
machinatoonist / scrape_pm_media.R
Last active Jul 24, 2021
How to Create a Large Structured Text Dataset Using R
View scrape_pm_media.R
# How to analyse 2.2 million words from 786 different speeches and interviews
# by the Prime Minister of Australia from Jan 2020- July 2021. After hearing
# one of the PM's speeches I began to wonder if transcripts of all his
# speeches and interviews were publicly available. It turns out they are!
# This code extract shows how I scraped the full text of 786 speeches and interviews
# using the #rvest package which comes with tidyverse.
# I recommend @Julia Silge's #Tidytext tools for analysis.