Skip to content

Instantly share code, notes, and snippets.

@meli-lewis
Last active April 19, 2020 19:16
Show Gist options
  • Save meli-lewis/79fe6785900ad7ace3394fdaa8465bb0 to your computer and use it in GitHub Desktop.
Save meli-lewis/79fe6785900ad7ace3394fdaa8465bb0 to your computer and use it in GitHub Desktop.
4/18/20 Biden Campaign Ad Transcript
#!/usr/bin/env python
# coding: utf-8
import matplotlib as mpl
import nltk
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
porter = PorterStemmer()
# Load transcript of ad: https://www.youtube.com/watch?v=PmieUrXwKCc
df = pd.read_csv('20200418_biden_ad_transcript.csv')
# Join each entry of transcript text to one string
full_text = ' '.join(x.lower() for x in list(df['Text']))
# Convert to format for NLTK processing
tokens = nltk.word_tokenize(all_text)
# Built-in stemmers didn't recognize "Chinese" and "China" as having a common stem.
tokens = [re.sub('chinese', 'china', token) for token in tokens]
# Remove stop words (e.g. "the" or "from")
stop_words = set(stopwords.words('english'))
tokens_filtered = [word for word in tokens if word not in stop_words]
# Remove punctuation
words = [token for token in tokens_filtered if token.isalpha()]
# There's a better way of doing this, but I'm manually replacing references to either "joe" or "biden" to "joe biden" as a single entity.
words = [re.sub('joe|biden', 'joe biden', x) for x in words]
# Produce a frequency distribution of words in text
fd = nltk.FreqDist(words)
# Plot the 20 most common words
fd.plot(20, title='Word Frequency in 4/18/20 Biden Ad')
Start Stop Text Speaker
00:00:00 00:00:21 He failed to act. So now, Trump and his allies are launching negative attacks against Joe Biden to hide the truth. Here are the facts. Joe Biden warned the nation in January that Trump had left us unprepared for a pandemic. Then Biden told Trump he should insist on having American health experts on the ground in China. Narrator
00:00:21 00:00:30 I would be on the phone with China and making it clear we are going to need to be in your country. You have to be open. You have to be clear. We have to know what's going on. Joe Biden
00:00:31 00:00:34 But Trump rolled over for the Chinese. He took their word for it. Narrator
00:00:34 00:00:42 The president tweeted. China has been working very hard to contain the coronavirus. The United States greatly appreciates their efforts and transparency. Newscast
00:00:42 00:00:47 China I spoke with President Xi and they're working very, very hard. And I think it's going to all work out fine. Donald Trump
00:00:47 00:00:55 Trump praised the Chinese 15 times in January and February as the Coronavirus spread across the world. Narrator
00:00:55 00:00:57 It's a tough situation. I think they're doing a very good job. Donald Trump
00:01:01 00:01:03 I think that China will do a very good job. Donald Trump
00:01:03 00:01:33 Trump never got a CDC team on the ground in China and the travel ban he brags about? Trump led in 40000 travelers from China into America after he signed it. Not exactly airtight. Look around: 22 million Americans are out of work. And we have more officially reported cases and deaths than any other country. Donald Trump left this country unprepared and unprotected for the worst public health and economic crisis in our lifetime. Narrator
00:01:33 00:01:39 And now we are paying the price. All the negative ads in the world can't change the truth. Narrator
@meli-lewis
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment