- Download the file
ExtractedTweets.csv
programmatically from this website: https://www.kaggle.com/kapastor/democratvsrepublicantweets#ExtractedTweets.csv
- Find the word distribution for each party using
CountVectorizer
- Make a histogram of the top 10 most used words for each party
- Find the total word distribution using
CountVectorizer
- Plot a histogram of the top 10 most used words in total
- Plot the number of tweets over time, so that time is on the x-axis and number of tweets is on the y-axis.
- Find the biggest peak in tweets and find out what they were tweeting about: is there a big event that made everyone push a tweet? Hand-in a description of what happened and a link to a larger news site (BBC/CNN/Times/etc.)
Is the data correctly and automatically downloaded?
Is the CountVectorizer
used correctly?
Are the histogram correctly made and do they have labelled axes?
Are the tweets correctly counted over time? Does the plot correctly show the tweet count over time and does it include axes labels?