Skip to content

Instantly share code, notes, and snippets.

@Jegp
Forked from NikolajX4000/Question.md
Last active April 4, 2019 06:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Jegp/6e79ee6b016e1cebd16d97200e7b015c to your computer and use it in GitHub Desktop.
Save Jegp/6e79ee6b016e1cebd16d97200e7b015c to your computer and use it in GitHub Desktop.
Prepared question for assignment 8

Assignment 8

Questions

Part 1

  1. Download the file ExtractedTweets.csvprogrammatically from this website: https://www.kaggle.com/kapastor/democratvsrepublicantweets#ExtractedTweets.csv

Part 2

  1. Find the word distribution for each party using CountVectorizer
  • Make a histogram of the top 10 most used words for each party
  1. Find the total word distribution using CountVectorizer
  • Plot a histogram of the top 10 most used words in total

Part 3

  1. Plot the number of tweets over time, so that time is on the x-axis and number of tweets is on the y-axis.
  2. Find the biggest peak in tweets and find out what they were tweeting about: is there a big event that made everyone push a tweet? Hand-in a description of what happened and a link to a larger news site (BBC/CNN/Times/etc.)

Review Questions

Part 1

Is the data correctly and automatically downloaded?

Part 2

Is the CountVectorizer used correctly? Are the histogram correctly made and do they have labelled axes?

Part 3

Are the tweets correctly counted over time? Does the plot correctly show the tweet count over time and does it include axes labels?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment