asaini/twitter2

## twitter2
{
 "metadata": {
  "name": "twitter2"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<h1><center>Tweet Sentiment Analysis</center></h1>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<center><img src=\"https://blog.twitter.com/sites/all/themes/gazebo/img/twitter-bird-white-on-blue.png\"></img></center>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<b>Sentiment analysis</b> aims to determine the attitude of a speaker or a writer with respect to some topic or the overall <i>contextual polarity</i> of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader). <sup><a href=\"http://en.wikipedia.org/wiki/Sentiment_analysis\">1</a></sup>  \n\nIn this tutorial, we will attempt to assign sentiment scores to tweets. Our goal in this exercise is to devise a way of assigning a score to a tweet such that a score > 0 implies positive sentiment and score < 0 implies negative sentiment. "
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Open a Python shell in the directory where you have saved <em>search.py</em>  \n  \n    $ python"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<code>import</code> the search module"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "import search",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 10
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "In the previous exercise, we had created the function, <code>printTweets()</code> which takes an input a query string and returns a list of tweets as well as prints them out. Test your function on a few inputs. "
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "tweets = search.printTweets('harlem nyc')",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "Harlem might be the coolest place to party in nyc.\nRT @TwinsProd_fr1: #RT #Video\n@robcashkar \nProduced by TwinsProd\nhttp://t.co/7QH6G7kQyi\n#KAR #Nyc #Harlem #Paris #France\n#Cashdeposit #KAR_\u2026\nNewly-planted mums, Spanish Harlem NYC http://t.co/5k4AVJR6lk\nHere are a couple recommendations for uptown coffee places I recommend #WaHi #Harlem #NYC #NationalCoffeeDay http://t.co/FotqOC8znN\nLittle altars everywhere, Spanish Harlem NYC http://t.co/Vtws9JZfuk\n#harlem #nyc @camrae52 @verifiedhoney \u270c\ufe0f\u2764\ufe0f#baggageclaim #amc14 http://t.co/5CFRhDYydF\n\"Puerto Rico,\" Spanish Harlem NYC http://t.co/x1elguoowE\nMe &amp; @camrae52 I appreciate you love thank you \u2764\ufe0f\u2764\ufe0f\u2764\ufe0f#nyc #baggageclaim #harlem #trey #tan http://t.co/xisiB0uJcs\nRT @apadillafilm6: These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\nThese guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\nRalina Cardona to run on GOP line &amp; take on @MMViverito in November RPT: Sybile Phn #East Harlem #South Bronx #NYC http://t.co/YlmCuae3Kw\nCommented on WingitApp http://t.co/IEXVruKVC8 - RT @Kingibb First time @HarlemTavern #picstitch #harlem #nyc #food #gumbo #wings #burge...\nIt's so #fun being #twins! twopointoh and I get each other. #grateful #harlem #nyc #spelhouse Sunday\u2026 http://t.co/2FsMGeRLYi\nFurthest Thing From Perfect. #NoFilter #NoEdit #Nikon #NikonD3000 #ImjustDifferent #Ls #NYC #Harlem http://t.co/SLkx9zOWHM\nPhoto: #nyc #food #nom #yum (at Harlem Public) http://t.co/Q2dY3gsSx6\n"
      }
     ],
     "prompt_number": 21
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "By default, Twitter returns 15 Tweets for every call made to the search API. We can change this default value by adding a count parameter as shown in the Search API documentation, https://dev.twitter.com/docs/api/1.1/get/search/tweets  \nLet us make the change in our <code>printTweets()</code> function, and add a count parameter."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "def printTweets(query):\n  params =  { \n\t\t\t'q'     :query,\n\t\t\t'lang'  : 'en',\n            'count' : 100, \n            }\n    \n  # A list to store the Tweet text string\n  text = []\n\n  response = fetchsamples(params)\n  data = response.read()\n  j = json.loads(data)\n  tweets = j['statuses']\n  for tweet in tweets:\n\ttext.append(tweet['text'])\n\tprint tweet['text']\n\t\n  return text",
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<h2>Sentiment Scores</h2>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "We are interested in assigning sentiment scores to each tweet. Consider the following tweet:  \n\n<em>\"Happy World Peace Day! We can't think of a better song to listen to than John Lennon's \"Imagine\"</em>\n\nWe can see that this looks like a positive tweet because it contains words like \"happy\", \"peace\", \"better\". \n\nIn contrast, the following tweet  \n\n<em>\"War mutilates the soldiers' psyches, and the media turns a blind eye so as to not remind the masses what war is like.\"</em>\n\ncontains words like \"war\" and \"mutilate\" which do not convey a positive sentiment. \n\n<strong>What if we can assign scores to each word in a tweet? </strong>  \n\nAs it happens, we can do that. <a href=\"http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010\">AFINN</a> is a dictionary compiled by researchers at Informatics and Mathematical Modelling Institute at Technical University of Denmark, which assign sentiment scores to commonly occuring words in the english lexicon. Listed below are the first few words and their corresponding scores from that dictionary..."
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<pre>\nabandon         -2\nabandoned       -2\nabandons        -2\nabducted        -2\nabduction       -2\nabductions      -2\nabhor           -3\nabhorred        -3\nabhorrent       -3\nabhors          -3\nabilities        2\nability          2\naboard           1\nabsentee        -1\nabsentees       -1\nabsolve          2\nabsolved         2\nabsolves         2\nabsolving        2\nabsorbed         1</pre>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Let us read this sentiment file and store its contents in a dictinary. We will create a function called <code>loadSentiments()</code> which reads in the sentiment file and stores its contents. Each line in the sentiment file contains a word, followed by a <em>tab space</em>, followed by the sentiment score. To obtain the word and the corresponding score from each line, we can use the <code>split()</code> function and split on the basis of tab space."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "def loadSentiments():\n    sentiments = {}\n    f = open('AFINN-111.txt')\n    for line in f.readlines():\n        word, score = line.split('\\t')\n        score = int(score)\n        sentiments[word] = score\n    return sentiments        ",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 7
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<h3>Scoring Method</h3>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "The sentiment score of a tweet will be the sum of sentiment scores assigned to the words present in the tweet.  \nFor each word, we will lookup the word in our dictionary. If we find that the word is not present in our dictionary, we will assign it a score of zero. We had previously fetched tweets using our <code>printTweets()</code> function and stored the result in a list called <code>tweets</code>"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "tweets",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 26,
       "text": "['Harlem might be the coolest place to party in nyc.',\n u'RT @TwinsProd_fr1: #RT #Video\\n@robcashkar \\nProduced by TwinsProd\\nhttp://t.co/7QH6G7kQyi\\n#KAR #Nyc #Harlem #Paris #France\\n#Cashdeposit #KAR_\\u2026',\n 'Newly-planted mums, Spanish Harlem NYC http://t.co/5k4AVJR6lk',\n 'Here are a couple recommendations for uptown coffee places I recommend #WaHi #Harlem #NYC #NationalCoffeeDay http://t.co/FotqOC8znN',\n 'Little altars everywhere, Spanish Harlem NYC http://t.co/Vtws9JZfuk',\n u'#harlem #nyc @camrae52 @verifiedhoney \\u270c\\ufe0f\\u2764\\ufe0f#baggageclaim #amc14 http://t.co/5CFRhDYydF',\n '\"Puerto Rico,\" Spanish Harlem NYC http://t.co/x1elguoowE',\n u'Me &amp; @camrae52 I appreciate you love thank you \\u2764\\ufe0f\\u2764\\ufe0f\\u2764\\ufe0f#nyc #baggageclaim #harlem #trey #tan http://t.co/xisiB0uJcs',\n 'RT @apadillafilm6: These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC',\n 'These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC',\n 'Ralina Cardona to run on GOP line &amp; take on @MMViverito in November RPT: Sybile Phn #East Harlem #South Bronx #NYC http://t.co/YlmCuae3Kw',\n 'Commented on WingitApp http://t.co/IEXVruKVC8 - RT @Kingibb First time @HarlemTavern #picstitch #harlem #nyc #food #gumbo #wings #burge...',\n u\"It's so #fun being #twins! twopointoh and I get each other. #grateful #harlem #nyc #spelhouse Sunday\\u2026 http://t.co/2FsMGeRLYi\",\n 'Furthest Thing From Perfect. #NoFilter #NoEdit #Nikon #NikonD3000 #ImjustDifferent #Ls #NYC #Harlem http://t.co/SLkx9zOWHM',\n 'Photo: #nyc #food #nom #yum (at Harlem Public) http://t.co/Q2dY3gsSx6']"
      }
     ],
     "prompt_number": 26
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "The words in the tweet are a combination of both lowecase and uppercased letters. Since our dictionary only contains lowercased words, we need to lowercase the text present in the tweets. Since tweets are stored as strings we can call the string's <a href=\"http://docs.python.org/2/library/stdtypes.html#str.lower\">lower()</a> method to lowercase the tweet. "
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "tweet = tweets[0]\nprint tweet\nprint tweet.lower()",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "Harlem might be the coolest place to party in nyc.\nharlem might be the coolest place to party in nyc.\n"
      }
     ],
     "prompt_number": 23
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "In the <em>Read and Write</em> exercise, we had used the <code>split()</code> function to break a sentence into words. We will use it here to break up our tweet into words."
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "words = tweet.lower().split()",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 24
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "words",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 25,
       "text": "['harlem',\n 'might',\n 'be',\n 'the',\n 'coolest',\n 'place',\n 'to',\n 'party',\n 'in',\n 'nyc.']"
      }
     ],
     "prompt_number": 25
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "sentiments = loadSentiments()",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 8
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "totalScore = 0\nfor word in words:\n    totalScore += sentiments.get(word, 0)\nprint totalScore",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "0\n"
      }
     ],
     "prompt_number": 9
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Let us put all what we wrote into a function called <code>printScores()</code>, which takes as input the following:  \n\n1. List of Tweets\n2. Sentiment Dictionary\n\nAnd prints the tweet and the sentiment score for each tweet"
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "def printScores(tweets, sent):\n\tfor tweet in tweets:\n\t\tscore = 0\n\t\ttweet_l = tweet.lower()\n\t\twords = tweet_l.split(' ')\n\t\tfor word in words:\n\t\t\tscore += sent.get(word, 0)\n\t\tprint tweet + '\\t' + str(score)",
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 5
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": "printScores(tweets, sentiments)",
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": "Harlem might be the coolest place to party in nyc.\t0\nRT @TwinsProd_fr1: #RT #Video\n@robcashkar \nProduced by TwinsProd\nhttp://t.co/7QH6G7kQyi\n#KAR #Nyc #Harlem #Paris #France\n#Cashdeposit #KAR_\u2026\t0\nNewly-planted mums, Spanish Harlem NYC http://t.co/5k4AVJR6lk\t0\nHere are a couple recommendations for uptown coffee places I recommend #WaHi #Harlem #NYC #NationalCoffeeDay http://t.co/FotqOC8znN\t2\nLittle altars everywhere, Spanish Harlem NYC http://t.co/Vtws9JZfuk\t0\n#harlem #nyc @camrae52 @verifiedhoney \u270c\ufe0f\u2764\ufe0f#baggageclaim #amc14 http://t.co/5CFRhDYydF\t0\n\"Puerto Rico,\" Spanish Harlem NYC http://t.co/x1elguoowE\t0\nMe &amp; @camrae52 I appreciate you love thank you \u2764\ufe0f\u2764\ufe0f\u2764\ufe0f#nyc #baggageclaim #harlem #trey #tan http://t.co/xisiB0uJcs\t7\nRT @apadillafilm6: These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\t-3\nThese guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\t-3\nRalina Cardona to run on GOP line &amp; take on @MMViverito in November RPT: Sybile Phn #East Harlem #South Bronx #NYC http://t.co/YlmCuae3Kw\t0\nCommented on WingitApp http://t.co/IEXVruKVC8 - RT @Kingibb First time @HarlemTavern #picstitch #harlem #nyc #food #gumbo #wings #burge...\t0\nIt's so #fun being #twins! twopointoh and I get each other. #grateful #harlem #nyc #spelhouse Sunday\u2026 http://t.co/2FsMGeRLYi\t0\nFurthest Thing From Perfect. #NoFilter #NoEdit #Nikon #NikonD3000 #ImjustDifferent #Ls #NYC #Harlem http://t.co/SLkx9zOWHM\t0\nPhoto: #nyc #food #nom #yum (at Harlem Public) http://t.co/Q2dY3gsSx6\t0\n"
      }
     ],
     "prompt_number": 22
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Lets take a look at some of the scores generated  \n\n<em>Me &amp; @camrae52 I appreciate you love thank you</em> ---> high positive score  \n<em>These guys outside my window have played drums badly every sunny Sunday since I was born ...</em> ---> negative score  \n<em>Harlem might be the coolest place to party in nyc.</em> ---> neutral score, even though it sounds like it should have had a positive score  \n\nOur method doesn't do so well for all tweets since our <em>sentiment score dictionary</em> has a limited number of entries, while the tweets we are trying to score can have words which don't fall in our dictionary. However, it seems to work well for tweets which have words which express strong sentiments.\n"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<h2>Additional Exercises</h2>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<strong>Getting more than a 100 tweets</strong>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Using the current <code>printTweets()</code> function, we can only get a maximum of 100 tweets. Extend the functionality of <code>printTweets()</code> such that it returns <em>x</em> tweets, where <em>x</em> is a parameter specifying the number of tweets to be returned.  \n\nConsult the Twitter Documentation at https://dev.twitter.com/docs/working-with-timelines, where they explain how to use the  <code>max_id</code> parameter."
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<strong>CHALLENGE! Calculating Sentiment Scores across a topic</strong>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Can we sum up the sentiment scores for all the tweets returned for a topic and arrive at a sentiment score on a topic as a whole?"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "<strong>What do you need to submit?</strong>"
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": "Submit your <em>search.py</em> script which contains the above enhancements, along with a 1 page writeup which explains the results you obtained."
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "twitter2"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<h1><center>Tweet Sentiment Analysis</center></h1>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<center><img src=\"https://blog.twitter.com/sites/all/themes/gazebo/img/twitter-bird-white-on-blue.png\"></img></center>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<b>Sentiment analysis</b> aims to determine the attitude of a speaker or a writer with respect to some topic or the overall <i>contextual polarity</i> of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader). <sup><a href=\"http://en.wikipedia.org/wiki/Sentiment_analysis\">1</a></sup> \n\nIn this tutorial, we will attempt to assign sentiment scores to tweets. Our goal in this exercise is to devise a way of assigning a score to a tweet such that a score > 0 implies positive sentiment and score < 0 implies negative sentiment. "
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Open a Python shell in the directory where you have saved <em>search.py</em> \n \n $ python"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<code>import</code> the search module"
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "import search",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 10
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "In the previous exercise, we had created the function, <code>printTweets()</code> which takes an input a query string and returns a list of tweets as well as prints them out. Test your function on a few inputs. "
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "tweets = search.printTweets('harlem nyc')",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "Harlem might be the coolest place to party in nyc.\nRT @TwinsProd_fr1: #RT #Video\n@robcashkar \nProduced by TwinsProd\nhttp://t.co/7QH6G7kQyi\n#KAR #Nyc #Harlem #Paris #France\n#Cashdeposit #KAR_\u2026\nNewly-planted mums, Spanish Harlem NYC http://t.co/5k4AVJR6lk\nHere are a couple recommendations for uptown coffee places I recommend #WaHi #Harlem #NYC #NationalCoffeeDay http://t.co/FotqOC8znN\nLittle altars everywhere, Spanish Harlem NYC http://t.co/Vtws9JZfuk\n#harlem #nyc @camrae52 @verifiedhoney \u270c\ufe0f\u2764\ufe0f#baggageclaim #amc14 http://t.co/5CFRhDYydF\n\"Puerto Rico,\" Spanish Harlem NYC http://t.co/x1elguoowE\nMe & @camrae52 I appreciate you love thank you \u2764\ufe0f\u2764\ufe0f\u2764\ufe0f#nyc #baggageclaim #harlem #trey #tan http://t.co/xisiB0uJcs\nRT @apadillafilm6: These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\nThese guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\nRalina Cardona to run on GOP line & take on @MMViverito in November RPT: Sybile Phn #East Harlem #South Bronx #NYC http://t.co/YlmCuae3Kw\nCommented on WingitApp http://t.co/IEXVruKVC8 - RT @Kingibb First time @HarlemTavern #picstitch #harlem #nyc #food #gumbo #wings #burge...\nIt's so #fun being #twins! twopointoh and I get each other. #grateful #harlem #nyc #spelhouse Sunday\u2026 http://t.co/2FsMGeRLYi\nFurthest Thing From Perfect. #NoFilter #NoEdit #Nikon #NikonD3000 #ImjustDifferent #Ls #NYC #Harlem http://t.co/SLkx9zOWHM\nPhoto: #nyc #food #nom #yum (at Harlem Public) http://t.co/Q2dY3gsSx6\n"
	}
	],
	"prompt_number": 21
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "By default, Twitter returns 15 Tweets for every call made to the search API. We can change this default value by adding a count parameter as shown in the Search API documentation, https://dev.twitter.com/docs/api/1.1/get/search/tweets \nLet us make the change in our <code>printTweets()</code> function, and add a count parameter."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "def printTweets(query):\n params = { \n\t\t\t'q' :query,\n\t\t\t'lang' : 'en',\n 'count' : 100, \n }\n \n # A list to store the Tweet text string\n text = []\n\n response = fetchsamples(params)\n data = response.read()\n j = json.loads(data)\n tweets = j['statuses']\n for tweet in tweets:\n\ttext.append(tweet['text'])\n\tprint tweet['text']\n\t\n return text",
	"language": "python",
	"metadata": {},
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<h2>Sentiment Scores</h2>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "We are interested in assigning sentiment scores to each tweet. Consider the following tweet: \n\n<em>\"Happy World Peace Day! We can't think of a better song to listen to than John Lennon's \"Imagine\"</em>\n\nWe can see that this looks like a positive tweet because it contains words like \"happy\", \"peace\", \"better\". \n\nIn contrast, the following tweet \n\n<em>\"War mutilates the soldiers' psyches, and the media turns a blind eye so as to not remind the masses what war is like.\"</em>\n\ncontains words like \"war\" and \"mutilate\" which do not convey a positive sentiment. \n\n<strong>What if we can assign scores to each word in a tweet? </strong> \n\nAs it happens, we can do that. <a href=\"http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010\">AFINN</a> is a dictionary compiled by researchers at Informatics and Mathematical Modelling Institute at Technical University of Denmark, which assign sentiment scores to commonly occuring words in the english lexicon. Listed below are the first few words and their corresponding scores from that dictionary..."
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<pre>\nabandon -2\nabandoned -2\nabandons -2\nabducted -2\nabduction -2\nabductions -2\nabhor -3\nabhorred -3\nabhorrent -3\nabhors -3\nabilities 2\nability 2\naboard 1\nabsentee -1\nabsentees -1\nabsolve 2\nabsolved 2\nabsolves 2\nabsolving 2\nabsorbed 1</pre>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Let us read this sentiment file and store its contents in a dictinary. We will create a function called <code>loadSentiments()</code> which reads in the sentiment file and stores its contents. Each line in the sentiment file contains a word, followed by a <em>tab space</em>, followed by the sentiment score. To obtain the word and the corresponding score from each line, we can use the <code>split()</code> function and split on the basis of tab space."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "def loadSentiments():\n sentiments = {}\n f = open('AFINN-111.txt')\n for line in f.readlines():\n word, score = line.split('\\t')\n score = int(score)\n sentiments[word] = score\n return sentiments ",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 7
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<h3>Scoring Method</h3>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "The sentiment score of a tweet will be the sum of sentiment scores assigned to the words present in the tweet. \nFor each word, we will lookup the word in our dictionary. If we find that the word is not present in our dictionary, we will assign it a score of zero. We had previously fetched tweets using our <code>printTweets()</code> function and stored the result in a list called <code>tweets</code>"
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "tweets",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 26,
	"text": "['Harlem might be the coolest place to party in nyc.',\n u'RT @TwinsProd_fr1: #RT #Video\\n@robcashkar \\nProduced by TwinsProd\\nhttp://t.co/7QH6G7kQyi\\n#KAR #Nyc #Harlem #Paris #France\\n#Cashdeposit #KAR_\\u2026',\n 'Newly-planted mums, Spanish Harlem NYC http://t.co/5k4AVJR6lk',\n 'Here are a couple recommendations for uptown coffee places I recommend #WaHi #Harlem #NYC #NationalCoffeeDay http://t.co/FotqOC8znN',\n 'Little altars everywhere, Spanish Harlem NYC http://t.co/Vtws9JZfuk',\n u'#harlem #nyc @camrae52 @verifiedhoney \\u270c\\ufe0f\\u2764\\ufe0f#baggageclaim #amc14 http://t.co/5CFRhDYydF',\n '\"Puerto Rico,\" Spanish Harlem NYC http://t.co/x1elguoowE',\n u'Me & @camrae52 I appreciate you love thank you \\u2764\\ufe0f\\u2764\\ufe0f\\u2764\\ufe0f#nyc #baggageclaim #harlem #trey #tan http://t.co/xisiB0uJcs',\n 'RT @apadillafilm6: These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC',\n 'These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC',\n 'Ralina Cardona to run on GOP line & take on @MMViverito in November RPT: Sybile Phn #East Harlem #South Bronx #NYC http://t.co/YlmCuae3Kw',\n 'Commented on WingitApp http://t.co/IEXVruKVC8 - RT @Kingibb First time @HarlemTavern #picstitch #harlem #nyc #food #gumbo #wings #burge...',\n u\"It's so #fun being #twins! twopointoh and I get each other. #grateful #harlem #nyc #spelhouse Sunday\\u2026 http://t.co/2FsMGeRLYi\",\n 'Furthest Thing From Perfect. #NoFilter #NoEdit #Nikon #NikonD3000 #ImjustDifferent #Ls #NYC #Harlem http://t.co/SLkx9zOWHM',\n 'Photo: #nyc #food #nom #yum (at Harlem Public) http://t.co/Q2dY3gsSx6']"
	}
	],
	"prompt_number": 26
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "The words in the tweet are a combination of both lowecase and uppercased letters. Since our dictionary only contains lowercased words, we need to lowercase the text present in the tweets. Since tweets are stored as strings we can call the string's <a href=\"http://docs.python.org/2/library/stdtypes.html#str.lower\">lower()</a> method to lowercase the tweet. "
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "tweet = tweets[0]\nprint tweet\nprint tweet.lower()",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "Harlem might be the coolest place to party in nyc.\nharlem might be the coolest place to party in nyc.\n"
	}
	],
	"prompt_number": 23
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "In the <em>Read and Write</em> exercise, we had used the <code>split()</code> function to break a sentence into words. We will use it here to break up our tweet into words."
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "words = tweet.lower().split()",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 24
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "words",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 25,
	"text": "['harlem',\n 'might',\n 'be',\n 'the',\n 'coolest',\n 'place',\n 'to',\n 'party',\n 'in',\n 'nyc.']"
	}
	],
	"prompt_number": 25
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "sentiments = loadSentiments()",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 8
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "totalScore = 0\nfor word in words:\n totalScore += sentiments.get(word, 0)\nprint totalScore",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "0\n"
	}
	],
	"prompt_number": 9
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Let us put all what we wrote into a function called <code>printScores()</code>, which takes as input the following: \n\n1. List of Tweets\n2. Sentiment Dictionary\n\nAnd prints the tweet and the sentiment score for each tweet"
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "def printScores(tweets, sent):\n\tfor tweet in tweets:\n\t\tscore = 0\n\t\ttweet_l = tweet.lower()\n\t\twords = tweet_l.split(' ')\n\t\tfor word in words:\n\t\t\tscore += sent.get(word, 0)\n\t\tprint tweet + '\\t' + str(score)",
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 5
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": "printScores(tweets, sentiments)",
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": "Harlem might be the coolest place to party in nyc.\t0\nRT @TwinsProd_fr1: #RT #Video\n@robcashkar \nProduced by TwinsProd\nhttp://t.co/7QH6G7kQyi\n#KAR #Nyc #Harlem #Paris #France\n#Cashdeposit #KAR_\u2026\t0\nNewly-planted mums, Spanish Harlem NYC http://t.co/5k4AVJR6lk\t0\nHere are a couple recommendations for uptown coffee places I recommend #WaHi #Harlem #NYC #NationalCoffeeDay http://t.co/FotqOC8znN\t2\nLittle altars everywhere, Spanish Harlem NYC http://t.co/Vtws9JZfuk\t0\n#harlem #nyc @camrae52 @verifiedhoney \u270c\ufe0f\u2764\ufe0f#baggageclaim #amc14 http://t.co/5CFRhDYydF\t0\n\"Puerto Rico,\" Spanish Harlem NYC http://t.co/x1elguoowE\t0\nMe & @camrae52 I appreciate you love thank you \u2764\ufe0f\u2764\ufe0f\u2764\ufe0f#nyc #baggageclaim #harlem #trey #tan http://t.co/xisiB0uJcs\t7\nRT @apadillafilm6: These guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\t-3\nThese guys outside my window have played drums badly every sunny Sunday since I was born #Spanish Harlem #NYC\t-3\nRalina Cardona to run on GOP line & take on @MMViverito in November RPT: Sybile Phn #East Harlem #South Bronx #NYC http://t.co/YlmCuae3Kw\t0\nCommented on WingitApp http://t.co/IEXVruKVC8 - RT @Kingibb First time @HarlemTavern #picstitch #harlem #nyc #food #gumbo #wings #burge...\t0\nIt's so #fun being #twins! twopointoh and I get each other. #grateful #harlem #nyc #spelhouse Sunday\u2026 http://t.co/2FsMGeRLYi\t0\nFurthest Thing From Perfect. #NoFilter #NoEdit #Nikon #NikonD3000 #ImjustDifferent #Ls #NYC #Harlem http://t.co/SLkx9zOWHM\t0\nPhoto: #nyc #food #nom #yum (at Harlem Public) http://t.co/Q2dY3gsSx6\t0\n"
	}
	],
	"prompt_number": 22
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Lets take a look at some of the scores generated \n\n<em>Me & @camrae52 I appreciate you love thank you</em> ---> high positive score \n<em>These guys outside my window have played drums badly every sunny Sunday since I was born ...</em> ---> negative score \n<em>Harlem might be the coolest place to party in nyc.</em> ---> neutral score, even though it sounds like it should have had a positive score \n\nOur method doesn't do so well for all tweets since our <em>sentiment score dictionary</em> has a limited number of entries, while the tweets we are trying to score can have words which don't fall in our dictionary. However, it seems to work well for tweets which have words which express strong sentiments.\n"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<h2>Additional Exercises</h2>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<strong>Getting more than a 100 tweets</strong>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Using the current <code>printTweets()</code> function, we can only get a maximum of 100 tweets. Extend the functionality of <code>printTweets()</code> such that it returns <em>x</em> tweets, where <em>x</em> is a parameter specifying the number of tweets to be returned. \n\nConsult the Twitter Documentation at https://dev.twitter.com/docs/working-with-timelines, where they explain how to use the <code>max_id</code> parameter."
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<strong>CHALLENGE! Calculating Sentiment Scores across a topic</strong>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Can we sum up the sentiment scores for all the tweets returned for a topic and arrive at a sentiment score on a topic as a whole?"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "<strong>What do you need to submit?</strong>"
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": "Submit your <em>search.py</em> script which contains the above enhancements, along with a 1 page writeup which explains the results you obtained."
	}
	],
	"metadata": {}
	}
	]
	}