Skip to content

Instantly share code, notes, and snippets.

@bdupreez
Created September 8, 2013 07:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bdupreez/80b51b024d90d5989189 to your computer and use it in GitHub Desktop.
Save bdupreez/80b51b024d90d5989189 to your computer and use it in GitHub Desktop.
Twitter search stream, IPython Notebook
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "TwitterSearchStream"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": "Twitter search stream"
},
{
"cell_type": "raw",
"metadata": {},
"source": "Using: Python Twitter Tools\nhttp://mike.verdone.ca/twitter/"
},
{
"cell_type": "raw",
"metadata": {},
"source": "Required imports"
},
{
"cell_type": "code",
"collapsed": false,
"input": "from time import time\nfrom twitter import Twitter, TwitterStream, OAuth\nimport re",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "raw",
"metadata": {},
"source": "Some regular expressions to remove urls, users and hashtags (can exclude if wanted)"
},
{
"cell_type": "code",
"collapsed": false,
"input": "http_pattern = re.compile('(https?://\\S+)', re.IGNORECASE)\nuser_pattern = re.compile('(@\\S+)', re.IGNORECASE)\nhash_pattern = re.compile('(#\\S+)', re.IGNORECASE)",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "raw",
"metadata": {},
"source": "The tokens are necessary for user authentication, you will need to create an application on https://dev.twitter.com/apps\nOnce you have the tokens... replace XXXXXX below.\n\nFunction returning an OAuth object"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def twitter_auth():\n # these tokens are necessary for user authentication\n # (created within the twitter developer API pages)\n consumer_key = \"XXXXXX\"\n consumer_secret = \"XXXXXXXXXXXX\"\n access_key = \"XXXXXXXXXXXXXXXXXX\"\n access_secret = \"XXXXXXXXXXXXXXXXXX\"\n\n # create twitter API object\n auth = OAuth(access_key, access_secret, consumer_key, consumer_secret)\n return auth",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "raw",
"metadata": {},
"source": "Clean the text part of the tweet json string"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def clean_tweet(status):\n tweet = status['text']\n tweet = re.sub(http_pattern, '', tweet)\n tweet = re.sub(user_pattern, '', tweet)\n tweet = re.sub(hash_pattern, '', tweet) \n #replace new lines, tabs, spaces with a space\n tweet = tweet.replace('\\n', ' ')\n tweet = tweet.replace('\\t', ' ')\n tweet = tweet.replace('\\\\s+', ' ')\n #can contain some weird char\n tweet = tweet.encode('ascii', 'ignore')\n return tweet",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "raw",
"metadata": {},
"source": "Create a twitter stream, search the statuses for you search terms (can use upto 400, before having to chat to twitter)"
},
{
"cell_type": "code",
"collapsed": false,
"input": "def download_stream(search_terms, filename,search_time=30):\n stream = TwitterStream(auth=twitter_auth(), secure=True)\n tweet_iter = stream.statuses.filter(track=search_terms)\n end_time = time() + search_time\n \n #open for append, so it can be run over and over\n with open(filename, \"a\") as tfile:\n for itweet in tweet_iter:\n #I only wanted english tweets\n if itweet['lang'] == 'en':\n tweet = clean_tweet(itweet)\n if len(tweet) == 0:\n continue\n tfile.write(tweet + '\\n')\n print tweet\n if time() > end_time:\n print \"... done ... ran for %i seconds ...\" % search_time\n break",
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": "search_str = (\"python,java,artificial intelligence,machine learning,data mining,programming,software,\"\n \"software development,software design,natural language processing,linear regression,\"\n \"deep learning,k-means,open source,source code,api,web service,matplotlib,scikit-learn\"\n \"ML,ensemble learning,clustering,classification,SVM,nearest neighbors,random forest,PCA,\"\n \"decision tree,image recognition\")\ndownload_stream(search_str, 'test.txt', search_time=10)",
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": " I love it.. it has the blackberry software so it can be controlled through active directory.. BYOD is killing IT people\nRT The NSA has used covert relationships with tech companies to insert vulnerabilities into consumer security software http:/"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\ncan use it to manage your installed andor running software program"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n[GET] Quick Deals Generator Software Lets You Create Deal Pages In Less Than 5 Minutes! DOWNLOAD "
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n [GET] Quick Deals Generator Software Lets You Create Deal Pages In Less Than 5 Minutes! DOWNLOAD"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\nRT Unitrends is looking for: Software Developer (User Interface) \nDJ Facebook Chatrooms Chat Software"
},
{
"output_type": "stream",
"stream": "stdout",
"text": "\n... done ... ran for 10 seconds ...\n"
}
],
"prompt_number": 6
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment