Skip to content

Instantly share code, notes, and snippets.

@ptwobrussell
ptwobrussell / gist:668166
Created November 8, 2010 19:52
Visualizing Twitter Search Results:
Visualizing Twitter Search Results with Protovis and/or Graphviz is this easy:
$ easy_install twitter # See https://github.com/sixohsix/twitter and http://pypi.python.org/pypi/setuptools
$ git clone https://github.com/ptwobrussell/Mining-the-Social-Web.git
$ cd Mining-the-Social-Web/python_code
$ python introduction__retweet_visualization.py TeaParty # or whatever you want to search for
Your browser should pop open and display the results as a force directed graph, but also check your console for some useful output.
You can create an image file from the DOT language output with a command like the following:
@ptwobrussell
ptwobrussell / gist:1592859
Created January 11, 2012 03:33
Mining the Social Web, Example 1-3 (works as of Jan 10, 2012 and last tested on 4 April 2012)
# Twitter's Trends API has been in flux since Feburary 2011 when Mining the Social Web was published
# and unfortunately, this is causing some confusion in the earliest examples.
# See also https://dev.twitter.com/docs/api/1/get/trends
# Note that the twitter package that's being imported is from https://github.com/sixohsix/twitter
# If you have first done an "easy_install pip" to get pip, you could easily install the latest
# version directly from GitHub as follows:
# $ pip install -e git+http://github.com/sixohsix/twitter.git#egg=github-pip-install
@ptwobrussell
ptwobrussell / gist:1877506
Last active February 10, 2019 11:48
Some analysis of capturing/redirecting UTF-8 output with Python 2
# -*- coding: utf-8 -*-
# Studying this script might be helpful in understanding why UnicodeDecode errors
# sometimes happen when trying to capture utf-8 output to files with Python 2 even
# though the output prints to your (utf-8 capable) terminal.
# Note that the first line of this file is called the Byte Order Marker (BOM), which
# is a directive to tell Python that it should treat this file as utf-8 (i.e. comments and
# string values may be utf-8)
@ptwobrussell
ptwobrussell / Syria.geojson
Created September 4, 2013 15:02
Tweets about #Syria from Sept 1, 2013 through Sept 3, 2013. See also http://miningthesocialweb.com/2013/09/04/what-are-people-saying-about-syria-in-your-neck-of-the-woods/ for the back story on this data.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ptwobrussell
ptwobrussell / SyriaExport-09072013.geojson
Created September 8, 2013 03:42
Of ~1.1M tweets about #Syria collected from between 1 Sept 2013 and 7 Sept 2013, ~0.5% (~6,000) of them included geocoordinates. Click on the visualization below to zoom in on particular areas of the world and see what people are tweeting about. Note that in some areas (such as in Berkeley, CA) single users account for disproportionate numbers o…
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ptwobrussell
ptwobrussell / Helper Code for Saving LinkedIn Contacts to a Remote VM
Created October 28, 2013 03:47
If you are running on a hosted AWS VM provided for the Strata workshop, you'll need to use the following approach to follow along with Example 6 and on, because you aren't using Vagrant and won't have another way to copy your data onto the remote machine. All you'll need to do is create a new cell in the "Chapter 3 (Mining LinkedIn)" IPython Not…
# On a remote AWS VM, you'll need to create and save your
# CSV connections to the the remote VM before executing Example 6
# since you're not using Vagrant (and since we won't be using SSH
# as part of the workshop.)
# Copy/paste your connections (or a large subset of them) into a string value
# that's bounded by triple quotes like the following example (which defines only
# a single contact for brevity.)
csv_as_string = \
@ptwobrussell
ptwobrussell / summarize.py
Created December 1, 2013 01:25
An example of how to use yhat's cloud server to "predict" summaries of news articles.
########################################################################
#
# An example of how to deploy a custom predictive model to yhat
# and "predict" the summary for a news article.
#
# Input: URL for a web page containing a news article
#
# Output: Summary of the "story" in the web page for the URL
#
# Example usage: $ python summarizer.py <username> <apikey> <url>
@ptwobrussell
ptwobrussell / gist:8243923
Created January 3, 2014 18:49
A little script for converting tweets with geocoords to geojson format. (The script assumes that you've flattened the tweets down to a CSV format with the field format described in the tuple of the "for" loop, which is easy enough to do.)
import geojson
import sys
lines = [line.strip().split("\t") for line in open(sys.argv[1]).readlines()]
features = []
for (x, y, _id, text, screen_name, utc) in lines:
props = dict(text=text, screen_name=screen_name, utc=utc)
features.append( geojson.Feature(id=_id, geometry=geojson.Point(coordinates=(x,y)), properties=props) )
@ptwobrussell
ptwobrussell / MTSW2E Example 6-3 Improvements
Last active December 26, 2016 23:08
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with improvements toward getting the code to work seamlessly on mailboxes exported from Google Takeout.
"""
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with the following modifications:
* Extra debugging information is written to sys.stderr to help isolate any problematic content
that may be encountered.
* A (hopeful) fix to a blasted UnicodeEncodeError in cleanContent() that may be triggered from
quopri.decodestring attempting to decode an already decoded Unicode value.
* The JSONification in jsonifyMessage now ignores any content that's not text. MIME-encoded content
such as images, PDFs, and other non-text data that is not useful for textual analysis without
significant additional work is now no longer carried forward into the JSON for import into MongoDB.
@ptwobrussell
ptwobrussell / MTSW2E Example 6-13 Improvements
Created February 8, 2014 01:51
Improvements to Example 6-13 that use regular expressions to enable searching by an email address as opposed to an exact string match on the From: field of a JSONified mbox
import json
import pymongo # pip install pymongo
from bson import json_util # Comes with pymongo
import re
# The basis of our query
FROM = "noreply@coursera.org" # As opposed to a value like "Coursera <noreply@coursera.org>"
client = pymongo.MongoClient()