Skip to content

Instantly share code, notes, and snippets.

@ptwobrussell
ptwobrussell / MTSW2E Example 6-13 Improvements
Created February 8, 2014 01:51
Improvements to Example 6-13 that use regular expressions to enable searching by an email address as opposed to an exact string match on the From: field of a JSONified mbox
import json
import pymongo # pip install pymongo
from bson import json_util # Comes with pymongo
import re
# The basis of our query
FROM = "noreply@coursera.org" # As opposed to a value like "Coursera <noreply@coursera.org>"
client = pymongo.MongoClient()
@ptwobrussell
ptwobrussell / nuztap-requirements.txt
Last active August 29, 2015 14:00
Nuztap Python requirements. Install with "pip install -r nuztap-requirements.txt"
BeautifulSoup==3.2.1
GnuPGInterface==0.3.2
JPype1==0.5.5.2
Landscape-Client==13.07.3
PAM==0.4.2
PyYAML==3.11
Twisted-Core==11.1.0
apt-xapian-index==0.44
argparse==1.2.1
boilerpipe==1.2.0.0
" Place this file in ~/.vimrc
" Get it with:
" wget --no-check-certificate https://gist.githubusercontent.com/ptwobrussell/05600f8d955be3423ba1/raw/374b960cea9c3cf28cb0757db6136ba4fce3b196/gistfile1.txt -O .vimrc
" When started as "evim", evim.vim will already have done these settings.
if v:progname =~? "evim"
finish
endif
" Use Vim settings, rather then Vi settings (much better!).
@ptwobrussell
ptwobrussell / gist:668166
Created November 8, 2010 19:52
Visualizing Twitter Search Results:
Visualizing Twitter Search Results with Protovis and/or Graphviz is this easy:
$ easy_install twitter # See https://github.com/sixohsix/twitter and http://pypi.python.org/pypi/setuptools
$ git clone https://github.com/ptwobrussell/Mining-the-Social-Web.git
$ cd Mining-the-Social-Web/python_code
$ python introduction__retweet_visualization.py TeaParty # or whatever you want to search for
Your browser should pop open and display the results as a force directed graph, but also check your console for some useful output.
You can create an image file from the DOT language output with a command like the following:
@ptwobrussell
ptwobrussell / gist:1592859
Created January 11, 2012 03:33
Mining the Social Web, Example 1-3 (works as of Jan 10, 2012 and last tested on 4 April 2012)
# Twitter's Trends API has been in flux since Feburary 2011 when Mining the Social Web was published
# and unfortunately, this is causing some confusion in the earliest examples.
# See also https://dev.twitter.com/docs/api/1/get/trends
# Note that the twitter package that's being imported is from https://github.com/sixohsix/twitter
# If you have first done an "easy_install pip" to get pip, you could easily install the latest
# version directly from GitHub as follows:
# $ pip install -e git+http://github.com/sixohsix/twitter.git#egg=github-pip-install
@ptwobrussell
ptwobrussell / Syria.geojson
Created September 4, 2013 15:02
Tweets about #Syria from Sept 1, 2013 through Sept 3, 2013. See also http://miningthesocialweb.com/2013/09/04/what-are-people-saying-about-syria-in-your-neck-of-the-woods/ for the back story on this data.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ptwobrussell
ptwobrussell / SyriaExport-09072013.geojson
Created September 8, 2013 03:42
Of ~1.1M tweets about #Syria collected from between 1 Sept 2013 and 7 Sept 2013, ~0.5% (~6,000) of them included geocoordinates. Click on the visualization below to zoom in on particular areas of the world and see what people are tweeting about. Note that in some areas (such as in Berkeley, CA) single users account for disproportionate numbers o…
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ptwobrussell
ptwobrussell / Helper Code for Saving LinkedIn Contacts to a Remote VM
Created October 28, 2013 03:47
If you are running on a hosted AWS VM provided for the Strata workshop, you'll need to use the following approach to follow along with Example 6 and on, because you aren't using Vagrant and won't have another way to copy your data onto the remote machine. All you'll need to do is create a new cell in the "Chapter 3 (Mining LinkedIn)" IPython Not…
# On a remote AWS VM, you'll need to create and save your
# CSV connections to the the remote VM before executing Example 6
# since you're not using Vagrant (and since we won't be using SSH
# as part of the workshop.)
# Copy/paste your connections (or a large subset of them) into a string value
# that's bounded by triple quotes like the following example (which defines only
# a single contact for brevity.)
csv_as_string = \
@ptwobrussell
ptwobrussell / summarize.py
Created December 1, 2013 01:25
An example of how to use yhat's cloud server to "predict" summaries of news articles.
########################################################################
#
# An example of how to deploy a custom predictive model to yhat
# and "predict" the summary for a news article.
#
# Input: URL for a web page containing a news article
#
# Output: Summary of the "story" in the web page for the URL
#
# Example usage: $ python summarizer.py <username> <apikey> <url>
@ptwobrussell
ptwobrussell / gist:8243923
Created January 3, 2014 18:49
A little script for converting tweets with geocoords to geojson format. (The script assumes that you've flattened the tweets down to a CSV format with the field format described in the tuple of the "for" loop, which is easy enough to do.)
import geojson
import sys
lines = [line.strip().split("\t") for line in open(sys.argv[1]).readlines()]
features = []
for (x, y, _id, text, screen_name, utc) in lines:
props = dict(text=text, screen_name=screen_name, utc=utc)
features.append( geojson.Feature(id=_id, geometry=geojson.Point(coordinates=(x,y)), properties=props) )