Skip to content

Instantly share code, notes, and snippets.

Matthew A. Russell ptwobrussell

Block or report user

Report or block ptwobrussell

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
View gist:05600f8d955be3423ba1
" Place this file in ~/.vimrc
" Get it with:
" wget --no-check-certificate https://gist.githubusercontent.com/ptwobrussell/05600f8d955be3423ba1/raw/374b960cea9c3cf28cb0757db6136ba4fce3b196/gistfile1.txt -O .vimrc
" When started as "evim", evim.vim will already have done these settings.
if v:progname =~? "evim"
finish
endif
" Use Vim settings, rather then Vi settings (much better!).
@ptwobrussell
ptwobrussell / nuztap-requirements.txt
Last active Aug 29, 2015
Nuztap Python requirements. Install with "pip install -r nuztap-requirements.txt"
View nuztap-requirements.txt
BeautifulSoup==3.2.1
GnuPGInterface==0.3.2
JPype1==0.5.5.2
Landscape-Client==13.07.3
PAM==0.4.2
PyYAML==3.11
Twisted-Core==11.1.0
apt-xapian-index==0.44
argparse==1.2.1
boilerpipe==1.2.0.0
@ptwobrussell
ptwobrussell / MTSW2E Example 6-13 Improvements
Created Feb 8, 2014
Improvements to Example 6-13 that use regular expressions to enable searching by an email address as opposed to an exact string match on the From: field of a JSONified mbox
View MTSW2E Example 6-13 Improvements
import json
import pymongo # pip install pymongo
from bson import json_util # Comes with pymongo
import re
# The basis of our query
FROM = "noreply@coursera.org" # As opposed to a value like "Coursera <noreply@coursera.org>"
client = pymongo.MongoClient()
@ptwobrussell
ptwobrussell / MTSW2E Example 6-3 Improvements
Last active Dec 26, 2016
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with improvements toward getting the code to work seamlessly on mailboxes exported from Google Takeout.
View MTSW2E Example 6-3 Improvements
"""
A modification of MTSW2E Example 6-3 (http://bit.ly/1aWYgAv) with the following modifications:
* Extra debugging information is written to sys.stderr to help isolate any problematic content
that may be encountered.
* A (hopeful) fix to a blasted UnicodeEncodeError in cleanContent() that may be triggered from
quopri.decodestring attempting to decode an already decoded Unicode value.
* The JSONification in jsonifyMessage now ignores any content that's not text. MIME-encoded content
such as images, PDFs, and other non-text data that is not useful for textual analysis without
significant additional work is now no longer carried forward into the JSON for import into MongoDB.
@ptwobrussell
ptwobrussell / gist:8243923
Created Jan 3, 2014
A little script for converting tweets with geocoords to geojson format. (The script assumes that you've flattened the tweets down to a CSV format with the field format described in the tuple of the "for" loop, which is easy enough to do.)
View gist:8243923
import geojson
import sys
lines = [line.strip().split("\t") for line in open(sys.argv[1]).readlines()]
features = []
for (x, y, _id, text, screen_name, utc) in lines:
props = dict(text=text, screen_name=screen_name, utc=utc)
features.append( geojson.Feature(id=_id, geometry=geojson.Point(coordinates=(x,y)), properties=props) )
@ptwobrussell
ptwobrussell / summarize.py
Created Dec 1, 2013
An example of how to use yhat's cloud server to "predict" summaries of news articles.
View summarize.py
########################################################################
#
# An example of how to deploy a custom predictive model to yhat
# and "predict" the summary for a news article.
#
# Input: URL for a web page containing a news article
#
# Output: Summary of the "story" in the web page for the URL
#
# Example usage: $ python summarizer.py <username> <apikey> <url>
@ptwobrussell
ptwobrussell / Helper Code for Saving LinkedIn Contacts to a Remote VM
Created Oct 28, 2013
If you are running on a hosted AWS VM provided for the Strata workshop, you'll need to use the following approach to follow along with Example 6 and on, because you aren't using Vagrant and won't have another way to copy your data onto the remote machine. All you'll need to do is create a new cell in the "Chapter 3 (Mining LinkedIn)" IPython Not…
View Helper Code for Saving LinkedIn Contacts to a Remote VM
# On a remote AWS VM, you'll need to create and save your
# CSV connections to the the remote VM before executing Example 6
# since you're not using Vagrant (and since we won't be using SSH
# as part of the workshop.)
# Copy/paste your connections (or a large subset of them) into a string value
# that's bounded by triple quotes like the following example (which defines only
# a single contact for brevity.)
csv_as_string = \
@ptwobrussell
ptwobrussell / SyriaExport-09072013.geojson
Created Sep 8, 2013
Of ~1.1M tweets about #Syria collected from between 1 Sept 2013 and 7 Sept 2013, ~0.5% (~6,000) of them included geocoordinates. Click on the visualization below to zoom in on particular areas of the world and see what people are tweeting about. Note that in some areas (such as in Berkeley, CA) single users account for disproportionate numbers o…
View SyriaExport-09072013.geojson
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ptwobrussell
ptwobrussell / Syria.geojson
Created Sep 4, 2013
Tweets about #Syria from Sept 1, 2013 through Sept 3, 2013. See also http://miningthesocialweb.com/2013/09/04/what-are-people-saying-about-syria-in-your-neck-of-the-woods/ for the back story on this data.
View Syria.geojson
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@ptwobrussell
ptwobrussell / gist:1877506
Last active Feb 10, 2019
Some analysis of capturing/redirecting UTF-8 output with Python 2
View gist:1877506
# -*- coding: utf-8 -*-
# Studying this script might be helpful in understanding why UnicodeDecode errors
# sometimes happen when trying to capture utf-8 output to files with Python 2 even
# though the output prints to your (utf-8 capable) terminal.
# Note that the first line of this file is called the Byte Order Marker (BOM), which
# is a directive to tell Python that it should treat this file as utf-8 (i.e. comments and
# string values may be utf-8)
You can’t perform that action at this time.