Skip to content

Instantly share code, notes, and snippets.

View fivejjs's full-sized avatar
🏠
Working from home

vfive fivejjs

🏠
Working from home
  • Data scientist and engineer
  • Sydney Australia
View GitHub Profile
#!/usr/bin/env python
# -*- coding: utf-8 -*-
#
# pegasos.py
#
# Copyright 2013 nipun batra <nipunb@iiitd.ac.in>
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
@fivejjs
fivejjs / gdoc.py
Last active August 29, 2015 14:10 — forked from mushfiq/gdoc.py
import os
import sys
from getpass import getpass
import gdata.docs.service
import gdata.spreadsheet.service
'''
get user information from the command line argument and
#!/bin/bash
# GTK+ and Firefox for Amazon Linux
# Written by Joseph Lawson 2012-06-03
# http://joekiller.com
# http://joekiller.com/2012/06/03/install-firefox-on-amazon-linux-x86_64-compiling-gtk/
# chmod 755 ./gtk-firefox.sh
# sudo ./gtk-firefox.sh
from TwitterSearch import *
import csv
def get_tweets(query, max = 2000):
# takes a search term (query) and a max number of tweets to find
# gets content from twitter and writes it to a csv bearing the name of your query
i = 0
search = query
@fivejjs
fivejjs / jython_classPathHacker_teradata.py
Last active August 29, 2015 14:27
jython 2.7.0 access Teradata with the modifed classPathHacker
### author: Jinjun Sun
### email: jsunster@gmail.com
### refer to https://gist.github.com/linkerlin/4654376#file-classpathhacker-py-L20
### refer to http://www.jython.org/jythonbook/en/1.0/appendixB.html#working-with-classpath
import java.sql as sql
import java.lang as lang
class classPathHacker :
@fivejjs
fivejjs / dispatch.py
Last active August 29, 2015 14:27 — forked from aortbals/dispatch.py
Synchronize two folders using python.
#! /usr/bin/python
# Dispatch - synchronize two folders
import os
import filecmp
import shutil
from stat import *
class Dispatch:
''' This class represents a synchronization object '''
@fivejjs
fivejjs / gist:df009c4680eb15151f90
Last active September 10, 2015 04:36 — forked from debasishg/gist:8172796
A collection of links for streaming algorithms and data structures
  1. General Background and Overview
@fivejjs
fivejjs / hmm.py
Created October 5, 2015 12:16 — forked from fonnesbeck/hmm.py
Hidden Markov model in PyMC
import numpy as np
import pymc
import pdb
def unconditionalProbability(Ptrans):
"""Compute the unconditional probability for the states of a
Markov chain."""
m = Ptrans.shape[0]
@fivejjs
fivejjs / crbm.py
Created November 2, 2015 06:15 — forked from gwtaylor/crbm.py
Theano CRBM demonstration
""" Theano CRBM implementation.
For details, see:
http://www.uoguelph.ca/~gwtaylor/publications/nips2006mhmublv
Sample data:
http://www.uoguelph.ca/~gwtaylor/publications/nips2006mhmublv/motion.mat
@author Graham Taylor"""
import numpy

Expand The Edinburgh Twitter FSD Corpus

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

  • Respect the Twitter API rate limits and throttle API hits.
  • Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
  • Parse the API response and dump it into the correct column in the sqlite3 database.
  • Gracefully handle exceptions while acquiring tweets from the API.
  • Wrap version 1.1 of the Twitter API.
  • Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.