Skip to content

Instantly share code, notes, and snippets.

View fivejjs's full-sized avatar
🏠
Working from home

vfive fivejjs

🏠
Working from home
  • Data scientist and engineer
  • Sydney Australia
View GitHub Profile

Expand The Edinburgh Twitter FSD Corpus

The Python scripts attached here take care of the following tedious work, and should help one quickly get started with some real work on the corpus:

  • Respect the Twitter API rate limits and throttle API hits.
  • Don't hit the API for already expanded tweet ID's, so you can resume tweet expansion after stopping midway.
  • Parse the API response and dump it into the correct column in the sqlite3 database.
  • Gracefully handle exceptions while acquiring tweets from the API.
  • Wrap version 1.1 of the Twitter API.
  • Start from a specified tweet ID, assuming the input file is sorted in increasing order of tweet ID.
import nltk
text = """The Buddha, the Godhead, resides quite as comfortably in the circuits of a digital
computer or the gears of a cycle transmission as he does at the top of a mountain
or in the petals of a flower. To think otherwise is to demean the Buddha...which is
to demean oneself."""
# Used when tokenizing words
sentence_re = r'''(?x) # set flag to allow verbose regexps
([A-Z])(\.[A-Z])+\.? # abbreviations, e.g. U.S.A.
@fivejjs
fivejjs / gist:6babb65de8505af5e2ca
Created January 13, 2016 14:00 — forked from karpathy/gist:587454dc0146a6ae21fc
An efficient, batched LSTM.
"""
This is a batched LSTM forward and backward pass
"""
import numpy as np
import code
class LSTM:
@staticmethod
def init(input_size, hidden_size, fancy_forget_bias_init = 3):
@fivejjs
fivejjs / readme.md
Created May 7, 2016 03:49 — forked from baraldilorenzo/readme.md
VGG-16 pre-trained model for Keras

##VGG16 model for Keras

This is the Keras model of the 16-layer network used by the VGG team in the ILSVRC-2014 competition.

It has been obtained by directly converting the Caffe model provived by the authors.

Details about the network architecture can be found in the following arXiv paper:

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan, A. Zisserman

@fivejjs
fivejjs / 0_reuse_code.js
Created August 24, 2016 02:40
Here are some things you can do with Gists in GistBox.
// Use Gists to store code you would like to remember later on
console.log(window); // log the "window" object to the console
- word2vec https://arxiv.org/abs/1310.4546
- sentenc2vec, paragraph2vec, doc2vec https://cs.stanford.edu/~quocle/paragraph_vector.pdf
- tweet2vec http://arxiv.org/abs/1605.03481
- tweet2vec http://socialmachines.media.mit.edu/wp-content/uploads/sites/27/2016/05/tweet2vec_vvr.pdf
- author2vec http://dl.acm.org/citation.cfm?id=2889382
- item2vec http://arxiv.org/abs/1603.04259
- lda2vec https://arxiv.org/abs/1605.02019
- illustration2vec http://dl.acm.org/citation.cfm?id=2820907
- tag2vec http://ktsaurabh.weebly.com/uploads/3/1/7/8/31783965/distributed_representations_for_content-based_and_personalized_tag_recommendation.pdf
- category2vec http://www.anlp.jp/proceedings/annual_meeting/2015/pdf_dir/C4-3.pdf
@fivejjs
fivejjs / stuns
Created October 22, 2016 03:39 — forked from zziuni/stuns
STUN server list
# source : http://code.google.com/p/natvpn/source/browse/trunk/stun_server_list
# A list of available STUN server.
stun.l.google.com:19302
stun1.l.google.com:19302
stun2.l.google.com:19302
stun3.l.google.com:19302
stun4.l.google.com:19302
stun01.sipphone.com
stun.ekiga.net
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@fivejjs
fivejjs / howto.md
Created February 2, 2017 09:07 — forked from persiyanov/howto.md
How-to get Amazon EC2 instance and do machine learning on it. Jupyter 4.0.6 server and Python 2.7.

Goal

Want to move computation on machine with much power. We will set up Anaconda 4.0.0 and XGBoost 0.4 (it is tricky installable).

Preliminaries

Let's start

AWS Console and launching EC2 Instance.

@fivejjs
fivejjs / min-char-rnn.py
Created October 18, 2017 13:47 — forked from karpathy/min-char-rnn.py
Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy
"""
Minimal character-level Vanilla RNN model. Written by Andrej Karpathy (@karpathy)
BSD License
"""
import numpy as np
# data I/O
data = open('input.txt', 'r').read() # should be simple plain text file
chars = list(set(data))
data_size, vocab_size = len(data), len(chars)