Skip to content

Instantly share code, notes, and snippets.

Matthew Honnibal honnibal

Block or report user

Report or block honnibal

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@honnibal
honnibal / prodigy_srs.py
Created Apr 22, 2019
Script to jury-rig a little spaced-repeition system out of the Prodigy annotation tool
View prodigy_srs.py
"""See https://twitter.com/honnibal/status/1120020992636661767 """
import time
import srsly
from prodigy import recipe
from prodigy.components.db import connect
from prodigy.util import INPUT_HASH_ATTR, set_hashes
from prodigy.components.filters import filter_duplicates
def get_rank_priority(data):
@honnibal
honnibal / install-cuda.sh
Created Dec 9, 2018
Provision GPU for Ubuntu 18.04
View install-cuda.sh
#!/usr/bin/env bash
# First download cudnn to a directory /tmp/binaries.
# The filename should be cudnn-9.2-linux-x64-v7.1.tgz
set -e
# Install driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update
@honnibal
honnibal / dynamic_params.py
Last active Jun 29, 2017
Cycle hyper parameter
View dynamic_params.py
def cycle_hyper_param(low, high):
'''Dynamically oscillate a hyper-parameter between two values.
Uses the loss momentum to adjust the rate of change. The idea is
that the value should move through regions where the loss is flat
faster, and linger in values where the loss improves.
'''
inc = 0.0001
trend = 0.
prev = 0.
View bootstrap_python_env.sh
#!/usr/bin/env bash
HERE=`pwd`
cd /tmp
wget http://www.python.org/ftp/python/2.7.5/Python-2.7.5.tgz /tmp
tar -zxvf Python-2.7.5.tgz
cd Python-2.7.5
mkdir $HERE/.python
./configure --prefix=$HERE/.python
View pydata url
http://s000.tinyupload.com/index.php?file_id=07575878755298799648
View dummy_sentiment.py
# Simple sentiment analysis with lots and lots of problems. For answer to Quora thread:
# https://www.quora.com/Would-it-be-possible-for-an-undergraduate-like-me-to-create-a-sentiment-analysis-program
import sys
from collections import counter
with open(sys.argv[1]) as file_:
positive_text = file_.read()
with open(sys.argv[2]) as file_:
negative_text = file_.read()
@honnibal
honnibal / mc.pyx
Last active Jan 28, 2016
Monte carlo simulation, re /r/python thread on numba, cython, etc
View mc.pyx
# cython: infer_types=True
# cython: boundscheck=False
# cython: cdvision=True
# distutils: compile_options = ["-O2", "-fopenmp", "-march=native"]
# distutils: link_options = ["-fopenmp"]
cimport cython
from numpy import random as rng
import numpy as np
import numpy.random
@honnibal
honnibal / sort_like_color.py
Last active Sep 27, 2015
Find words that might be colors, using word vectors.
View sort_like_color.py
from __future__ import unicode_literals
from __future__ import print_function
import plac
import spacy.en
def main(vectors_loc=None):
nlp = spacy.en.English()
@honnibal
honnibal / simple_bigrams.py
Created Sep 14, 2015
Simple but not so accurate bigram language model
View simple_bigrams.py
from preshed.counter import PreshCounter
from spacy.en import English
from spacy.attrs import ORTH, IS_OOV
import plac
import plac
from os import path
import os
@honnibal
honnibal / gist:30499850449a46c167a8
Created Jul 16, 2015
Syntax-specific search with spaCy
View gist:30499850449a46c167a8
"""
Example use of the spaCy NLP tools for data exploration.
Here we will look for reddit comments that describe Google doing something,
i.e. discuss the company's actions. This is difficult, because other senses of
"Google" now dominate usage of the word in conversation, particularly references to
using Google products.
The heuristics here are quick and dirty --- about 5 minutes work. A better approach
is to use the word vector of the verb. But, the demo here is just to show what's
You can’t perform that action at this time.