Skip to content

Instantly share code, notes, and snippets.

View honnibal's full-sized avatar

Matthew Honnibal honnibal

View GitHub Profile
@honnibal
honnibal / simple_bigrams.py
Created September 14, 2015 06:35
Simple but not so accurate bigram language model
from preshed.counter import PreshCounter
from spacy.en import English
from spacy.attrs import ORTH, IS_OOV
import plac
import plac
from os import path
import os
@honnibal
honnibal / gist:30499850449a46c167a8
Created July 16, 2015 17:01
Syntax-specific search with spaCy
"""
Example use of the spaCy NLP tools for data exploration.
Here we will look for reddit comments that describe Google doing something,
i.e. discuss the company's actions. This is difficult, because other senses of
"Google" now dominate usage of the word in conversation, particularly references to
using Google products.
The heuristics here are quick and dirty --- about 5 minutes work. A better approach
is to use the word vector of the verb. But, the demo here is just to show what's
import sys
def shift(words, stack, c):
return words, stack + c
def reduce(words, stack, c):
return words + (stack,), c