This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from preshed.counter import PreshCounter | |
from spacy.en import English | |
from spacy.attrs import ORTH, IS_OOV | |
import plac | |
import plac | |
from os import path | |
import os |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" | |
Example use of the spaCy NLP tools for data exploration. | |
Here we will look for reddit comments that describe Google doing something, | |
i.e. discuss the company's actions. This is difficult, because other senses of | |
"Google" now dominate usage of the word in conversation, particularly references to | |
using Google products. | |
The heuristics here are quick and dirty --- about 5 minutes work. A better approach | |
is to use the word vector of the verb. But, the demo here is just to show what's |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import sys | |
def shift(words, stack, c): | |
return words, stack + c | |
def reduce(words, stack, c): | |
return words + (stack,), c |
NewerOlder