Skip to content

Instantly share code, notes, and snippets.

@theaspect
Created November 22, 2012 18:47
Show Gist options
  • Save theaspect/4132524 to your computer and use it in GitHub Desktop.
Save theaspect/4132524 to your computer and use it in GitHub Desktop.
NLTK book chapter 07 task 13
#Develop an NP chunker that converts POS-tagged text into a list of tuples, where each tuple
#consists of a verb followed by a sequence of noun phrases and prepositions, e.g. the little cat
#sat on the mat becomes ('sat', 'on', 'NP')...
import nltk
# Tagged corpus
brown = nltk.corpus.brown
# Grammar from chapter 7
grammar = r"""
NOUNP: {<DT>?<JJ.*>*<NN.*>+} # Noun phrase
CLAUSE: {<VB><IN><NOUNP>} # Verb
"""
cp = nltk.RegexpParser(grammar)
tuples = set()
# Find required clauses
for sent in brown.tagged_sents():
tree = cp.parse(sent)
for subtree in tree.subtrees():
if subtree.node == 'CLAUSE':
tuples.add((subtree[0][0],subtree[1][0], "NP"))
# Output
for t in sorted(tuples):
print t
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment