Skip to content

Instantly share code, notes, and snippets.

@rajeshmr
Last active February 17, 2016 06:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajeshmr/ec853f1cc73e88c100f2 to your computer and use it in GitHub Desktop.
Save rajeshmr/ec853f1cc73e88c100f2 to your computer and use it in GitHub Desktop.
Structuring text using Conditional Random Field (CRF). Tagging recipe ingredient phrases.
import sys
import nltk
import json
for line in sys.stdin:
data = json.loads(line)
for ingredient in data['ingredients']:
tokens = nltk.word_tokenize(ingredient.strip())
tagged_tokens = nltk.pos_tag(tokens)
for token, pos in tagged_tokens:
try:
print "%s\t%s\tXXX" % (token.encode('utf8'), pos)
except Exception as e:
print e
print "Error writing token:", token
print
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment