Skip to content

Instantly share code, notes, and snippets.

@kowey
Created March 29, 2013 19:20
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kowey/5272976 to your computer and use it in GitHub Desktop.
Save kowey/5272976 to your computer and use it in GitHub Desktop.
import nltk.data
text = "Hello, I am a bit of corpus. Why don't you segment me?"
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
for start,end in tokenizer.span_tokenize(text):
print "%d\t%d\t%s" % (start, end, text[start:end])
# 0 28 Hello, I am a bit of corpus.
# 29 54 Why don't you segment me?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment