Skip to content

Instantly share code, notes, and snippets.

@amnrzv
Last active November 1, 2017 13:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amnrzv/6120fa2f770b043b7fcfd8db9b7aa8d2 to your computer and use it in GitHub Desktop.
Save amnrzv/6120fa2f770b043b7fcfd8db9b7aa8d2 to your computer and use it in GitHub Desktop.
A little example of NLTK's word and sentence tokenization. Output here: https://gist.github.com/amnrzv/2cbaad89e016acc0db410ec79a5ff40f
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Hello, Mr. Jacobs. Nice to meet you!"
sentences = sent_tokenize(text)
words = word_tokenize(text)
print (sentences)
print (words)
['Hello, Mr. Jacobs.', 'Nice to meet you!']
['Hello', ',', 'Mr.', 'Jacobs', '.', 'Nice', 'to', 'meet', 'you', '!']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment