Skip to content

Instantly share code, notes, and snippets.

@MaryamZi
Last active December 7, 2015 10:11
Show Gist options
  • Save MaryamZi/542faa4ab4e806819af2 to your computer and use it in GitHub Desktop.
Save MaryamZi/542faa4ab4e806819af2 to your computer and use it in GitHub Desktop.
Blog - NLP - Word Tokenization Example
from nltk.tokenize import word_tokenize
from nltk.tokenize import wordpunct_tokenize
test_sentence = "Hi Mr. Sam, today's a good day to learn NLP. It's a well-known field of study."
#Method 1 - Without using the NLTK - Splitting at white spaces
words = test_sentence.split()
print words
#Method 2 - Using word_tokenize of NLTK
words = word_tokenize(test_sentence)
print words
#Method 2 - Using wordpunct_tokenize of NLTK
words = wordpunct_tokenize(test_sentence)
print words
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment