Last active
December 7, 2015 10:11
-
-
Save MaryamZi/542faa4ab4e806819af2 to your computer and use it in GitHub Desktop.
Blog - NLP - Word Tokenization Example
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from nltk.tokenize import word_tokenize | |
from nltk.tokenize import wordpunct_tokenize | |
test_sentence = "Hi Mr. Sam, today's a good day to learn NLP. It's a well-known field of study." | |
#Method 1 - Without using the NLTK - Splitting at white spaces | |
words = test_sentence.split() | |
print words | |
#Method 2 - Using word_tokenize of NLTK | |
words = word_tokenize(test_sentence) | |
print words | |
#Method 2 - Using wordpunct_tokenize of NLTK | |
words = wordpunct_tokenize(test_sentence) | |
print words |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment