Skip to content

Instantly share code, notes, and snippets.

@sanikamal
Created January 28, 2021 08:42
Show Gist options
  • Save sanikamal/a648f528aafeb44d15661ef0f110441f to your computer and use it in GitHub Desktop.
Save sanikamal/a648f528aafeb44d15661ef0f110441f to your computer and use it in GitHub Desktop.
Text Tokenization using NLTK
# Text Tokenization using NLTK
from nltk.tokenize import sent_tokenize, \
word_tokenize, WordPunctTokenizer
in_text = 'Use this option to select your font. The Show only monospaced fonts option if selected shortens the list of available fonts.'
# Sentence Tokenization
print(sent_tokenize(in_text))
# Word Tokenization
print(word_tokenize(in_text))
# Word Punct Tokenization
print(WordPunctTokenizer().tokenize(
in_text))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment