Skip to content

Instantly share code, notes, and snippets.

@glickmac
Last active December 17, 2019 20:06
Show Gist options
  • Save glickmac/d052cd699984e0857c78cd593792aa6c to your computer and use it in GitHub Desktop.
Save glickmac/d052cd699984e0857c78cd593792aa6c to your computer and use it in GitHub Desktop.
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
tokens = tokenizer.tokenize(text)
tokens = [i.lower() for i in tokens]
## Uncomment and remove the ")" to get length of longest word
print("Longest word in text: " + max(tokens, key=len) )# + " is " + str(len(max(tokens, key=len))) + " characters long")
## Longest real word
tokens = [y for y in tokens if y != "cutterigsloop"]
print("Longest real word in text: " + max(tokens, key=len) + " is " + str(len(max(tokens, key=len))) + " characters long")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment