Skip to content

Instantly share code, notes, and snippets.

@glickmac
Last active December 20, 2019 16:20
Show Gist options
  • Save glickmac/bf2c0e10d52d897522a004c2f7d2f9b3 to your computer and use it in GitHub Desktop.
Save glickmac/bf2c0e10d52d897522a004c2f7d2f9b3 to your computer and use it in GitHub Desktop.
def text_processing(input_text):
tokens = tokenizer.tokenize(input_text)
lemmatizer = WordNetLemmatizer()
tokens = [lemmatizer.lemmatize(i) for i in tokens]
stops = set(stopwords.words('english'))
values = [i for i in tokens if i not in stops]
weird = ["wa", "u"]
values = [i for i in values if i not in weird]
return(values)
values = text_processing(text)
print("The number of unique words is: " + str(len(set(values))))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment