Skip to content

Instantly share code, notes, and snippets.

@becahp
Created July 2, 2020 00:01
Show Gist options
  • Save becahp/441df561bb6c8a82e2b769076afb7276 to your computer and use it in GitHub Desktop.
Save becahp/441df561bb6c8a82e2b769076afb7276 to your computer and use it in GitHub Desktop.
Little snippet to create a word count dictionary
def conta_palavras(text, stop_words, punctuations):
#divide o texto em tokens
tokens = word_tokenize(text)
#remove as stopwords e pontuacao
keywords = [word for word in tokens if not word in stop_words and not word in punctuations]
#gera dicionario com as palavras e contagem
wordcount = {}
for word in keywords:
word = word.lower() #todas minúsculas
if word not in stop_words:
if word not in wordcount:
wordcount[word] = 1
else:
wordcount[word] += 1
return wordcount
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment