Skip to content

Instantly share code, notes, and snippets.

@brenoimatos
Created January 7, 2021 20:49
Show Gist options
  • Save brenoimatos/8de45d99b5a0d86330a75c172c956260 to your computer and use it in GitHub Desktop.
Save brenoimatos/8de45d99b5a0d86330a75c172c956260 to your computer and use it in GitHub Desktop.
nltk.download('stopwords')
stopwords_nltk = nltk.corpus.stopwords.words('portuguese')
update = ["tá",'pra','tô', 'cê','pro', 'então', "meu", "em",
"você", "de", "ao", "os",'vou', 'vai', 'vem', 'mim',
'uns', 'sei', 'quero', 'ser', 'ver', 'aqui','faz']
# Concatenando as duas listas
stopwords_raw = [*stopwords_nltk, *update]
# Tirando os acentos
stopwords = [unidecode.unidecode(word) for word in stopwords_raw]
# Juntando as palavras sem stopword separadas por espaço em uma string
def remove_stopwords(word_list):
return ' '.join([word for word in word_list if word not in stopwords])
df['wordcloud'] = df['tokens'].apply(lambda x: remove_stopwords(x))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment