Last active
September 6, 2022 12:05
-
-
Save fmasanori/4673017 to your computer and use it in GitHub Desktop.
Word Count, please download http://www.gutenberg.org/cache/epub/11/pg11.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
texto = open('alice.txt').read().lower() | |
from string import punctuation | |
for c in punctuation: | |
texto = texto.replace(c, ' ') | |
texto = texto.split() | |
dic = {} | |
for p in texto: | |
if p not in dic: | |
dic[p] = 1 | |
else: | |
dic[p] += 1 | |
print (f'{dic["alice"]} vezes') |
Fabiovilela - Aqui no linux testei pra ver a codificação do arquivo de entrada alice.txt e apareceu utf-8, se você baixar e colar por exemplo no bloco de notas ele salvará por padrão em iso-8859-1 o que pode estar causando o erro.
Outra coisa. Como eu poderia gerar um dict comprehension ao invés de usar um laço nessa estrutura
Fabiovilela, tenta adicionar o "encoding" no open:
file = open(filename, encoding="utf8")
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have a lightly modified version here: https://github.com/voyeg3r/dotfiles/blob/master/bin/countwords.py
opening a file: with open(file) as f
and printing using python 3.6 new string format