Skip to content

Instantly share code, notes, and snippets.

@hungneox
Created July 9, 2023 09:17
Show Gist options
  • Save hungneox/2a0c720a0d7cc17afdd38ba716c6c58e to your computer and use it in GitHub Desktop.
Save hungneox/2a0c720a0d7cc17afdd38ba716c6c58e to your computer and use it in GitHub Desktop.
word count
import json
import re
filename = './swedish.txt'
print('Start reading')
dictionary = {}
with open(filename, encoding="utf-8") as file:
while line := file.readline():
line = line.rstrip().split(' ')
for word in line:
word = word.strip('.').strip('-').strip("\\").strip(',').strip('?')
if word[:1].isdigit() or word == "" or re.match(r'^[a-zA-Z]+\)', word):
continue
print('.')
if word not in dictionary:
dictionary[word] = 1
else:
dictionary[word] += 1
sorted_dictionary = dict(sorted(dictionary.items()))
with open('result.json', 'w', encoding="utf-8") as fp:
json.dump(sorted_dictionary, fp, ensure_ascii=False)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment