Skip to content

Instantly share code, notes, and snippets.

@starenka
Created October 29, 2020 09:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save starenka/db120f389e08aa235c5cc5db75dd0b9a to your computer and use it in GitHub Desktop.
Save starenka/db120f389e08aa235c5cc5db75dd0b9a to your computer and use it in GitHub Desktop.
# coding=utf-8
# pip install pandas xlrd
import collections
import pandas as pd
df = pd.read_excel('https://www.mvcr.cz/odk2/soubor/databaze-obci-1-1-2014-xlsx.aspx')
cleaned = []
for one in df['Název obce']:
parts = one.split()
if len(parts) >= 3: # Dětřichov nad Bystřicí
name = parts[0]
elif len(parts) == 2: # Dlouhá Brtnice
name = parts[1] if parts[1] not in ('I', 'II', 'III') else parts[0]
else:
name = parts[0]
cleaned.append(name[-3:] if not
(name.endswith('ov') or
name.endswith('ín')) else name[-2:])
print(collections.Counter(cleaned).most_common(20))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment