Skip to content

Instantly share code, notes, and snippets.

@garretraziel
Created October 26, 2010 16:58
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garretraziel/647288 to your computer and use it in GitHub Desktop.
Save garretraziel/647288 to your computer and use it in GitHub Desktop.
Program na odstraňování české diakritiky z utf8 souborů./Program for converting czech accent characters from utf8 files.
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import sys, codecs
dictableSmall = list(u"ěščřžýáíéúůóťďň")
nedictableSmall = list("escrzyaieuuotdn")
dictableBig = list(u"ĚŠČŘŽÝÁÍÉÚŮŤĎŇ")
nedictableBig = list("ESCRZYAIEUUTDN")
def main():
if (len(sys.argv)==1):
print "Pouziti: nohus [vstupniSoubor vystupniSoubor]"
print "program pro odstranovani ceske diakritiky ze souboru"
try:
soubor = codecs.open(sys.argv[1],"r","utf-8")
znaky = soubor.read()
vyslednyZnaky = ""
soubor.close()
for znak in znaky:
if znak in dictableSmall:
vyslednyZnaky += nedictableSmall[dictableSmall.index(znak)]
elif znak in dictableBig:
vyslednyZnaky += nedictableBig[dictableBig.index(znak)]
else:
vyslednyZnaky += znak
soubor = open(sys.argv[2],"w")
soubor.write(vyslednyZnaky)
soubor.close()
except IndexError:
print "Cteni za koncem pole, zadali jste opravdu dva argumenty?\n"
if __name__ == "__main__":
main()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment