Skip to content

Instantly share code, notes, and snippets.

@Sanqui
Created March 30, 2013 16:20
Show Gist options
  • Save Sanqui/5277308 to your computer and use it in GitHub Desktop.
Save Sanqui/5277308 to your computer and use it in GitHub Desktop.
Converts file from Kamenický encoding/KEYBCS2/CP895, a legacy Czechoslovak encoding, to system Unicode. Not even iconv supports this!
#!/bin/python3
from sys import argv
highchars = 'üéďäĎŤčěĚĹÍľĺÄÁÉžŽôöÓůÚýÖÜŠĽÝŘťáíóúňŇŮÔšřŕŔ¼§«»░▒▓│┤╡╢╖╕╣║╗╝╜╛┐└┴┬├─┼╞╟╚╔╩╦╠═╬╧╨╤╥╙╘╒╓╫╪┘┌█▄▌▐▀αßΓπΣσµτΦΘΩδ∞φε∩≡±≥≤⌠⌡÷≈°∙·√ⁿ²■'
out = ""
with open(argv[1], 'rb') as f:
for byte in f.read():
if byte < 128:
out += chr(byte)
else:
out += highchars[byte-129]
print(out)
@cz-fish
Copy link

cz-fish commented Dec 15, 2013

Nice, but I think that you're missing character 'Č' at position 128, that will therefore be translated to highchars[-1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment