Skip to content

Instantly share code, notes, and snippets.

@regebro
Created April 12, 2017 16:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save regebro/362b93b91dcfedcd534b01a1da74fff4 to your computer and use it in GitHub Desktop.
Save regebro/362b93b91dcfedcd534b01a1da74fff4 to your computer and use it in GitHub Desktop.
A file to fix spurios encoding errors in a gedcom.
import smc.bibencodings
import sys
infile = open(sys.argv[1], 'rb')
outfile = open(sys.argv[2], 'wb')
while True:
text = infile.read(2**16)
if not text:
break
text = text.replace('\xcf', '')
try:
out = text.decode('ansel').encode('latin-1')
except UnicodeEncodeError as e:
print(e)
out = text.decode('ansel').encode('latin-1', 'replace')
outfile.write(out)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment