Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

@gwpl
Created May 25, 2017 14:08
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save gwpl/427f76c571c41357e10a5153b9f62d17 to your computer and use it in GitHub Desktop.
Save gwpl/427f76c571c41357e10a5153b9f62d17 to your computer and use it in GitHub Desktop.
Line by line encoding fixing using ftfy to Mojibake and other corrections ( ftfy: https://github.com/LuminosoInsight/python-ftfy )
#!/usr/bin/env python3
import ftfy, sys
with open(sys.argv[1], mode='rt', encoding='utf8', errors='replace') as f:
for line in f:
sys.stdout.buffer.write(ftfy.fix_text(line).encode('utf8', 'replace'))
#print(ftfy.fix_text(line).rstrip().decode(encoding="utf-8", errors="replace"))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment