Created
November 4, 2014 18:15
-
-
Save tyrion/46e5c63e66e5635908c1 to your computer and use it in GitHub Desktop.
Parse irc logs with multiple encodings
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
def decode(line, encodings): | |
for encoding in encodings: | |
try: | |
return line.decode(encoding) | |
except UnicodeDecodeError: | |
pass | |
return line.decode('utf-8', 'ignore') | |
if __name__ == '__main__': | |
encodings = ['utf-8', 'latin1'] | |
with open('error.log', 'rb') as input, \ | |
open('error.out', 'a') as output: | |
output.writelines(decode(line, encodings) for line in input) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment