Last active
August 29, 2015 14:04
-
-
Save peterkappus/f49c98f7465eac89fa43 to your computer and use it in GitHub Desktop.
Fix those annoying "invalid byte sequence in UTF-8" errors when parsing CSVs, etc. Just strips out undef/invalid chars. Use with caution.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Fix those annoying "invalid byte sequence in UTF-8" errors when parsing CSVs, etc. | |
#prints to STDOUT where you can redirect to a new file, etc. | |
#NOTE: this just strips out invalid/undef characters which may be a Very Bad Thing™ - YMMV | |
#suggestions welcome... | |
#also fix annoying newlin chars created by excel... replace \r with \r (weird but works) | |
input_file = ARGV[0] or raise "No input file specified" | |
puts IO.read(input_file).encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '').gsub(/\r/,"\r") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment