Skip to content

Instantly share code, notes, and snippets.

@peterkappus
Last active August 29, 2015 14:04
Show Gist options
  • Save peterkappus/f49c98f7465eac89fa43 to your computer and use it in GitHub Desktop.
Save peterkappus/f49c98f7465eac89fa43 to your computer and use it in GitHub Desktop.
Fix those annoying "invalid byte sequence in UTF-8" errors when parsing CSVs, etc. Just strips out undef/invalid chars. Use with caution.
#Fix those annoying "invalid byte sequence in UTF-8" errors when parsing CSVs, etc.
#prints to STDOUT where you can redirect to a new file, etc.
#NOTE: this just strips out invalid/undef characters which may be a Very Bad Thing™ - YMMV
#suggestions welcome...
#also fix annoying newlin chars created by excel... replace \r with \r (weird but works)
input_file = ARGV[0] or raise "No input file specified"
puts IO.read(input_file).encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '').gsub(/\r/,"\r")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment