Skip to content

Instantly share code, notes, and snippets.

@sj26
Last active August 29, 2015 14:06
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sj26/411508490eddb0c7ef31 to your computer and use it in GitHub Desktop.
Save sj26/411508490eddb0c7ef31 to your computer and use it in GitHub Desktop.
Ruby CSV parser support RFC 4180 double double-quote and unix-style backslash escaping
require "strscan"
class LenientCSV
def initialize(source)
@scanner = StringScanner.new(source)
end
def each
until @scanner.eos?
yield scan_row
end
end
private
def scan_row
[].tap do |row|
loop do
if value = scan_field
row << value
end
if @scanner.scan /,/
next
elsif @scanner.scan /\r?\n/ or @scanner.eos?
return row
else
raise "Malformed row at #{@scanner.inspect}"
end
end
end
end
def scan_field
scan_quoted_field or
scan_unquoted_field or
scan_empty_field
end
def scan_quoted_field
if @scanner.scan /"/
value = ""
until @scanner.eos?
if @scanner.scan /[^\\"]+/
value << @scanner.matched
# Unix-style quoting
# (Don't care about "\t" => <tab>)
elsif @scanner.scan /\\/
value << @scanner.getch
# CSV RFC 4180-style quoting
elsif @scanner.scan /""/
value << '"'
elsif @scanner.scan /"/
return value
else
raise "unexpected EOF inside quoted value #{@scanner.inspect}"
end
end
end
end
def scan_unquoted_field
@scanner.scan /[^,\r\n]+/
end
def scan_empty_field
"" if @scanner.check /[,\r\n]/
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment