Skip to content

Instantly share code, notes, and snippets.

@Bajena
Created January 29, 2020 06:46
Show Gist options
  • Save Bajena/8412fdc8e0613938a652cd4c78fd31b2 to your computer and use it in GitHub Desktop.
Save Bajena/8412fdc8e0613938a652cd4c78fd31b2 to your computer and use it in GitHub Desktop.
class Loader
def load
Enumerator.new { |main_enum| stream(main_enum) }
end
private
def stream(main_enum)
reader = nil
file_uri.open do |file|
reader = Zlib::GzipReader.new(file)
reader.each_line.lazy.drop(1).each do |line|
main_enum << preprocess_row(line)
end
end
ensure
reader&.close
end
def file_uri
URI.parse("ftp://user:password@host.com/file.csv.gz")
end
def preprocess_row(row)
row.chomp.gsub('"', "").split(",")
end
end
@SampsonCrowley
Copy link

building a proper streaming CSV parser, you would actually open an IO object, pass that into CSV.foreach, and then feed each line into the IO

@SampsonCrowley
Copy link

what about CSVs containing quoted newlines, nested quotes, etc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment