Skip to content

Instantly share code, notes, and snippets.

@pigoz
Created May 31, 2011 13:04
Show Gist options
  • Star 7 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save pigoz/1000465 to your computer and use it in GitHub Desktop.
Save pigoz/1000465 to your computer and use it in GitHub Desktop.
Ruby SQL to CSV exporter
require "rubygems"
begin
require 'sequel'
rescue LoadError
puts "Please run gem install sequel"
exit!
end
DB = Sequel.connect(:adapter => 'mysql2',
:host => '127.0.0.1',
:database => '???',
:user => '???',
:password => '???')
Dir['export/*.csv'].each { |f| File.delete(f); puts "deleted: #{f}" }
class CVSSQLExporter
def initialize(filename, db, sql)
@filename = File.join('export', filename)
@db = db
@sql = sql
end
def export()
require 'csv'
CSV.open(@filename, 'wb') do |csv|
first = true
@db[@sql].all.each do |row|
csv << row.inject([]) { |r, n| r << n[0] } and first = false if first
csv << row.inject([]) { |r, n| r << n[1] }
end
end
puts "exported: #{@filename}"
end #/def export
end
foo_sql = <<-SQL
select *
from bar as foo
SQL
CVSSQLExporter.new('foo.csv', DB, foo_sql).export
@dukejones
Copy link

Very useful, thank you.

@ttilberg
Copy link

Thank you for leaving this behind, it was a useful template for me today. and first = false is a neat trick. One question I have:

You specify @db[@sql].all.each which loads the entire dataset into memory first (.all), and then iterates. Is there a specific reason you chose that over streaming the records individually ala @db[@sql].each? It worked fine for me today against SQL Server, but maybe it doesn't work for all DB systems. Or maybe there's something specific to my scenario that allowed it. But for me, it kept my memory footprint low (500 MB vs 3 GB). If someone stumbles across this in the future, you may wish to take note if you run into memory issues.

Additionally, I found a helpful to accept a dataset argument instead of specifying raw SQL. This allowed me to explore my data in "sequel style", and also send the traditional DB[sql] if I wished.

If anyone finds it helpful, here was my solution using this gist.

Problem: Take a 5 million row table and export it into smaller CSVs that Excel could open (limit ~1mil). I chose to export the table grouped by a specific set of keys.

class SqlCsvExporter
  def initialize(filename, dataset)
    @filename = File.join('export', "#{filename}.csv").gsub(/(.csv)+/, '.csv')
    @dataset = dataset
  end

  def export
    require 'csv'
    CSV.open(@filename, 'wb') do |csv|
      first = true
      @dataset.each do |row|
        csv << row.inject([]) { |r, n| r << n[0] } and first = false if first
        csv << row.inject([]) { |r, n| r << n[1] }
      end
    end
    puts "exported: #{@filename}"
  end
end

# ... Set DB

table = DB[:relatively_large_table]
partitions = table.distinct.select(:website, :zip_code).all

partitions.each do |filter|
  puts filename = filter.values.join.gsub(/[^a-z0-9]+/i, '_')
  SqlCsvExporter.new(filename, 
    table.where(filter).order(:year_week, :brand, :size, :unique_product_number)
    ).export
end

Unrelated, but I thought it was cool that a select distinct returns a hash, and the .where method accepts a hash. It made it really easy to dynamically partition my results in different iterations.

Thanks again pigoz!

@hgundi
Copy link

hgundi commented Oct 4, 2020

Thanks a ton.This is really useful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment