Skip to content

Instantly share code, notes, and snippets.

@sparkertime
Created July 13, 2012 23:10
Show Gist options
  • Save sparkertime/3108152 to your computer and use it in GitHub Desktop.
Save sparkertime/3108152 to your computer and use it in GitHub Desktop.
Thresher - Separates the Wheat from the Chaff
require "rake"
require "csv"
require_relative '../../lib/thresher_sketch'
namespace :chicago_new do
namespace :contracts do
desc "Load city contracts"
task :load => :environment do
file_name = path_to(ENV["contracts_file"] || "Contracts.csv")
ContractsThresher.thresh file_name, :rejections_to => "rejected_contracts.csv"
end
end
end
require_relative 'thresher'
require_relative '../app/models/vendor'
require_relative '../app/models/contract'
class ContractsThresher # named after the source rather than a model, but they're conflated in this case
extend Thresher
create_or_find Vendor, :by => :external_id do
fields :external_id => "Vendor ID",
:name => "Vendor Name",
:address1 => "Address 1",
:address2 => "Address 2",
:city => "City",
:state => "State",
:zipcode => "Zip"
reject_if_blank :all
end
upsert Contract, :by => [:purchase_order, :revision] do
fields :purchase_order => "Purchase Order (Contract) Number",
:vendor => {:association => Vendor, :by => {:external_id => "Vendor ID"}}, # a little repetitive, but anything else was too much magic for me
:revision => "Revision Number",
:description => "Purchase Order Description",
:specification => "Specification Number",
:award_amount => {:column => "Award Amount", :formatting => format_award_amount}, # possible by some context/method_missing black magic. Would appreciate your thoughts on this - feels a little too evil, but the alternatives (wrapping with procs, accepting a symbol and calling #send behind the scenes) feel less obvious to use. Also this method enforces a standalone method to do the formatting, which I like.
:start_at => {:column => "Start Date", :formatting => format_date},
:end_at => {:column => "End Date", :formatting => format_date},
:approval_at => {:column => "End Date", :formatting => format_date},
:contract_type => "Contract Type",
:department => "Department",
:procurement_type => "Procurement Type"
reject_if_blank :all
end
def format_award_amount(raw_award)
raw_award.strip(0)
end
def format_date(raw_date)
Date.strptime('%m/%d/%Y')
end
end
@chadwpry
Copy link

Hey Scott,
I do not see lib/thresher_sketch.rb. Could you provide this as part of a gist as well? It would be helpful to better understand the class with its context. Comment added below in the mean time.

@chadwpry
Copy link

I like where this is headed. This will be immensely helpful in generically applying new structured data for the scope of an endpoint. Will validation rules be applied in some way when a record exists for an upsert? Maybe a method hook contract such as "thresher_record_exists?" and "thresher_handle_existing_record" for when duplication is found via the underlying thresher lib?

Great start!

@sparkertime
Copy link
Author

Chad:

Good idea on the hooks. I'll make sure and add that. As for the other files, you can see this at https://github.com/citizenparker/chicago-finances/commit/139cded2d83ba2b04ce337d9c00fbd7e11c89ad4. As always, feedback welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment