Skip to content

Instantly share code, notes, and snippets.

@boie0025
Last active May 17, 2017 20:20
Show Gist options
  • Save boie0025/ae9697eed61cbf5342a6 to your computer and use it in GitHub Desktop.
Save boie0025/ae9697eed61cbf5342a6 to your computer and use it in GitHub Desktop.
scraper-psuedocode-exmaple
module Producers
class DataModelA
... # Some ruby magic to return the correct subclass for a state
def page
@page ||= Nokogiri.new(URL)
end
end
class SpecificState < DataModelA
URL = "http://www.example.com/foo"
def scraped_data_point_a
page.xpath('some xpath where our data is')
end
def scraped_data_point_b
page.xpath('some other xpath')
end
end
end
module Consumers
class GenericDataModelA
attr_accessor :scrape_obj, :persistence_object
def initialize(scrape_obj, persistence_object)
self.scrape_obj = scrape_obj
self.persistence_object
end
def persist!
%i(data_point_a data_point_b).each do |meth|
persistence_object.send("#{meth}=", scrape_obj.send("scraped_{meth}"))
end
end
end
end
class ScraperJob
def perform
# could iterate through specific consumers, passing various specifics into the generic processor. Since all of the
# scrape/producer classes have the same data methods, you're free to define them for arbitrary pages.
scraper = Producers::SpecificState.new
persistence_obj = OpenStruct.new #add methods, or use AR, or something else.
consumer = Consumers::GenericDataModelA.new(scraper, persistence_obj).persist!
end
end
@boie0025
Copy link
Author

This scraping pattern was built in collaboration with Ryan Long (https://github.com/rtlong) and JD Guzman (https://github.com/jdguzman)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment