Skip to content

Instantly share code, notes, and snippets.

@hakanensari
Created August 27, 2012 23:49
Show Gist options
  • Save hakanensari/3493494 to your computer and use it in GitHub Desktop.
Save hakanensari/3493494 to your computer and use it in GitHub Desktop.
How to build a testable scraper

Separate concerns. You should probably encapsulate the following responsibilities in separate classes:

  1. Get a URL.
  2. Parse the response body.
  3. Build a Hash or PORO that represents the payload.

You're not hitting an API. Be prepared to fail. Use an HTTP library like Excon that handles retries. Don't stub when testing. Your tests should not hide changes in the external URL.

Keep business logic out of your parsers. When testing them, simply assert if the parsing methods return something.

class OfferParser
  def initialize(node)
    @node = node
  end

  def price
    @node.at('.price')
         .text
  end

  def discount
    @node.at('.discount')
         .text
  end
end

class Offer
  attr_accessor :price
  attr_accessor :discount
end

class OfferBuilder
  attr :offer

  def initialize
    @offer = Offer.new
  end

  def add_price(val)
    @offer.price = Money.new val.to_i if val
  end

  def add_discount(val)
    @offer.discount = val.to_s
                         .gsub(/\D/, '')
                         .to_i
  end
end

When testing builders, you should not require to use any XML node. Values passed into methods should probably all be either Strings or nil.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment