Skip to content

Instantly share code, notes, and snippets.

@bry4n
Created November 1, 2010 16:42
Show Gist options
  • Save bry4n/658475 to your computer and use it in GitHub Desktop.
Save bry4n/658475 to your computer and use it in GitHub Desktop.
Simple DSL web scraper using Nibbler + Nokogiri + Faraday (less 35 LOC)
require 'webfetch'
rails = WebFetch.parse("http://rubygems.org/gems/rails") do
element :title
element '#markup p' => :description
end
puts rails.title
#=> "rails | RubyGems.org | your community gem host"
puts rails.description
#=> "Ruby on Rails is a full-stack web framework optimized for programmer happiness and sustainable productivity. It encourages beautiful code by favoring convention over configuration."
source :rubygems
gem 'nokogiri'
gem 'faraday'
gem 'nibbler'
require 'rubygems'
require 'bundler/setup'
require 'nibbler'
require 'nokogiri'
require 'faraday'
class WebFetch
attr_accessor :uri, :document, :klass
def self.parse(uri, &block)
new(uri, &block)
end
def initialize(uri, &block)
@document, @uri = fetch uri
@klass = Class.new(Nibbler) { instance_eval(&block) }.parse(@document.body)
end
def method_missing(m)
@klass.send(:"#{m}")
end
private
def fetch(uri)
response = Faraday.get uri
if [301,302].include?(response.status)
uri = response.headers['location']
response = fetch(uri).first
end
[response, uri]
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment