Skip to content

Instantly share code, notes, and snippets.

@robertomiranda
Created February 2, 2012 20:24
Show Gist options
  • Save robertomiranda/1725551 to your computer and use it in GitHub Desktop.
Save robertomiranda/1725551 to your computer and use it in GitHub Desktop.
Scrapping with nokogiri
require 'net/http'
require "nokogiri"
require 'open-uri'
module Scrapper
def get_images_from(url)
images = []
doc = Nokogiri::HTML(open(url))
doc.xpath("/html/body//img[@src[ not(contains(.,'\.gif'))
and contains(.,'://')
and not(contains(.,'ads.')
or contains(.,'ad.')
or contains(.,'?'))]][1]").each do |img|
image = img.attributes["src"].value
size = image_size_from(image)
images << image
end
images
end
def image_size_from(asset)
url = URI.parse asset
res = Net::HTTP.start(url.host, url.port) do |http|
response = http.request_head(asset)
return response['content-length'].to_i
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment