Skip to content

Instantly share code, notes, and snippets.

@aantix
Created November 7, 2011 01:16
Show Gist options
  • Save aantix/1343958 to your computer and use it in GitHub Desktop.
Save aantix/1343958 to your computer and use it in GitHub Desktop.
Download all images for a given web page
require 'nokogiri'
require 'open-uri'
require 'uri'
require 'pathname'
pages = ['http://images.google.com/search?tbm=isch&hl=en&source=hp&q=dogs',
'http://images.google.com/search?tbm=isch&hl=en&source=hp&q=cats',
'http://images.google.com/search?tbm=isch&hl=en&source=hp&q=hats']
pages.each do |page|
doc = Nokogiri::HTML(open(page))
links = doc.css('a')
hrefs = links.map {|link| link.attribute('href').to_s}.uniq.sort.delete_if {|href| href.empty? || href =~ /PROOF/ || href =~ /THUMB/ || href !~ /(.jpg|.png)/i}
p = URI(page)
base_url = "#{(p.scheme.nil? ? "http" : p.scheme )}://#{p.host}#{p.port.nil? ? "" : ":#{p.port}"}"
hrefs.each do |h|
l = URI(h)
base_name = Pathname.new(l.path).basename
remote_file = "#{base_url}#{p.path}#{l.path}"
out_file = "wedding_images/#{l.path}"
puts " Downloading #{remote_file}, to #{out_file}"
open(out_file, 'wb') do |file|
file << open(remote_file).read
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment