Skip to content

Instantly share code, notes, and snippets.

@nebirhos
Created November 5, 2012 13:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nebirhos/4017130 to your computer and use it in GitHub Desktop.
Save nebirhos/4017130 to your computer and use it in GitHub Desktop.
Grabs exposition images from www.palazzostrozzi.org
require "rubygems"
require "mechanize"
url = "http://www.palazzostrozzi.org/AnagraficaOpere.jsp?idForm=13&idMostra=%u"
browser = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' }
(1..20).each do |id_mostra|
begin
page = browser.get(url % id_mostra)
title = page.at('h1.title').text
raise "No exposition found at #{id_mostra}!" if title.empty?
folder = "#{id_mostra}_#{title.gsub(/\W+/, '-').upcase}"
links = page.search('table[summary="Lista Opere"] a[title="Scarica l\'immagine ad alta risoluzione"]')
puts "#{title}: found #{links.length} images"
Dir.mkdir(folder) unless File.directory? folder
links.each do |link|
href = link.attr(:href)
filename = File.basename(href)
`curl #{href} > #{File.join(folder, filename)}`
end
rescue Exception => ex
puts "Error! #{ex}"
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment