Skip to content

Instantly share code, notes, and snippets.

@nickcampbell18
Last active February 3, 2016 11:19
Show Gist options
  • Save nickcampbell18/aa8d5cba3e4231ba70e5 to your computer and use it in GitHub Desktop.
Save nickcampbell18/aa8d5cba3e4231ba70e5 to your computer and use it in GitHub Desktop.
Desktopography image scraper (Ruby)

Installation

  1. Install Ruby and bundler (using your preferred mechanism)
  2. Run bundler: $ bundle install
  3. bundle exec ruby desktopography.rb

Arguments

Arguments are specified as environment variables, although you could easily modify the first 10 lines of the script.

# This one is unlikely to change, but you can if you like
SITE="http://desktopography.net" ruby desktopography.rb

# Specify which year you would like to download
YEAR=2014 ruby desktopography.rb

# What dimensions would you like? We default to largest widescreen (16:9) ratio, 2560x1440
# Supported dimensions can be found on image page "download" dropdown.
FORMAT=1600x1200 ruby desktopography.rb

# The default directory is ./desktopography_{specified_year}
# Please use a full path if specifying another
DIR=/path/to/images/desktopography/2014 ruby desktopography.rb
%w(nokogiri typhoeus).each { |name| require name }
SITE = ENV.fetch('SITE') { 'http://desktopography.net' }
YEAR = ENV.fetch('YEAR') { (Time.now.year - 1).to_s }
FORMAT = ENV.fetch('FORMAT') { '2560x1440' }
EXT = ENV.fetch('EXT') { 'jpg' }
DIR = ENV.fetch('DIR') { "desktopography_#{YEAR}" }
Dir.mkdir(DIR) unless Dir.exist?(DIR)
URL = [SITE, 'exhibition', YEAR].join('/')
def download_url(image_name)
[URL, image_name, FORMAT, 'download'].join('/')
end
puts "Querying url: #{URL}"
gallery = Nokogiri::HTML(Typhoeus.get(URL).body)
downloader = Typhoeus::Hydra.new
gallery.css('a.thumbnail').each do |link|
individual_img_url = link.attributes['href']
image_name = individual_img_url.to_s.split('/').last
puts "* Found image: #{image_name}..."
url = download_url(image_name)
safe_name = image_name.gsub(/^.*(\\|\/)/, '').gsub(/[^0-9A-Za-z.\-]/, '_')
fname = "#{DIR}/#{safe_name}.#{EXT}"
if File.exist?(fname)
puts " -> File exists, skipping: #{fname}"
next
else
puts " -> Starting download"
file = File.open(fname, 'w')
end
request = Typhoeus::Request.new(url, followlocation: true)
request.on_body { |chnk| file.write(chnk) }
request.on_complete do |_res|
puts "* Finished image: #{file.path}"
file.close
end
downloader.queue(request)
end
downloader.run
source 'https://rubygems.org'
gem 'nokogiri'
gem 'typhoeus'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment