Skip to content

Instantly share code, notes, and snippets.

@jasonshen
Forked from jamiew/tumblr-photo-ripper.rb
Last active May 13, 2016 17:36
Show Gist options
  • Save jasonshen/4562856 to your computer and use it in GitHub Desktop.
Save jasonshen/4562856 to your computer and use it in GitHub Desktop.
# Usage:
# [sudo] gem install mechanize
# ruby tumblr-photo-ripper.rb
require 'rubygems'
require 'mechanize'
# Your Tumblr subdomain, e.g. "jamiew" for "jamiew.tumblr.com"
site = "thoughtjoy"
FileUtils.mkdir_p(site)
concurrency = 8
num = 50
start = 0
loop do
puts "start=#{start}"
url = "http://#{site}.tumblr.com/api/read?type=photo&num=#{num}&start=#{start}"
page = Mechanize.new.get(url)
doc = Nokogiri::XML.parse(page.body)
images = (doc/'post photo-url').select{|x| x if x['max-width'].to_i == 1280 }
image_urls = images.map {|x| x.content }
image_urls.each_slice(concurrency).each do |group|
threads = []
group.each do |url|
threads << Thread.new {
puts "Saving photo #{url}"
begin
file = Mechanize.new.get(url)
filename = File.basename(file.uri.to_s.split('?')[0])
file.save("#{site}/#{filename}")
rescue Mechanize::ResponseCodeError
puts "Error getting file, #{$!}"
end
}
end
threads.each{|t| t.join }
end
puts "#{images.count} images found (num=#{num})"
if images.count < num
puts "our work here is done"
break
else
start += num
end
end
@jasonshen
Copy link
Author

The new version of mechanize (2.5.1) removed "save_as" as a method, leaving only "save". I've made the update: sparklemotion/mechanize#246

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment