Skip to content

Instantly share code, notes, and snippets.

@davidx
Created August 11, 2009 17:26
Show Gist options
  • Save davidx/165975 to your computer and use it in GitHub Desktop.
Save davidx/165975 to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'open-uri'
require 'yaml'
require 'mechanize'
ARCHIVE_PATH = '/tmp/images'
FileUtils.mkdir_p ARCHIVE_PATH
a = WWW::Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari'}
(0..10).each do |page_number|
page_number = 'imgboard' if page_number == 0
url = "http://img.4chan.org/b/#{page_number}.html"
p "getting #{url}"
a.get(url) do |page|
page.links.each do |link|
file_name = link.text.downcase
next unless file_name =~ /[0-9]*\.(jpg|png|gif)/
file_path = ARCHIVE_PATH + '/' + file_name
if File.exists?(file_path)
p "exists, skipping"
next
end
file = link.click
p "creating #{file_path}"
File.open(file_path, "w+") do |f|
f.syswrite(file.body)
end
end
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment