Skip to content

Instantly share code, notes, and snippets.

@pedrovanzella
Created March 24, 2011 22:19
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pedrovanzella/886023 to your computer and use it in GitHub Desktop.
Save pedrovanzella/886023 to your computer and use it in GitHub Desktop.
Looks for the existence of dirs on a numbered list. Done in order to learn net/http and threads.
#!/usr/bin/env ruby
#
###################################################
# crawler.rb #
# Pedro Vanzella - pedro@pedrovanzella.com #
# #
# Looks for the existence of directories on a #
# numbered, but sparse, list #
###################################################
require "net/http"
require "uri"
def crawl(a,b)
(a..b).each do |n|
uri = URI.parse("http://twitpic.com/#{n}/")
# I know twitpic has alphanumeric URLs, but this was not the original URL anyway ;)
http = Net::HTTP.new(uri.host, uri.port)
request = Net::HTTP::Get.new(uri.request_uri)
begin
response = http.request(request)
puts "[#{n}] => [#{response.code}]" unless response.code == '404'
rescue Timeout::Error
puts "Timeout on [#{n}]. Perhaps we're being throtled?"
end
end
end
if ARGV[2] == nil
puts "USAGE: ruby crawler.rb FIRST LAST THREADS"
exit
end
x = ARGV[0].to_i
y = ARGV[1].to_i
threads = ARGV[2].to_i
t = [] # Threads array
threads.times do |n|
# Watch for division by zero!
first = x + (n)*(y / threads)
last = x + (n + 1)*(y / threads) - 1
# I've got an off-by-some on uneven divisions here, somewhere.
puts "[#{n}] (#{first}..#{last})"
t[n] = Thread.new { crawl(first, last) }
end
# Join our threads
t.each do |n|
n.join
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment