Skip to content

Instantly share code, notes, and snippets.

@joerussbowman
Created May 3, 2011 19:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save joerussbowman/954058 to your computer and use it in GitHub Desktop.
Save joerussbowman/954058 to your computer and use it in GitHub Desktop.
My first ruby script, checks vanity url redirects
require 'net/http'
require 'uri'
if ARGV.length < 2
puts "Syntax: urltest.rb [ redirects file ] [host ] (optional 1 for external redirects)"
puts "example for local redirects: urltest.rb vanity_urls.txt www.example.com"
exit
end
if File.exists?(ARGV[0])
redirects_file = ARGV[0]
else
puts "#{ARGV[0]} does not exist."
exit
end
$max_redirects = 10
host = ARGV[1]
max_threads = 5 # 4+1 for the main thread
threads = []
def fetch(url, info, limit = $max_redirects)
Net::HTTP.get_response(URI.parse(url)) do |response|
case response
when Net::HTTPSuccess # just do nothing
when Net::HTTPRedirection then fetch(response['location'], info, limit - 1)
else
File.open("output.log", "a") do |errors|
errors.puts "\nError: REDIRECT INFO: #{info} REAL DESTINATION: #{url} [ #{response.code} ]\n"
end
puts "\nError: #{info}: #{response.code}\n"
end
end
end
File.open(redirects_file).each { |line|
if Thread.list.length < max_threads
# format of redirect info is: /prettyurl /redirects/to (or external.domain/redirects/to)
# as we're following redirects, it's not necessary to parse the final destination.
if line.length > 0
url = "http://#{host}#{line.split(/\s/)[0]}"
x = Thread.new {
puts "Threads: #{Thread.list.length} - #{url}"
fetch(url, line)
}
threads << x
end
else
redo
end
}
threads.each_with_index { |name, i| threads[i].join }
@freeformz
Copy link

What is your redirects_file ?

@joerussbowman
Copy link
Author

It's sent as a 2 column spreadsheet with no header. I simply open it in openoffice and cut and paste it into an empty file opened in vi. Ends up as a tab delimited file like

/aboutus/annualreport /aboutus/ouraccountability/annualreport/index.htm
/aboutus/annualreport/fy09/art30061.html /aboutus/ouraccountability/annualreport/annual-report-2009.xml

I actually have some more tweaks I want to do to it, just haven't had time to get to it. I'm pretty slammed at work lately.

@joerussbowman
Copy link
Author

updated version that takes command line arguments and cleans up output so the business user can find the redirects that are 404ing quicker. Only other thing I plan on doing is changing host to be more of a default host setting, so if redirects go to external sites the option is ignored.

@joerussbowman
Copy link
Author

ok this is the last edit, as far as I'm going to get in my environment. I'm stuck at Ruby 1.8.5 and there's policy reasons to not compile 1.9.2. The catch is I need to set a user agent because I believe some of the external redirects are blocking the default. get_response and other methods in 1.8.5 don't appear to have support for more than 1 argument so I'm not able to pass a headers hash to it. 1.9.2 appears to support it though. Going to switch to Python and get the script done and move on to other tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment