Skip to content

Instantly share code, notes, and snippets.

@chris-roerig
Last active August 29, 2015 14:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chris-roerig/20237fbce75a58ec6fc4 to your computer and use it in GitHub Desktop.
Save chris-roerig/20237fbce75a58ec6fc4 to your computer and use it in GitHub Desktop.
Iterates a list of URLs from a CSV, tries to access each page and adds it to a CSV based on the status code.
require 'curb'
require 'csv'
file = 'crawlreport.csv'
started_at = Time.new
codes = {
missing: [],
success: [],
redirect: [],
failure: []
}
CSV.foreach(file) do |row|
next if row.empty?
check = row[0]
begin
Curl::Easy.perform(check) do |curl|
curl.on_success { |url| codes[:success].push(url.url) }
curl.on_redirect { |url| codes[:redirect].push(url.url) }
curl.on_missing { |url| codes[:missing].push(url.url) }
curl.on_failure { |url| codes[:failure].push(url.url) }
end
rescue
puts "There were errors during #{check}"
end
end
codes.each do |code, list|
CSV.open("#{code}.csv", "w") do |csv|
list.each { |url| csv << [url] }
end
end
completed_at = Time.new
script_time = completed_at - started_at
puts "Script took #{script_time} seconds to complete"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment