Skip to content

Instantly share code, notes, and snippets.

@purezen
Last active August 29, 2015 14:02
Show Gist options
  • Save purezen/f730b98cf25b79d4549c to your computer and use it in GitHub Desktop.
Save purezen/f730b98cf25b79d4549c to your computer and use it in GitHub Desktop.
Naive crawler
require 'PageRankr'
@c=0
trap('SIGINT') {puts "\nCrawled #{@c} times."; exit 1; }
output = File.open('naive_crawler.dump.txt', 'w')
File.open('aug_url_list.txt', 'r').each_line do |line|
url=line.split('**')[1]
puts 'Checking PR for ' + url + ' .. attempt:' + @c.to_s
rank = PageRankr.rank(url, :google)
puts rank
output.puts rank
#sleep (1..9).map{|i| i.to_f/10}.sample
@c=@c+1
end
@purezen
Copy link
Author

purezen commented Jun 12, 2014

It's incomplete. Am still working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment