Skip to content

Instantly share code, notes, and snippets.

@priyaaank
Created January 3, 2012 06:31
Show Gist options
  • Save priyaaank/1553788 to your computer and use it in GitHub Desktop.
Save priyaaank/1553788 to your computer and use it in GitHub Desktop.
Crawling SO users
require 'mongoid'
require 'mechanize'
Mongoid.load!("./mongoid.yml")
Mongoid.configure do |config|
config.master = Mongo::Connection.new.db("so_users")
end
class SOUser
include Mongoid::Document
field :name, :type => String
field :url , :type => String
field :reputation , :type => String
field :location , :type => String
end
a = Mechanize.new { |agent| agent.user_agent_alias = 'Mac Safari' }
(1..18441).each do |page_num|
page = a.get("http://stackoverflow.com/users?page=#{page_num}&tab=reputation&filter=all")
page.search(".user-details").each do |a|
location = ((a.search(".user-location") || "").to_s.match(/<span class="user-location">(.*)<\/span>/) || [])[1]
url,name = (a.search("a").to_s.match(/<a href="(.*)">(.*)<\/a>/) || [])[1..2]
reputation = ((a.search(".reputation-score") || "").to_s.match(/<span[a-zA-Z\s"\-=0-9]+>(.*)<\/span>/) || [])[1]
SOUser.new(:name => name, :url => url, :location => location, :reputation => reputation).save!
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment