Created
November 21, 2014 00:02
-
-
Save cawel/78a2d567de66b844bc78 to your computer and use it in GitHub Desktop.
blog-scrubyt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'rubygems' | |
require 'scrubyt' | |
HASH_TERM = "leadscon" | |
data = Scrubyt::Extractor.define do | |
fetch 'http://www.hashtweeps.com/' | |
fill_textfield "term", HASH_TERM | |
submit | |
# build XML structure | |
item "//li[@class='result']" do | |
msg "//div[@class='msg']" | |
link "//div[@class='info']/a[1]/@href" | |
user "//div[@class='info']/a[1]", :generalize => false do | |
# this will follow link because it ends with "_detail" | |
page_detail do | |
profile_info '//ul[@class="about vcard entry-author"]' do | |
full_name "//li//span[@class='fn']" | |
location "//li//span[@class='adr']" | |
website "//li//a[@class='url']/@href" | |
bio "//li//span[@class='bio']" | |
end | |
end | |
end | |
end | |
end | |
# dump XML to file | |
dump = File.new("output.xml", "w") | |
dump.puts data.to_xml |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment