Skip to content

Instantly share code, notes, and snippets.

@lazywei
Last active December 20, 2015 09:49
Show Gist options
  • Save lazywei/6111432 to your computer and use it in GitHub Desktop.
Save lazywei/6111432 to your computer and use it in GitHub Desktop.
Yahoo! buy parser
require 'nokogiri'
require 'typhoeus'
url = "http://tw.buy.yahoo.com/gdsale/gdsale.asp?gdid=3791541"
r = Typhoeus.get(url)
doc = Nokogiri::HTML(r.body)
specs = {}
doc.at_css("table.spec-table").css('tr').slice(0..-5).each do |tr|
specs[tr.css('td')[0].content] = tr.css('td')[1].css('label.item').map {|x| x.content}
end
details = []
doc.css('div#cl-gdintro div.cl-gddesc').slice(1..-1).each do |desc|
details << {
title: desc.at_css('div.itemtitle').content,
content: desc.at_css('div.content').content
}
end
data = {
title: doc.at_css("div.title").content,
price: doc.at_css("div.priceinfo span.price").content,
actpromo: (doc.at_css("div.actpromo").content rescue "no active promo"),
rate: doc.at_css("div.rate .ratemax").children.slice(0..-2).map(&:content).join,
descs: doc.css("ul.desc-list li.desc").map {|x| x.content},
specs: specs,
gd_id: doc.at_css('div.gdid .number').content,
main_img: doc.at_css('img.main-image').attr('src'),
imgs: doc.css('div#cl-gdintro img').map {|x| "http://tw.buy.yahoo.com#{x.attr('src')}" },
details: details
}
puts data
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment