Skip to content

Instantly share code, notes, and snippets.

@wppurking
Created November 4, 2013 11:57
Show Gist options
  • Save wppurking/7301520 to your computer and use it in GitHub Desktop.
Save wppurking/7301520 to your computer and use it in GitHub Desktop.
用来抓取 Amazon DE Akkus 下的 Top 100 排名的 ASIN
require "httparty"
require "csv"
require "nokogiri"
class Asin
attr_accessor :rank, :asin
def initialize(params)
@rank = params[:rank]
@asin = params[:asin]
end
def to_s
"#{@asin}, #{@rank}"
end
end
asins = []
threads = []
1.upto(5) do |i|
threads << Thread.new do
url = "http://www.amazon.de/gp/bestsellers/ce-de/364919031/ref=zg_bs_364919031_pg_2?pg=#{i}"
puts url
resp = HTTParty.get(url, headers: {'User-Agent' => "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36", "Accept-Encoding" => "gzip"})
doc = Nokogiri::HTML.parse(resp.body)
asins += doc.css('#zg_centerListWrapper .zg_itemImmersion').map do |div|
asin = Asin.new({rank: div.at_css('.zg_rankNumber').text, asin: div.at_css('.zg_title a')['href'].split('/dp/')[1][0...10]})
end
end
end
threads.each(&:join)
CSV.open('./de_rank.csv', 'w+') do |csv|
asins.each do |a|
csv << [a.asin, a.rank]
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment