Skip to content

Instantly share code, notes, and snippets.

@SabretWoW
Last active July 15, 2018 04:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save SabretWoW/6322535 to your computer and use it in GitHub Desktop.
Save SabretWoW/6322535 to your computer and use it in GitHub Desktop.
Small Ruby script that demonstrates how to use Mechanize to scrape some product details from an array of product URLs from Zappos.com
# http://nokogiri.org/Nokogiri/XML/Node.html#method-i-css
require 'mechanize'
require 'csv'
puts "Product Scraper!!!"
puts ' '
urls = [
"http://www.zappos.com/seavees-teva-universal-sandal-concrete",
"http://www.zappos.com/teva-bomber-sandal-dark-olive",
"http://www.zappos.com/teva-jetter-cigar"]
file = "product_data.csv"
header = "title,sku,image,alt_images"
File.open(file, "w") do |csv|
csv << header
csv << "\n"
(0..urls.length - 1).each do |index|
puts urls[index]
agent = Mechanize.new
page = agent.get(urls[index])
title = page.title
title = title[0..title.index(' - ')].rstrip
sku = page.search("#sku").inner_text
sku = sku[4..sku.length-1]
prod_image = page.search("#detailImage img").first
alt_images = page.search("#productImages ul li a img")
brand_text = page.search("#brandText").inner_text
alt_images = alt_images.map { |x| x[:src] }.join("|")
csv << [title, sku, prod_image[:src], "#{alt_images}"]
csv << "\n"
end
2.times { |x| puts "" }
puts "Done!"
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment