Skip to content

Instantly share code, notes, and snippets.

@matthewbarram
Last active October 9, 2015 13:22
Show Gist options
  • Save matthewbarram/40a4aefedf5d2230ff50 to your computer and use it in GitHub Desktop.
Save matthewbarram/40a4aefedf5d2230ff50 to your computer and use it in GitHub Desktop.
#!/usr/bin/env ruby
require 'mechanize'
require 'pry'
puts 'Starting...'
base_url = 'https://www.cachingsupplies.com.au'
list_url = "#{base_url}/containers"
agent = Mechanize.new
list_page = agent.get(list_url)
products = list_page.search('div#productList a')
results = []
products.each do |product|
print "Fetching product #{products.index(product) + 1} from #{products.count}\r"
product_url = "#{base_url}#{product[:href]}"
product_page = agent.get(product_url)
result = {}
result[:description] = "#{product_page.search('.product-excerpt').first.text.gsub(",", " ")}"
result[:price] = "#{product_page.search('.sqs-money-native').first.text}" rescue nil
result[:image_url] = []
product_page.search('#productGallery').search('img').each do |image|
result[:image_url] << image.attributes["data-src"].value
end
result[:image_url] = "#{result[:image_url].uniq.join(' - ')}"
result[:link] = "#{product_url}"
results << result
end
file_name = "#{Time.now.to_s.match(/(\d{4})-(\d{2})-(\d{2}) (\d{2}):(\d{2}):(\d{2})/).captures.join('_')}.csv"
puts "\nGenerating CSV #{file_name}..."
File.open(file_name, 'w') do |csv|
csv.write "#{[results.first.keys.map(&:to_s)].join(",")}\n"
results.each { |result| csv.write("#{result.values.join(",")}\n") }
end
puts 'Done!'
source 'https://rubygems.org'
gem 'mechanize'
gem 'pry'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment