Skip to content

Instantly share code, notes, and snippets.

@GusGA
Last active January 3, 2016 15:49
Show Gist options
  • Save GusGA/8484986 to your computer and use it in GitHub Desktop.
Save GusGA/8484986 to your computer and use it in GitHub Desktop.
Script scrapper que busca todos los id, precio Nuevo y refurbished de http://www.widespreadsales.com/Products/Circuit-Breakers, la tarea es tomada de una oferta de trabajo de freelancer.com
#!/usr/bin/ruby
require 'nokogiri'
require 'open-uri'
require 'progressbar'
@pages = 50 #7516
@page = "http://www.widespreadsales.com/Products/Search&category=cb&r_num=50&p="
@file = File.open(Dir.pwd + "/data", 'w')
@content = []
@articles_and_links = []
def scrap_list(page)
doc = Nokogiri::HTML(open(@page + page.to_s))
doc.xpath("//div/h2/a").each do |link|
#add [article_number, article_pathname]
@articles_and_links << [link.content, link.attributes["href"].value]
end
end
def extract_pages
pbar = ProgressBar.new("Saving Content", @articles_and_links.count)
@articles_and_links.each do |number, path|
prices_array = []
doc = Nokogiri::HTML(open("http://www.widespreadsales.com" + path))
doc.xpath('//div[1]/div[2]/h3').each do |inner_link|
price = inner_link.content.gsub("\r\n","").split(":")
prices_array << price
end
@content << { part_number: number,
prices: prices_array.count > 0 ? Hash[*prices_array.flatten] : "Request prices"
}
pbar.inc
end
pbar.finish
end
def write_content
pbar = ProgressBar.new("Writing File", @content.count)
@content.each do |line|
@file.write("\n")
@file.write(line)
pbar.inc
end
@file.close
pbar.finish
size = File.size(Dir.pwd + "/data")
p "***************"
p "Se han escrito #{@content.count} lineas"
p "El archivo tiene un tamaño de #{'%.2f' % [size / 1024.0]} kb"
p "***************"
end
def init
pbar = ProgressBar.new("Scrapping Web", @pages)
(1..@pages).each do |page|
scrap_list(page)
pbar.inc
end
pbar.finish
extract_pages
write_content
exit
end
init
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment