Skip to content

Instantly share code, notes, and snippets.

@jesusangelm
Created May 27, 2015 14:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jesusangelm/ec481aa6f018ee2c29a2 to your computer and use it in GitHub Desktop.
Save jesusangelm/ec481aa6f018ee2c29a2 to your computer and use it in GitHub Desktop.
Parseador que extrae los clasificados de alquiler de apartamento y casa de la web del periodico El Sol de Margarita.
require 'nokogiri'
require 'open-uri'
class ParserClasificados
def urls(type)
case type
when :apartment_rent
return "http://www.elsoldemargarita.com.ve/clasificados/index/fsection:1"
when :house_rent
return "http://www.elsoldemargarita.com.ve/clasificados/index/fsection:4"
end
end
def source_data(type)
url = urls(type)
return Nokogiri::HTML(open(url))
end
def classifieds_info(data)
return data.css("hr+ p").text
end
def classified_lists(data)
return data.css(".classified").text.split("\r\n")
end
def page_ammounts_inspector(type)
info = classifieds_info(source_data(type))
pages = info.split(" ")[3].to_i
return pages
end
def url_generator(type, ammount)
url_list = Array.new
url_base = urls(type)
1.upto(ammount) do |n|
url_list << url_base + "/page:#{n}"
end
return url_list
end
def parse_url_generated(type)
ammount = page_ammounts_inspector(type)
url_list = url_generator(type, ammount)
classifieds = Array.new
url_list.each do |url|
classifieds << classified_lists(Nokogiri::HTML(open(url)))
end
return classifieds.flatten
end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment