Skip to content

Instantly share code, notes, and snippets.

@cgimenes
Last active February 14, 2019 02:01
Show Gist options
  • Save cgimenes/cef77314c42439564f02fffc70af1a15 to your computer and use it in GitHub Desktop.
Save cgimenes/cef77314c42439564f02fffc70af1a15 to your computer and use it in GitHub Desktop.
Simple ruby web scraper
require 'httparty'
require 'nokogiri'
require 'cgi'
all_objects = []
('a'..'z').each do |letter|
page = 1
total_pages = 1
while page <= total_pages do
puts "Letter: #{letter} - Page: #{page}"
doc = HTTParty::get "https://pudim.com.br/#{letter}?page=#{page}"
parse_page = Nokogiri::HTML doc
objects = parse_page.css(".objects").children.map{ |name| name.text }.compact
all_objects.concat objects
page += 1
last_page_elem = parse_page.css(".pagination .fa-angle-double-right").first
if !last_page_elem.nil?
if last_page_elem.parent.name == 'a'
total_pages = CGI::parse(URI::parse(last_page_elem.parent.attr("href")).query)['page'].first.to_i
end
end
end
end
puts 'Results: '
puts all_objects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment