Skip to content

Instantly share code, notes, and snippets.

@jmscholen
Created July 28, 2016 15:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jmscholen/36794a6cc12a7399cd66ae8c68a63836 to your computer and use it in GitHub Desktop.
Save jmscholen/36794a6cc12a7399cd66ae8c68a63836 to your computer and use it in GitHub Desktop.
Scraping Info from HTML page
#example of scrapping information such as a listing from a website
#that list results in a container for each 'match' of the search criteria.
require 'nokogiri'
website = "www.example.com"
scrapped_info = Array.new
doc = File.open(ARGV[0]) { |f| Nokogiri::HTML(f) }
doc.css('.result-container').each do |x|
#want to push results into a hash so that we can query it later
a = {
"name" => x.css('strong').text,
"details" =>
{
"introduction" => "Type: #{x.css('leaf').text}" + "| Schedule: #{x.css('clock-o').text}",
"start_age" => x.css('.row[2] div').text.split(/-/)[0].split(" "),
"end_age" => x.css('.row[2] div').text.split(/-/)[1].split(" "),
"address_street_name" => x.css('map-marker result-category-content').text.split(/,/)[0],
"address_city_name" => x.css('map-marker result-category-content').text.split(/,/)[1],
"website_url" => "#{website}" + "#{x.css('@href')}"
}
}
scrapped_info << a
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment