Skip to content

Instantly share code, notes, and snippets.

@radar
Forked from smartynko/Nokogiri scraper
Last active January 27, 2017 01:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save radar/9bc44252696ecf73cfba328747e4872a to your computer and use it in GitHub Desktop.
Save radar/9bc44252696ecf73cfba328747e4872a to your computer and use it in GitHub Desktop.
require 'rubygems'
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("http://v-tac.eu/led-lights/lights1/led-spotlights/led-spotlight-7w-gu10-smd-white-plastic-white-detail.html"))
puts page.class # => Nokogiri::HTML::Document
fields = page.css(".product-field-type-S")
fields.each do |field|
key = field.css(".product-fields-title").text
value = field.css(".product-field-display").text
puts "#{key}: #{value}"
end
@smartynko
Copy link

smartynko commented Jan 27, 2017

I tried it out but it seems to return the same mess. I don't know why but I begin to suspect that it has something to do with special characters used in some .product-field-display fields like "<" and "/". If you don't mind please have a look at my output.
test_4_radar_scr

If it works for you there might be something wrong with my setting and I would like to find out what it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment