Created
May 20, 2013 12:47
-
-
Save mkwiatkowski/5612006 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
require 'nokogiri' | |
good_html = "<div> <span>first</span> <span>second</span> </div>" | |
bad_html = "<div> <span><a>first</a></span> <span>second</span> </div>" | |
# This worked under libxml2 2.8.0 (default in Ubuntu 12.10), but doesn't work anymore with libxml2 2.9.0 (under Ubuntu 13.04). | |
puts Nokogiri::HTML(good_html).search('div span:nth-child(2)').first # => <span>second</span> | |
puts Nokogiri::HTML(bad_html).search('div span:nth-child(2)').first # => nil | |
# I found two workarounds. One is to use ">" for more precise selection: | |
puts Nokogiri::HTML(good_html).search('div > span:nth-child(2)').first # => <span>second</span> | |
puts Nokogiri::HTML(bad_html).search('div > span:nth-child(2)').first # => <span>second</span> | |
# Another workaround is to use nth-of-type instead of nth-child: | |
puts Nokogiri::HTML(good_html).search('div span:nth-of-type(2)').first # => <span>second</span> | |
puts Nokogiri::HTML(bad_html).search('div span:nth-of-type(2)').first # => <span>second</span> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment