Skip to content

Instantly share code, notes, and snippets.

@mkwiatkowski
Created May 20, 2013 12:47
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mkwiatkowski/5612006 to your computer and use it in GitHub Desktop.
Save mkwiatkowski/5612006 to your computer and use it in GitHub Desktop.
require 'nokogiri'
good_html = "<div> <span>first</span> <span>second</span> </div>"
bad_html = "<div> <span><a>first</a></span> <span>second</span> </div>"
# This worked under libxml2 2.8.0 (default in Ubuntu 12.10), but doesn't work anymore with libxml2 2.9.0 (under Ubuntu 13.04).
puts Nokogiri::HTML(good_html).search('div span:nth-child(2)').first # => <span>second</span>
puts Nokogiri::HTML(bad_html).search('div span:nth-child(2)').first # => nil
# I found two workarounds. One is to use ">" for more precise selection:
puts Nokogiri::HTML(good_html).search('div > span:nth-child(2)').first # => <span>second</span>
puts Nokogiri::HTML(bad_html).search('div > span:nth-child(2)').first # => <span>second</span>
# Another workaround is to use nth-of-type instead of nth-child:
puts Nokogiri::HTML(good_html).search('div span:nth-of-type(2)').first # => <span>second</span>
puts Nokogiri::HTML(bad_html).search('div span:nth-of-type(2)').first # => <span>second</span>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment