Skip to content

Instantly share code, notes, and snippets.

@timcharper
Created February 3, 2009 21:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save timcharper/57751 to your computer and use it in GitHub Desktop.
Save timcharper/57751 to your computer and use it in GitHub Desktop.
Nokogiri Issue
html = <<-EOF
<html>
<body>
<table>
<tr>
<td>
One
<table><tr><td>Nested Cell #1</td></tr></table>
</td>
</tr>
<tr>
<td>
Two
</td>
</tr>
</table>
</body>
</html>
EOF
require "nokogiri"
require "hpricot"
# ---------
# -HPRICOT-
# ---------
root_table = (Hpricot(html) / "body > table")
# this works as expected
puts (root_table / "tr").length # => 3
# this also works as expected
puts (root_table / "> tr").length # => 2
# ----------
# -NOKOGIRI-
# ----------
noko_root_table = (Nokogiri::HTML.parse(html) / "body > table")
# this also works as expected
puts (noko_root_table / "tr").length # => 3
# this BREAKS
# puts (noko_root_table / "> tr").length
# I would expect this to select child elements only, but it recurses and gets the nested cells as well
puts (noko_root_table / "//tr").length # => 3
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment