Skip to content

Instantly share code, notes, and snippets.

@pfrenssen
Last active April 17, 2018 09:55
Show Gist options
  • Save pfrenssen/c3de505be9ff2f6dc6eef6ff82b73830 to your computer and use it in GitHub Desktop.
Save pfrenssen/c3de505be9ff2f6dc6eef6ff82b73830 to your computer and use it in GitHub Desktop.
Why does XPath return an empty set?

Update: this was caused by a bug in libxml2, which has in the meantime been reported and fixed. https://bugzilla.gnome.org/show_bug.cgi?id=795299

I have an HTML page with two tables, one of which has a row with mixed header and data cells, the second having only data cells:

<html>
  <head>
    <title>Table test page</title>
  </head>
  <body>
    <table>
      <tr>
        <th>Header 1</th>
        <td>Header 2</td>
      </tr>
    </table>
    <table>
      <tr>
        <th>Header 1</th>
        <th>Header 2</th>
      </tr>
    </table>
  </body>
</html>

I am using the following XPath expression to select table cells from the first row of the first table, by combining the results for <th> and <td>. This works fine, I get both cells back. Note that the first table combines and in a single row.

$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)' test.html
<th>Cell 1</th><td>Cell 2</td>

When I try to get the first cell by appending [1] to the expression I get the second cell back instead of the first. Why? The expected result of this expression is <th>Cell 1</th>:

$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[1]' test.html
<td>Cell 2</td>

Appending [2] yields the second cell successfully. But why not the first?

$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[1]//tr)[1]/th)[2]' test.html
<td>Cell 2</td>

Similarly, for the second table I can get all cells with this expression. The second table only contains <th> cells in the first row.

$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)' test.html
<th>Cell 1</th><th>Cell 2</th>

However, when I append [1] to get only the first cell back, I get an empty result. The expected result for this expression is <th>Cell 1</th>.

$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[1]' test.html
XPath set is empty

The second cell can be retrieved successfully by appending [2]. But why not the first?

$ xmllint --xpath '(((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/td|((descendant-or-self::html/descendant-or-self::table)[2]//tr)[1]/th)[2]' test.html
<th>Cell 2</th>

Note that by replacing the first descendant-or-self with // the expression seems to work as expected in all cases for both the first and second table:

$ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)' test.html
<th>Cell 1</th><td>Cell 2</td>

$ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[1]' test.html
<th>Cell 1</th>

$ xmllint --xpath '(((//html/descendant-or-self::table)[1]//tr)[1]/td|((//html/descendant-or-self::table)[1]//tr)[1]/th)[2]' test.html
<td>Cell 2</td>

$ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)' test.html
<th>Cell 1</th><th>Cell 2</th>

$ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[1]' test.html
<th>Cell 1</th>

$ xmllint --xpath '(((//html/descendant-or-self::table)[2]//tr)[1]/td|((//html/descendant-or-self::table)[2]//tr)[1]/th)[2]' test.html
<th>Cell 2</th>

Here is the test case implemented in PHP which also uses libxml2: https://3v4l.org/WuPSt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment