Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Twitter quiz solution

Yesterday I posted a little quiz on Twitter about HTML parsing.

The question was: what element is going to be the parent of the final <s> in the following snippet of HTML:

<div><table><svg><foreignObject><select><table><s>

The final answers are:

  • table: 29.2%
  • select: 13.7%
  • div: 23.7%
  • <s> won't be in the dom: 33,5%

The correct answer is... <div>! And before going into a short explanation why, let just discuss other options and why they might seem correct for people uninitiated to the madness of HTML parsing.

Theory #1: table seems correct, since <s> is directly after <table>!

Many tags in HTML behave just like containers for other tags. For instace, we can nest <div>s as many times we like. The following snippet:

<div id=1><div id=2><div id=3>

is parsed into the following DOM fragment:

div id="1"
└─ div id="2"
   └─ div id="3"

However, the same is not true for a quite significant amount of HTML elements. <table> is one such example.

Simplyfying things a little bit, the only tags you can put in <table> are table-specific tags, like <tbody>, <td>, <tr> and so on (the parser will automatically add some tags in those cases but let's not go that much into details).

If we try to put almost anything else in a table, then these elements are put before the table (the process is called foster parenting). So the following snippet:

<table><s>

is parsed into:

├─ s
└─ table

So an <s> cannot be a direct child of a <table>.

Theory #2: I know about foster parenting hence <select> is a correct answer because it is a parent of <table>!

<select> is another special element in the HTML spec. In a nutshell, <select> works like the old function strip_tags for most elements. The following list shows the only valid children of <select>:

  • option
  • optgroup
  • script
  • template

So the following snippet of HTML:

<select><script></script><a><b><template></template><s>

is parsed into:

select
├─ script
└─ template

Notice that only <script> and <template> have been created.

The special parsing rules of <select> are described in the "in select" insertion mode in HTML spec.

Theory #3: so the correct answer must be "<s> won’t be in the dom" because of the special parsing rules for <select>!

Most people selected this answer. I believe that's because they were aware of the special way <select> is parsed and assumed that would be the case here.

The problem is that there is another insertion mode in HTML spec, called "in select in table". So there are special rules for parsing <select> that is inside a <table>.

The relevant difference is that whenever any table specific element is encountered then <select> is implicitly closed, and then parsing is the same as in table.

So let's go back to tables for a short while. We already know about foster parenting. The second question is: what happens if we try to embed a <table> inside <table>? The answer is: the first table is implicitly closed.

Thus the following snippet:

<table id=1><s><table id=2><td>

is parsed into:

├─ s
├─ table id="1"
└─ table id="2"
   └─ tbody
      └─ tr
         └─ td

The <s> is foster parented, that's why it's before the table. But then the second table implicitly closes the first one. And <td> is then a child of the second table.

So let's have a look at the snippet from the quiz again:

<div><table><svg><foreignObject><select><table><s>

Up until <select> we have the following DOM tree:

└─ #document
   └─ html
      ├─ head
      └─ body
         └─ div
            ├─ svg
            │  └─ foreignobject
            │     └─ select
            └─ table

So far so good: <svg> and its children are foster parented so they are before the table. At this point we are in the "in select in table" insertion mode. Because the next token is another <table>, we implicitly close <select> and switch to "in table" insertion mode. Because the previous <table> is still not closed, the new table implicitly closes it and opens another one.

So after the second <table> the DOM tree is:

└─ #document
   └─ html
      ├─ head
      └─ body
         └─ div
            ├─ svg
            │  └─ foreignobject
            │     └─ select
            ├─ table
            └─ table

So finally, when we encounter <s>, it is foster parented, being put just before the second <table> and the final DOM tree is:

└─ #document
   └─ html
      ├─ head
      └─ body
         └─ div
            ├─ svg
            │  └─ foreignobject
            │     └─ select
            ├─ table
            ├─ s
            └─ table

So <s> is indeed a child of <div>.

Appendix: Firefox appears to have a parser bug and parses it differently:

└─ #document
   └─ html
      ├─ head
      └─ body
         └─ div
            ├─ svg
            │  └─ foreignobject
            │     ├─ select
            │     ├─ s
            │     └─ table
            └─ table

I'm going to report the bug to Mozilla very soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment