Skip to content

Instantly share code, notes, and snippets.

@chrisjsewell
Last active March 26, 2024 00:33
Show Gist options
  • Save chrisjsewell/0c5827add50074fef0937e2543e955b4 to your computer and use it in GitHub Desktop.
Save chrisjsewell/0c5827add50074fef0937e2543e955b4 to your computer and use it in GitHub Desktop.

Why you can't have nested titles (a.k.a. headings) in Sphinx

Over the course of my maintenance of sphinx and sphinx extensions (including myst-parser, myst-nb and sphinx-needs) the question of using headings within directives, or more generally nested inside other nodes of the docutils document tree has come up multiple times.

To not have to repeat myself anymore, I decided to write this short doc, to explain why this is not possible (or at least generally a bad idea).

How headings are processed

Lets take the following reStructuredText document as an example:

Section 1
=========

Section 1.1
-----------

Section 1.1.1
.............

Section 1.2
-----------

Section 1.2.1
.............

more ...

If we convert this to a document tree (the first stage of processing undertaken by docutils/sphinx), we get the following (note --no-transforms is not currently available in docutils):

rst2pseudoxml --no-transforms test.rst
<document source="test2.rst">
    <section ids="section-1" names="section\ 1">
        <title>
            Section 1
        <section ids="section-1-1" names="section\ 1.1">
            <title>
                Section 1.1
            <note>
                <section ids="section-1-1-1" names="section\ 1.1.1">
                    <title>
                        Section 1.1.1
                    <section ids="section-1-2" names="section\ 1.2">
                        <title>
                            Section 1.2
            <section ids="section-1-2-1" names="section\ 1.2.1">
                <title>
                    Section 1.2.1
                <paragraph>
                    more ...

If we run this through to a full HTML5 conversion, you get the following:

rst2html5 test.rst
<body>
  <main id="section-1">
    <h1 class="title">Section 1</h1>

    <section id="section-1-1">
      <h2>Section 1.1</h2>
      <section id="section-1-1-1">
        <h3>Section 1.1.1</h3>
      </section>
    </section>
    <section id="section-1-2">
      <h2>Section 1.2</h2>
      <section id="section-1-2-1">
        <h3>Section 1.2.1</h3>
        <p>more ...</p>
      </section>
    </section>
  </main>
</body>

This is all good 👍

What happens if you write headings in a directive

Lets now "nest" some of these headings inside a directive, for example the note admonition:

Section 1
=========

Section 1.1
-----------

.. note::

    Section 1.1.1
    .............

    Section 1.2
    -----------

Section 1.2.1
.............

more ...

In regular docutils / sphinx processing, this will raise an error:

rst2pseudoxml --no-transforms test1.rst
test1.rst:10: (SEVERE/4) Unexpected section title.

Section 1.1.1
.............
Exiting due to level-4 (SEVERE) system message.

What happens if you try to bypass docutils restrictions

In sphinx there is a function nested_parse_with_tiles, see: https://github.com/sphinx-doc/sphinx/blob/04bd0df100809de350be89b64bb85c3524867132/sphinx/util/nodes.py#L327 which "allows" you to do this (see below why I think this should not be part of the public API).

So lets hack into docutils and replace the note directive with a custom directive that uses nested_parse_with_titles:

from sphinx.util.nodes import nested_parse_with_titles

class NoteWithTitles(Directive):

    final_argument_whitespace = True
    option_spec = {'class': directives.class_option,
                   'name': directives.unchanged}
    has_content = True

    node_class = nodes.note

    def run(self):
        set_classes(self.options)
        self.assert_has_content()
        text = '\n'.join(self.content)
        admonition_node = self.node_class(text, **self.options)
        self.add_name(admonition_node)
        if self.node_class is nodes.admonition:
            title_text = self.arguments[0]
            textnodes, messages = self.state.inline_text(title_text,
                                                         self.lineno)
            title = nodes.title(title_text, '', *textnodes)
            title.source, title.line = (
                    self.state_machine.get_source_and_line(self.lineno))
            admonition_node += title
            admonition_node += messages
            if 'classes' not in self.options:
                admonition_node['classes'] += ['admonition-'
                                               + nodes.make_id(title_text)]
                   
        nested_parse_with_titles(self.state, self.content, admonition_node, self.content_offset)

        return [admonition_node]

Now lets run our modified docutils again:

rst2pseudoxml --no-transforms test1.rst
<document source="test2.rst">
    <section ids="section-1" names="section\ 1">
        <title>
            Section 1
        <section ids="section-1-1" names="section\ 1.1">
            <title>
                Section 1.1
            <note>
                <section ids="section-1-1-1" names="section\ 1.1.1">
                    <title>
                        Section 1.1.1
                    <section ids="section-1-2" names="section\ 1.2">
                        <title>
                            Section 1.2
            <section ids="section-1-2-1" names="section\ 1.2.1">
                <title>
                    Section 1.2.1
                <paragraph>
                    more ...

Maybe you can already see what the problem is, but lets convert this to HTML5 and the problem becomes more obvious:

rst2html5 test1.rst
<body>
  <main id="section-1">
    <h1 class="title">Section 1</h1>
    <p class="subtitle" id="section-1-1">Section 1.1</p>

    <aside class="admonition note">
      <p class="admonition-title">Note</p>
      <section id="section-1-1-1">
        <h2>Section 1.1.1</h2>
        <section id="section-1-2">
          <h3>Section 1.2</h3>
        </section>
      </section>
    </aside>
    <section id="section-1-2-1">
      <h2>Section 1.2.1</h2>
      <p>more ...</p>
    </section>
  </main>
</body>

As you can see, Section 1.2 is now incorrectly an h3 element, and Section 1.2.1 is now incorrectly an h2 element.

In short, allowing nested headings seriously messes up the document structure, in a way that may not be immediately apparent (since there are no warnings/errors raised). This is even more of of a problem, for any document tree post-processing (a.k.a. transforms) that expect the correct document structure, and can lead them to them failing in unexpected ways (for example processing of toctrees).

This is an intrinsic limitation of the way docutils/sphinx works. I would emphasise that I don't believe it is a bug in docutils/sphinx though, because it is not a new problem when it comes to markup parsing (see for example jgm/djot#213)

So what is nested_parse_with_tiles actually used for?

In my opinion, nested_parse_with_tiles should probably be a private function in sphinx, as it is not really intended to be used by extensions or users, unless they really know what they are doing. Plus, ideally this functionality would be up-streamed into docutils itself, so that it was less of a hack of the underlying docutils parser.

The core use cases for nested_parse_with_titles are:

  1. If you need to "include" a document fragment inside another document, but want the heading level of the document fragment adjusted to the current level. This is utilised, for example, when including docstrings from Python code into sphinx documentation.

  2. If your directive is only used at the "top-level" of a document, and the generated nodes are not wrapped in any kind of "container" node, or you are also doing some form of "post-processing" on the generated nodes, to fix the problems that arise from nested headings. This is utilised, for example, in the ifconfig directive

These are both very specific and advanced use-cases, and should be used with absolute caution (and testing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment