Skip to content

Instantly share code, notes, and snippets.

@agoose77
Last active September 24, 2022 17:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save agoose77/e8f0f8f7d7133e73483ca5c2dd7b907f to your computer and use it in GitHub Desktop.
Save agoose77/e8f0f8f7d7133e73483ca5c2dd7b907f to your computer and use it in GitHub Desktop.
from sphinx.transforms import SphinxTransform
import sphinx.environment.collectors.toctree as toctree_collector
from sphinx import addnodes
from docutils import nodes
from typing import Any, Dict, List, Set, Tuple, Type, TypeVar, cast
from docutils import nodes
from docutils.nodes import Element, Node
from sphinx import addnodes
from sphinx.application import Sphinx
from sphinx.environment import BuildEnvironment
from sphinx.environment.adapters.toctree import TocTree
from sphinx.environment.collectors import EnvironmentCollector
from sphinx.locale import __
from sphinx.transforms import SphinxContentsFilter
from sphinx.util import logging, url_re
N = TypeVar("N")
logger = logging.getLogger(__name__)
class SignatureContentsFilter(SphinxContentsFilter):
# This logic is obtuse, but what's happening here is that we set up the
# .parent array to hold a _container_ around a text node.
# The `SkipNode` exception: https://github.com/docutils/docutils/blob/ae4d18314a821e61a24dc0e4f29a691b7c3b656e/docutils/docutils/nodes.py#L2159-L2164
# ensures that the `default_depart` method: https://github.com/docutils/docutils/blob/ae4d18314a821e61a24dc0e4f29a691b7c3b656e/docutils/docutils/nodes.py#L2127-L2130
# isn't called.
# It also ensures the container's children aren't visited - we want to stop here
# `fullname` is the qualified name of the documented object.
def visit_desc_signature(self, node):
# Replace this element with a simple element wrapping text
self.parent.append(nodes.Element("", nodes.Text(node.attributes["fullname"])))
raise nodes.SkipNode
# Most of this is copied from Sphinx
class BetterTocTreeCollector(toctree_collector.TocTreeCollector):
def process_doc(self, app: Sphinx, doctree: nodes.document) -> None:
"""Build a TOC from the doctree and store it in the inventory."""
docname = app.env.docname
numentries = [0] # nonlocal again...
# This is changed to a generator, and the class condition removed
def traverse_in_section(node: Element) -> List[N]:
"""Like traverse(), but stay within the same section."""
yield node
for child in node.children:
if isinstance(child, nodes.section):
continue
elif isinstance(child, nodes.Element):
yield from traverse_in_section(child)
def build_toc(node: Element, depth: int = 1) -> nodes.bullet_list:
# The logic here is a bit confusing.
# It looks like section nodes are expected to be top-level within a section.
entries: List[Element] = []
for sectionnode in node:
# find all toctree nodes in this section and add them
# to the toc (just copying the toctree node which is then
# resolved in self.get_and_resolve_doctree)
if isinstance(sectionnode, nodes.section):
title = sectionnode[0]
# copy the contents of the section title, but without references
# and unnecessary stuff
visitor = SphinxContentsFilter(doctree)
title.walkabout(visitor)
nodetext = visitor.get_entry_text()
# if nodetext and nodetext[0] == "ak.ArrayBuilder":
# print(node)
# break
if not numentries[0]:
# for the very first toc entry, don't add an anchor
# as it is the file's title anyway
anchorname = ""
else:
anchorname = "#" + sectionnode["ids"][0]
numentries[0] += 1
# make these nodes:
# list_item -> compact_paragraph -> reference
reference = nodes.reference(
"",
"",
internal=True,
refuri=docname,
anchorname=anchorname,
*nodetext
)
para = addnodes.compact_paragraph("", "", reference)
item: Element = nodes.list_item("", para)
sub_item = build_toc(sectionnode, depth + 1)
if sub_item:
item += sub_item
entries.append(item)
elif isinstance(sectionnode, addnodes.only):
onlynode = addnodes.only(expr=sectionnode["expr"])
blist = build_toc(sectionnode, depth)
if blist:
onlynode += blist.children
entries.append(onlynode)
# Otherwise, for a generic element we allow recursion into the section
elif isinstance(sectionnode, nodes.Element):
for node in traverse_in_section(sectionnode):
if isinstance(node, addnodes.toctree):
item = node.copy()
entries.append(item)
# important: do the inventory stuff
TocTree(app.env).note(docname, node)
# For signatures within some section, we add them to the ToC
elif isinstance(node, addnodes.desc_signature):
title = node
# Copy the signature, but without the detail e.g. parens
visitor = SignatureContentsFilter(doctree)
title.walkabout(visitor)
nodetext = visitor.get_entry_text()
if not numentries[0]:
# for the very first toc entry, don't add an anchor
# as it is the file's title anyway
anchorname = ""
else:
anchorname = "#" + node["ids"][0]
numentries[0] += 1
# make these nodes:
# list_item -> compact_paragraph -> reference
reference = nodes.reference(
"",
"",
internal=True,
refuri=docname,
anchorname=anchorname,
*nodetext
)
para = addnodes.compact_paragraph("", "", reference)
item: Element = nodes.list_item("", para)
entries.append(item)
if entries:
return nodes.bullet_list("", *entries)
return None
toc = build_toc(doctree)
assert docname in app.env.tocs
if toc:
app.env.tocs[docname] = toc
else:
app.env.tocs[docname] = nodes.bullet_list("")
app.env.toc_num_entries[docname] = numentries[0]
def setup(app):
app.add_env_collector(BetterTocTreeCollector)
@asmeurer
Copy link

I tried this and got this error

Extension error (sphinx_toctree_signature):
Handler <bound method BetterTocTreeCollector.process_doc of <sphinx_toctree_signature.BetterTocTreeCollector object at 0x115feb9d0>> for event 'doctree-read' threw an exception (exception: list index out of range)

Any idea how to get the full traceback?

@agoose77
Copy link
Author

Ooh yikes. In theory you can wrap the entire process_doc in a

try: 
    ...
except Exception as exc:
    traceback.print_exc(exc)

@asmeurer
Copy link

Here's the traceback.

  File "/Users/aaronmeurer/Documents/gists/e8f0f8f7d7133e73483ca5c2dd7b907f/sphinx_toctree_signature.py", line 135, in process_doc
    toc = build_toc(doctree)
  File "/Users/aaronmeurer/Documents/gists/e8f0f8f7d7133e73483ca5c2dd7b907f/sphinx_toctree_signature.py", line 84, in build_toc
    sub_item = build_toc(sectionnode, depth + 1)
  File "/Users/aaronmeurer/Documents/gists/e8f0f8f7d7133e73483ca5c2dd7b907f/sphinx_toctree_signature.py", line 84, in build_toc
    sub_item = build_toc(sectionnode, depth + 1)
  File "/Users/aaronmeurer/Documents/gists/e8f0f8f7d7133e73483ca5c2dd7b907f/sphinx_toctree_signature.py", line 84, in build_toc
    sub_item = build_toc(sectionnode, depth + 1)
  File "/Users/aaronmeurer/Documents/gists/e8f0f8f7d7133e73483ca5c2dd7b907f/sphinx_toctree_signature.py", line 115, in build_toc
    anchorname = "#" + node["ids"][0]
IndexError: list index out of range

@asmeurer
Copy link

I think it's coming from an autodoc entry that has :noindex:.

@asmeurer
Copy link

I was able to fix it with

diff --git a/sphinx-toctree-signature.py b/sphinx-toctree-signature.py
index 21f21b8..e0040a9 100644
--- a/sphinx-toctree-signature.py
+++ b/sphinx-toctree-signature.py
@@ -111,6 +111,8 @@ class BetterTocTreeCollector(toctree_collector.TocTreeCollector):
                                 # for the very first toc entry, don't add an anchor
                                 # as it is the file's title anyway
                                 anchorname = ""
+                            elif not node["ids"]:
+                                continue
                             else:
                                 anchorname = "#" + node["ids"][0]
                             numentries[0] += 1

@asmeurer
Copy link

Other than that, it seems to work great. I used it on the SymPy documentation, which has many large pages with many autodoc functions. In some cases we use the one heading per function style and in some cases we don't (typically because of automodule + :members:), but this handles everything great. There was a concern in the issue about getting the toc in the right order. I didn't test it extensively, but it appears to be in the correct order. Maybe there's an automated way to check it?

@agoose77
Copy link
Author

This should respect the order of the statements - it does a depth-first search, so the ToC should reflect the document structure.

It would be sensible for something like SymPy to somehow disable the existing ToC for performance reasons. You could even monkey-patch the existing ToC builder by setting TocTreeCollector.process_doc

@asmeurer
Copy link

Now I'm trying to figure out if this can be extended for glossary terms. I think we just need to do similar logic for term nodes.

@asmeurer
Copy link

I think I figured the glossary entries out. See https://gist.github.com/asmeurer/5009f8845f864bd671769d10e07d1184.

Do you have any plans to make this into a real Sphinx extension? Or are you planning to try to upstream it into Sphinx? Either way, this seems to work quite well and I'd like to be able to use it.

@agoose77
Copy link
Author

I haven't any plans just yet. It probably needs to be upstreamed into Sphinx, but at the same time, this logic could perhaps be more extensible to begin with (e.g. your case where you added support for a glossary). I'm not sure that I can get to that right at this moment, but maybe worth a discussion.

@asmeurer
Copy link

Just having this its current state would be very helpful. I wouldn't worry too much about proper extensibility for the moment. I'm not personally aware of any other things that I would like to be added to the ToC. It can be added as a v2 feature.

Most things you can just add an actual header for. And based on the existing code for glossary entries and function signatures, I don't expect proper extensibility is very straightforward. Each requires logic for that specific node type to extract the proper ToC heading name and anchor.

My impression from the issue discussion is that upstreaming this would not be very easy, although I could be wrong about that. That's obviously the most ideal, but I rather see this be in a usable extension for now than languish in a Sphinx PR or this gist. That was also @pradyunsg's suggestion here sympy/sympy#23332 (comment).

At worst, if you're not interested in doing anything further with this, do I have permission to take this an use it in SymPy (under the SymPy license)?

@agoose77
Copy link
Author

@asmeurer oh, I'd thought about injecting entries into the ToC, but I wondered whether it would be more work than overriding the ToC builder. Seeing that example (I hadn't realised autoapi does this), I think that looks like it has the potential to be slightly cleaner.

I also think the autoapi example would fit better with an extension. I will build an extension and make it publicly available. However, you might want to just vendor it into Sympy given how little code it will be, that's up to you.

As for this code, yes you do. It's obviously adapted from the Sphinx core, so if you'd be able to check the license and restore any license information that would be handy. I didn't get round to it yet.

@agoose77
Copy link
Author

To add, this is terrible code - I was hacking away at it yesterday to build an understanding of how the ToC works. That's why there are a number of unused imports 😅

The SphinxContentsFilter, for example, isn't needed - just pull the text from the desc node directly.

@pradyunsg
Copy link

FWIW, I do think this is a useful improvement to make to Sphinx directly as well. :)

sphinx-doc/sphinx#6316 is the relevant issue, but I think folks here may know about that already.

@asmeurer
Copy link

I've made an additional change in my fork, which is to make it so that methods are sub-items of their respective classes. This makes the toctree much easier to read for pages with :autoclass: + :members:. I also removed the SphinxContentsFilter and removed the unused imports. The code could maybe be cleaned up a little more, but I'm not too concerned about it. https://gist.github.com/asmeurer/5009f8845f864bd671769d10e07d1184

I've realized that an unfortunate consequence of this is that the entries get added to :toctree: in addition to the theme sidebar (well, duh), so you basically have to use :maxdepth: 1 everywhere or else you get unreadable tables of contents (at least for SymPy). Although tbh I'm not sure if we even need nonhidden toctrees anywhere given the theme navigation.

I plan on using my forked version of this in SymPy. Please let me know if you make any further progress yourself on it, either with this version or for upstreaming it. I would love for this to just be a flag in Sphinx, but I'm not personally up to the task of upstreaming it.

@pradyunsg
Copy link

You can use :titlesonly: in the toctree, perhaps?

@agoose77
Copy link
Author

I'll let you know if/when I tackle this @asmeurer. Right now it's lower on my priority list, but I will try and get to it eventually :)

@asmeurer
Copy link

You can use :titlesonly: in the toctree, perhaps?

Yes, this looks like exactly what we want. We really should be using this everywhere instead of :maxdepth:. Is there a similar option for the Furo left sidebar btw? For now, we can just require one H1 header per page, but it's annoying when a page doesn't follow that and its subheadings get put in the left sidebar.

@pradyunsg
Copy link

@pradyunsg
Copy link

Oh wait, is that a single page with multiple h1s? If so, all the h1s will end up in the left sidebar and there's not much that Furo can do about that -- Sphinx's toctree context variable adds every h1 to the toctree HTML in that manner, with no real metadata to allow Furo to trim things. See https://pradyunsg.me/furo/kitchen-sink/structure/#structural-elements-2 for example.

Besides, restricting pages to have a single h1 is the right thing to do anyway -- https://developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements#multiple_h1_elements_on_one_page

@tony
Copy link

tony commented Aug 28, 2022

@agoose77 Is this fine to re(-use) under the same license as Sphinx-doc (BSD)?

@agoose77
Copy link
Author

@tony this code extract is taken from Sphinx core (apidoc), so whilst I won't add or modify the license in any way, you'd need to follow the Sphinx licensing rules.

@tony
Copy link

tony commented Aug 29, 2022

@agoose77 Thank you! It looks like the same license as Sphinx, then. That works well! (They are a permissive license so fit the case of people sharing / re-using the code well)

@asmeurer
Copy link

Oh wait, is that a single page with multiple h1s? If so, all the h1s will end up in the left sidebar and there's not much that Furo can do about that -- Sphinx's toctree context variable adds every h1 to the toctree HTML in that manner, with no real metadata to allow Furo to trim things. See pradyunsg.me/furo/kitchen-sink/structure/#structural-elements-2 for example.

Besides, restricting pages to have a single h1 is the right thing to do anyway -- developer.mozilla.org/en-US/docs/Web/HTML/Element/Heading_Elements#multiple_h1_elements_on_one_page

Yes. It's a little confusing that the left sidebar is H1 and the right sidebar is H(n > 1). I would expect it to be left sidebar = pages, right sidebar = headings.

I guess it's expecting H1 to equal "page", and that makes sense since that's the only way a page gets its name. It would be nice if there were tooling to enforce one H1 per page, though. Part of the problem is the way headings work in RST. I've noticed other instances of incorrect headings which were never noticed. Furo's right sidebar makes it much easier to see what is going on with the heading levels.

@pradyunsg
Copy link

There's reasonable Markdown linting tooling though, which combined with MyST-Parser would make it relatively straightforward to enforce. :)

AFAIK, the only rough edge with that is API docs, and even those aren't that painful.

@asmeurer
Copy link

I found another interesting thing here. When using automodule with :members:, if the module docstring has headers in it, then the members of that module are placed under the last header. This is done in the docutils node structure. I don't know if it is intentional or not. It's rather unfortunate for this extension since typically the module docstrings are written to be independent docstrings from the function docstrings, not as a heading for them.

I'm unsure if I should try to work around this in this extension. It's possible this is an autodoc bug because no one ever noticed the structure of the nodes, in which case it ought to be fixed upstream.

@asmeurer
Copy link

asmeurer commented Sep 6, 2022

I figured out what is going on with automodule, if anyone is interested. It's a bug in the way Sphinx handles module docstrings, and as far as I can tell it's impossible to work around it in this extension. sphinx-doc/sphinx#10804

@pradyunsg
Copy link

Adding a cross-reference to sphinx-doc/sphinx#10807 which is a PR to Sphinx, that... I'll go with "drew inspiration from" this Gist. :)

@agoose77
Copy link
Author

agoose77 commented Sep 9, 2022

TBF this gist ... drew inspiration from the source in Sphinx ;) I just repackaged it.

@tony
Copy link

tony commented Sep 24, 2022

This is baked into sphinx 5.2

Settings options:

  • add_function_parentheses = False (default: True)
  • toc_object_entries_show_parents can be (default: 'domain'):
    • toc_object_entries_show_parents = 'domain'
    • toc_object_entries_show_parents = 'hide'
    • toc_object_entries_show_parents = 'all'
Example w/ Furo theme (screenshot)

My setting:

toc_object_entries_show_parents = 'hide'

URL: https://libvcs.git-pull.com/sync/git.html (may break in future, sorry!)

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment