Skip to content

Instantly share code, notes, and snippets.

@QuietMisdreavus
Last active February 7, 2019 08:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save QuietMisdreavus/391a611779ac893ad990a599b3a728e5 to your computer and use it in GitHub Desktop.
Save QuietMisdreavus/391a611779ac893ad990a599b3a728e5 to your computer and use it in GitHub Desktop.
things rustdoc wants from the compiler for 2019 goals

Below is a listing of rustdoc features and their current stumbling blocks that require coordination from outside the Rustdoc Team itself. Usually this involves needing more data from the compiler, but there are couple items outside of that as well.

intra-doc links

Intra-doc links are the rustdoc feature where you can write the path to a type to link to it, so that you don't have to guess at where the HTML file is going to be relative to the item you're documenting:

pub struct SomeStruct;

/// This function does stuff with a [`SomeStruct`].
pub fn some_fn(asdf: SomeStruct) {}
/// take [`asdf`] and do things with it.
///
/// [`asdf`]: Vec
pub fn some_fn(asdf: Vec<String>) {}

"cross-crate resolver scopes"

The short version: Rustdoc wants to use the name resolver with scopes that come from other crates.

The problem

Rustdoc's "intra-doc links" feature leans heavily on the compiler's name resolver, to determine what item a given name should point to. This works great when we're looking at an item that is declared in the crate we're documenting. However, users can re-export items from other crates and make it appear as if its documentation is a part of the downstream crate. (The standard library does this for everything in libcore, and other crates that are written as a collection do this to create a unified facade library.)

In this situation, rustdoc only has the item's DefId, and can recall its documentation, but if the item has intra-doc links on it, we can't recall the resolver information for the scope it was declared in. That information is discarded during the compilation of the other crate. In this situation, rustdoc uses the scope of the pub use statement, but this can create confusing situations where links get broken because the re-export scope doesn't have all the same items used that the original scope did.

The proposed solution

The ideal solution would be to introduce a way for compiler to save the resolver scope information of a crate, rather than discarding it, so that rustdoc can use this information to resolve intra-doc links using the correct scope. eddyb proposed something like this in the original PR for intra-doc links: rust-lang/rust#47046 (comment)

Another option would be to save just the intra-doc link information as part of the compile process. This would involve putting a Markdown parser into the compiler, and loading the documentation of every item as it runs through name resolution, so that we can detect intra-doc links and resolve them.

Loading metadata from downstream crates

The short version: Rustdoc would like to load metadata from downstream crates, so that intra-doc links can point to more than just a crate's dependencies.

The problem

Several libraries have broken up their code into multiple crates. This can create benefits for library authors and downstream users, but documentation can suffer in this situation, since it's difficult to treat the collection as a whole. Intra-doc links are hit especially hard, since there's no way for the compiler or rustdoc to know anything about crates that aren't dependencies of the one where an item is declared. If you want to link from the "main" crate to a "bonus" crate, or from a proc-macro crate to the one that declares the items it derives, you're forced to go back to direct links to pages. This forces your crate to accept a "canonical" location for your docs, which may not be docs.rs (in the case of Rocket, for example).

The reason for this is that rustdoc just uses the resolver that comes out of the regular compilation process, meaning that it can only access information about the crate's dependencies, not any "supplemental" crates, and especially not any downstream crates. If there were a way to take the resolver and load in extra information that's not needed for regular compilation, without creating a dependency cycle situation, we could use this information to enhance intra-doc links.

Some discussion about this problem can be found in the tracking issue for intra-doc links: rust-lang/rust#43466 (comment)

The proposed solution

This may already exist today, but ideally rustdoc would like to take the resolver information after compilation and add additional information to it - in this case, more extern crates that we can shoehorn into the extern prelude. (There's a related feature i want to add involving adding the equivalent use statements after the fact, which may dovetail into this concept.) With this, we could support linking to more crates than just its dependencies.

#[doc(cfg)]

This is the feature that adds the "This is supported by (Windows/Unix/etc) only" banners seen in std::os and on other platform-specific items. It's rustdoc's current method of addressing docs for crates that present different interfaces to different platforms. Since it goes hand-in-hand with using conditional compilation to present items from every platform when rustdoc is running, rustdoc also sets cfg(rustdoc) when the feature gate is active. This creates some interesting situations, depending on the crate being documented.

aside: everybody_loops

One secret weapon that rustdoc uses to sidestep problems with referencing items from a "foreign" platform is the pretty-print mode everybody_loops. This mode replaces all expressions in function bodies with loop {}. (It used to replace entire function bodies, but got extended to keep items, as trait impls inside functions can be used outside that function.) You can see the results of this by passing -Z unpretty=everybody_loops to rustc. By removing most references to items that may not be available on the host's platform, rustdoc misses several potential resolver issues.

problem: what everybody_loops misses

Consider the following code:

#[cfg(any(rustdoc, windows))]
#[doc(cfg(windows))]
pub fn my_handle() -> winapi::shared::ntdef::HANDLE { ... }

To document this item, we need to be able to resolve the HANDLE type from winapi. However, the winapi crate is entirely behind a #![cfg(windows)], so if we try to run documentation on not-Windows, this type isn't available, and causes a compilation error when we try to document it. Modifying the signature of this function isn't possible, because then we wouldn't be presenting the right documentation.

I don't have a good solution to this. I don't know if it's advisable to silence certain resolution errors during compilation, and make rustdoc take this "broken reference" and just not link that type?

pipe dream: overhauling conditional compilation

A few months ago i schemed up an idea to overhaul the first few phases of compilation, and make it so that the #[doc(cfg)] attribute itself would become unnecessary. This is probably a more problematic idea than the others in this document, but it would create the "holy grail" situation many people desire in rustdoc.

The basic idea is that rustdoc would request a compilation mode where cfg attributes are ignored, and the name resolver takes these into account during name resolution. Rustdoc would then take the "exploded" version of the crate and generate documentation for all possible configurations, not just the ones from the active platform or that were marked with cfg(rustdoc).

doctest refactoring

As documentation tests are likely to be a focus this year for the Rustdoc Team, we're eyeing current papercuts and seeing what would be necessary to solve them.

Custom doctest parsing mode

The short version: The way that rustdoc currently parses and modifies doctests is flaky, and we'd like to be able to use the real libsyntax parser for steps where we currently use a plain-text search. However, this is harder than it sounds, because doctests don't behave like normal crates.

The problem

Rustdoc allows doctests to be written as a sequence of statements/expressions, and will automatically wrap them with a main function to compile it as a standalone binary. This is extremely convenient for documentation writers - you don't need to write the fn main boilerplate yourself, and can treat doctests like unit tests - but can create a lot of edge cases where the abstraction leaks and the modification can fail.

  • What if your doctest requires some nightly feature?
    • Okay, rustdoc will scan ahead and leave crate-level #! attributes outside the new fn main.
    • However, this scan is line-based, and doesn't check to see whether an attribute is stretched over multiple lines!
  • What if your doctest is particularly complicated, and defines its own fn main?
    • Okay, rustdoc will scan ahead to see whether your doctest defines its own fn main before creating a new one. (This used to be text-based, but was updated last year to use libsyntax, after attempting to splice out enough doctest code that the parser would actually attempt to parse items instead of choking on crate-level attributes.)
    • However, this scan will completely skip over main functions that are generated by macros! (This is the impetus for this spot in the meeting - see rust-lang/rust#57415 and the PR it links to for details.)
  • What if your doctest needs to import things from your crate, and you're running in Rust 2015?
    • Okay, rustdoc will insert an extern crate my_crate; statement into the code before compiling it.
    • (There is also detection to make sure that the doctest doesn't already declare this - it happens at the same time as fn main detection.)
    • However, the import it does add doesn't import macros! You need to add the import yourself so you can decorate it with #[macro_use].
    • However, rustdoc does some line-by-line text parsing before the libsyntax-based loop, and it breaks if the #[macro_use] attribute and the extern crate statement are on separate lines! This can cause it to appear inside the generated fn main.

...and so on. These are all issues caused by the fact that rustdoc needs to modify the code in a doctest before it can compile, but it also needs to read through the doctest to ensure it doesn't perform the wrong modifications.

The proposed solution

We would ideally like to run macro expansion on a doctest before checking a doctest for fn main or extern crate my_crate statements - this allows crates to wrap their main function in a macro call, or generate it from a macro in the first place. However, the compiler driver API assumes a regular AST crate, which doesn't allow bare expressions in module scope, which forbids us from sending nearly all regular doctests through the compile process without some modification.

The idea that follows is a kind of special "doctest crate" that we can send through macro expansion, before transforming it into a proper AST crate and sending it on through the rest of the compilation process. The major issue is the way doctests are set up: Since they can contain regular items as well as loose statements, we need to be able to attempt to parse an item, then roll back the parser state if there's no item at the position, so we can switch modes to parse a statement instead. However, from what i can tell, the parser isn't set to look ahead in the stream for more than a few tokens, and the code to parse an arbitrary item is quite involved.

Rustdoc JSON support

One possible feature the Rustdoc Team could focus on this year is the addition of a machine-readable JSON output for docs, instead of only having the current HTML output. The most likely avenue where this approach could be used is by extending the save-analysis output to include the information that rustdoc emits that it currently does not. There's not a specific request right now, but we would need to coordinate with whoever knows save-analysis the most (+ RLS people?) to see the best way to make the new information work in the existing structure.

A few months ago, i tried to start this discussion, and it led me to the conclusion of extending save-analysis instead of rolling our own: https://internals.rust-lang.org/t/design-discussion-json-output-for-rustdoc/8271

static-filez

This isn't a compiler thing, but more of a cargo/rustup thing. Another possible focus for the Rustdoc Team this year is extending our existing HTML output with a new compressed archive output option that works with a utility like killercup's static-filez, to reduce the number of files rustdoc emits and potentially speed up the documentation process. This portion is probably best served by a small conversation between the team, killercup, and a few cargo/rustup people.

For more information about this plan, see this comment on the 2019 rustdoc roadmap thread: rust-dev-tools/dev-tools-team#41 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment