Skip to content

Instantly share code, notes, and snippets.

@sharwell
Last active May 11, 2024 15:22
Show Gist options
  • Save sharwell/ab7a6ccab745c7e0a5b8662104e79735 to your computer and use it in GitHub Desktop.
Save sharwell/ab7a6ccab745c7e0a5b8662104e79735 to your computer and use it in GitHub Desktop.
Documentation comments revised

Overview

Markdown documentation comments are a backwards-compatible replacement for XML documentation comments.

  • If the first non-whitespace character of the comment is <, it is treated as an XML documentation comment
  • Otherwise, the comment is treated as a Markdown documentation comment

Unlike XML documentation, Markdown comments are allowed anywhere a line or block comment is allowed in the language.

🔗 dotnet/csharplang#891

Language and Compiler

Language changes

While XML documentation files will remain the standard for shipping documentation with assemblies, the language will relax its rules surrounding the form for these comments in code.

  1. The behavior of a documentation comment whose first non-whitespace character is not < is implementation-defined.
  2. The behavior of a documentation comment not placed on a type or member is implementation-defined.
  3. Documentation comments are allowed to contain arbitrary valid XML. In addition to the elements defined in earlier versions of the C# language, documentation rendering tools are encouraged to support the following elements:
    • <em>
    • <strong>
    • <inheritdoc>
    • <a href="">
    • <see href="">
    • <see langword="keyword">

Compiler changes

The compiler translates documentation comments for exposed types and members to XML during the build. For new documentation comments, the compiler delegates the translation to a documentation analyzer, which is responsible for:

  1. Translation of documentation comments to XML form for inclusion in compiler outputs
  2. Analysis of documentation comments for any diagnostics

The compiler provides a default documentation analyzer which handles XML documentation comments. It may provide a minimal documentation analyzer for non-XML documentation comments based on a minimal CommonMark behavior, which is used if no other documentation analyzer is provided.

IDE extensibility

Documentation analyzers interact at a low level with the compiler. The documentation analyzer specifies a content type for the documentation contents, which IDEs may use to provide a default editing experience. A separate documentation presenter can be provided which interacts with higher-level IDE features. It is responsible for:

  • Classification
  • Find references
  • Get symbol info (determine the symbol(s) referenced by a specific location within the comment)
  • Complexification and simplification
  • Rename

Sections

Sections are treated as extensions to the thematic breaks behavior of CommonMark.

@summary

The first section of a comment is the summary. This section may optionally start with a @summary thematic break. The @summary element typically does not need to be specified explicitly. However, a user may want to include it for one of the following reasons:

  • The content of the summary section starts with a < character, which would otherwise cause the compiler to treat the comment as an XML documentation comment.
  • The content of the summary section includes more than one paragraph, and the comment does not include a remarks section.

@remarks

The remarks section starts following the @remarks thematic break.

Other sections

🚧 Other sections may be supported by this design. Possible approaches include:

  1. Restrict the sections to @summary and @remarks
  2. Allow additional sections, but restrict the set to an allowlist
  3. Allow any section in the form ^\s*@\w+\s*$, or perhaps a more restricted form focusing on identifiers

Implicit breaks

If the @summary and @remarks thematic breaks are omitted, a @remarks thematic break is implicitly added immediately following the first paragraph of the summary section.

Parameters

Parameters are defined using an extension to the list syntax.

🚧 The delimiter syntax is not finalized for this, but may look like one of the following:

  • name:
  • @param name
  • @name

The documentation for a parameter follows the list delimiter under the same rules as bulleted or numbered lists.

Type parameters

Type parameters would be documented in a manner similar to parameters.

🚧 The name portion of the delimiter syntax is not finalized, but could be either T or <T> for a type parameter T.

Return values

Return values would be documented in the same list as parameters and/or type parameters.

🚧 The delimiter syntax is not finalized, but could be one of the following:

  • return:
  • returns:
  • @return
  • @returns

Tuple elements

Tuple elements may be documented in the same manner as parameters, appearing as a nested list under the item whose type is a tuple.

/// point: The point to scale
///     x: The x-coordinate of the point to scale
///     y: The y-coordinate of the point to scale
/// scale: The amount by which to scale the point
/// return: The scaled point
///     x: The x-coordinate of the scaled point
///     y: The y-coordinate of the scaled point
(double x, double y) Scale((double x, double y) point, double scale);

Code and References

By default, code within a comment is validated. In their simplest forms, inline code and code blocks are treated as code in the same language as the containing document.

  • Inline code may be treated as "plain" code by using one more set of backticks than is necessary for escaping purposes.

    • `semantic`
    • ``"Semantic string with backtick (`)"``
    • ``plain``
    • ```plain backtick (`)```
  • Fenced code may be treated as "plain" code in the current lanuage by including plain in the info string.

    ```
    // In a C# source file, this is treated as C# code and semantically validated
    void Method() { }
    ```
    ```csharp
    // This is semantically validated
    void Method() { }
    ```
    ```csharp plain
    // This is highlighted as C# code but not semantically validated
    void Method() { }
    ```

Resolving references

  • For comments not placed in a code block, resolve the comments from a pseudo-context "inside" the element (i.e. parameters resolve, then element name, then containers...)
  • For comments preceding a statement which can have child statements, resolve the comments from the beginning of the first child statement
  • For comments preceding a standalone statement, resolve the comments from the end of the statement
  • For comments at the end of a code block, resolve from the current location
@daveaglick
Copy link

I like this proposal a lot, particularly the use of thematic breaks.

I'm not in love with "plain" though - maybe "unvalidated" (a little longer but clearer, IMO)?

Another suggestion I have is to consider the use of WikiMedia [[...]] syntax for references (I.e., to handle <see>, <paramref>, etc.) That makes references more explicitly different from other code blocks. It follows that if simple references are specified using WikiMedia syntax that perhaps inline code with backticks should be considered "plain"/"unvalidated" by default (with the option to turn on validation using the extra backtick). I would then continue to validate code fence blocks by default. This follows the pattern I've anecdotally noticed in my own code: <c> elements usually contain somewhat arbitrary and incomplete code while <code>, <example>, etc. blocks often contain semantically valid code.

@xoofx
Copy link

xoofx commented Jan 8, 2019

@sharwell hey, I'm a huge proponent of this idea! Could you illustrate this proposal with a single but fairly complete example (comparing with XML) so that it can help us to get a better view on the final experience?

@sharwell
Copy link
Author

sharwell commented Jan 8, 2019

@xoofx The following is a possible translation of https://github.com/tunnelvisionlabs/dotnet-threading/blob/3e99a9d13476a1e8224d81f282f3cedad143c1bc/Rackspace.Threading/TaskBlocks.cs#L16-L68:

/// Provides support for resource cleanup in asynchronous code where the `async`/`await`
/// keywords are not available.
///
/// This code implements support for the following construct without requiring the use of
/// `async`/`await`.
///
/// ```
/// using (IDisposable disposable = await resource().ConfigureAwait(false))
/// {
///     return await body(disposable).ConfigureAwait(false);
/// }
/// ```
///
/// This method expands on the `using` statement provided by C# by implementing support for
/// `IAsyncDisposable` as described in [IAsyncDisposable, using statements, and async/await](https://github.com/dotnet/roslyn/issues/114).
///
/// If the `resource` function throws an exception, or if it returns `null`,
/// or if the `Task<TResult>` it returns does not complete successfully, the resource will not be
/// acquired by this method. In either of these situations the caller is responsible for ensuring the
/// `resource` function cleans up any resources it creates.
///
/// The following example asynchronously acquires a resource by calling the user method ``AcquireResourceAsync``.
/// The resource will be disposed after the body executes, prior to returning the result of the body.
///
/// ```
/// public Task<string> UsingWithResult()
/// {
///     return TaskBlocks.Using(
///         () => AcquireResourceAsync(),
///         task => task.Result.ReadToEndAsync());
/// }
///
/// private Task<StringReader> AcquireResourceAsync()
/// {
///     // this would generally contain an asynchronous call
///     return CompletedTask.FromResult(new StringReader("Text to read"));
/// }
/// ```
///
/// For reference, the following example demonstrates a (nearly) equivalent implementation of this behavior using
/// the `async`/`await` operators.
///
/// ```
/// public async Task<string> UsingWithResultAsyncAwait()
/// {
///     using (StringReader resource = await AcquireResourceAsyncAwait())
///     {
///         return await resource.ReadToEndAsync();
///     }
/// }
///
/// private async Task<StringReader> AcquireResourceAsyncAwait()
/// {
///     // this would generally contain an asynchronous call
///     return new StringReader("Text to read");
/// }
/// ```
///
/// TResource: The type of resource used within the task and disposed of afterwards.
/// TResult: The type of the result produced by the continuation `Task<TResult>`.
///
/// resource: A function which acquires the resource used during the execution of the task.
/// body: The continuation function which provides the `Task<TResult>` which acts as the body of the `using` block.
///
/// return: A `Task` representing the asynchronous operation. When the task completes successfully,
///     the `Task<TResult>.Result` property will contain the result provided by the
///     `Task<TResult>.Result` property of the task returned from `body`.
///
/// @exceptions
///
/// ArgumentNullException:
/// * If `resource` is `null`.
/// * If `body` is `null`.
/// InvalidOperationException:
/// * If `resource` returns `null`.

📝 Notes:

  • There is no equivalent for <note type="caller"> in this proposal. I inlined the paragraph without the annotation.
  • The proposal does not describe exception documentation, so I translated it according to the following rules:
    • I treated @exceptions as a thematic break
    • I indicated the exception type in the same way type parameters and parameters are marked
    • For exception conditions separated by -or-, I used a bulleted list

@xoofx
Copy link

xoofx commented Jan 8, 2019

Thanks for the sample!

We should strive to try to use -abuse- standard Markdown constructions so that we can leverage the existing infrastructure a lot more efficiently. For example, for the exceptions, I would try more something like this:

# exceptions

- ArgumentNullException:
   - If `resource` is `null`
   - If `body` is ``null`
- InvalidOperationException: if `resource` is `null`

The advantage is that:

  1. you leverage the existing markdown syntax. Paste this to a normal document, it will highlight immediately without a special parser
  2. parsing wise, it is very easy to recover the bits without even implementing a dedicated parser extensions

Same could apply to the example section, or a remarks section

@xoofx
Copy link

xoofx commented Jan 8, 2019

For parameters, same:

# params

- TResource: The type of resource used within the task and disposed of afterwards.
- TResult: The type of the result produced by the continuation `Task<TResult>`.

- resource: A function which acquires the resource used during the execution of the task.
- body: The continuation function which provides the `Task<TResult>` which acts as the body of the `using` block.
- return: A `Task` representing the asynchronous operation. When the task completes successfully,
    the `Task<TResult>.Result` property will contain the result provided by the
    `Task<TResult>.Result` property of the task returned from `body`.

It requires a bit of explicit list items, but reusing this same comment in or from a markdown document would be super straightforward. I could type my comments in a markdown editor, and copy them to a comment block.

@xoofx
Copy link

xoofx commented Jan 8, 2019

There is no equivalent for in this proposal. I inlined the paragraph without the annotation.

you should also check what docfx has been doing in that matter on some extensions that you might be able to reuse (Example: warning, note sections https://dotnet.github.io/docfx/spec/docfx_flavored_markdown.html?tabs=tabid-1%2Ctabid-a#note-warningtipimportant ). It could be good to leverage some of their syntax to make sure their parser will be able to work with our markdown code comments

I'm not necessarily a fan of their syntax, but as they might end-up processing these comments directly, we should probably make sure that it will integrate well with it.

@mstum
Copy link

mstum commented Jan 8, 2019

I like this! I'm wondering if it should be specified that <GenerateDocumentationFile> still creates an XML File for compatibility reasons with existing document-html-generators? During compilation, the markdown could be translated to XML (e.g., entity replacement < => &lt;), replacing code fences with <code> or <pre> elements, etc. etc. etc.

I would also propose an @example section, or rather the ability to add multiple ones. Like .EXAMPLE in PowerShell. That would also allow e.g. Visual Studio to fold them by default to reduce clutter and only show the summary and/or remarks by default.

Oh, and while I'm scope creeping: I wonder if we can officially have documentation support on Namespaces and for the Assembly as a whole? Like how the official docs have a summary at the top and a remarks section at the bottom. That's a suggestion for whatever project generates the XML file - would that be Roslyn?

@sharwell
Copy link
Author

sharwell commented Jan 8, 2019

Another suggestion I have is to consider the use of WikiMedia [[...]] syntax for references (I.e., to handle <see>, <paramref>, etc.) That makes references more explicitly different from other code blocks.

@daveaglick Note that one of the main features of this is the simplest way to reference things is intended to be the most correct. In other words, I want to make sure that `CodeElement` produces a reference to CodeElement works with Find All References, Rename, and other IDE features. I want users to need to take extra steps to produce a span of code which isn't treated as linked.

@sharwell
Copy link
Author

sharwell commented Jan 9, 2019

We should strive to try to use -abuse- standard Markdown constructions so that we can leverage the existing infrastructure a lot more efficiently. For example, for the exceptions, I would try more something like this:

@xoofx My main concern here is the top-level bulleted list is not especially obvious. Also, and even though my example doesn't address the problem, I'd like to avoid the need to use headings for cases where users stick to the most common subset of information:

  1. Summary
  2. Remarks (second and additional paragraphs), including simple example code blocks appearing in the remarks
  3. Parameters and return value
  4. Exceptions

you should also check what docfx has been doing in that matter on some extensions that you might be able to reuse (Example: warning, note sections https://dotnet.github.io/docfx/spec/docfx_flavored_markdown.html?tabs=tabid-1%2Ctabid-a#note-warningtipimportant ).

The note/warning syntax could likely be informally adopted without actually including it in the specification for the comments themselves. It's pretty clean for the subset of users who expect that to work.

During compilation, the markdown could be translated to XML

@mstum Yes, I was thinking this would occur for documentation on exposed code elements.

I would also propose an @example section, or rather the ability to add multiple ones.

I've written quite a few documentation examples, and I'm still not convinced we need a specific section for this. For example, it would be easy to provide this by simply adding a # Examples heading within the remarks section. For now I'm focusing on other areas that aren't as easily resolved.

Oh, and while I'm scope creeping: I wonder if we can officially have documentation support on Namespaces and for the Assembly as a whole?

This proposal is much more relaxed than the current language regarding comment placement. It should be straightforward to address this.

@mosra
Copy link

mosra commented Jan 10, 2019

Background: I'm the author of the m.css content authoring/documentation framework, which, among other things, contains a Doxygen-based tool for documenting C++ projects (example). I'm now creating a similar tool for C#, based on the XMLDoc output.

Why I'm commenting: I have some experience with both writing and parsing JavaDoc/Doxygen syntax and I think some of my insights could be useful to you. Mainly to avoid repeating the same mistakes Doxygen did :) Please note that the below experience was made when documenting C++ code, but it equally applies to C# as well as it's mainly about the doc block syntax.

The proposal above is reminding me a lot of what Markdown-enabled Doxygen looks like, which immediately suggests a question about making the syntax compatible with Doxygen. An argument for doing that would be to make it easier for users (no new syntax to learn), however there is a lot of counter-arguments:

  • The syntax has a few really nasty corner cases. One of them is the @ref command, which is used to reference symbols. The argument to @ref is "anything in the following text as long as it looks like a symbol", which leads to a very complicated parser implementation. In your proposal above, you wrap the reference in backticks, which makes much more sense.

  • Doxygen has various notions of specially-styled blocks -- @note, @warning etc., which for example put the next paragraph in a yellow box to make it more visible. This is a very useful feature to have (I don't see it in the above proposal yet), but the problem is that it's limited to a single paragraph. Often it's desired to have more paragraphs in a @note and, in order to support that, Doxygen had to implement a complex handling on the parser side that merges adjacent @notes into a single block.

  • Similarly to @notes, general nesting of block-level elements is problematic. Markdown, as I see it, was not designed for complex layouting capabilities and in order to do more complex things users often have to resort in writing plain HTML inside (which in turn means most Markdown parsers have to implement HTML parsing at some point as well). The usual cases I'm hitting very often are:

    • a code block or a @note in a list (have to use HTML <ul> to achieve correct nesting)
    • code blocks, lists or generally anything more complex than plain text inside tables (again I have to use HTML <table> to make my way around the limitations)

To give an example, a real-world case of a more complex documentation layout could be this: Magnum::Animation::Easing -- it combines a responsive table-like layout containing embedded SVG images, together with math rendering and custom styled elements. While not absolutely essential, having such options at hand when writing docs is what makes the difference.

So, knowing the limitations of the Markdown/Doxygen syntax, an alternative idea I am now toying with is to use the same syntax as Sphinx has -- reStructuredText. The main syntactical difference for your above examples would be that it's :foo: instead of foo: / @foo, but the rest would stay mostly the same -- references to symbols with backticks, inline code with double backticks. The main advantages of this syntax are:

  • It's already used by Sphinx for Python projects (and reStructuredText alone is used by many for authoring web content in tools like Pelican), so the users can again reuse something well known (and it can open the possibilities for C# support in Sphinx or https://readthedocs.org)
  • paragraph nesting and other advanced layouts are not a problem, code-block in a note in a list in a table is a completely valid use case parsable in a completely unambiguous way
  • the syntax is well-documented and made to be easily extensible for new inline and block elements without resorting to hacks (so you could introduce a builtin .. exceptions:: directive, for example)
  • since the syntax is standard and there are many parsers for it, it'll be easy to add support for this to 3rd party tools without forcing them to implement their own modified parsers

To visualize how this could look, here's an example taken from this tweet (sorry, it's an image, don't have the original code anymore) --- again the particular code here is C++, using /** */-style comments, but the doc block syntax is language-agnostic. Resulting rendered docs, for comparison, are here.

sphinx

I hope this rather lengthy comment is of some use to this discussion :)

@CyrusNajmabadi
Copy link

@sharwell Definitely curious about how you would like symbol-references to be encoded. Do you have thoughts on that?

@CyrusNajmabadi
Copy link

Definitely curious on more details about things like:

```csharp
// This is semantically validated
void Method() { }
```

Even syntactic validation is tricky given the desire to write potentially arbitrary code, without clear indications about what scope that code would be contained in. For example, i could easily see someone writing code that would only be valid at the namespace/type level, or only valid inside a type, or only valid as a statement, or only valid as an expression. Both syntactic and semantic validation are def tricky here.

If there is to be validation, i would suggest something like csharp=statement. If the scope wasn't provided, perhaps the IDE would make some reasonable efforts to try to figure out what was going on, but i would't then validate.

THoughts?

--

Note: i really like this proposal :) Not trying to poke holes, just trying to start good discussions on thorny problems!

@nanoant
Copy link

nanoant commented Nov 20, 2019

@sharwell Hi Sam, is there a chance to push this forward. E.g. having this spec RFC moved to C# Lang issue list so it can be referenced by dotnet/csharplang#2394 and tracked? I am afraid that if there's not enough pressure we gonna end up without any viable alternative to XML comments.
And to be honest the example you're showing https://github.com/tunnelvisionlabs/dotnet-threading/blob/3e99a9d13476a1e8224d81f282f3cedad143c1bc/Rackspace.Threading/TaskBlocks.cs#L16-L68 clearly demonstrates how bad and unreadable eyesore XML comments are.

image

@CyrusNajmabadi
Copy link

A separate documentation presenter can be provided which interacts with higher-level IDE features. It is responsible for:

Note: this sounds very similar to the IEmbeddedLanguage system i built for embedded json/regex literals. When you get to this part, i would both like to be part of the discussion, and i think it would be good if we could evaluate how we might be able to build a system where we have one single concept here instead of multiple similar concepts.

Note: a recent thought i've been having here is that all of these areas should simply be represented as (Contained)?Documents. Documents already have a defined and understandable way to get at structure and to expose services. And we already have the concept of embedded documents in the ASP/razor system. The only real difference we need is:

  1. arbitrary nesting levels. We would expect to potentially have a 'markdown' doc embedded in a C# doc. Then we would expect to potentially have 'semantically inert c#' docs embedded in the 'markdown doc'.
  2. an appropriate registration/discovery system for the language processors here.
  3. a system to load/embed processing of the different sections to the different processors. Note that this would have to be collaborative. i.e. the markdown-provider would be the one that would have to figure out which sections of itself would then have to load and be processed by a diferent language processor.

--

The system i built was effectively this. Though i didn't reuse the Document abstraction as i was worried too much about the potential size impact on the rest of the system. I intentionally tried to keep things explicitly separated for simplicity. However, i would want to not do that in the future if this is a first-class concept in the workspace and presentation models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment