Skip to content

Instantly share code, notes, and snippets.

@mhelsley
Created April 5, 2023 05:17
Show Gist options
  • Save mhelsley/980dc606e5213c68e86e85a9dbb29b33 to your computer and use it in GitHub Desktop.
Save mhelsley/980dc606e5213c68e86e85a9dbb29b33 to your computer and use it in GitHub Desktop.

RFC: Enable Grouping of Code Blocks in CommonMark

Abstract

It may be useful to group multiple code blocks while interspersing non-code sections (e.g. step-by-step explanations) between them.

Background and Motivation

Some tools process code blocks to test them and ensure they work.

With larger code blocks it is sometimes helpful to intersperse explanatory documentation within the code.

This is possible in plain documentation simply by ending the code block.

However this loses some potentially useful information: that these adjacent code blocks with helpful explanation interspersed between them are part of a coherent group.

This information could be useful to documentation tools and to the testing tools that use CommonMark (or minor dialects of CommonMark) to specify 'doctests' (e.g. Rust's rustdoc).

Alternatives Considered

Not Adopting this RFC

A tool could implement a dialect of CommonMark. This is likely the best alternative to adopting the RFC. However, if the community agrees on its utility, adopting the RFC may enable more interoperability between tools than fragmenting into dialects would.

Solutions Involving Parsing Deeper Into Code Blocks

Since the code's language syntax would have to be compatible with any proposed way of reaching inside the code block an extensive consideration of the possible languages and many combinations of corner cases would have to be performed. Such a survey would be complicated, highly error prone, and likely result in a complex array of restrictions both on what future languages could be supported, the code that could appear in code blocks, and thus be a substantial burden on folks writing in CommonMark.

Some existing dialects, for example rustdoc, have augmented the code block processing to comment out lines from appearing in the documentation.

However, to enable interspersed documentation this results in massive amounts of textual duplication just to enable the extended examples with interspersed explanations -- they recommend "commenting out" the preceding and following code blocks so that, if there are N interspersed comments every block of code essentially gets repeated N+1 times. This creates N opportunities to introduce subtle differences which allow the documentation example to appear functional while, if the reader simply copies-and-pastes the code it will not work as expected.

So it would be between to stick as close to the existing code block delimiting lexing/grammar as possible.

Semi-unbalanced Delimiters Denoting Groups of Code Blocks

Or CommonMark could adopt a method that can be useful in every context CommonMark content is useful since grouping code blocks could also be helpful to readers, documentation processors, etc.

To delineate code block groupings the idea is simple:

  1. All code block delimiters have at least three backticks
  2. To delineate the first in a group of code blocks a number of extra backticks, e.g. 2 is appended to the delimiter that starts the first code block.
  3. To end any code block three backticks are required
  4. To end the last code block in a group of code blocks the terminating delimiter must have the same number of backticks as the starting delimiter of the first code block in the group (in this example: 2)

Further Enhancement: Nested Groups of Code Blocks

  1. groups of code blocks may be nested
  2. To avoid ambiguity the code block groups must nest properly; A parent group delimiter must have strictly fewer backticks. A child group delimiter must have strictly more backticks.

Limiting Impact

Aiding Parser Implementations

Depending on whether it makes parsing easier it may be useful to add to this RFC:

  1. The total number of backticks in a code block group delimiter must not be a multiple of 3.

This could make it easier for parsers to tell the difference between plain code blocks and groups of code blocks.

Future Compatibility

It may also guard against unforseen compatibility issues with existing CommonMark tooling and CommonMark documents. In the future "must" could change to "should" or the requirement may be dropped if, after a sufficient period of time, no compatibility issues arise when some folks (inevitably) forget or ignore this rule.

An Illustrative Example

NOTE: See raw text of this RFC rather than the rendered result.
TODO: Use CommonMark to make the example render properly without needing to view the raw text.

fn foo() -> bar {}
```

explanatory text interspersed between adjacent related code
blocks

```
fn baz() {}

This can be especially useful for tools that extract and run extended examples that are most clear when interspersed with step-by-step explanations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment