Skip to content

Instantly share code, notes, and snippets.

@srawlins
Last active September 27, 2023 22:28
Show Gist options
  • Save srawlins/ad5ef4d153bc0fc223e1 to your computer and use it in GitHub Desktop.
Save srawlins/ad5ef4d153bc0fc223e1 to your computer and use it in GitHub Desktop.
CommonMark-like spec text for tables

CommonMark-like spec text for tables

Motivation

Some very popular Markdown implementations include support for a table syntax (pandoc, PHP Markdown Extra, and GitHub-Flavored Markdown, to name a few). Basically support for the following syntax:

Head cell | Another
----------|--------
Cell text | Another cell
More cells| below...
Inlines   | __allowed__

which renders as:

Head cell Another
Cell text Another cell
More cells below...
Inlines allowed

I am unaware of any great syntax definition for the above table syntax, and while writing support for tables in the Dart "markdown" package, I thought I'd write up a syntax definition, in the spirit of the CommonMark spec.

This spec is more useful to an author of a Markdown parser than it is to someone trying to just write some Markdown (in the same way that the CommonMark spec is more useful to Markdown parser authors). For a more terse explanation of how to write these Markdown tables, GitHub's text is very approachable.

Todo

The spec below is not 100% complete. In particular:

  • Need to spell out how whitespace around | characters is trimmed.
  • Need to spell out how tables must be separated from other block elements by blank lines.
  • Need to see where exactly the rules below are different from the rules of GFM, PHP Markdown Extra, and pandoc.

Tables

A table consists of a table row, followed by a table head divider, followed by zero or more [table rows](#table row). The result is a table, with a table head consisting of the parsed results of the first table row, and a table body consisting of the parsed results of all of the table rows that follow the table head divider. Cell alignment can be declared in the table head divider.

A table row consists of a line of text, containing at least one non-whitespace character, with no more than 3 spaces indentation. The line of text must be one that, were it not followed by the table head divider, would be interpreted as part of a paragraph: it cannot be interpretable as a code fence, ATX header, block quote, horizontal rule, list item, or HTML block.

The contents of a table row are the results of a three-step process:

  1. Any leading and trailing | characters at the beginning of the line and end of the line are removed. They are allowed for source readability.
  2. The line is then parsed as Markdown inline content.
  3. Any textual content is then scanned for | characters. Every | character results in a table cell boundary.

A table head divider consists of an optional opening | character, followed by a sequence of at least two table column markers, each separated by a | character, followed by an optional | character, with no more than 3 spaces indentation and any number of trailing spaces. The width and number of table column markers has no consequences on the contents of the table (except in the declarations of alignment).

A table column marker consists of an optional opening : character, followed by a sequence of - characters, followed by an optional closing : character. The alignment of the contents of a column are defined by the presence of the opening and closing : characters:

  • No opening or closing : characters indicates no declared alignment.
  • An opening : character without a closing : character indicates a declared "left" alignment.
  • A closing : character without an opening : character indicates a declared "right" alignment.
  • Both an opening and a closing : character indicates a declared "center" alignment.

Here is a simple example:

foo  | bar                              <table>
-----|----                              <thead>
some | text                             <tr><th>foo</th><th>bar</th></tr>
in   | cells                            </thead>
                                        <tbody>
                                        <tr><td>some</td><td>text</td></tr>
                                        <tr><td>in</td><td>cells</td></tr>
                                        </tbody>
                                        </table>

Wrapping | characters do not change the results:

| foo  | bar   |                        <table>
|------|-------|                        <thead>
| some | text  |                        <tr><th>foo</th><th>bar</th></tr>
                                        </thead>
                                        <tbody>
                                        <tr><td>some</td><td>text</td></tr>
                                        </tbody>
                                        </table>

The number of cells in each row can be variable:

| foo  |                                <table>
|------|------|                         <thead>
| some | text |                         <tr><th>foo</th></tr>
| in   | many | cells |                 </thead>
                                        <tbody>
                                        <tr><td>some</td><td>text</td></tr>
                                        <tr><td>in</td><td>many</td><td>cells</td></tr>
                                        </tbody>
                                        </table>

Each row is parsed as inline Markdown, and cell divisions cannot occur anywhere except textual content:

| `foo` |                               <table>
|-------------|-------------------|     <thead>
| `foo | bar` | [link](weird|url) |     <tr><th><code>foo</code></th></tr>
                                        </thead>
                                        <tbody>
                                        <tr><td><code>foo | bar</code></td><td><a href="link">weird|url</a></td></tr>
                                        </tbody>
                                        </table>

Table bodies are not required:

| foo  | bar   |                        <table>
|------|-------|                        <thead>
                                        <tr><th>foo</th><th>bar</th></tr>
                                        </thead>
                                        <tbody>
                                        </tbody>
                                        </table>

Single-columned tables are allowed, but the table head divider still must contain at least one |, to distinguish it from a setext header.

foo                                     <table>
|-----|                                 <thead>
                                        <tr><th>foo</th></tr>
                                        </thead>
                                        <tbody>
                                        </tbody>
                                        </table>
foo                                     <table>
-|-                                     <thead>
                                        <tr><th>foo</th></tr>
                                        </thead>
                                        <tbody>
                                        </tbody>
                                        </table>
foo                                     <h2>foo</h2>
---

Alignment can be specified in the table head divider:

foo | bar | baz| quux                   <table>
:---|:---:|---:|-----                   <thead>
left | center | right | unspecified     <tr><th style="text-align: left;">foo</th><th style="text-align: center;">bar</th><th style="text-align: right;">baz</th><th>quux</th></tr>
                                        </thead>
                                        <tbody>
                                        <tr><td style="text-align: left;">left</td><td style="text-align: center;">center</td><td style="text-align: right;">right</td><td>unspecified</td></tr>
                                        </tbody>
                                        </table>
@ell1e
Copy link

ell1e commented Sep 27, 2023

Has anyone ever considered an extension to allow multiline cells? I find myself always yearning for those whenever I use markdown tables. Something like this:

| my | table                |
|----|----------------------|
| abc| cell 1 row 1\        |
|    | and it continues!    |
| def| cell 2               |

which would then produce something like this:

<table>
<thead><tr><th>my</th><th>table</th></tr></thead>
<tbody>
<tr><td>abc</td><td>cell 1 row 1 and it continues!</td></tr>
<tr><td>def</td><td>cell 2</td></tr>
</tbody>
</table>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment