Skip to content

Instantly share code, notes, and snippets.

@miekg
Created September 19, 2015 14:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miekg/0146581dd9f885ec2bed to your computer and use it in GitHub Desktop.
Save miekg/0146581dd9f885ec2bed to your computer and use it in GitHub Desktop.
R. Gieben
Atoom
December 10, 2014
Using mmark to create I-Ds and RFCs
TechDoc 0x1
Abstract
This document describes an markdown variant called mmark [mmark] that
can be used to create RFC documents. The aim of mmark is to make
writing document as natural as possible, while providing a lot of
power on how to structure and layout the document.
The source of this document [1] provides a good example.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3
3. Mmark Syntax . . . . . . . . . . . . . . . . . . . . . . . . 3
4. TOML header . . . . . . . . . . . . . . . . . . . . . . . . . 3
5. Citations . . . . . . . . . . . . . . . . . . . . . . . . . . 3
6. Internal References . . . . . . . . . . . . . . . . . . . . . 4
7. Document divisions . . . . . . . . . . . . . . . . . . . . . 4
8. Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . 4
9. Captions . . . . . . . . . . . . . . . . . . . . . . . . . . 4
9.1. Tables . . . . . . . . . . . . . . . . . . . . . . . . . 5
9.2. Figures . . . . . . . . . . . . . . . . . . . . . . . . . 5
9.3. Quotes . . . . . . . . . . . . . . . . . . . . . . . . . 5
10. Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
11. Inline Attribute Lists . . . . . . . . . . . . . . . . . . . 6
12. Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
12.1. Ordered Lists . . . . . . . . . . . . . . . . . . . . . 7
12.2. Unordered Lists . . . . . . . . . . . . . . . . . . . . 7
12.3. Definition Lists . . . . . . . . . . . . . . . . . . . . 7
12.4. Example Lists . . . . . . . . . . . . . . . . . . . . . 8
13. Figures and Images . . . . . . . . . . . . . . . . . . . . . 8
13.1. Details . . . . . . . . . . . . . . . . . . . . . . . . 8
13.2. Images in v2 . . . . . . . . . . . . . . . . . . . . . . 10
14. Miscellaneous Features . . . . . . . . . . . . . . . . . . . 11
14.1. HTML Comment . . . . . . . . . . . . . . . . . . . . . . 11
14.2. Including Files . . . . . . . . . . . . . . . . . . . . 11
14.3. Including Code Fragments . . . . . . . . . . . . . . . . 11
15. XML2RFC V3 features . . . . . . . . . . . . . . . . . . . . . 12
15.1. Asides . . . . . . . . . . . . . . . . . . . . . . . . . 12
15.2. Notes . . . . . . . . . . . . . . . . . . . . . . . . . 12
Gieben Internal [Page 1]
TechDoc 0x1 mmark2rfc December 2014
15.3. RFC 2119 Keywords . . . . . . . . . . . . . . . . . . . 12
15.4. Super- and Subscripts . . . . . . . . . . . . . . . . . 12
16. Converting from RFC 7328 Syntax . . . . . . . . . . . . . . . 12
17. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 13
18. References . . . . . . . . . . . . . . . . . . . . . . . . . 13
18.1. Normative References . . . . . . . . . . . . . . . . . . 13
18.2. Informative References . . . . . . . . . . . . . . . . . 13
18.3. URIs . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Appendix A. Tips and Tricks . . . . . . . . . . . . . . . . . . 14
Appendix B. Bugs . . . . . . . . . . . . . . . . . . . . . . . . 15
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 15
1. Introduction
Mmark [mmark] is a markdown processor. It supports the markdown
syntax and has been extended with (syntax) features found in other
markdown implementations like kramdown [2], PHP markdown extra [3],
[pandoc], Scholarly markdown [4], leanpub [5] and even asciidoc [6].
This allows mmark to be used to write larger, structured documents
such as RFC and I-Ds or even books, while not deviating too far from
markdown.
Mmark is a fork of blackfriday [blackfriday], is written in Golang
and very fast. Input to mmark must be UTF-8, the output is also UTF-
8. Mmark converts tabs to 4 spaces.
The goals of mmark are:
(I) Self contained: a single file can be converted to XML2RFC v2
or (v3) or HTML5.
(II) Make the markdown "source code" look as natural as possible.
(III) Provide seemless upgrade path to XML2RFC v3.
(IV) Consistent interface, aim to minimize the number of weird
corner cases you need to remember while typing.
Using Figure 1 from [RFC7328], mmark can be positioned as follows:
Gieben Internal [Page 2]
TechDoc 0x1 mmark2rfc December 2014
+-------------------+ pandoc +---------+
| ALMOST PLAIN TEXT | ------> | DOCBOOK | <1>
+-------------------+ +---------+
| \ |
non-existent | \_________ | xsltproc
faster way | <2> *mmark* \ |
v v v
+------------+ xml2rfc +---------+
| PLAIN TEXT | <-------- | XML | <3>
+------------+ +---------+
Figure 1: Mmark (2) skips the conversion to DOCBOOK (2) and directly
outputs XML2RFC XML (3) (or HTML5).
Note that kramdown-2629 [7] fills the same niche as mmark.
2. Terminology
The folloing terms are used in this document:
v2: Refers to XML2RFC version 2 [RFC2926] output created by mmark.
v3: Refers to XML2RFC version 2 [I-D.hoffman-xml2rfc] output created
by mmark.
3. Mmark Syntax
In the following sections we go over some of the differences, and the
extra syntax features of mmark.
Note that there are no wrong markdown documents, but once converted
to XML may lead to an invalid document. Case in point: having a
table in a list and converting to v2.
4. TOML header
Mmark uses TOML [toml] document header to specify the document's meta
data. Each line of this header must start with an "%". The document
header is also different in v3, for instance the "docName" is not
used anymore.
5. Citations
A citation can be entered using the syntax from [pandoc]:
"[@reference]", such a reference is "informative" by default. Making
a reference informative or normative can be done with a "?" and "!"
respectively: "[@!reference]" is a normative reference.
Gieben Internal [Page 3]
TechDoc 0x1 mmark2rfc December 2014
For RFC and I-Ds the references are generated automatically, meaning
you don't need to include an XML reference element in source of
document.
For I-Ds you might need to include a draft version in the reference
"[@?I-D.blah#06]", creates an informative reference to the seventh
version of draft-blah.
Once a citation has been defined the brackets can be omited, so once
"[@pandoc]" is used, you can just use "@pandoc".
If the need arises (usually when citing a document that is not in the
XML2RFC database) an XML reference fragment should be included, note
that this needs to happen _before_ the back matter is started,
because that is the point when the references are outputted (right
now the implementation does not scan the entire file for citations,
also see Appendix B).
6. Internal References
The cross reference syntax is "[](#id)", which allows for an optional
title between the brackets. Usually this is left empty, for this use
case mmark allows the shortcut form "(#id)" which omits the brackets
in its entirely.
The external reference syntax is "[](url)".
7. Document divisions
Using "{mainmatter}" on a line by itself starts the main matter
(middle) of the document, "{backmatter}" starts the appendix. There
is also a "{frontmatter}" that starts the front matter (front) of the
document, but is normally not needed because the TOML header
(Section 4) starts that by default.
8. Abstract
An abstract is defined by using the special header syntax ".#". The
name of the section, when lowercased, must be "abstract". In the
future mmark might also support Preface and Colophon (special)
sections.
9. Captions
Whenever an blockquote, fenced codeblock or image has caption text,
the entire block is wrapped in a "<figure>" and the caption text is
put in a "<name>" tag for v3.
Gieben Internal [Page 4]
TechDoc 0x1 mmark2rfc December 2014
In mmark you can put a caption under either a table, indented code
block (even after a fenced code block) or even after a block quote.
Referencing these elements (and thus creating an document "id" for
them), is done with an IAL (Section 11):
{#identifier}
Name | Age
--------|-----:
Bob | 27
Alice | 23
An empty line between the IAL and the table or indented code block is
allowed.
9.1. Tables
A table caption is signalled by using "Table:" directly after the
table.
9.2. Figures
Any text directly after the code block/fenced code block starting
with "Figure:" is used as the caption.
9.3. Quotes
After a quote (a paragraph prefixed with ">") you can add a caption:
Quote: URI for attribution -- Name
In v3 this is used in the block quote attributes, for v2 it is
discarded. If you need the string "Quote:" after an quote, escape
the colon: "Quote\:".
10. Tables
Tables can be created by drawing them in the input using a simple
syntax:
Name | Age
--------|-----:
Bob | 27
Alice | 23
Tables can also have a footer: use equal signs instead of dashes for
the separator, to start a table footer. If there are multiple footer
lines, the first one is used as a starting point for the table
footer.
Gieben Internal [Page 5]
TechDoc 0x1 mmark2rfc December 2014
Name | Age
--------|-----:
Bob | 27
Alice | 23
======= | ====
Charlie | 4
If a table is started with a _block table header_, which starts with
a pipe or plus sign and a minimum of three dashes, it is a *Block
Table*. A block table may include block level elements in each (body)
cell. If we want to start a new cell use the block table header
syntax. In the example below we include a list in one of the cells.
|-----------------+------------+-----------------|
| Default aligned |Left aligned| Center aligned |
|-----------------|:-----------|:---------------:|
| Bob |Second cell | Third cell |
| Alice |foo | **strong** |
| Charlie |quux | baz |
|-----------------+------------+-----------------|
| Bob | foot | 1. Item2 |
| Alice | quuz | 2. Item2 |
|=================+============+=================|
| Footer row | more footer| and more |
|-----------------+------------+-----------------|
Note that the header and footer can't contain block level elements.
The table syntax used that one of Markdown Extra [8].
11. Inline Attribute Lists
This borrows from kramdown [9], with the difference that the colon is
dropped and each IAL must be typeset _before_ the block element (see
Appendix B). Added an anchor to blockquote can be done like so:
{#quote:ref1}
> A block quote
You can specify classes with ".class" (although these are not used
when converting to XML2RFC), and arbitrary key value pairs where each
key/value becomes an attribute. Different elements in the IAL must
be seperated using spaces: "{#id lang=go}".
For the following elements a IAL is processed:
o Table
o Code Block
Gieben Internal [Page 6]
TechDoc 0x1 mmark2rfc December 2014
o Fenced Code Block
o List (any type)
o Section Header
o Image
o Quote
o ...
For all other elements they are ignored, but not disgarded. This
means they will be applied to the next element that does use the
IALs.
12. Lists
12.1. Ordered Lists
The are several ways to start an ordered lists. You can use numbers,
roman numbers, letters and uppercase letters. When using roman
numbers and letter you MUST use two spaces after the dot or the brace
(the underscore signals a space here):
a)__
A)__
Note that mmark (just as [pandoc]) pays attention to the starting
number of a list (when using decimal numbers), thus a list started
with:
4) Item4
5) Item5
Will use for "4" as the starting number.
12.2. Unordered Lists
Unordered lists can be started with "*", "+" or "-" and follow the
normal markdown syntax rules.
12.3. Definition Lists
Mmark supports the definition list syntax from PHP Markdown Extra
[10], meaning there can not be a empty line between the term and the
definition. Note the multiple terms and definition syntax is _not_
supported.
Gieben Internal [Page 7]
TechDoc 0x1 mmark2rfc December 2014
12.4. Example Lists
This is the example list syntax from pandoc [11]. References to
example lists work as well. Note that an example list always needs
to have an identifier, "(@good)" works, "(@)" does not. You start an
example list when the indentifier is the first word on a line.
Example:
(@good) This is a good example.
As (@good) illustrates, ...
(@good) Another example.
Outputs:
(1) This is a good example.
As (1) illustrates, ...
(2) Another example.
13. Figures and Images
When an figure has a caption it will be wrapped in "<figure"> tags.
A figure can wrap source code (v3) or artwork (v2/v3).
An image is wrapped in a figure when the optional title syntax is
used. But images are only useful when outputting v3. For v2 the
actual image can not be shown, see Section 13.2 for this.
Multiple artworks/sources can be put in one figure. This done by
prefixing the section containing the figures with a figure quote:
"F>".
13.1. Details
o A Fenced Code Block will becomes a source code in v3 and artwork
in v2. We can use the language to signal the type.
``` c
printf("%s\n", "hello");
```
o An Indented Code Block becomes artwork in v3 and artwork in v2.
The only way to indicate the type is by using an IAL. So one has
to use:
Gieben Internal [Page 8]
TechDoc 0x1 mmark2rfc December 2014
{type="ascii-art"}
+-----+
| ART |
+-----+
v3 allows the usage of a "src" attribute to link to external files
with images. We use the image syntax for that.
o An image "![Alt text](/path/to/img.jpg "Optional title")", will be
converted to an artwork with a "src" attribute in v3. Again the
type needs to be specified as an IAL. If the "Optional title" is
specified the generated artwork will be wrapped in a figure with
name set to "Optional title" Creating an artwork with an anchor
and type will become:
{#fig-id type="ascii-art"}
![](/path/to/art.txt "Optional title")
For v2 this presents difficulties as there is no way to display
any of this, see Section 13.2 for a treatment on how to deal with
that.
o To group artworks and code blocks into figures, we need an extra
syntax element. Scholarly markdown [12] has a neat syntax for
this. It uses a special section syntax and all images in that
section become subfigures of a larger figure. Disadvantage of
this syntax is that it can not be used in lists. Hence we use a
quote like solution, just like asides and notes, but for figures:
we prefix the entire paragraph with "F>" . Basic usage:
F> {type="ascii-art"}
F> +-----+
F> | ART |
F> +-----+
F> Figure: This caption is ignored in v3, but used in v2.
F>
F> ``` c
F> printf("%s\n", "hello");
F> ```
F>
Figure: Caption for both figures in v3 (in v2 this is ignored).
In v2 this is not supported so the above will result in one
figure. Yes one, because the fenced code block does not have a
caption, so it will not be wrapped in a figure. To summerize in
v2 the inner captions _are_ used and the outer one is discarded,
Gieben Internal [Page 9]
TechDoc 0x1 mmark2rfc December 2014
for v3 it is the other way around. The figure from above will be
rendered as:
+-----+
| ART |
+-----+
This caption is ignored in v3, but used in v2.
printf("%s\n", "hello");
13.2. Images in v2
Images (real images, not ascii-art) are non-existent in v2, but are
allowed in v3. To allow writers to use images _and_ output v2 and v3
formats, the following hack is used in v2 output. Any image will be
converted to a figure with an title attribute set to the "Optional
title". And the url in the image will be type set as a link in the
postamble. So "![](misc/image.xml "Optional title")" will be
converted to:
<figure title="Optional title">
<artwork>
</artwork>
<postamble>
<eref target="misc/image.xml"/>
</postamble>
</figure>
If a image does not have a title, the "figure" is dropped and only
the link remains. The default is to center the entire element. Note
that is you don't give the image an anchor, "xml2rfc" won't typeset
it with a "Figure X", so for an optional "image" rendering, you
should use the folowing:
{#fig-id}
![](misc/image.xml "Optional title")
Which when rendered becomes:
misc/image.xml
<misc/image.xml>
Figure 2: Optional title
Note that ideas to improve/change on this are welcome.
Gieben Internal [Page 10]
TechDoc 0x1 mmark2rfc December 2014
14. Miscellaneous Features
14.1. HTML Comment
If a HTML comment contains "--", it will be rendered as a "cref"
comment in the resulting XML file. Typically "<!-- Miek Gieben --
you want to include the next paragraph? -->".
14.2. Including Files
Files can be included using "{{filename}}", "filename" is relative to
the current working directory if it is not absolute.
14.3. Including Code Fragments
This borrows from the Go present tool, which got its inspiration from
the Sam editor. The syntax was gleaned from leanpub. But the syntax
presented here is more powerful than the one used by leanpub. Use
the syntax: "<{{file}}[address]" to include a code snippet. The
"address" identifier specifies what lines of code are to be included
in the fragment.
Any line in the program that ends with the four characters "OMIT" is
deleted from the source before inclusion, making it easy to write
things like
<{{test.go}}[/START OMIT/,/END OMIT/]
So you can include snippets like this:
tedious_code = boring_function()
// START OMIT
interesting_code = fascinating_function()
// END OMIT
To aid in including HTML or XML framents, where the "OMIT" key words
is probably embedded in comments, line the in in "OMIT -->" are
excluded as well. Note that the default is put out an artwork, but
if the extension of the included file matches a computer language,
"<sourcecode>" will be emitted for v3.
Note that the attribute "prefix" (which you can specify with an IAL)
can be used to prefix all lines of the code to be included to
prefixed with the value of the attribute, so
{prefix="C:"}
<{{test.go}}[/START OMIT/,/END OMIT/]
Gieben Internal [Page 11]
TechDoc 0x1 mmark2rfc December 2014
Will prefix all lines of test.go with 'C:' when included.
15. XML2RFC V3 features
The v3 syntax adds some new features and those can already be used in
mmark (even for documents targeting v2 -- but there they will be
faked with the limited constructs of the v2 syntax).
15.1. Asides
Any paragraph prefixed with "A>". For v2 this becomes a indented
paragraph.
15.2. Notes
Any paragraph prefixed with "N>". For v2 this becomes a indented
paragraph.
15.3. RFC 2119 Keywords
Any [RFC2119] keyword used with strong emphasis _and_ in uppercase
will be typeset within "bcp14" tags, that is "**MUST**" becomes
"<bcp14>MUST</bcp14>", but "**must**" will not. For v2 they are
stripped of the emphasis and outputted as-is.
15.4. Super- and Subscripts
Use H~2~O and 2^10^ is 1024. In v2 these are outputted as-is.
16. Converting from RFC 7328 Syntax
Converting from an RFC 7328 ([RFC7328]) document can be done using
the quick and dirty Perl script [13], which uses pandoc to output
markdown PHP extra and converts that into proper mmark: (mmark is
more like markdown PHP extra, than like pandoc).
for i in middle.mkd back.mkd; do \
pandoc --atx-headers -t markdown_phpextra < $i |\
./parts.pl
done
Note this:
o Does not convert the abstract to a prefixed paragraph;
o Makes all RFC references normative;
Gieben Internal [Page 12]
TechDoc 0x1 mmark2rfc December 2014
o Handles all figure and table captions and adds references (if
appropriate);
o Probably has other bugs, so a manual review should be in order.
There is also titleblock.pl [14] which can be given an [RFC7328]
"template.xml" file and will output a TOML titleblock, that can be
used as a starting point.
Yes, this uses pandoc and Perl.. why? Becasue if mmark could
parse the file by itself, there wasn't much of problem. Two
things are holding this back: mmark cannot parse definition lists
with empty spaces and there isn't renderer that can output
markdown syntax.
For now the mmark parser will not get any features that makes it
backwards compatible with pandoc2rfc.
17. Acknowledgements
18. References
18.1. Normative References
[I-D.hoffman-xml2rfc]
Hoffman, P., "The 'XML2RFC' version 3 Vocabulary", draft-
hoffman-xml2rfc-21 (work in progress), July 2015.
[RFC2926] Kempf, J., Moats, R., and P. St. Pierre, "Conversion of
LDAP Schemas to and from SLP Templates", RFC 2926, DOI
10.17487/RFC2926, September 2000,
<http://www.rfc-editor.org/info/rfc2926>.
[RFC7328] Gieben, R., "Writing I-Ds and RFCs Using Pandoc and a Bit
of XML", RFC 7328, DOI 10.17487/RFC7328, August 2014,
<http://www.rfc-editor.org/info/rfc7328>.
[toml] Preston-Werner, T., "TOML git repository", March 2013,
<https://github.com/toml-lang/toml>.
18.2. Informative References
[blackfriday]
"Blackfriday git repository", November 2011,
<http://github.com/russross/blackfriday>.
[mmark] Gieben, R., "Mmark git repository", December 2014,
<http://github.com/miekg/mmark>.
Gieben Internal [Page 13]
TechDoc 0x1 mmark2rfc December 2014
[pandoc] MacFarlane, J., "Pandoc, a universal document converter",
2006, <http://johnmacfarlane.net/pandoc/>.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/
RFC2119, March 1997,
<http://www.rfc-editor.org/info/rfc2119>.
18.3. URIs
[1] http://http://kramdown.gettalong.org/
[2] http://michelf.com/projects/php-markdown/extra/
[3] http://scholarlymarkdown.com/Scholarly-Markdown-Guide.html
[4] https://leanpub.com/help/manual
[5] http://www.methods.co.nz/asciidoc/
[6] https://github.com/cabo/kramdown-rfc2629
[7] https://michelf.ca/projects/php-markdown/extra/#table
[8] http://kramdown.gettalong.org/syntax.html#block-ials
[9] https://michelf.ca/projects/php-markdown/extra/#def-list
[10] http://johnmacfarlane.net/pandoc/README.html#extension-
example_lists
[11] http://scholarlymarkdown.com/Scholarly-Markdown-Guide.html
[12] https://raw.githubusercontent.com/miekg/mmark/master/convert/
parts.pl
[13] https://raw.githubusercontent.com/miekg/mmark/master/convert/
titleblock.pl
[14] http://commonmark.org/
Appendix A. Tips and Tricks
How do I type set:
Multiple paragraphs in a list: Indent the list with four spaces.
Text indented with three spaces will be seen as a new paragraph
that breaks the list.
Gieben Internal [Page 14]
TechDoc 0x1 mmark2rfc December 2014
Appendix B. Bugs
o Citations must be included in the text before the "{backmatter}"
starts. otherwise they are not available in the appendix.
o Inline Attribute Lists must be given _before_ the block element.
o Mmark cannot correctly parse [RFC7328] markdown.
o Multiple terms and definitions are not supported in definition
lists.
o Mmark uses two scans when converting a document and does not build
an internal AST of the document, this means it can not adhere 100%
to the CommonMark [15] specification, however the CommonMark test
suite is used when developing mmark. Currently mmark passes ~60%
of the tests.
Author's Address
R. (Miek) Gieben
Atoom
Email: miek@miek.nl
Gieben Internal [Page 15]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment