rdela/penguin-flavoured-markdown.md

## penguin-flavoured-markdown.md

      
    Raw
  

              penguin-flavoured-markdown.md
            
          
    From: pngwn/MDsveX#293
In #259 I alluded to other changes in v1, besides the configuration API. The biggest one is something that has been one of my biggest frustration in some regards: markdown itself. Now, lets get this out of the way: markdown is excellent and easily the best document markup format we have ever had. Kudos to Mr Gruber and all who have pushed markdown forward. Nothing in this document should be considered a derogation of markdown, I have nothing but respect and admiration for everything that has brought us to this point. This is a criticism, in many ways, but hopefully a constructive one. It is also just the opinion of a single penguin, and a penguin of questionable character, at that.

Penguin-flavoured markdown


Context
Principles
The differences

Indentation
Link references
Headings
Lists
Emphasis
Line breaks
Superscript
Subscript
Strikethrough
Tables
Generic directives


Context

This document is an early suggestion of what markdown in mdsvex might look like (it will also exist as a standalone markdown flavour—yes, another one), but it takes a slightly different approach, removing features, as well as adding them. I still feel as though it can be considered 'markdown' in spirit (a 'suberset' that shares most of the syntax and semantics), although technically that probably isn't true.
Over the past few months I have been doing a lot of reading and research around markdown, the various flavours, and the history. I have read the CommonMark spec more times that I wish to think about, I have thousands of words of notes. I have also looked very deeply into various markdown flavours (GitHub Flavoured Markdown, Markdown Extra, R Markdown, MultiMarkdown) (and others), as well as other document markup formats (reStructured text, AsciiDoc(tor), textile, text2tags, org mode, setext). Some of these formats are almost 3 decades old, and at this stage I have learned more than I ever wished to know about human-readable document markup formats and their history, but honestly learning about the history and evolution of this ecosystem has been very rewarding and I bow down to all of those who have played a part; it is thanks to them that we have such a rich set of tools and such a vibrant ecosystem.
This document uses CommonMark as the baseline (not Gruber's original 'spec'), all markdown variants are considered just that. Many of the older formats would be too much of a break from what folk are used to today but learning about them was interesting and provided useful context.
This document will detail a list of differences from markdown. The aim is to remove ambiguity (by removing, restricting or modifying syntax) and add some essential features out of box (via a few syntax additions).
On reducing ambiguity, I have to give all of the credit to John MacFarlane, he is smarter than I am and I have stolen his ideas, mainly from here
Principles

Some over-arching principle. Things that I believe to be true. Some will disagree but these are some of the key guiding principles that guide what I'm about to suggest. There are always exceptions but i generally find these things to hold true.

Markdown is a learned language. It is not 'intuitive' or 'natural'; this is a myth. Tweaking it will not compromise its integrity.
Ambiguity is confusing for users and complex for machines.
Invisible syntax is confusing (I don't consider line breaks invisible).
Having multiple ways to achieve the same thing make for confusing design.
Markdown should solve the 80% case without extension.
Markdown is not a 'compile to HTML' language. It is a way of adding simple documents semantics to a text file, the compilation target varies. However, this is the most common usecase, and must be considered.
Where possible things should render well enough when fed into a standard parser. An odd constraint for such a proposal but a practical one. I do not care about tooling much but I do care about GitHub/ GitLab rendering a bit.

Indentation

Indentation ins markdown creates code blocks or literal/ verbatim text. This is naturally problematic for mdsvex (when mixing HTML and markdown you may want to indent for clarity) but it also introduces confusing rules around when certain markdown syntax will render a code block and when it will do nothing. Indented code blocks arguably violate the invisible syntax principle but definitely violate the 'more than one way to do things' principle because fenced code blocks exist.
Fenced code blocks exist and are generally more useful. They allow you to include a variety of useful metadata.
PFM will remove the significance from indentation.
Link references

In markdown you can do this, it is quite explicit and very useful:
[link text][link_ref]

[link_ref]: google.com
But you can also do this:
[link_text_and_ref]

[link_text_and_ref]: google.come
The problem is, we don't know if the second one is a shorthand link until we discover the link reference definition. This is complex for parsers but I also think it is confusing for humans, there is a degree of context switching that is required by jumping around a document to search for the possible link reference definition. If no definition is found then this is actually just a normal paragraph: <p>[link_text_and_ref]</p>.
PFM will require shorthand link references to be explicit, the shape of the syntax alone will dictate its nature:
[link_text_and_ref][]

[link_text_and_ref]: google.come

Should this be an error if no reference is found? This would be a dramatic break for markdown, markdown does not have parser errors.

Headings

These are all valid headings:
hello one
===

hello two
---

### hello three ###

#### hello four
The first two are setext style headings and are confusing especially when they are near to horizontal rules, the others are atx style. These styles are named after the language they were borrowed from. All but one of these will be removed.
Leading octothorpes will distinguish headings, nothing else will be supported, trailing octothorpes will be considered part of the heading text:
# one

## two

Lists

Lists with strange numbers will be supported because it seems sensible, the auto-heading functionality of markdown is very confusing:
11. eleven
27. twenty-seven

Markdown has a concept of loose and tight lists as well as strange rules around nested lists, I haven't quite got my head around these yet. They are useful features but i'm still confused. I'll update when I have thought it through properly.

Emphasis

There is no single feature in markdown that is as confusing as emphasis. The markdown spec as 17 rules for emphasis, each more confusing than the last. That is seventeen. This ambiguity is mainly due to the fact that there are many wasy to emphasise and strongly emphasise text. You can read more here
What is this:
**hello* world**
or this:
hello***a*friendsss
(I'm not sure GitHub is rendering these in accordance with the commonmark spec).
As you can see, it gets confusing fast. Intraword emphasis is even more confusing.
PFM will have distinct syntax for emphasis and strong emphasis. This is a hard break from the spec but a necessary one. Intraword emphasis is very rare but needs to be supported. Distinct syntax will be used for that case.
_emphasis_

*strong emphasis*

*_emphasis inside strong emphasis_*

_*strong emphasis inside emphasis*_

intraword emphasis is fan~_tas_~tic
Line breaks

Line breaks come in a few forms, hard and soft.
Lets do a small experiment, should these two paragraphs render the same?
hello
friends

hello  
friends
They don't:
<p>hello
friends</p>

<p>hello<br>
friends</p>
The second paragraph actually had two space characters after the work "hello" this made it a hard line break. The first is a 'soft' line break (it basically renders in HTML as if there is no line break), the second is a 'hard' line break and will render a br for times when line breaks add semantic meaning (poems, addresses).
I personally find invisible syntax quite hard to see. Markdown has an explicit version too:
hello
friends

hello\
friends
PFM will support both hard and soft breaks, hard breaks will only be supported via the explicit backslash syntax.
Superscript

Superscript is not part of the markdown spec, PFM will support it via a pair of ^ characters (rules TBD):
Coming Soon ^TM^
Will render:
<p>Coming Soon <sup>TM</sup></p>

Coming Soon ^TM

Subscript

Subscript is not part of the markdown spec, PFM will support it but I'm not sure what the syntax will be:
To be changed:
x~1~ ... x~n~
Will render:
<p>x <sub>1</sub> ... z<sub>n</sub></p>

x ₁ ... z_n

Strikethrough

Strikethrough is not supported by markdown but I'd like to support them in PFM. They will match common syntax for strikethrough ~~word~~. Undecided on intraword strikethrough but it is simpler than emphasis.
Some options:
~~strikethrough~~

This is very much TBD, would be nice but very much a nice to have.

Tables

Here we go. I think tables are a critical feature and i haven't really worked this one out yet. It is very complex and there are a million edge cases. The basis should be GFM pipe tables but I'd like the ability to have headings on left, in addition to or instead of across the top, as well as the ability to merge cells (at least horizontally but probably both). I was thinking something like this:
| Header  | Header | Header           |
| --------|--------|------------------|
| Header || Cell   | Cell             |       
| Header || Cell spanning two columns |
This looks weird and will almost definitely not work but it is a start. It may be better to have distinct syntax for simple pip tables (GFM style) and more complex tables with more explicit syntax, this does violate a principle though. Needs more thought.
Generic directives

This one is a bit left of field, but I think it is very important. Markdown needs a way for users to 'plugin' with a generic syntax that covers most usecases. This essentially brings wordpress/ hugo style 'shortcodes' into the spec, except it is more flexible and more powerful. This feature has been co-opted from here.
It is powerful and flexible due the different forms.
Inline:
:name[content]

This is a very great article on :wikipedia[penguins]
Would be accompanied by a function. API TBD, realistically this would be a custom AST compile handler (becaue this would have its own AST node type with the necessary info in it) rather than the below but this is illustrative:
const config = {
  directives: {
    inline: {
      penguin(content){
        const { url, title } = look_up_wikipedia(content);
        return `<a href="${url}">${title}</a>`;
      }
    }
  }
}
Leaf block (block, cannot be inside inline content; leaf, cannot have children):
::name[content]

::youtube[cool video]
Similar function as above to render a youtube video.
Container block (block, cannot be inside inline content content; container, can have children):
:::name[content]
<!-- childf content -->
:::


:::warning[bad shit be happening]
here is some text that will go inside this block
:::
The linked proposal also suggest {key=val} for 'attributes' but { and } are sacred in mdsvex and not viable for markdown syntax. I think the capability to pass props would be nice (::youtube[title]{start=180} for example) but I'm not sure what that syntax should look like.
What is really great about the generic directive proposal is that it can actually replace a lot of other github extensions (it could actually be used for addition, delete, etc. from above), as well as allowing easy customisation. It is very, very powerful, in my opinion. PFM will support it by default.

This is where i am up to. Please feed me back!

Edit: Replaced 'Additions and deletions' with 'Strikethrough'. Update syntax to use ~~word~~.