There are three pieces we need to think about in migrating MDN to Markdown:
- how we should represent MDN content using Markdown
- how we should convert MDN's HTML content into that Markdown representation
- updating/adapting Yari tooling to work with Markdown as the authoring format
For this we need to think of the places in MDN where we're currently using features of HTML that can't be represented in our chosen Markdown (GFM, very likely). For each of these features, we can choose between about four options:
- stop doing the thing (for example, removing
<div class="hidden">
) - add custom extensions to GFM to support the thing (for example, supporting a custom syntax for notes)
- drop into raw HTML (for example, using HTML for complex tables)
- generate the thing using a macro (for example, generating spec tables using a macro)
The output of this is a specification describing how people should author content in Markdown in MDN, and how Yari should render that authored content as HTML. We've made a start on that here: https://developer.mozilla.org/en-US/docs/MDN/Contribute/Markdown_in_MDN and will continue to flesh it out as we work through the issues.
But the "MDN Markdown" spec is only part of the story: it assumes that we're already in Markdown. We also need to explain how we will convert our content from HTML to Markdown. So we need a spec for this, that will say what the converter should do when it encounters various problematic constructs.
As for the "MDN Markdown" specification, the content of this conversion spec should come out of the issues I've been filing.
This could list all HTML elements and for each one describe the behaviour of the converter. This could be one of the following things, more or less:
- convert it to the Markdown form: this is the easy category, for elements that have a Markdown representation, like
<li>
- throw an error: this is for elements that we need to have hand-cleaned out of the source before conversion
- take the raw HTML
- discard the element
- something special (most likely, some combination of the above depending on some other factors)
This could list all the HTML attributes and, again, describe the behaviour of the converter. This could be one of the following things, more or less:
- throw an error: this is for attributes that we need to have hand-cleaned out of the source before conversion. For example, I'd expect
style
to be in this category - discard the attribute
- something special (for example, an
id
attribute could throw an error except when it's attached to a heading, and then it could be discarded.)
This is a thing I've not thought about at all, and the dev team definitely has more insight than me. For example, the flaws system assumes it's dealing with HTML.