First, the markdown goes through the tokenizer.
The tokenizer splits by paragraphs, which gives this kind of structure:
(("text of paragraph 1") ("text of paragraph 2"))
Then, the tokenizer finds the special characters to split up words. There is then this kind of structure:
(("text of " "*paragraph 1*" " yes man") ("[text of]" "[0]" " wesh"))
This kind of structure is passed to the parser.
The parser goes through each paragraph.
-
If it's a 2-lines paragraph and has === or --- as second line, it returns this kind of structure:
(h1 . "text of h1")
-
If each line starts with #, numbers (in the correct order), - or *, it returns this kind of structure:
(ul . ((li . "text 1") (li . "text 2")))
-
Else, it returns this kind of structure:
(p . "text of the paragraph")
-
It goes through each special character (if the string starts with a special character, then it means it was tokenized before) and replaces accordingly.
Basically it reads the structured returned by the parser and spits out the correct HTML.
Some text
===
and another
text
in a *paragraph 1* but yeah
- you [got it][0]
[0]: http://google.fr
Tokenizer:
(
("some text\n===")
("and another\ntext\nin a " "*paragraph 1*" " but yeah")
("- you " "[got it]" "[0]")
("[0]:" " http://google.fr")
)
Parser:
(
(h1 . "some text")
(p . (("and another text in a ") (em . "paragraph 1") ("but yeah")))
(ul . ((li . ("you " (a . (:text "got it" :href "http://google.fr"))))))
)
AST to HTML:
<h1>some text</h1>
<p>
and another text in a <em>paragraph 1</em> but yeah
</p>
<ul>
<li>you <a href="http://google.fr">got it</a></li>
</ul>