ralt/gist:5484237

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    The tokenizer

First, the markdown goes through the tokenizer.
The tokenizer splits by paragraphs, which gives this kind of structure:
(("text of paragraph 1") ("text of paragraph 2"))

Then, the tokenizer finds the special characters to split up words. There is then this kind of structure:
(("text of " "*paragraph 1*" " yes man") ("[text of]" "[0]" " wesh"))

This kind of structure is passed to the parser.
The parser

The parser goes through each paragraph.


If it's a 2-lines paragraph and has === or --- as second line, it returns this kind of structure:
  (h1 . "text of h1")


If each line starts with #, numbers (in the correct order), - or *, it returns this kind of structure:
  (ul . ((li . "text 1") (li . "text 2")))


Else, it returns this kind of structure:
  (p . "text of the paragraph")


It goes through each special character (if the string starts with a special character, then it means it was tokenized before) and replaces accordingly.


The AST to HTML transformer

Basically it reads the structured returned by the parser and spits out the correct HTML.
Example

Some text
===

and another
text
in a *paragraph 1* but yeah

- you [got it][0]

[0]: http://google.fr

Tokenizer:
(
    ("some text\n===")
    ("and another\ntext\nin a " "*paragraph 1*" " but yeah")
    ("- you " "[got it]" "[0]")
    ("[0]:" " http://google.fr")
)

Parser:
(
    (h1 . "some text")
    (p . (("and another text in a ") (em . "paragraph 1") ("but yeah")))
    (ul . ((li . ("you " (a . (:text "got it" :href "http://google.fr"))))))
)

AST to HTML:
<h1>some text</h1>

<p>
and another text in a <em>paragraph 1</em> but yeah
</p>

<ul>
    <li>you <a href="http://google.fr">got it</a></li>
</ul>