Skip to content

Instantly share code, notes, and snippets.

@alex-s168
Created January 23, 2024 11:56
Show Gist options
  • Save alex-s168/d96575c75d8e71afa964418cb8122783 to your computer and use it in GitHub Desktop.
Save alex-s168/d96575c75d8e71afa964418cb8122783 to your computer and use it in GitHub Desktop.
Compiler Pipeline

Compiler Pipeline

What happens in the background when we enter tcc main.c -o out? - To find out, we need to explore the stages that a compiler goes trough.

1. (optional) preprocessor

takes in code, parses it, includes files, parses them, includes files, expands macros, ... Can also happen after (or during) main lexing (which is waaay smarter)

2. lexer (aka tokenizer)

Tokenizes your code example: printf("Hello, %s\n", getName()); -> printf ( "Hello, %s\n" , getName, ( ) ) ;

3. parser

Takes these tokens and generates an AST

(side node) AST

= Abstract Syntax Tree A tree that contains all necessary information to process the input code The AST for this example could be:

  function-call
   |         |
 symbol      |
"printf"     |
            / \
          /     \     
       string     \
    "Hello, %s\n"   \
               function-call
                    |
                 symbol
                "getName"

4. (type) checker

Takes that AST and adds additional info (like types) to it new AST with types could look like this:

   (void)
   function-call
   |          |
 symbol       |
"printf"      |
(object: cba)/ \
           /    \     
       (string)   \
      string       \
    "Hello, %s\n"   \
                 (string) 
               function-call
                    |
                 symbol
                "getName"
              (object: abc)

(all these objects are some part of the AST) it also resolves all symbol references, checks for arguments, ...

5. (optional) "optimizer"

Takes the AST and removes unused things, optimizes loops, ... (Can also happen in the type checker but I strongly advertise against doing it in the type checker because of code complexity)

6. Code Gen

Generates one of these (and also does optimizations):

  • assembly or machine code (for example tcc)
  • vm bytecode (for example javac)
  • IR code (for example clang)
  • code of other programming languages (for example tsc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment