What happens in the background when we enter tcc main.c -o out
? - To find out, we need to explore the stages that a compiler goes trough.
takes in code, parses it, includes files, parses them, includes files, expands macros, ... Can also happen after (or during) main lexing (which is waaay smarter)
Tokenizes your code
example: printf("Hello, %s\n", getName());
-> printf
(
"Hello, %s\n"
,
getName
, (
)
)
;
Takes these tokens and generates an AST
= Abstract Syntax Tree A tree that contains all necessary information to process the input code The AST for this example could be:
function-call
| |
symbol |
"printf" |
/ \
/ \
string \
"Hello, %s\n" \
function-call
|
symbol
"getName"
Takes that AST and adds additional info (like types) to it new AST with types could look like this:
(void)
function-call
| |
symbol |
"printf" |
(object: cba)/ \
/ \
(string) \
string \
"Hello, %s\n" \
(string)
function-call
|
symbol
"getName"
(object: abc)
(all these objects are some part of the AST) it also resolves all symbol references, checks for arguments, ...
Takes the AST and removes unused things, optimizes loops, ... (Can also happen in the type checker but I strongly advertise against doing it in the type checker because of code complexity)
Generates one of these (and also does optimizations):
- assembly or machine code (for example
tcc
) - vm bytecode (for example
javac
) - IR code (for example
clang
) - code of other programming languages (for example
tsc
)