Skip to content

Instantly share code, notes, and snippets.

@srcreigh
Created January 14, 2016 20:49
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save srcreigh/522bfd4ea971612cf62e to your computer and use it in GitHub Desktop.
Save srcreigh/522bfd4ea971612cf62e to your computer and use it in GitHub Desktop.
notes from my compiler construction course
January 14, 2016
Thursday
COMPILER CONSTRUCTION
1. TOKENS AND KINDS
===================
|------|-------------|
| kind | RE |
|======|=============|
| IF | if |
| ID | [a-z][a-z]* |
|------|-------------|
figure 1.1: examples of kinds and the regular expressions for the kinds
2. CONTEXT-FREE GRAMMARS
========================
2.1 DEFINITIONS
---------------
- a context-free grammar is a 4-tuple G = <N,T,R,S> where
- set of terminals T
- set of non-terminals N
- start symbol S ε N
- production rules R ⊆ N x V*
- symbols are V = T ∪ N
- sequences of symbols V*
- sequences of terminals T*
2.2 CONVENTIONS
---------------
- a, b, c ε T
- lowercase letters at the start of the alphabet
- A, B, C, S ε N
- uppercase letters at the start of the alphabet (and S, the start symbol)
- A → α ε R
- rules have arrows →
- X, Y, Z ε V
- uppercase letters at the end of the alphabet are symbols (either terminal or
non-terminal)
- α, β, γ ε V*
- lowercase greek letters are sequences of symbols (terminal or non-terminal)
- x, y, z ε T*
- lowercase letters at the end of the alphabet are sequences of terminal
symbols
2.3 DERIVING A LANGUAGE FROM A GRAMMAR
--------------------------------------
- βAγ ⇒ βαγ if A → α ε R
- "⇒" is pronounced "directly derives"
- α ⇒* β if α = β or (a ⇒ γ and γ ⇒* β)
- "⇒*" is pronounced "derives"
- L(G) = { x ε T* | S ⇒* x }
! the set of sequences of non-terminals that can be derived from the start
symbol
2.4 PARSING
-----------
- want to know for a given x whether x ε L(G)
- a "proof" is a derivation of x
- or a parse tree (which is just a graphical representation of a derivation)
- in a right derivation, each step is of the form βAx ⇒ βαx
- expands the rightmost non-terminal
- similarly, left derivations are such that each step is of the form xAγ ⇒ xαγ
- parse trees, left derivations, and right derivations are all isomorphic
- this is by design
- we want programs to have unambiguous meanings
? we *could* derive rules from the middle, but then ...? the parse trees
wouldn't be ambiguous?
- a grammar is ambiguous if some word has more than one parse tree (or
left-derivation or right-derivation)
E → F | F - F
F → ID | NUM
figure 2.4.1: rules
F - F
/ \
F F
| |
NUM - NUM
figure 2.4.2: parse tree for "NUM - NUM"
E E
⇒ F - F ⇒ F - F
⇒ NUM - F ⇒ F - NUM
⇒ NUM - NUM ⇒ NUM - NUM
figure 2.4.3: left and right (respectively) derivations for "NUM - NUM"
E → F | E - E
F → ID | NUM
figure 2.4.4: updated rules (notice the E → E - E rule)
|------------------------------------|
| E |
| /|\ -----E |
| / | \ / / \ |
| / | E E | \ |
| / / /|\ /|\ | | |
| / / / | \ / | \ | | |
| E | E | E E | E | E |
| | | | | | | | | | | |
| F | F | F F | F | F |
| | | | | | | | | | | |
| NUM - NUM - NUM NUM - NUM - NUM |
|------------------------------------|
figure 2.4.5: two separate parse trees for expression "NUM - NUM - NUM"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment