Skip to content

Instantly share code, notes, and snippets.

@clholgat
Created May 1, 2012 01:22
Show Gist options
  • Save clholgat/2564163 to your computer and use it in GitHub Desktop.
Save clholgat/2564163 to your computer and use it in GitHub Desktop.
ice9 Spec
EBNF for ice9
All input is enclosed in quotes. This is meant to be taken literally.
Anything outside of quotes has a meta-meaning. For example,
{ exp }
means zero or more exp's. Whereas,
'{' exp '}'
means LBRACE followed by exp followed by RBRACE.
The start symbol is program.
########### Begin EBNF ############
program -> {var|type|forward|proc} { stm } # empty file is valid program
# 3 complex tokens (these are usually handled in the lexer)
id -> [A-Za-z][A-Za-z0-9_]*
int -> [0-9]+ # decimal integer literal
string -> "[^"\n]*" # double-quoted string: any char but " and \n
-> '[^'\n]*' # single-quoted string: any char but ' and \n
stms -> stm { stm }
stm -> if | do | fa | 'break' ';' | 'exit' ';'
-> 'return' ';'
-> lvalue ':=' exp ';' # assignment statement
-> 'write' exp ';' | 'writes' exp ';'
-> exp ';' # any exp is valid
-> ';' # the "empty" statement
if -> 'if' exp '->' stms { '[]' exp '->' stms } 'fi'
-> 'if' exp '->' stms { '[]' exp '->' stms } '[]' 'else' '->' stms 'fi'
do -> 'do' exp '->' { stm } 'od'
fa -> 'fa' id ':=' exp 'to' exp '->' { stm } 'af'
proc -> 'proc' id '(' declist ')'
{type|var} {stm} 'end'
-> 'proc' id '(' declist ')' ':' typeid
{type|var} {stm} 'end'
idlist -> id { ',' id}
var -> 'var' varlist ';'
varlist -> idlist ':' typeid { '[' int ']' } { ',' varlist}
forward -> 'forward' id '(' declist ')' ';'
-> 'forward' id '(' declist ')' ':' typeid ';'
type -> 'type' id '=' typeid { '[' int ']' } ';'
declist -> declistx
-> # empty
declistx -> idlist ':' typeid { ',' declistx }
typeid -> id # This for semantics; see semantic note #3 below
lvalue -> id
-> lvalue '[' exp ']'
exp -> lvalue
-> int # integer literal
-> 'true' # boolean literal
-> 'false' # boolean literal
-> string
-> 'read'
-> '-' exp
-> '?' exp
-> id '(' ')' # procedure call
-> id '(' exp { ',' exp } ')' # procedure call
-> exp '+' exp
-> exp '-' exp
-> exp '*' exp
-> exp '/' exp
-> exp '%' exp
-> exp '=' exp
-> exp '!=' exp
-> exp '>' exp
-> exp '<' exp
-> exp '>=' exp
-> exp '<=' exp
-> '(' exp ')'
########### End EBNF ############
############
# Comments #
############
A comment extends from the first sharp (#) on a line to the end of the
line (determined by the newline character or end of file).
#############
# Semantics #
#############
1. The expressions in 'if' and 'do' statements must be bool values.
For example:
var x: int;
if x -> ... fi
is not semantically correct. Instead use
if x != 0 -> ... fi
2. The 'fa' statement
fa loop := lo to hi -> body af
executes the body once for every value of loop between lo and hi,
inclusive. The expressions in 'fa' must be int. The expressions
are executed exactly once and the body is executed at most hi - lo
+ 1 times. The loop variable is an int and it is not assignable in
the body of the loop. If hi = lo, the loop executes once. If hi <
lo, the statements in the loop are not executed. The scope of the
loop variable is the fa statement body: from the '->' to 'af'.
It's declaration hides any previously declared variable with the
same name.
3. The rule
typeid -> id
is only denoting semantics. The terminals id and typeid are
syntactically indistinguishable. However, semantically typeid
represents an id referring to a type (found in a different symbol
table from the other ids).
4. The read expression returns an integer. Thus
var a: int;
a := read;
is correct.
5. The write statement outputs an expression AND a newline. The
writes statement omits the newline.
6. The write and writes statements are over-loaded. If type of exp is
an integer, then each evaluates the integer expression and outputs a
decimal representation of it. For example:
writes 1+5;
outputs the decimal '6'. On the other hand, if exp is a string,
then the string is output. For example:
writes "string"
outputs the strings 'string'.
7. The exit statement terminates the program.
8. Statements in the outermost scope are executed in order at startup.
(Think of the outermost scope as being the C main procedure.)
9. Boolean expressions are short-circuit evaluated. For example, in the
following expression the function f is not evaluated because the answer
to the expression is known after the first clause.
true + f()
10. Functions (procedures that return a value) are declared by
annotating a proc declaration with a return type. For example:
proc f() : int ... end
This defines an implicit return variable with the same name as the
function. In the example above there is a local variable named
'f' of type int. The value returned by the function is the value
in f when it returns. Functions may only return basic types.
11. The break statement is valid only in loops (fa or do). It causes
execution of the loop to stop. Control jumps immediately to the
first statement following the loop.
12. The return statement in a proc definition forces a return from the
procedure. A return outside of a proc ... end (at the level of
the file) is equivalent to an exit.
13. The following function definition is allowed but the result is
undefined.
proc f() : int end
Consider the following statement:
a := f()
The return value was not set in the function body and there is no
default value. Therefore, the program is not deterministic. It is
not an error for a function to not set the return value. (Why?)
#################
# Symbol Tables #
#################
There are three symbol tables in ice9. One each for types, variables,
and procedures. Initially, the types symbol table consists of three
entries: 'int', 'bool', and 'string' which resolve to the three basic
types. These entries are in the default scope. They can be masked by
a like definition in an inner scope. The procedure and variables
tables are initially empty.
#################
# Scoping rules #
#################
Variables, parameters, types, and procedures are visible beginning
with their declaration. If declared in a procedure, the visibility
stops at the end of the defining procedure. Otherwise visibility ends
at the end of the file. Visibility starts with the declaration
itself, allowing recursive definitions. A declaration in a nested
scope overrides a previous declaration. However, two declarations in
the same scope is a name clash.
FORWARD: A forward statement makes a name visible (it declares the
variable) before its definition. This is useful for making mutually
recursive definition. Forward declarations are only defined for
procedures.
#############
# Operators #
#############
- : T -> T, T={int,bool} // unary minus or boolean not
? : bool -> int // conversion from boolean to integer
- : int x int -> int // integer subtraction
+ : T x T -> T, T={int,bool} // integer addition or boolean or
* : T x T -> T, T={int,bool} // integer multiplication or boolean and
/ : int x int -> int // integer division
% : int x int -> int // integer modulo
= : T x T -> bool, T={int,bool} // comparison, equal
!= : T x T -> bool, T={int,bool} // comparison, not equal
> : int x int -> bool // comparison, greater than
< : int x int -> bool // comparison, less than
>= : int x int -> bool // comparison, greater than or equal to
<= : int x int -> bool // comparison, less than or equal to
:= : T x T -> nil, T={int,bool,string} // assignment
################################
# Precedence and associativity #
################################
Precedence | Associativity
(highest to lowest) |
========================|===============
- (Unary minus) ? | right
------------------------|---------------
* / % | left
------------------------|---------------
+ - | left
------------------------|---------------
= != > < >= <= | none
----------------------------------------
The ? operator does not associate semantically because of a type
conflict.
##############
# Conversion #
##############
The ? operator converts a bool value to an int. It converts true to 1
and false to 0.
i := ? true; # i gets 1
i := ? (1 != 1); # i gets 0
There is no inverse operator to ?. However, an int is easily converted
a bool:
b := x != 0;
##########
# Arrays #
##########
Arrays are indexed from 0 to size-1, where size > 0. Bounds are
checked. An index that is < 0 or > size-1 generates a runtime error.
An index that is not an integer generates a compile-time error.
#########
# Notes #
#########
1. Type checking on user-defined types is structure based. That is
two types with the same structure are equivalent even if different
names are used. As an example,
type a = int [10][5]
type b = int [5]
type q = int [5]
type r = q [10]
The type q is structurally equivalent to b, while r is structurally
equivalent to a.
2. Basic types are passed by value. Array parameters are passed by
reference, similar to the C calling convention.
3. In forward declarations, the name of the parameters are not
important. Only the names of the parameters in the actual proc
definition are meaningful.
4. All integers are signed integers.
5. The maximum integer value must be at least 2^31 - 1
(2147483647). The negative maximum integer must be at least - 2^31
(-2147483648). This corresponds to 32-bit, two's-complement signed
integers.
6. The result of the modulo takes the sign of its "dividend." For
example, 5%-2 -> 1 and -5%-2 -> -1. This is the same convention as C.
7. Whitespace: The only valid whitespace characters are space, tab, and
newline, which in ASCII are decimal 32, 9, and 10 respectively. If you work on
DOS/Windows machines you can add carriage return (decimal 13) for your
convenience.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment