-
-
Save clholgat/2564163 to your computer and use it in GitHub Desktop.
ice9 Spec
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
EBNF for ice9 | |
All input is enclosed in quotes. This is meant to be taken literally. | |
Anything outside of quotes has a meta-meaning. For example, | |
{ exp } | |
means zero or more exp's. Whereas, | |
'{' exp '}' | |
means LBRACE followed by exp followed by RBRACE. | |
The start symbol is program. | |
########### Begin EBNF ############ | |
program -> {var|type|forward|proc} { stm } # empty file is valid program | |
# 3 complex tokens (these are usually handled in the lexer) | |
id -> [A-Za-z][A-Za-z0-9_]* | |
int -> [0-9]+ # decimal integer literal | |
string -> "[^"\n]*" # double-quoted string: any char but " and \n | |
-> '[^'\n]*' # single-quoted string: any char but ' and \n | |
stms -> stm { stm } | |
stm -> if | do | fa | 'break' ';' | 'exit' ';' | |
-> 'return' ';' | |
-> lvalue ':=' exp ';' # assignment statement | |
-> 'write' exp ';' | 'writes' exp ';' | |
-> exp ';' # any exp is valid | |
-> ';' # the "empty" statement | |
if -> 'if' exp '->' stms { '[]' exp '->' stms } 'fi' | |
-> 'if' exp '->' stms { '[]' exp '->' stms } '[]' 'else' '->' stms 'fi' | |
do -> 'do' exp '->' { stm } 'od' | |
fa -> 'fa' id ':=' exp 'to' exp '->' { stm } 'af' | |
proc -> 'proc' id '(' declist ')' | |
{type|var} {stm} 'end' | |
-> 'proc' id '(' declist ')' ':' typeid | |
{type|var} {stm} 'end' | |
idlist -> id { ',' id} | |
var -> 'var' varlist ';' | |
varlist -> idlist ':' typeid { '[' int ']' } { ',' varlist} | |
forward -> 'forward' id '(' declist ')' ';' | |
-> 'forward' id '(' declist ')' ':' typeid ';' | |
type -> 'type' id '=' typeid { '[' int ']' } ';' | |
declist -> declistx | |
-> # empty | |
declistx -> idlist ':' typeid { ',' declistx } | |
typeid -> id # This for semantics; see semantic note #3 below | |
lvalue -> id | |
-> lvalue '[' exp ']' | |
exp -> lvalue | |
-> int # integer literal | |
-> 'true' # boolean literal | |
-> 'false' # boolean literal | |
-> string | |
-> 'read' | |
-> '-' exp | |
-> '?' exp | |
-> id '(' ')' # procedure call | |
-> id '(' exp { ',' exp } ')' # procedure call | |
-> exp '+' exp | |
-> exp '-' exp | |
-> exp '*' exp | |
-> exp '/' exp | |
-> exp '%' exp | |
-> exp '=' exp | |
-> exp '!=' exp | |
-> exp '>' exp | |
-> exp '<' exp | |
-> exp '>=' exp | |
-> exp '<=' exp | |
-> '(' exp ')' | |
########### End EBNF ############ | |
############ | |
# Comments # | |
############ | |
A comment extends from the first sharp (#) on a line to the end of the | |
line (determined by the newline character or end of file). | |
############# | |
# Semantics # | |
############# | |
1. The expressions in 'if' and 'do' statements must be bool values. | |
For example: | |
var x: int; | |
if x -> ... fi | |
is not semantically correct. Instead use | |
if x != 0 -> ... fi | |
2. The 'fa' statement | |
fa loop := lo to hi -> body af | |
executes the body once for every value of loop between lo and hi, | |
inclusive. The expressions in 'fa' must be int. The expressions | |
are executed exactly once and the body is executed at most hi - lo | |
+ 1 times. The loop variable is an int and it is not assignable in | |
the body of the loop. If hi = lo, the loop executes once. If hi < | |
lo, the statements in the loop are not executed. The scope of the | |
loop variable is the fa statement body: from the '->' to 'af'. | |
It's declaration hides any previously declared variable with the | |
same name. | |
3. The rule | |
typeid -> id | |
is only denoting semantics. The terminals id and typeid are | |
syntactically indistinguishable. However, semantically typeid | |
represents an id referring to a type (found in a different symbol | |
table from the other ids). | |
4. The read expression returns an integer. Thus | |
var a: int; | |
a := read; | |
is correct. | |
5. The write statement outputs an expression AND a newline. The | |
writes statement omits the newline. | |
6. The write and writes statements are over-loaded. If type of exp is | |
an integer, then each evaluates the integer expression and outputs a | |
decimal representation of it. For example: | |
writes 1+5; | |
outputs the decimal '6'. On the other hand, if exp is a string, | |
then the string is output. For example: | |
writes "string" | |
outputs the strings 'string'. | |
7. The exit statement terminates the program. | |
8. Statements in the outermost scope are executed in order at startup. | |
(Think of the outermost scope as being the C main procedure.) | |
9. Boolean expressions are short-circuit evaluated. For example, in the | |
following expression the function f is not evaluated because the answer | |
to the expression is known after the first clause. | |
true + f() | |
10. Functions (procedures that return a value) are declared by | |
annotating a proc declaration with a return type. For example: | |
proc f() : int ... end | |
This defines an implicit return variable with the same name as the | |
function. In the example above there is a local variable named | |
'f' of type int. The value returned by the function is the value | |
in f when it returns. Functions may only return basic types. | |
11. The break statement is valid only in loops (fa or do). It causes | |
execution of the loop to stop. Control jumps immediately to the | |
first statement following the loop. | |
12. The return statement in a proc definition forces a return from the | |
procedure. A return outside of a proc ... end (at the level of | |
the file) is equivalent to an exit. | |
13. The following function definition is allowed but the result is | |
undefined. | |
proc f() : int end | |
Consider the following statement: | |
a := f() | |
The return value was not set in the function body and there is no | |
default value. Therefore, the program is not deterministic. It is | |
not an error for a function to not set the return value. (Why?) | |
################# | |
# Symbol Tables # | |
################# | |
There are three symbol tables in ice9. One each for types, variables, | |
and procedures. Initially, the types symbol table consists of three | |
entries: 'int', 'bool', and 'string' which resolve to the three basic | |
types. These entries are in the default scope. They can be masked by | |
a like definition in an inner scope. The procedure and variables | |
tables are initially empty. | |
################# | |
# Scoping rules # | |
################# | |
Variables, parameters, types, and procedures are visible beginning | |
with their declaration. If declared in a procedure, the visibility | |
stops at the end of the defining procedure. Otherwise visibility ends | |
at the end of the file. Visibility starts with the declaration | |
itself, allowing recursive definitions. A declaration in a nested | |
scope overrides a previous declaration. However, two declarations in | |
the same scope is a name clash. | |
FORWARD: A forward statement makes a name visible (it declares the | |
variable) before its definition. This is useful for making mutually | |
recursive definition. Forward declarations are only defined for | |
procedures. | |
############# | |
# Operators # | |
############# | |
- : T -> T, T={int,bool} // unary minus or boolean not | |
? : bool -> int // conversion from boolean to integer | |
- : int x int -> int // integer subtraction | |
+ : T x T -> T, T={int,bool} // integer addition or boolean or | |
* : T x T -> T, T={int,bool} // integer multiplication or boolean and | |
/ : int x int -> int // integer division | |
% : int x int -> int // integer modulo | |
= : T x T -> bool, T={int,bool} // comparison, equal | |
!= : T x T -> bool, T={int,bool} // comparison, not equal | |
> : int x int -> bool // comparison, greater than | |
< : int x int -> bool // comparison, less than | |
>= : int x int -> bool // comparison, greater than or equal to | |
<= : int x int -> bool // comparison, less than or equal to | |
:= : T x T -> nil, T={int,bool,string} // assignment | |
################################ | |
# Precedence and associativity # | |
################################ | |
Precedence | Associativity | |
(highest to lowest) | | |
========================|=============== | |
- (Unary minus) ? | right | |
------------------------|--------------- | |
* / % | left | |
------------------------|--------------- | |
+ - | left | |
------------------------|--------------- | |
= != > < >= <= | none | |
---------------------------------------- | |
The ? operator does not associate semantically because of a type | |
conflict. | |
############## | |
# Conversion # | |
############## | |
The ? operator converts a bool value to an int. It converts true to 1 | |
and false to 0. | |
i := ? true; # i gets 1 | |
i := ? (1 != 1); # i gets 0 | |
There is no inverse operator to ?. However, an int is easily converted | |
a bool: | |
b := x != 0; | |
########## | |
# Arrays # | |
########## | |
Arrays are indexed from 0 to size-1, where size > 0. Bounds are | |
checked. An index that is < 0 or > size-1 generates a runtime error. | |
An index that is not an integer generates a compile-time error. | |
######### | |
# Notes # | |
######### | |
1. Type checking on user-defined types is structure based. That is | |
two types with the same structure are equivalent even if different | |
names are used. As an example, | |
type a = int [10][5] | |
type b = int [5] | |
type q = int [5] | |
type r = q [10] | |
The type q is structurally equivalent to b, while r is structurally | |
equivalent to a. | |
2. Basic types are passed by value. Array parameters are passed by | |
reference, similar to the C calling convention. | |
3. In forward declarations, the name of the parameters are not | |
important. Only the names of the parameters in the actual proc | |
definition are meaningful. | |
4. All integers are signed integers. | |
5. The maximum integer value must be at least 2^31 - 1 | |
(2147483647). The negative maximum integer must be at least - 2^31 | |
(-2147483648). This corresponds to 32-bit, two's-complement signed | |
integers. | |
6. The result of the modulo takes the sign of its "dividend." For | |
example, 5%-2 -> 1 and -5%-2 -> -1. This is the same convention as C. | |
7. Whitespace: The only valid whitespace characters are space, tab, and | |
newline, which in ASCII are decimal 32, 9, and 10 respectively. If you work on | |
DOS/Windows machines you can add carriage return (decimal 13) for your | |
convenience. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment