Skip to content

Instantly share code, notes, and snippets.

@Chubek
Last active March 10, 2024 13:04
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Chubek/30e20238ab198c73644512e6ccf41eed to your computer and use it in GitHub Desktop.
Save Chubek/30e20238ab198c73644512e6ccf41eed to your computer and use it in GitHub Desktop.
EBNF Grammar for AWK

Note: if you wish to understand these notations, please read this: https://gist.github.com/Chubek/52884d1fa766fa16ae8d8f226ba105ad

So, again, why did I write the EBNF grammar for AWK?

Basically, I have two ongoing projects where AWK is involved. Firs is Squawk, and implementation of AWK and second is AWK2c, which obviously translates AWK to C.

Plus, I am thinking of making a Github page called 'The Internet Grammar Database' where I would post EBNF, Yacc, PEG, Lex, definitions of languages. However, I don't have much experience in web development, so if you can help me, let me know (chubakbidpaa [at] riseup [dot] net).

So anyways, awk.ebnf contains the EBNF grammar for AWK. Some considerations:

  • The grammar is based on POSIX specs.
  • It may not be complete.
  • It may not be the best representation of AWK as EBNF.
  • The file, ad verbatim, is best viewed on NeoVim with my ebnf.vim which I linked in the Gist above.
  • Some people asked if this gramamr can be fed to any parser generator? No parser generator exists that can feeds off of EBNF. There's BNFC, which uses LBNF but that's a whole other beast. This was true until last year. Today, with ChatGPT, you can generate almost anything you want with a well-specified EBNF grammar!

Thanks; ~ Chubak

awk-program ::= { top-level-element | comment }
top-level-element ::= function-definition
| pattern-action-pair
function-definition ::= "function" identifier '(' [ identifier-list ] ')' compound-statements
pattern-action-pair ::= [ pattern ] action
action ::= compound-statements
pattern ::= expression-range
| "BEGIN"
| "END"
expression-range ::= expression [ ',' expression ]
compound-labeled ::= '{' labeled-statement-list '}'
compound-statements ::= '{' statement-list '}'
labeled-statement-list ::= labeled-statement { terminate labeled-statement }
labeld-statement ::= label ':' statement-list
label ::= "case" identifier
| "default"
statement-list ::= statement { terminate statement }
statement ::= expression-statement
| print-statement
| printf-statement
| for-statement
| do-while-statement
| while-statement
| if-statement
| switch-statement
| controlflow-statement
| memory-statement
memory-statement ::= "delete" variable
controlflow-statement ::= "break"
| "continue"
| "next"
| "return" expression
switch-statment ::= "switch" '(' expression ')' compound-labeled
if-statement ::= "if" '(' expression ')' compound-statements [ "else" ( if-statement | compound-statements ) ]
while-statement ::= "while" '(' expression ')'compound-statements
do-while-statement ::= "do" compound-statements "while" '(' expression ')'
for-statment ::= "for" '(' [ expr-list [ ';' expr-list ';' expr-list ] ] ')' compound-statements
printf-statement ::= "printf" format-string ',' expr-list [ output-redirection ] terminate
print-statement ::= "print" expr-list [ output-redirection ] terminate
expression-statement ::= expression terminate
output-redirection ::= '>' string-const
| ">>" string-const
| '|' string-const
expr-list ::= expression { ',' expression }
expression ::= primary
| unary-expr
| binary-expr
| ternary-expr
| assign-expr
assign-expr ::= variable "^=" expression
| variable "%=" expression
| variable "*=" expression
| variable "/=" expression
| variable "+=" expression
| variable "-=" expression
| variable '=' expression
ternary-expr ::= binary-expr '?' binary-expr ':' binary-expr
binary-expr ::= logical-or-expr
logical-or-expr ::= logical-and-expr "||" logical-and-expr
logical-and-expr ::= in-expr "&&" logical-and-expr
in-expr ::= match-xpr "in" variable
match-expr ::= relop-expr'~' match-expr
| relop-expr "!~" match-expr
relop-expr ::= concat-expr '<' relop-expr
| concat-expr "<=" relop-expr
| concat-expr '!=' relop-expr
| concat-expr "==" relop-expr
| concat-expr '>' relop-expr
| concat-expr ">=" relop-expr
concat-expr ::= additive-expr concat-expr
additive-expr ::= multiplicative-expr '+' additive-expr
| multiplicative-expr '-' additive-expr
multiplicative-expr ::= exponential-expr '*' multiplicative-expr
| exponential-expr '/' multiplicative-expr
| exponential-expr '%' multiplicative-expr
exponential-expr ::= unary-expr '^' exponential-expr
unary-expr ::= '+' primary
| '-' primary
| '!' primary
| '~' primary
| '$' primary
| "++" primary
| "--" primary
| postfix-expr
postfix-expr ::= primary "++"
| primary "--"
primary ::= variable
| const-literal
| getline
| identifier '(' [ expr-list ] ')'
| '(' expr ')'
getline ::= [ string-const '|' ] "getline" [ variable ] [ '<' string-const ]
variable ::= identifier [ index ]
index ::= '[' { alphanumeric } ']'
identifier-list ::= identifier { ',' identifier }
identifier ::= alphabetic { alphanumeric | '_' }
alphanumeric ::= letter | int-digit
alphabetic ::= upper-case | lower-case
upper-case ::= 'A' | 'B' | 'C' | ... | 'X' | 'Y' | 'Z'
lower-case ::= 'a' | 'b' | 'c' | ... | 'x' | 'y' | 'z'
format-string ::= '"' { format-glyph } '"'
| "'" { format-glyph } "'"
const-literal ::= regex-const
| string-const
| float-const
| integer-const
regex-const ::= '/' { vis-glyph } '/'
string-const ::= '"' { vis-glyph } '"'
| "'" { vis-glyph } "'"
float-const ::= [ dec-integer ] '.' dec-integer
integer-const ::= dec-integer
| hex-integer
| oct-integer
| bin-integer
dec-integer ::= int-digit { int-digit }
hex-integer ::= ( "0x" | "0X" ) hex-digit { hex-digit }
oct-integer ::= ( "0o" | "0O" ) oct-digit { oct-digit }
bin-integer ::= ( "0b" | "0B" ) bin-digit { bin-digit }
hex-digit ::= int-digit | 'a' | 'b' | 'c' | 'd' | 'e' | 'f'
| 'A' | 'B' | 'C' | 'D' | 'E' | 'F'
int-digit ::= oct-digit | '8' | '9'
oct-digit ::= bin-digit | '3' | '4' | '5' | '6' | '7'
bin-digit ::= '0' | '1'
format-glyph ::= vis-glyph | format-tag
format-tag ::= '%' no-percentage
vis-glyph ::= printable | c-escape
printable ::= ' ' | '!' | ... | '}' | '~'
c-escape ::= '\' ( "'" | '"' | 'n' | 't' | 'r' | 'b' | 'f' | 'v' | 'a' | '?' | '0' | '\' )
| '\' oct-digit oct-digit oct-digit
| "\x" hex-digit hex-digit
| "\U" hex-digit hex-digit hex-digit hex-digit
comment ::= "# " { no-newline } sys-newline
no-percentage ::= ? any character except percentage sign ?
no-newline ::= ? any character except sys-newline ?
terminate ::= [ ';' ] sys-newline
whitespace ::= ? space and horizontal tab ?
sys-newline ::= ? system newline character ?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment