Skip to content

Instantly share code, notes, and snippets.

@jtpaasch
Created March 3, 2024 18:37
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jtpaasch/8852ec57fe7c16f0c12b273afa901167 to your computer and use it in GitHub Desktop.
Save jtpaasch/8852ec57fe7c16f0c12b273afa901167 to your computer and use it in GitHub Desktop.
C++ BNF Grammar quickstart

C++ BNF Grammar

The C++ grammar is convoluted. These notes offer a quickstart/bird's eye view into it.

BNF grammars

Here are some versions of the grammar (or C-variants):

Helpful simplifications:

Translation units

A translation unit is a sequence of declarations.

translation-unit ::= declaration-seq

In Clang's AST, this is the TranslationUnitDecl.

A declaration sequence is either a single declaration, or a declaration sequence terminating in a declaration:

declaration-seq ::= declaration
                  | declaration-seq declaration

It's a bit like the functional-programming way to do lists: you have nil and cons, and you push an element onto the head of a list to make it longer. To parse it, you take the head off, and then recursively step into the tail. Here, it's just reverse-cons. Or you can think of the snake pointing in the other direction: the head points to the left and the tail is everything to the left of that last element.

What sorts of things can be declared at the top level? Lots of things. Functions, templates, namespaces, etc. It's C++.

For simplicity, let's just think of the case where you have global variables and functions:

declaration ::= block-declaration | function-definition | ...

We'll cover function definitions in a moment. For the moment, think about top-level block-declarations.

Top-level block-declarations can take many forms. You can declare an alias, namespaces, using-declarations, etc. The easy case is just a simple-declaration:

block-declaration ::= simple-declaration | ...

A simple-declaration allows you to specify an optional list of initial declarations (which are optionally qualified by attribute and declaration specifiers):

simple-declaration ::= [attribute-secifier-seq] [decl-specifier-seq] [init-declarator-list];

Note that you can just have an empty simple-declaration here: just a semicolon!

For simplicity, we'll think about the simple case where you declare some variables and some types. The types are the decl-specifier-seq and the variables are the init-declarator-list.

For the decl-specifier-seq,this is a reverse-consed sequence of decl-specifiers:

decl-specifier-seq ::= decl-specifier | decl-specifier-seq decl-specifier

A decl-specifier can specify the storage class, type, etc:

decl-specifier ::= storage-class-specifier | type-specifier | ...

The type can be specified as a trailing-type-specifier, etc:

type-specifier ::= trailing-type-specifier | ...

And that in turn can be a simple-type, among other things:

trailing-type-speifier ::= simple-type-specifier | ...

Simple types include thinsg like int, char, etc.

If you want a list of initial declarations, beyod the types, it should be in the form of a reverse-consed list of init-declarators:

init-declarator-list ::= init-declarator | init-declarator-list init-declarator 

What is an init-declarator? It consists of a declarator, and an optional initializer:

init-declarator ::= declarator [initializer]

A declarator effectively names a variable or a type or a namespace or whatever it is you're trying to declare. The simple case is to think of it simply as declarating a variable.

However, you can declare a pointer variable, or a non-pointer variable qualified by parameters and a trailing return type.

For simplicity, we can ignore the qualified non-pointer type, and just think about the ptr-declarators:

declarator ::= ptr-declarator | ...

What does this include? It can be a non-pointer declaration, or a non-pointer declaration pre-pended by a pointer operator:

ptr-declarator ::= noptr-declarator
                 | ptr-operator ptr-declarator

So, for instance, you can do x, or *x, or **x, and so on.

A non-pointer declarator, in turn can just be the declarator's ID, optionally qualified with a sequence of attribute specifiers:

noptr-declarator ::= declarator-id [attribute-specifier-seq] | ...

For simplicity, we can just ignore the attribute specifiers, and think of a noptr-declarator as simply being a declarator-id:

noptr-declarator ::= declarator-id | ...

The declarator-id can be an id-expression, or a reference to a classname. For our purposes, we can just focus on the id-expression:

declarator-id ::= id-expression | ...

An id-expression can be a qualified or unqualified ID:

id-expression ::= unqualified-id | qualified-id

In C++ jargon, the difference here is whether you have some "Foo::" in front of an identifier "x". The "Foo::" qaulifies the name "x" so that "x" is part of the "Foo" namespace. Without that, "x" is just an unqualified name.

For simplicity, we'll focus only on the qualified case. This is just an identifier:

unqualified-id ::= identifier | ...

Example. Suppose you have a global declaration in your file:

int foo;

Here's the production/expansion:

declaration ::=> block-declaration
block-declaration ::=> simple-declaration
simple-declaration ::=> <a>decl-specifier-seq <b>init-declarator-list;
<a>decl-specifier-seq ::=> decl-specifier
   decl-specifier ::=> type-specifier
   type-specifier ::=> trailing-type-specifier
   trailing-type-specifier ::=> simple-type-specifier
   simple-type-specifier ::=> "int"
<b>init-declarator-list ::=> init-declarator
   init-declarator ::=> declarator [initializer]
   declarator ::=> ptr-declarator
   ptr-declarator ::=> noptr-declarator
   noptr-declarator ::=> declarator-id
   declarator-id ::=> id-expression
   id-expression ::=> unqualified-id
   unqualified-id ::=> identifier
   identifier ::=> "foo"

You can also provide an optional initializer for a top level declaration like this. For instance, adding 3 + 2 as an initializer of foo:

int foo = 3 + 2;

An initializer can be a brace-or-equal-initializer, or a list of expressions.

initializer ::= brace-or-equal-initializer
              | ( expression-list )

For simplicity, we'll just focus on the brace-or-equal-initializer. This can either be the braced initial value, or an equals sign followed by an initializer clause:

brace-or-equal-initializer ::= = initializer-clause
                             | braced-init-list

The initializer-clause is, in turn, an assignment-expression:

initializer-clause ::= assigment-expression | ...

There are examples of assignment-expressions below.

In Clang's AST, these top-level init-declarators will have the form of SomethingDecl children of the parent TranslationUnitDecl.

Function declarations

In Clang's AST, a top-level function-definition is a top-level FunctionDecl, i.e., a FunctionDecl that is a child of the TranslationUnitDecl.

A function definition can take optional attribute specifiers and decl specifiers, along with a declarator to name the function and a function body.

function-definition ::= [attribute-specifier-seq] [decl-specifier-seq] declarator function-body

The decl-specifiers involve stipulating the return type of the function, among other things, and we saw an example of it in the last section. The declarotor is also just an identifier, and we saw an example of that in the last section too. The function body of a function involves an optional ctor-initializer, and then a compound-statement for the body of the function:

function-body ::= [ctor-initializer] compound-statement

We can ignore the ctor-initializer for simplicity, and instead just think of a function as having a function body.

Function bodies/Compound statements

The body of a function is a compound-statement. This is a sequence of zero or more statements between curly braces:

compound-statement ::= { [statement-seq] }

In Clang's AST, the function body is a CompoundStmtDecl, and its children are the list of statements.

A statement sequence is of course a reverse-consed sequence of statements:

statement-seq ::= statement | statement-seq statement

Statements

A statement can be an if-statement (which C++ calls a selection-statement), a while-loop or for-loop (which C++ calls iteration statements), a jump-statement, another compound statement, an expression statement, a manually labeled statement (a target of jumps/gotos), a try block, and more declarations:

statement ::= declaration-statement
            | labeled-statement
            | selection-statement
            | iteration-statement
            | jump-statement
            | compound-statement
            | expression-statement
            | try-block

Note that C++ does not have an assignment statement. Instead, it just has expression statements, whose contents are just expressions (and even the expression can be optional):

expression-statement ::= [expression]

Since an expression can be optional, a "skip statement" is really just an empty expression statement in C++. In Clang's AST, it is a NullStmt.

Expressions

Expressions can take many forms. One version of an expression is an assignment. Other examples involve arithmetic expressions, boolean expressions, or even just a character or string literal.

Here are some helpful grammar rules from the Annotated Manual's appendix:

name ::= identifier
literal ::= integer-constant

primary-expression ::= literal | name

postfix-expression ::= primary-expression | postfix-expression++

unary-operator ::= * | & | + | - | ! | ~

cast-expression ::= unary-expression
                  | ( type-name ) cast-expression

unary-expression ::= postfix-expression
                   | ++unary-expression
                   | sizeof unary-expression
                   | sizeof ( type-name )
                   | unary-operator cast-expression

pm-expression ::= cast-expression | ...

multiplicative-expression ::= pm-expression
                            | multiplicative-expression * pm-expression

additive-expression ::= multiplicative-expression
                      | additive-expression + multiplicative expression

shift-expression ::= additive-expression
                   | shift-expression << additive-expression

relational-expression ::= shift-expression
                        | relational-expression < shift-expression

equality-expression ::= relational-expression
                      | equality-expression == relational-expression

and-expression ::= equality-expression | ...

exclusive-or-expression ::= and-expression | ...

inclusive-or-expression ::= exclusive-or-expression | ...

logical-and-expression ::= inclusive-or-expression | ...

logical-or-expression ::= logical-and-expression | ...

conditional-expression ::= logical-or-expression | ...

assignment-operator ::= = | *= | +=

assignment-expression
   ::= conditional-expression
     | unary-expression assignment-operator assignment-expression

expression ::= assignment-expression
             | expression , assignment-expression

For an example, suppose I have a statement like this:

x = 3;

Here's the production:

    expression ::=> assignment-expression
    assignment-expression ::=>
      <a>unary-expression <b>assignment-operator <c>assignment-expression
    <a>unary-expression ::=> postfix-expression
       postfix-expression ::=> primary-expression
       primary-expression ::=> name
       name ::=> identifier
       identifier ::=> "x"
    <b>assignment-operator ::=> "="
    <c>assignment-expression ::=> conditional-expression
       conditional-expression ::=> logical-or-expression
       logical-or-expression ::=> logical-and-expression
       logical-and-expression ::=> inclusive-or-expression
       inclusive-or-expression ::=> exclusive-or-expression
       exclusive-or-expression ::=> and-expression
       and-expression ::=> equality-expression
       equality-expression ::=> relational-expression
       shift-expression ::=> additive expression
       additive-expression ::=> multiplicative-expression
       multiplicative-expression ::=> pm-expression
       pm-expression ::=> cast-expression
       cast-expression ::=> unary-expression
       unary-expression ::=> postfix-expression
       postfix-expression ::=> primary-expression
       primary-expression ::=> literal
       literal ::=> integer-literal
       integer-literal ::= "3"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment