Skip to content

Instantly share code, notes, and snippets.

@akfish
Last active August 29, 2015 13:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akfish/9167407 to your computer and use it in GitHub Desktop.
Save akfish/9167407 to your computer and use it in GitHub Desktop.
Make Better Irony with Sarcasm

Make Better Irony with Sarcasm

Overview

When implementing complex grammar with Irony, debugging can be painful.

  • Write-Build-Run-Debug loop is too long
  • Debug information is not quite easy to understand sometimes
  • Cannot trace errors/conflicts back to source code

Sarcasm's Goal :

  • Provides a DSL that allows develops to write grammar in a EBNF-like syntax
  • Generates C# class and nice formatted grammar specification document in markdown
  • Compile-time error checking and grammar validation
  • Backtrack capabilities from erros and conflicts to source code
  • More readable and usable grammar conflict debugging information
  • Easy integreting with Visual Studio

Workflow

  1. Input .sarc file
  2. Parse and check for syntax error
  3. Generate .cs parser class (and .md document)
  4. Continues with other build stages
  5. On error, process ErrorItems and determine wheter they are Sarcasm related. If so, emit new error that maps to Sarcasm source file. (Ref)
  6. On success, load compiled assembly and do some IronyExplorer stuff. Get, translate and emit all grammar error.

Sarcasm Grammar Specification

Introduction

This documentation specifies grammar rules for Sarcasm.

@namespace Sarcasm . Parser

@class SarcasmGrammar

@name "Sarcasm"

@version "0.1"

@description "An EBNF-like DSL that generates Irony"

Overall Structure

Body is the root node:

Root := Body ;

The body consists of multiple lines of statements:

BodyNode Body := Stmt + ;

Valid statements are headers, comments, directives and production rules.

Stmt := Header | Comment | Rule | Directive ;

Grammar Documentation

Sarcasm are designed to generate both parser class and nice specification document of the grammar in MarkDown language. MarkDown header syntax is directly supported for it can also be used to organize code in sections, while other syntaxes should only be uesed in comments. All literal contents will be directly copied into final MD document. Directives, terminal declarations and rules will be formatted to a readable manner. Some syntaxes (like grammar hint) will be filtered. Named consts, keywords will be displayed as literal instead of identifiers.

6 level of headers are supported, all starting with #s:

H1 = new CommentTerminal ( "H1" "#" "\n" "\r" ) ;

H2 = new CommentTerminal ( "H2" "##" "\n" "\r" ) ;

H3 = new CommentTerminal ( "H3" "###" "\n" "\r" ) ;

H4 = new CommentTerminal ( "H4" "####" "\n" "\r" ) ;

H5 = new CommentTerminal ( "H5" "#####" "\n" "\r" ) ;

H6 = new CommentTerminal ( "H6" "######" "\n" "\r" ) ;

Header := H1 | H2 | H3 | H4 | H5 | H6 ;

2 types of comments can be used: Single line comment starting with // and block comment surrounded by /* and */.

SINGLE_COMMENT = new CommentTerminal ( "SINGLE_COMMENT" "//" "\n" "\r" ) ;

BLOCK_COMMENT = new CommentTerminal ( "BLOCK_COMMENT" "/" "/" ) ;

Comment := SINGLE_COMMENT | BLOCK_COMMENT ;

Directives

Directives are used to configure the behaviours of Sarcasm compiler. For example:

@namespace Me.CatX

The above directive tells the compiler to put the generated parser class in Me.CatX namespace.

Valid directives are:

DIRECTIVES = [ "class" "namespace" "transient" "punctuation" "operator" ] ;

A directive starts with @, following by a valid directive name and a varialbe length of arguments:

Directive := "@" DIRECTIVES Arguments ;

String literals and identifies are simple value that can be used in rule production

ID = new IdentifierTerminal ( "ID" ) ;

STRING = new StringLiteral ( "STRING" """ StringOptions . AllowsAllEscapes ) ;

SimpleValue := STRING | ID ;

Since some syntaxes like directives are directly mapped to C# method call, simple value may not be enough.

We will need number:

NUMBER = new NumberLiteral ( "NUMBER" ) ;

Syntaxes like Foo.Bar.Zzz are dotted syntax, formed by multiple identifier separated by .:

Dotted := ID "." ID | Dotted "." ID ;

That's all we need so far:

Expr := SimpleValue | NUMBER | Dotted ;

Arguments consist of 1 or more expressions separated by ,:

Arguments := Expr { "," } ;

Types

Before continue to declarations and rule, we define some basic types.

Left-hand-side of declarations and rules are called assignables.

It is either a untyped identifier, or a typed one. The first identifier is the type of AST node.

Assignable := ID | ID ID ;

We combine simple values with comma to make a list:

SimpleValueList := SimpleValue { "," } ;

Then we have an array:

SimpleArray := "[" SimpleValueList "]" ;

Declarations of Terminals

We declare a terminals with syntax:

Declaration := Assignable "=" TerminalInit ";" ;

A terminal can be initialized by:

  • String literal (like keyword)
  • Simple Array (same effect as connecting terminals with | operator)
  • Call to C# construtor or method that returns an instance of terminal

TerminalInit := STRING | SimpleArray | CSharpCall ;

C# calls is actually a code snippet in C#. Note that Scarasm will not create the classes nor methods for you.

Make sure you have the classes accessiable from the parser's namespace and have created the methods in the parser class:

CSharpFn := ID "(" Arguments ")" | ID "(" ")" ;

CSharpCall := "new" CSharpFn | CSharpFn ;

Rules of Production

Basic form

Rules of production are plain and simple EBNF. Each rule ends with a ;:

Rule := Assignable ":=" Production ";" ;

Right-hand side is production. Only two operations:

Production := And | Or ;

Nodes

Terminals, non-terminals and sub-productions are called nodes. Terminals and non-termials can always be identified with SimpleValue.

Node := shift SimpleValue | Parenthetical ;

Sub-productions can be put between parenthesis to control associativities:

Parenthetical := "(" Production ")" ;

Repeating

Four tailing operators are defined to describle repeating of any node:

  • ? Question Rule: 0 or 1 times
  • + Plus Rule: 1 or more times
  • * Star Rule: 0 or more times
  • {} List Rule: 1 or more times

REPEAT = [ "?" "+" "*" "{}" ] ;

+, * and {} operators can be argumented with delimeter:

Delimiter := shift "(" SimpleValue ")" ;

RepeatWithDelimiter := "+" Delimiter | "*" Delimiter | "{" SimpleValue "}" ;

That sums up to repeat operations:

Repeat := REPEAT | RepeatWithDelimiter ;

Grammar Hints

When grammar becomes more and more complex, Irony somethimes gets confused and produces conflicts. Grammar hints are provided to resolve those conflicts.

Some hints are plain and simple:

SIMPLE_HINTS = [ "shift" "reduce" ] ;

Others are conditional:

CONDITIONAL_HINTS = [ "shiftIf" "reduceIf" ] ;

That it:

Hint := SIMPLE_HINTS | CONDITIONAL_HINTS "(" SimpleValueList ")" ;

Adding Things Up

We argument Nodes with Repeats and Hints

NodeExt := Node | Node Repeat reduce | Hint ;

When writing sequencially from left to right, we perform AND operation:

And := NodeOpt {} ;

Alternaitves are describle with or (|) operator:

Or := Production "|" Production ;

Wrapping Up

Mark transient non-terminals:

@transient Stmt Expr Header Comment SimpleValue Production Node

Mark punctuations:

@punctuation "(" ")" "[" "]" "{" "}"

Register operators:

@operator 0 Left "|"

@operator 1 Neutral "shift" "reduce" "shiftIf" "reduceIf"

@operator 2 Left REPEAT

# Sarcasm Grammar Specification
## Introduction
/*
This documentation specifies grammar rules for Sarcasm.
*/
@namespace Sarcasm.Parser
@class SarcasmGrammar
@name "Sarcasm"
@version "0.1"
@description "An EBNF-like DSL that generates Irony"
## Overall Structure
// Body is the root node:
Root := Body;
// The body consists of multiple lines of statements:
BodyNode Body := Stmt+;
// Valid statements are headers, comments, directives and production rules.
Stmt := Header | Comment | Rule | Directive;
## Grammar Documentation
/*
Sarcasm are designed to generate both parser class and nice specification document of the grammar in MarkDown language.
MarkDown header syntax is directly supported for it can also be used to organize code in sections,
while other syntaxes should only be uesed in comments.
All literal contents will be directly copied into final MD document.
Directives, terminal declarations and rules will be formatted to a readable manner.
Some syntaxes (like grammar hint) will be filtered. Named consts, keywords will be displayed as literal instead of identifiers.
*/
// 6 level of headers are supported, all starting with ```#```s:
H1 = new CommentTerminal("H1", "#" , "\n", "\r");
H2 = new CommentTerminal("H2", "##" , "\n", "\r");
H3 = new CommentTerminal("H3", "###" , "\n", "\r");
H4 = new CommentTerminal("H4", "####" , "\n", "\r");
H5 = new CommentTerminal("H5", "#####" , "\n", "\r");
H6 = new CommentTerminal("H6", "######" , "\n", "\r");
Header := H1 | H2 | H3 | H4 | H5 | H6;
// 2 types of comments can be used: Single line comment starting with ```//``` and block comment surrounded by ```/*``` and ```*/```.
SINGLE_COMMENT = new CommentTerminal("SINGLE_COMMENT" , "//", "\n", "\r");
BLOCK_COMMENT = new CommentTerminal("BLOCK_COMMENT" , "/*", "*/");
Comment := SINGLE_COMMENT | BLOCK_COMMENT;
## Directives
/*
Directives are used to configure the behaviours of Sarcasm compiler. For example:
```
@namespace Me.CatX
```
The above directive tells the compiler to put the generated parser class in ```Me.CatX``` namespace.
*/
// Valid directives are:
DIRECTIVES = ["class", "namespace", "transient", "punctuation", "operator"];
// A directive starts with ```@```, following by a valid directive name and a varialbe length of arguments:
Directive := "@" DIRECTIVES Arguments;
// String literals and identifies are simple value that can be used in rule production
ID = new IdentifierTerminal("ID");
STRING = new StringLiteral("STRING", "\"", StringOptions.AllowsAllEscapes);
SimpleValue := STRING | ID;
// Since some syntaxes like directives are directly mapped to C# method call, simple value may not be enough.
// We will need number:
NUMBER = new NumberLiteral("NUMBER");
// Syntaxes like ```Foo.Bar.Zzz``` are dotted syntax, formed by multiple identifier separated by ```.```:
Dotted := ID "." ID
| Dotted "." ID;
// That's all we need so far:
Expr := SimpleValue | NUMBER | Dotted;
// Arguments consist of 1 or more expressions separated by ```,```:
Arguments := Expr{","};
## Types
/*
Before continue to declarations and rule, we define some basic types.
*/
// Left-hand-side of declarations and rules are called assignables.
// It is either a untyped identifier, or a typed one. The first identifier is the type of AST node.
Assignable := ID | ID ID;
// We combine simple values with comma to make a list:
SimpleValueList := SimpleValue{","};
// Then we have an array:
SimpleArray := "[" SimpleValueList "]";
## Declarations of Terminals
// We declare a terminals with syntax:
Declaration := Assignable "=" TerminalInit ";";
/* A terminal can be initialized by:
* String literal (like keyword)
* Simple Array (same effect as connecting terminals with ```|``` operator)
* Call to C# construtor or method that returns an instance of terminal
*/
TerminalInit := STRING | SimpleArray | CSharpCall;
// C# calls is actually a code snippet in C#. Note that Scarasm will not create the classes nor methods for you.
// Make sure you have the classes accessiable from the parser's namespace and have created the methods in the parser class:
CSharpFn := ID "(" Arguments ")" | ID "(" ")";
CSharpCall := "new" CSharpFn | CSharpFn;
## Rules of Production
### Basic form
// Rules of production are plain and simple EBNF. Each rule ends with a ```;```:
Rule := Assignable ":=" Production ";";
// Right-hand side is production. Only two operations:
Production := And | Or;
### Nodes
/*
Terminals, non-terminals and sub-productions are called nodes.
Terminals and non-termials can always be identified with SimpleValue.
*/
Node := shift SimpleValue | Parenthetical;
/*
Sub-productions can be put between parenthesis to control associativities:
*/
Parenthetical := "(" Production ")";
### Repeating
/*
Four tailing operators are defined to describle repeating of any node:
* ```?``` Question Rule: 0 or 1 times
* ```+``` Plus Rule: 1 or more times
* ```*``` Star Rule: 0 or more times
* ```{}``` List Rule: 1 or more times
*/
REPEAT = ["?", "+", "*", "{}"];
// ```+```, ```*``` and ```{}``` operators can be argumented with delimeter:
Delimiter := shift "(" SimpleValue ")";
RepeatWithDelimiter := "+" Delimiter
| "*" Delimiter
| "{" SimpleValue "}";
// That sums up to repeat operations:
Repeat := REPEAT | RepeatWithDelimiter;
### Grammar Hints
/*
When grammar becomes more and more complex, Irony somethimes gets confused and produces conflicts.
Grammar hints are provided to resolve those conflicts.
*/
// Some hints are plain and simple:
SIMPLE_HINTS = ["shift", "reduce"];
// Others are conditional:
CONDITIONAL_HINTS = ["shiftIf", "reduceIf"];
// That it:
Hint := SIMPLE_HINTS | CONDITIONAL_HINTS "(" SimpleValueList ")";
### Adding Things Up
// We argument Nodes with Repeats and Hints
NodeExt := Node
| Node Repeat reduce
| Hint;
/*
When writing sequencially from left to right, we perform AND operation:
*/
And := NodeOpt{};
/*
Alternaitves are describle with or (```|```) operator:
*/
Or := Production "|" Production;
## Wrapping Up
// Mark transient non-terminals:
@transient Stmt, Expr, Header, Comment, SimpleValue, Production, Node
// Mark punctuations:
@punctuation "(", ")", "[", "]", "{", "}"
// Register operators:
@operator 0, Left, "|"
@operator 1, Neutral, "shift", "reduce", "shiftIf", "reduceIf"
@operator 2, Left, REPEAT
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment