Skip to content

Instantly share code, notes, and snippets.

@eeskildsen
Last active January 31, 2024 12:43
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save eeskildsen/6ce49c5496f10c3e06ff6bde765f1424 to your computer and use it in GitHub Desktop.
Save eeskildsen/6ce49c5496f10c3e06ff6bde765f1424 to your computer and use it in GitHub Desktop.
Notes on using ANTLR4 to parse PL/SQL in C#.

Setting Up ANTLR

  1. Download and set up ANTLR4
  2. Download PlSqlLexer.g4 and PlSqlParser.g4 from https://github.com/antlr/grammars-v4/tree/master/sql/plsql

Building the Grammar

  1. Build the lexer (https://stackoverflow.com/a/55379369/1958726):

    antlr4 PlSqlLexer.g4 -Dlanguage=CSharp
    
  2. Build the parser:

    antlr4 PlSqlParser.g4 -Dlanguage=CSharp
    

Creating a Project

  1. PowerShell:

    dotnet new console -n MyPlSqlProject
    cd .\MyPlSqlProject
    iwr https://github.com/antlr/grammars-v4/raw/master/sql/plsql/CSharp/PlSqlLexerBase.cs -OutFile PlSqlLexerBase.cs
    iwr https://github.com/antlr/grammars-v4/raw/master/sql/plsql/CSharp/PlSqlParserBase.cs -OutFile PlSqlParserBase.cs
    iwr https://raw.githubusercontent.com/antlr/antlr4/master/doc/resources/CaseChangingCharStream.cs -OutFile CaseChangingCharStream.cs
    
  2. Copy the .cs files that ANTLR generated in the previous section into the project folder.

  3. Fix namespaces; remove namespaces from PlSqlLexerBase.cs and PlSqlParserBase.cs, or add PlSqlParseTree usings to their subclasses, or do something else (maybe antlr4 takes a namespace argument when building the lexer and parser?).

  4. Add missing constructor overload to PlSqlLexerBase.cs:

    public PlSqlLexerBase(ICharStream input, TextWriter output, TextWriter errorOutput) : base(input, output, errorOutput) { }
    
  5. Add missing constructor overload to PlSqlParserBase.cs:

    protected PlSqlParserBase(ITokenStream input, TextWriter output, TextWriter errorOutput) : base(input, output, errorOutput) { }
    
  6. In PlSqlLexerBase, change int la = _input.La(pos); to int la = InputStream.LA(pos); if it hasn't been fixed in the repo yet.

  7. Insert into Program.cs:

    var charStream = CharStreams.fromstring("My PL/SQL"); // or .fromPath or whatever
    var caseChangingCharStream = new CaseChangingCharStream(charStream, true); // Required for PL/SQL; see https://github.com/antlr/grammars-v4/blob/master/sql/plsql/README.md
    
    var lexer = new PlSqlLexer(caseChangingCharStream);
    var commonTokenStream = new CommonTokenStream(lexer);
    var parser = new PlSqlParser(commonTokenStream);
    
  8. For faster compilation, separate the ANTLR classes into their own class library project, and uncheck Build for it under Build > Configuration Manager.

Using XPath

If you just need to grab specific things, like all tableview names in a view DDL's from clause, you can use ANTLR's XPath class. Example:

var tableviews = XPath.FindAll(tree, "//create_view//from_clause//tableview_name", parser);

Getting a Context's Text, Including Whitespace

context.GetText() just returns the combined text of all child nodes; whitespace isn't included.

To get the full, original text, including whitespace, use:

caseChangingCharStream.GetText(new Interval(context.Start.StartIndex, context.Stop.StopIndex);
@taozuhong
Copy link

Thanks awfully.

@eeskildsen
Copy link
Author

Thanks awfully.

Sure, glad it helped!

@mbp
Copy link

mbp commented Nov 17, 2023

Thanks, it helped me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment