Skip to content

Instantly share code, notes, and snippets.

@corneliusroemer
Created December 19, 2023 01:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save corneliusroemer/523b1ab0235a85f7e642b881c7e4a44e to your computer and use it in GitHub Desktop.
Save corneliusroemer/523b1ab0235a85f7e642b881c7e4a44e to your computer and use it in GitHub Desktop.
ANTLR4 grammar for parsing Newick trees
// A Newick grammar based on the specification by Gary Olsen: available at
// https://phylipweb.github.io/phylip/newick_doc.html The grammar does _not_ include the optional
// comments as I have not figured out how to include them without adding a lot of `comment?` rules
// to the grammar whick makes it very slow.
grammar Newick;
tree: descendantList rootLabel? branchLength? SEMI;
descendantList: LPAREN subtree (COMMA subtree)* RPAREN;
subtree: (descendantList internalNodeLabel? | leafLabel) branchLength?;
rootLabel: label;
internalNodeLabel: label;
leafLabel: label;
label: (unquotedLabel | quotedLabel);
unquotedLabel: STRING;
quotedLabel:
SQUOTE (
STRING
| LPAREN
| RPAREN
| COMMA
| SEMI
| WS
| COLON
| doubleSquote
)* SQUOTE;
doubleSquote: SQUOTE SQUOTE;
branchLength: COLON (SIGNED_NUMBER | UNSIGNED_NUMBER);
// Lexer Rules
LPAREN: '(';
RPAREN: ')';
COMMA: ',';
SEMI: ';';
COLON: ':';
SQUOTE: '\'';
SIGNED_NUMBER: '-'? UNSIGNED_NUMBER;
UNSIGNED_NUMBER: [0-9]+ ('.' [0-9]+)?;
STRING: ~[ \t\r\n()[\],;:']+;
WS: [ \t\r\n]+ -> skip;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment