Skip to content

Instantly share code, notes, and snippets.

@mdakin
Created February 13, 2013 15:50
Show Gist options
  • Save mdakin/4945563 to your computer and use it in GitHub Desktop.
Save mdakin/4945563 to your computer and use it in GitHub Desktop.
Simple lexer for parsing text, including Turkish chars
/*
* To change this template, choose Tools | Templates
* and open the template in the editor.
*/
lexer grammar TestLexer;
@header {
package lexer;
}
Whitespace
: [ \t\n\r]+ -> skip;
fragment Digit: [0-9];
// Letters
fragment Letters
: [a-z\u00e7\u011f\u0131\u00f6\u015f\u00fc\u00e2\u00ee\u00fb];
// : [a-z];
fragment LettersCapital
: [A-Z\u00c7\u011e\u0130\u00d6\u015e\u00dc\u00c2\u00ce\u00db];
// : [A-Z];
fragment AllAlphanumerical
: [0-9a-zA-Z\u00e7\u011f\u0131\u00f6\u015f\u00fc\u00e2\u00ee\u00fb\u00c7\u011e\u0130\u00d6\u015e\u00dc\u00c2\u00ce\u00db\-];
// : [0-9a-zA-Z];
fragment AposAndSuffix: '\'' Letters+;
Number
: Digit+ ;
Word
: LettersCapital? Letters+;
WordWithApos
: LettersCapital? Letters+ AposAndSuffix;
Alphanumerical
: AllAlphanumerical+ AposAndSuffix?;
Punctuation
: [.,?];
Unknown : .+? ;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment