TODO: Add introduction paragraph.
- 1.1 Character: The smallest possible matchable unit. Defined as a Unicode Scalar Value.
- 1.2 Property: A binary character property as defined by Unicode. It's value is a closed range of all characters, which have this property set to true. The available properties are extraced from the UCD (DerivedCoreProperties.txt and PropList.txt).
The main part of USN are the rule definitions. A rule assigns an expression a name, so it can be referenced an in expression itself. The notation for rules is the following:
rule_name = expression ;
An expression can be formed from a set of atomic values, compounds and operators, which are elaborated in the next sections.
Comments may be used anywhere in USN to leave notes or information for readers.
They begin with two forward slashes //
and span to the end of the line.
// comment at beginning of line
rule = A | B ; // comment trailing a rule
These expressions match a single character and make up the core of USN.
Expression | Match Description | Example |
---|---|---|
#N |
The character with the hexadecimal value of N . |
#200E |
#[N-M] |
Any character within the closed range of N to M . |
#[30-39] |
§Prop |
Any character with the property Prop set to true. |
§ID_Start |
The compound expressions match a sequence of multiple characters.
Expression | Match Description | Example |
---|---|---|
'…' or "…" |
The sequence of characters within the quotes. | "struct" |
Rule |
The sequence of characters matched by the referenced rule Rule . |
literal |
(expr) |
The expression expr . Use parentheses to clear ambiguity. |
(A | B) |
The operators combine atomic and compound expressions to form the lexical structure of the described syntax.
Expression | Match Description | Short |
---|---|---|
A? |
The expression A or nothing. |
Optional A |
A+ |
One or more occurences of the expression A . |
1‥n times A |
A* |
Zero or more occurences of the expression A . |
0‥n times A |
A B |
The expression A followed by the expression B . |
Concatenation |
A | B |
Either the expression A or the expression B , exclusively. |
Alternation |
A - B |
The character sequence, that matches the expression A but not the expression B . |
Exception |
The order in which expressions are evaluated is defined by their individual precedence. Operators with equal precedence are evaluated from left to right.
- Atomics and compound expressions
- Unary operators:
?
,+
,*
- Binary operators:
|
,-
- Concatenation
Examples:
A | B?
= A | (B?)
A C - B
= A (B - C)
A?*
= (A?)*
A | B - C
= (A | B) - C
Author: Jesse Stricker
Version: 0.5.0