Skip to content

Instantly share code, notes, and snippets.

@jessestricker
Last active April 17, 2018 11:07
Show Gist options
  • Save jessestricker/f645d5533f01db68c4a61e37b1d5cf23 to your computer and use it in GitHub Desktop.
Save jessestricker/f645d5533f01db68c4a61e37b1d5cf23 to your computer and use it in GitHub Desktop.
Unicode Syntax Notification — Specification

Unicode Syntax Notification — Specification

TODO: Add introduction paragraph.

1 Term Definitions

  • 1.1 Character: The smallest possible matchable unit. Defined as a Unicode Scalar Value.
  • 1.2 Property: A binary character property as defined by Unicode. It's value is a closed range of all characters, which have this property set to true. The available properties are extraced from the UCD (DerivedCoreProperties.txt and PropList.txt).

2 Notation

The main part of USN are the rule definitions. A rule assigns an expression a name, so it can be referenced an in expression itself. The notation for rules is the following:

rule_name = expression ;

An expression can be formed from a set of atomic values, compounds and operators, which are elaborated in the next sections.

Comments may be used anywhere in USN to leave notes or information for readers. They begin with two forward slashes // and span to the end of the line.

// comment at beginning of line
rule = A | B ; // comment trailing a rule

2.1 Atomic Expressions

These expressions match a single character and make up the core of USN.

Expression Match Description Example
#N The character with the hexadecimal value of N. #200E
#[N-M] Any character within the closed range of N to M. #[30-39]
§Prop Any character with the property Prop set to true. §ID_Start

2.2 Compound Expressions

The compound expressions match a sequence of multiple characters.

Expression Match Description Example
'…' or "…" The sequence of characters within the quotes. "struct"
Rule The sequence of characters matched by the referenced rule Rule. literal
(expr) The expression expr. Use parentheses to clear ambiguity. (A | B)

2.3 Operator Expressions

The operators combine atomic and compound expressions to form the lexical structure of the described syntax.

Expression Match Description Short
A? The expression A or nothing. Optional A
A+ One or more occurences of the expression A. 1‥n times A
A* Zero or more occurences of the expression A. 0‥n times A
A B The expression A followed by the expression B. Concatenation
A | B Either the expression A or the expression B, exclusively. Alternation
A - B The character sequence, that matches the expression A but not the expression B. Exception

2.4 Operator Precedence

The order in which expressions are evaluated is defined by their individual precedence. Operators with equal precedence are evaluated from left to right.

  1. Atomics and compound expressions
  2. Unary operators: ?, +, *
  3. Binary operators: |, -
  4. Concatenation

Examples:
A | B? = A | (B?)
A C - B = A (B - C)
A?* = (A?)*
A | B - C = (A | B) - C

3 About

Author: Jesse Stricker
Version: 0.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment