This is a YAML variant intended to avoid having to read the absurdly large YAML spec. Every SEML document is also valid YAML, but not the reverse.
Each file corresponds to a JSON value. To avoid the Norway problem and similar issues, all values are parsed as strings.
This is version 1.0 of the SEML spec.
ALNUM := Unicode alpha, Unicode digit, underscore, dash
CHAR := Everything except '\n'
WHITE := ' '*
KEY := ALNUM+
VALUE := CHAR+
INDENT := WHITE*
OBJECT_ENTRY := KEY ':' VALUE?
ARRAY_ENTRY := '-' VALUE?
EMPTY_LINE :=
COMMENT := '#' VALUE?
LINE := INDENT (OBJECT_ENTRY | ARRAY_ENTRY | EMPTY_LINE | COMMENT) '\n'
SEML := LINE*
Split the file into lines. Discard empty lines and comments.
Each line has an indentation, which is the number of spaces in front. The first line must have an indentation of 0.
Array entries are considered indented by one greater than the number of spaces.
The parsing operation may recurse. As such, it may be parameterized with a minimum indentation. It consumes lines from the input text and returns a list of entries, with a length of at least one. Either all entries are object entries, or all are array entries; else it is an error.
The first line's indentation is the expected indentation. It must be greater than the minimum indentation, if one is passed.
Any line that has indentation greater than expected is an error.
Repeat while the current line's indentation is equal to the expected indentation:
If the current element has a value, remove whitespace from the front and back of the value and add the element to the list to be returned.
If the current element is an array element without a value, recurse with the expected indentation. Add an array of the returned elements to the returned list.
If the current element is an object element without a value, recurse with the expected indentation. Add an object element with the current element's key and an object of the returned elements to the returned list.
foo:
bar: baz
whee:
- 1
# 1 was too small.
- 2
-
key: value
This corresponds to the JSON object:
{
"foo": {
"bar": "baz",
"whee": [
"1",
"2",
{
"key": "value"
}
]
}
}
Python parser: