Skip to content

Instantly share code, notes, and snippets.

@FeepingCreature
Last active March 2, 2023 13:17
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save FeepingCreature/09cc97d3478586d5cc323a0f2a3d62e7 to your computer and use it in GitHub Desktop.
Save FeepingCreature/09cc97d3478586d5cc323a0f2a3d62e7 to your computer and use it in GitHub Desktop.
SEML - Somewhat Easier YAML 1.0

SEML - Somewhat Easier YAML

This is a YAML variant intended to avoid having to read the absurdly large YAML spec. Every SEML document is also valid YAML, but not the reverse.

Each file corresponds to a JSON value. To avoid the Norway problem and similar issues, all values are parsed as strings.

This is version 1.0 of the SEML spec.

Grammar

ALNUM   := Unicode alpha, Unicode digit, underscore, dash
CHAR    := Everything except '\n'
WHITE   := ' '*
KEY     := ALNUM+
VALUE   := CHAR+
INDENT  := WHITE*
OBJECT_ENTRY := KEY ':' VALUE?
ARRAY_ENTRY  := '-' VALUE?
EMPTY_LINE   :=
COMMENT      := '#' VALUE?
LINE    := INDENT (OBJECT_ENTRY | ARRAY_ENTRY | EMPTY_LINE | COMMENT) '\n'
SEML    := LINE*

Parsing

Split the file into lines. Discard empty lines and comments.

Each line has an indentation, which is the number of spaces in front. The first line must have an indentation of 0.

Array entries are considered indented by one greater than the number of spaces.

The parsing operation

The parsing operation may recurse. As such, it may be parameterized with a minimum indentation. It consumes lines from the input text and returns a list of entries, with a length of at least one. Either all entries are object entries, or all are array entries; else it is an error.

The first line's indentation is the expected indentation. It must be greater than the minimum indentation, if one is passed.

Any line that has indentation greater than expected is an error.

Repeat while the current line's indentation is equal to the expected indentation:

If the current element has a value, remove whitespace from the front and back of the value and add the element to the list to be returned.

If the current element is an array element without a value, recurse with the expected indentation. Add an array of the returned elements to the returned list.

If the current element is an object element without a value, recurse with the expected indentation. Add an object element with the current element's key and an object of the returned elements to the returned list.

Example:

foo:
  bar: baz
  whee:
  - 1
  # 1 was too small.
  - 2
  -
    key: value

This corresponds to the JSON object:

{
  "foo": {
    "bar": "baz",
    "whee": [
      "1",
      "2",
      {
        "key": "value"
      }
    ]
  }
}
@FeepingCreature
Copy link
Author

Python parser:

import sys
import re

p = re.compile('(?P<indent> *)(' +
               '(?P<obj_key>[\w_-]+):(?P<obj_value>.+)?' +
               '|(?P<array_entry>-(?P<array_value>.+)?)' +
               '|(?P<comment>#.*)' +
               '|(?P<empty_line>)' +
               ')')

lines = [line.rstrip('\n') for line in sys.stdin]
def process(min_indent = -1):
    global lines
    expected_indent = None
    result = None
    while lines:
        line = lines[0]
        m = p.fullmatch(line)
        assert m, "line failed to parse: '" + line + "'"

        if m.group('comment') is not None or m.group('empty_line') is not None:
            lines = lines[1:]
            continue

        indent = len(m.group('indent'))
        if m.group('array_entry'):
            indent += 1

        if expected_indent:
            if indent < expected_indent:
                break
            assert indent == expected_indent, "wrong indent in '" + line + "'"
        else:
            assert indent > min_indent, "unexpected dedent in '" + line + "'"
            expected_indent = indent
        lines = lines[1:]

        if m.group('array_entry'):
            if result:
                assert isinstance(result, list)
            else:
                result = []
            if m.group('array_value'):
                result.append(m.group('array_value').strip())
            else:
                result.append(process(expected_indent))
        elif m.group('obj_key'):
            if result:
                assert not isinstance(result, list)
            else:
                result = {}
            if m.group('obj_value'):
                result[m.group('obj_key')] = m.group('obj_value').strip()
            else:
                result[m.group('obj_key')] = process(expected_indent)
    return result

print(process())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment