Skip to content

Instantly share code, notes, and snippets.

@nsf
Last active November 22, 2015 00:44
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nsf/17175498d899955ea6c6 to your computer and use it in GitHub Desktop.
Save nsf/17175498d899955ea6c6 to your computer and use it in GitHub Desktop.
Sxy

Sxy file format

The format is based on modern S-expression notation used widely today, for example in lisp family of programming languages. However the format is redefined from the ground up and is not compatible with any existing formats (unless unintentionally so).

Text encoding requirement

Format is defined in terms of ASCII byte values. Any ASCII-compatible encoding will work. The input is steam of values, not bytes, hence encodings like UTF-32 may work as well. Preferred encoding is UTF-8, but it's not required.

Lexical elements

Space character

Space character is defined as one of: \r, \n, \t, <space>.

Non-scalar character

Non-scalar character is defined as one of: \r, \n, \t, ", (, ), ;, <backquote>. It is possible to escape grave accent mark in markdown, but I don't do that and use <backquote> instead.

String literal

String literals are defined with minimal amount of escape sequences. Some escape sequences are there simply for readability purposes.

String literal starts with a quotation mark " and ends with a quotation mark ". String literal may contain one or more: valid escape sequence or any other byte, except \n and ".

Valid escape sequences are:

  • \r - is converted to 0x0D byte
  • \n - is converted to 0x0A byte
  • \t - is converted to 0x09 byte
  • \\ - is converted to 0x5C byte
  • \xHH - is converted to 0xHH byte, H is a valid hex digit, upper-case or lower-case

Invalid escape sequence is an error and should not be allowed.

Uninterpreted string literals

Uninterpreted string literal starts with a <backquote> and ends with a <backquote>. You can use any byte in-between, except \n and <backquote>. There are no escape sequences. Uninterpreted strings are useful to represent regular expressions and file paths on some operating systems.

Example: `C:\Program Files\ABC\Data`

Multi-line string literals

Multi-line string literal is a special lexical element which contains a set of raw strings. Multi-line string literal starts with triple <backquote> and ends with triple <backquote>. However lines defined by it can also contain triple <backquote> if necessary. How does it work? Within a multi-line string literal, a line starts with a | character followed by an optional <space> and lasts to the first \n. The optional <space> is not included. So this string literal:

```
| Greetings, {{name}}.
|
| Welcome to this wonderful place called ```home```
```

Yields:

Greetings, {{name}}.

Welcome to this wonderful place called ```home```

As you can see this scheme allows absolutely any character inside a multi-line string. You can even have a multi-line string inside a multi-line string. Because it's a simple convetion, you take a line, you strip everything up to first | and optional <space> and this is your new line. Nothing new in fact, inspired by comment syntax in many languages which allows anything inside of a comment line.

Scalar

Scalar starts with a first non-scalar character and ends with a last non-scalar character.

List

List may contain scalars, strings or other lists. List starts with an opening parenthesis ( and ends with a closing parenthesis ). You can use space characters as separators for list elements, but it's not required in some cases. For example:

hello(iam"John")world

is a valid sequence of a scalar hello, a list with two elements iam (scalar) and John (string) followed by a scalar world. While this form is allowed by definition, it's not recommended. Please, use at least a single space character to separate list elements from each other. A preferred way to write the example above is:

hello (iam "John") world

Comment

Comment starts with a semicolon ; and ends with a newline byte \n. Anything in-between is allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment