Skip to content

Instantly share code, notes, and snippets.

@sunfishcode
Last active January 24, 2024 15:18
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sunfishcode/0247c4168a940337111c4fe9ef947aae to your computer and use it in GitHub Desktop.
Save sunfishcode/0247c4168a940337111c4fe9ef947aae to your computer and use it in GitHub Desktop.
BigWave

BigWave

👋

BigWave is a hypothetical data serialization language for configuration files, using the datatypes of the Wasm Component Model. It extends the Wasm Value Encoding (WAVE) with syntax for quoteless strings, multi-line strings, and comments.

Example

Here's a translation of the KDL example into BigWave:

package({
  name: "my-pkg",
  version: "1.2.3",

  dependencies: {
    // A field containing a record with fields.
    lodash: { version: "^3.2.1", optional: true, alias: "underscore" }
  },

  scripts: {
    // Multi-line strings with embedded quotes are supported.
    build:
      > echo "foo"
      > node -c "console.log('hello, world!');"
      > echo "foo" > some-file.txt
  },

  // Values can span lines without doing anything special.
  the-matrix: [
     1, 2, 3,
     4, 5, 6,
     7, 8, 9,
  ],

  // "Slashdash" comments operate at the subtree level,
  // with just `/-`.
  /- this-is-commented {
    this: "entire",
    node: "is gone"
  }
})

WAVE

BigWave is a superset of WAVE:

Type Example Values
Bools true, false
Integers 123, -9
Floats 3.14, 6.022e+23, nan, -inf
Chars 'x', '☃︎', '\'', '\u{0}'
Strings "abc\t123"
Tuples ("abc", 123)
Lists [1, 2, 3]
Records {field-a: 1, field-b: "two"}
Variants days(30), forever
Enums south, west
Options "flat some", some("explicit some"), none
Results "flat ok", ok("explicit ok"), err("oops")
Flags {read, write}, {}

BigWave adds additional syntax for strings and comments.

Quoteless strings

A quoteless string is written as > followed by string contents up to the next newline. For example, a record containing a cmd field containing a string might look like:

{
    cmd: > cargo build --target=riscv64gc-unknown-linux-gnu
}

When used as a record value, the comma that would follow it is omitted:

{
   prep: > cargo clean
   cmd: > cargo build --target=aarch64-unknown-linux-gnu
}

Multi-line strings

A multi-line string is written as a sequence of lines containing whitespace followed by > followed by string contents. For example:

{
    cmd:
        > cargo build --target=riscv64gc-unknown-linux-gnu
        > cargo build --target=wasm32-wasi
        > echo "we're all done here"
}

This is parsed as a single string by BigWave; parsing it into multiple commands with command-line arguments may be performed by the command interpreter of the system they're run on.

Multi-line strings are syntactically similar to quoteless strings, but are a distinct syntax and always start on their own line. The lines of a multi-line string must be contiguous, with no blank lines between them.

When used as a record value, the comma that would follow a multi-line string is omitted:

{
   prep:
      > cargo clean
      > rm Cargo.lock
   cmd:
      > cargo build --target=aarch64-unknown-linux-gnu
}

Comments

// to the end of the line is a comment.

Also, /- followed by a top-level value, record field, tuple field, flag value, or list item, potentially extending over multiple lines, is a comment.

For example:

[
   this-is-here,

   // commented-out,

   /- also(
      "commented out!"
   ),
]

Guarding against file truncation

To protect against misinterpreting truncated data, BigWave requires data streams that don't end with }, ], ), ", or ' to end with a newline (U+A), and the last line must not contain a multi-line string. A blank line may be appended to avoid this condition. This is expected to be uncommon, and parsers can issue diagnosts telling users what to do.

For example:

> This is a multi-line string at the top level of the file, so
> it is followed by a blank line.

Missing

Wave's records follow Component Model records in not permitting multiple keys with the same name. YAML and KDL both support this, and it's commonly used in CI scripts, eg. with multiple run lines.

Maybe we could use suffixes, etc. run0, run1, etc., however there's also the awkwardness that the keys wouldn't necessarily be kept in order they appear in the syntax; they might get sorted by the order the keys appear in the type declaration in some contents.

Maybe we'd use a list of tuples of strings values instead of records for this, but that's less pretty:

[
    ("run",
        > echo "hello"
        > node -c "console.log('hello, world!');"
    ),
    ("run",
        > echo "more"
    )   
]

Inspirations

In addition to WAVE, the Wasm Component Model, WIT, and WAC, BigWave takes ideas from from from KDL, CommonMark, and TOML, as well as SDLang, JSON, and YAML.

I like a lot of the ideas in KDL, but chose to design a new language to more closely align the data model with the Wasm Component Model type system and to more closely align the syntax with Wit and WAVE. Also, in contrast to KDL's "CLI command" feel, BigWave's syntax emphasizes its declarative-data nature, which helps distinguish it from Claw, which is about programmability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment