Skip to content

Instantly share code, notes, and snippets.

@passcod
Last active October 6, 2021 17:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save passcod/f58eda5f99d2c9366333034832b53cbf to your computer and use it in GitHub Desktop.
Save passcod/f58eda5f99d2c9366333034832b53cbf to your computer and use it in GitHub Desktop.
HUG language

HUG

A function call syntax for KDL.

HUG is several things:

  • A strict KDL superset: all KDL is valid HUG.
  • A syntax and spec for calling functions in KDL documents.
  • A standard library of HUG functions.

Name

HUG Is Not An Acronym

  • it's pronounced like "hug"
  • the file extension is .hug
  • the plural is HUGs

Syntax

Format: \ NAME [ [ INPUT ] ], where:

  • NAME is a KDL bare identifier
  • INPUT is optional
  • the [] square brackets are required regardless
  • whitespace may surround the INPUT

Function calls may be preceded by a type annotation.

The INPUT is either:

  • a document
  • a heterogenous list of values or function calls, separated by commas

A function call can be present in the position of:

  • a node
  • a value

A function call cannot be:

  • the name of a node
  • the name of a property
  • a type annotation

There are functions in the standard library to provide for usecases which would need these.

As bare identifiers cannot contain \ nor [], there is no KDL syntax conflict.

Evaluation

There are two modes:

  • partial application evaluates a HUG document as far as possible, and outputs either:

    • a KDL document (when fully resolved), or
    • a HUG document (when partially resolved), with any unresolved functions left in.
  • erroring application does the same as partial application, but produces errors for any leftover/unresolved functions

Arguments

Function calls are passed these arguments:

  • The input document or list, as defined above.
  • The type annotation on the function call, if present.
  • The output context, as one of:
    • Node (1)
    • Value (2)
    • Input (3)

Application

A function is always only called with plain values: all function calls within its input must be fully resolved.

example attribute {
    (lucky)\sum[
        1, 2, 3,
        \double[2]
    ]
    
    \unknown[]
}

In the example above:

  1. double is called with arguments: input: [Value(2)], type: None, context: Input
  2. sum is called with arguments: input: [Value(1), Value(2), Value(3), Value(4)], type: Some(lucky), context: Node
  3. unknown is not a known function, and is left in place
  4. The document resolves to:
    example attribute {
        (lucky)"4"
        \unknown[]
    }
    
  5. In erroring mode, the following error is emitted: "unresolved function \unknown[] at line 7:4".

A function can either:

  • succeed, and output as below;
  • not apply, and remain as an unapplied function call;
  • error.

Output

A function can output:

  • a document
  • a function call
  • a single value
  • a heterogenous list of values or function calls

However, some output types are restricted by the context argument:

  • with context: Node, only documents and function calls can be produced;
  • with context: Value, only single values and function calls can be produced.

Outputs are merged into their surroundings in different ways. Generally, the output of a function call replaces it as if it had been written in the call's place. Except:

  • with context: Node and a document output: all top-level nodes in the document are inserted in the place of the function call.
  • with context: Input and a list of values output: the list is unpacked (1, 2, \id[3, 4], 5 resolves to 1, 2, 3, 4, 5).

Standard library

TODO

\base64_decode[string] -> string

Takes a string of Base64 data. The string may span multiple lines: inner whitespace will be ignored. The Base64 alphabet is as in IETF RFC3548.

quote text=\base64_decode["
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24
gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2
YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZ
WRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
"]

// resolves to:
quote text=\base64_decode["TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4="]

// or (implementation dependent):
quote text="Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."

\base64_encode[string] -> string

Takes a string and returns its Base64 representation. The Base64 alphabet is as in IETF RFC3548. The output will have no newlines or whitespace.

quote text=\base64_encode["Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."]

// resolves to:
quote text="TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4="
@agathazeren
Copy link

agathazeren commented Aug 10, 2021

Type expressions are either:

  • unions: TYPE | UNION (string|bool means "either string or bool")
  • negations: ~ UNION (~bool|string means "any type except bool and string")

Currently this syntax does not allow you to negate a single type, only a union.

Also, I'd like to question the rationale for why the type syntax is needed. What is the use case?

@passcod
Copy link
Author

passcod commented Aug 10, 2021

Type expressions are either:

  • unions: TYPE | UNION (string|bool means "either string or bool")
  • negations: ~ UNION (~bool|string means "any type except bool and string")

Currently this syntax does not allow you to negate a single type, only a union.

Oh yeah, good point. I'll fix that.

Also, I'd like to push for a rationale for why the type syntax is needed. What is the use case?

  • Firstly, dry-runnable correctness. I want it to be relatively easy to assert that a configuration is valid without passing it to the actual program, or at least without dry-running it in the production environment (or a facsimile). Think nginx: you can run nginx -t, but because the validity of the config is dependent on the environment, you're often going to get an error unless you install the config in staging, test it, then carefully alter it for prod, dry run it, and finally apply it. With a HUG config, you should be able to determine most of the validity in dev, test it in staging, and only modify variables for prod (or more likely, have the config do \=variables.hug).
  • Inputs: similar to HCL's variable {} blocks. The idea for why a type is needed there is that person A might write the config, and person B will use or apply it, so there's little to no shared cognition of the requirements. HCL also has descriptions here, but HUG doesn't (yet? not sure).
  • Preludes: An application could include a prelude with predefined structures, or use the user-provided config as a prelude, or something else similar. E.g.
// user provides:
\name="Agatha"

// application appends:
person {
  name \name:string
}

// resolves to:
person {
  name "Agatha"
}

or

// application prepends:
\person= person {
  name \name:string
}

// user provides:
team {
  member {
    \name="Félix"
    \person
  }

  member {
    \name="Agatha"
    \person
  }
}

// resolves to:
team {
  member { person { name "Félix" } }
  member { person { name "Agatha" } }
}

which is a bit awkward to use, now that I write it out.

With an apply function that takes a block with the last element being a node, and returns that node:

// application prepends:
\person= person {
  name \name:string
}

// user provides:
team {
  \apply[{
    \name="Félix"
    \person
  }]

  \apply[{
    \name="Agatha"
    \person
  }]
}

// resolves to:
team {
  person { name "Félix" }
  person { name "Agatha" }
}
  • Similarly with imports as "modules", one can imagine third-party configuration blocks which would benefit from the correctness of typing.

I also made a deliberate decision not to use <> for anything yet, so it could possibly be used for composite types in the future, though I didn't have a good design in mind. But I still wanted Options, so type|null was a decent compromise on that.


That said, instead of types, a more powerful system could be to have a stdlib assert function of some kind, which fails the config given a condition, and a set of condition functions such as is_bool, is_string, gt, any, all, starts_with, etc. That could be nicer. I'll have a think

@passcod
Copy link
Author

passcod commented Oct 6, 2021

Revised/simplified on 7 October 2021, given changes in KDL 1.0.0:

  • no more additional primitives (use KDL type annotations)
  • no more type system
  • no more placeholders and variables
  • only functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment