Skip to content

Instantly share code, notes, and snippets.

@evincarofautumn
Last active April 26, 2024 08:08
Show Gist options
  • Save evincarofautumn/a4290c5fddecd22b03c6be0d106b3988 to your computer and use it in GitHub Desktop.
Save evincarofautumn/a4290c5fddecd22b03c6be0d106b3988 to your computer and use it in GitHub Desktop.

Grammar of Kitten

This is an annotated EBNF grammar of the latest rewrite of Kitten. Items marked with a dagger (†) are explained in the surrounding prose, not in EBNF.

Source Locations

The source column of a token is the offset to its first character within the line in which it appears. The indentation of a line is the source column of the first token in the line. Tab characters are illegal.

Layout

Blocks may begin with a left curly brace ({) and end with a right curly brace (}), or begin with a colon (:) and end at the first token whose source column is less than the indentation of the line containing the colon token.

<block-begin> = "{" | ":" …†
<block-end> = "}" | …†

Programs

A program consists of zero or more vocabularies containing program elements.

<program> = <vocabulary>*

<vocabulary>
  = "vocab" <qualified-name> ";" <program-element>*
  | "vocab" <qualified-name> <block-begin> <program-element>* <block-end>

<program-element>
  = <declaration-element>
  | <definition-element>
  | <metadata-element>
  | <synonym-element>
  | <import-element>
  | <export-element>
  | <term>

Notably, term elements are composed as if they all appeared concatenated together, so local variables introduced before an intervening non-term program element are in scope after that element:

1 -> x;

define f (->) {}

x say

Declarations

A declaration may introduce either a trait or an intrinsic. A trait gives the signature of an overloaded word, which may have multiple instance definitions. An intrinsic gives the signature of a word whose definition is provided by the compiler.

<declaration-element> = <trait-declaration> | <intrinsic-declaration>

<trait-declaration> = "trait" <unqualified-name> <signature>
<intrinsic-declaration> = "intrinsic" <unqualified-name> <signature>

Examples

trait + <T> (T, T -> T)

intrinsic add_int32 (Int32, Int32 -> Int32)

Definitions

A definition may introduce the definition of a word, instance, or permission. A word definition gives a name to a function. An instance definition gives a particular implementation of a trait. A permission definition gives a name to a function that grants permission to do something.

<definition-element> = <word-definition> | <instance-definition> | <permission-definition>

<word-definition> = "define" <unqualified-name> <signature> <block>
<instance-definition> = "instance" <unqualified-name> <signature> <block>
<permission-definition> = "permission" <unqualified-name> <signature> <block>

Examples

define add3 (Int32, Int32, Int32 -> Int32):
  (+) (+)

instance + (Option<Int32>, Option<Int32> -> Option<Int32>):
  -> mx, my;
  match (mx) case some:
    match (my) case some:
      (+) some
    else:
      drop none
  else:
    none

permission Locked<R..., S...> (R..., (R... -> S... +Locked) -> S... +IO):
  take_lock
  with (+Locked)
  release_lock

Metadata

Metadata is structured data attached to a program identifier. It consists of a key-value mapping from identifiers to arbitrary untyped Kitten terms. Some metadata is used by the compiler, such as the operator field for infix operator declarations, or the inline field for controlling inlining optimisation. However, metadata is generally intended for consumption by external tools. For example, metadata avoids ad-hoc commenting conventions for documentation generators—the generator can simply use the compiler API to read documentation text from the docs field of a definition’s metadata.

<metadata-element> = "about" <unqualified-name> <block-begin> <metadata-field> <block-end>

<metadata-field> = <unqualified-name> <block-begin> <term>* <block-end>

Examples

about +:
  docs: """
    Adds two values of the same type, producing a result of that type.
    
    The operation of an additive group with `zero` as the identity.
    """
  operator:
    left 6

Synonyms and Imports

A synonym introduces a symbolic link to an identifier. Synonyms may refer to words, types, or vocabularies.

<synonym-element> = <word-synonym> | <type-synonym> | <vocabulary-synonym>

<word-synonym> = "synonym" <unqualified-name> <qualified-name>
<type-synonym> = "type" "synonym" <unqualified-name> <qualified-name>
<vocabulary-synonym> = "vocab" "synonym" <unqualified-name> <qualified-name>

Synonyms are differentiated by type to add a bit of redundancy, improving legibility and the compiler’s ability to report errors.

An import is syntactic sugar for a series of synonyms:

<import-element> = "import" <block-begin> <import-name>* <block-end>

<import-name>
  = <qualified-name>
  | "type" <qualified-name>
  | "vocab" <qualified-name>

Notably, there is no way to import the entire contents of a vocabulary. This makes it possible to determine the origin of any identifier by grepping, and prevents code from breaking when dependencies add new exports.

Examples

import:
  foo::bar
  type foo::baz
  vocab foo::quux

synonym bar foo::bar
type synonym baz foo::baz
vocab synonym quux foo::quux

Terms

A term denotes an operation on the stack, local variables, or program counter.

<term>
  = <literal-term>
  | <word-term>
  | <section-term>
  | <group-term>
  | <vector-term>
  | <lambda-term>
  | <match-term>
  | <if-term>
  | <do-term>
  | <block-term>
  | <with-term>
  | "call" | "jump" | "return" | "loop"

A literal term denotes a literal value that is pushed to the stack.

<literal-term>
  = <character-term>
  | <float-term>
  | <integer-term>
  | <text-term>

A word term denotes an invocation of a word.

<word-term> = <qualified-name>

A section term denotes the partial or postfix application of an infix operator.

<section-term>
  = "(" <operator-name> <term>+ ")"
  | "(" (<term> - <operator-name>)+ <operator-name> ")"
  | "(" <operator-name> ")"

A right section (where the right operand is bound) desugars to a postfix call:

(/ 5)

5 (/)

A left section (where the left operand is bound) desugars to a swap followed by a postfix call:

(5 /)

5 swap (/)

And of course, fully applied infix operators also desugar to postfix calls, as postfix is the fundamental form:

(a + b)

a b (+)

A group term is a term wrapped in parentheses.

<group-term> = "(" <term>+ ")"

This can be used to override operator precedence, or simply to add redundancy, making the arities of words more explicit.

A vector term denotes a literal array. Elements are separated by commas (,), and a trailing comma is permitted but not required. The redundancy of commas can improve error reporting, and the admission of a trailing comma can improve textual diffs.

<vector-term> = "[" (<term>+ ",")* <term>* "]"

A lambda term denotes moving one or more values from the stack into local variables. This is in fact identical to a lambda form in lambda calculus, but it’s fashioned to resemble the introduction of an ordinary lexically scoped variable. A lambda term may use an underscore (_) as the name of a variable to denote discarding a value. Note that this affects destruction semantics: if x is of type T and T has an implementation of Destroy, then -> x; f calls destroy after the call to f, but -> _; f calls destroy before f.

<lambda-term> = "->" <lambda-variable> ("," <lambda-variable>)* ";" <term>*

<lambda-variable> = <unqualified-name> | "_"

A match term denotes the inverse of a data type constructor: whereas a constructor takes field values from the stack and produces a value of some data type, a match expression deconstructs a value, placing its fields on the stack. A match may have any number of case branches and at most one else branch—if no else branch is present, it’s equivalent to else { abort }.

<match-term> = "match" ("(" <term>* ")")? <match-case>* <match-else>?

<match-case> = "case" <unqualified-name> <block>
<match-else> = "else" <block>

The matched value may be given as part of the match term, or supplied implicitly from the stack:

match (x)
case foo { … }
case bar { … }

x
match
case foo { … }
case bar { … }

An if term denotes matching on a Boolean expression.

<if-term> = "if" ("(" <term>* ")")? <block> ("elif" "(" <term>* ")" <block>)* ("else" <block>)?

If the else branch is omitted, it’s equivalent to else {}. As for match, the condition may be given as part of the if term, or supplied implicitly from the stack.

if (A) { B } elif (C) { D } else { E }
match (A) case true { B } case false { match (C) case true { D } case false { E } }

if (A) { B } else { C }
if (A) { B }
if { A }

A do term is syntactic sugar for using higher-order functions as prefix control flow operators like match and if.

<do-term> = "do" "(" <term>+ ")" <block>

do (A) { B } is equivalent to simply { B } A.

[1, 2, 3]
do (map) -> x:
  x say
  (x + 1)

A block term denotes a quotation.

<block-term> = <block> | "->" <lambda-variable> ("," <lambda-variable>)* <block>

A lambda coupled with a block is sugar for a lambda within a block. This is particularly useful for control flow operators.

-> x, y { f x y g }
{ -> x, y; f x y g }

match (optional_x)
case some -> x:
  …
case none:
  …

xs do (each) -> x:
  x say

xs \say each

A with term denotes calling a function with altered permissions.

<with-term> = "with" "(" <with-permit>* ")"

<with-permit> = ("+" | "-") <qualified-name>

with can be used to (safely) revoke permissions or (unsafely) grant them.

define foo (Int32, (Int32 -> Int32) -> Int32 +IO):
  -> x, f;
  "begin" say

  // Using a +IO function within this block is a type error.
  do (with (-IO)) {
    x f call
  }

  "end" say

// The only safe implementation of this function is "drop"...
define trace (Text ->):
  // ...but here we unsafely grant the IO permission.
  \say with (+IO)

A call term denotes applying the function atop the stack to the rest of the stack. Its type is <R..., S..., +P> (R..., (R... -> S... +P) -> S... +P) and it’s equivalent to with ().

A jump term denotes a tail-call to the function atop the stack.

A return term denotes a jump to the end of the current function.

A loop term denotes a jump to the start of the current function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment