Skip to content

Instantly share code, notes, and snippets.

@evincarofautumn
Last active July 29, 2020 19:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save evincarofautumn/bd40a6ecb9f0e89aa766b9870e2afa9e to your computer and use it in GitHub Desktop.
Save evincarofautumn/bd40a6ecb9f0e89aa766b9870e2afa9e to your computer and use it in GitHub Desktop.
Programming Language Idea Shelter

Programming Language Idea Shelter

This is a list of notations and semantics I’ve wanted to use or have used in toy programming languages that I think are particularly nice but rare, and that I’d like to see in more languages. If you’re a PL dev and you fall in love with one, adopt it and let me know!

Unary Relational Range Operators

Most languages use the standard relational operators for comparisons (>, <, />=, /<=/=<, =/==/===, /!=//=/<>), but these are almost always presented as binary operators for comparing two values. Some languages (of which Python is the most notable) include “chained” comparisons like a <= b < c, meaning a <= b and b < c (but with b only evaluated for its side effects once, that is, (lambda x: a <= x and x < c)(b)). Some languages also allow unary versions of these operators, but they rarely denote anything related to comparisons. Some languages such as Haskell allow arguments to be omitted from operators via operator sections, denoting partially applied comparisons, such as (< 3) :: (Num a, Ord a) => a -> Bool

The idea of unary relational range operators is to introduce unary versions of these operators denoting ranges of values, which can be combined with union, intersection, and difference, or used to represent interval arithmetic. They also provide a more explicit way to talk about whether a range is inclusive or exclusive of its bounds.

Examples

Below I use C-like notations, but the specific lexical operator spellings aren’t important, just the grammatical syntax and semantics.

  • Construction

    • <x: open interval (−∞, x)
    • >x: open interval (x, +∞)
    • <=x: closed interval (−∞, x]
    • >=x: closed interval [x, +∞)
    • ==x: point interval [x, x]
    • !=x: inverse point interval (−∞, x) ∪ (x, +∞)
    • empty: empty interval ∅
  • Inversion, intersection, union, and difference

    • !<x = >=x
    • !>x = <=x
    • !<=x = >x
    • !>=x = <x
    • !==x = !=x
    • !!=x = ==x
    • >=x && <y: half-open interval [x, y), empty if x ≥ y
    • >x && <x: empty interval ∅
    • !(>=x && <y) = <x || >=y
    • >x || >y = >min(x, y)
    • >x \ >y: difference; (x, y] if x < y, empty if y ≤ x
  • Tests

    • (<3) u = u < 3: overload function application on ranges so they can be used like operator sections

    • timing_window = >=earliest && <=latest; if (timing_window) { … } = if (timing_window != empty): an interval is truthy if it’s nonempty

    • is_teenager = age && (>=13 && <=19) = is_teenager = (==age && >=13 && <=19) != empty: inclusion can be tested with intersection, automatically promoting the value x to the range [x, x] (==x)

    • bool is_alpha_lower(char x) { return x && (>='a' && <='z'); } = return (x >= 'a' && x <= 'z'): ranges may be used on any type with a defined ordering

    • switch (value) {
      case ==-1:
        use_default();
        break;
      case >=0 && <=3:
        valid();
        break;
      default:
        error();
        break;
      }
      

      range inclusion may be tested implicitly in case conditions

  • Arithmetic

    • (>x && <y) + z = >(x + z) && <(y + z): addition and subtraction translate a range
    • (<x || >y) * z = <(x * z) || >(y * z): multiplication and division scale a range
  • Comparisons

    • (>=x && <y) > z: true if all values in [x, y) are greater than z
    • <x < <y: true if all values in (−∞, x) are less than all values in (−∞, y), i.e., x < y
    • >x == >y: true if for all z in (x, +∞), z is in (y, +∞) and vice versa, i.e., the ranges include exactly the same values
    • <=x != <=y: true if there exists some z in (−∞, x] not in (−∞, y] or vice versa, i.e., the ranges do not include exactly the same values

Iteration Adverbs

In most languages, iteration over a data structure is either explicit by way of loops, mapping functions, or operators; some languages like the APL family make iteration implicit in operations. Iteration adverbs provide a middle ground, where there is no need to explicitly write out an indexed loop or use a combinator to lift a function over a structure, but the form of iteration is still made explicit.

This is much closer to how natural languages tend to express iteration: rather than phrase “increment the score of every player by 100” as foreach (player : players) { player.score += 100; }, it can be denoted as (each players).score += 100;. There are many natural terms to use for keywords here: each, every, all, some, none, which, where, count, &c.

There is a question of the scope of iteration—is f(g(each xs)); intended to mean map(\x -> f(g(x)), xs);, with the scope as large as possible, or f(map(g, xs));, as small as possible? I think the better approach is to make the scope as large as possible (for example, a whole statement) unless constrained by a delimiter. So f(g(each xs)); means map(\x -> f(g(x)), xs);, but f(iter g(each xs)); means f(map(g, xs));, with the scope of each constrained by iter. This is related to the compose/apply distinction in Functional Message Passing. Note that this structure isn’t capable of representing all iteration patterns, it’s just intended to subsume the most common patterns. For complex iteration with multiple overlapping but distinct iteration scopes, it’s better to write an explicit loop or comprehension, or factor out subexpressions into separate statements.

Examples

  • Zipping
    • f(each xs) = map f xs
    • f(each xs, each ys) = zipWith2 f xs ys
    • f(each xs, ys, each zs) = zipWith2 (\ x z -> f(x, ys, z)) xs zs
    • f(each xs, each ys, each zs) = zipWith3 f xs ys zs
  • Cartesian product
    • f(every xs) = map f xs
    • multiplication_table = every [1..9] * every [1..9] = (*) <$> [1..9] <*> [1..9]
  • Folds/reductions and filters
    • if ((all employees).age >= 21) { serve_alcohol(); } = all (\ e -> e.age >= 21) employees
    • new_users = today - (which users).account_creation_date <= one_week = filter (\ u -> today - u.account_creation_date <= one_week) users
    • score = 10 * (count enemies).dead = sum (map (\ e -> 10 * (if e.dead then 1 else 0)) enemies)
    • (where xs) < 0 = [i | i <- [0..] | x <- xs, x < 0]: selecting the indices in a sequence where a condition is true
    • (where products).price >= $10 = Map.keys (Map.filter (\ v -> v >= $10) products): selecting the keys in a map where a condition is true

Markdown-compatible Syntax

This is more of a whole-language or literate programming feature: a language whose syntax constitutes valid Markdown, or another lightweight markup language like ReStructuredText, allowing programs to be rendered as nicely typeset documents. This could be as shallow as treating the whole file as a comment and entering code in code blocks, or as deep as making use of Markdown headings and formatting as syntactic program elements.

Examples

A heading could denote a namespace or module, keywords could be stropped in bold, blockquotes could denote nesting, and so on:

module Main

define main():

print “Hello, world!”

Using a more capable document format like RTF would introduce more complications (especially compatibility between editors), but would also open up countless possibilities. Imagine embedding diagrams directly in comments, or data files directly in code with OLE or similar techniques.

ASCII Math Notation

Programming languages conventionally stick to ASCII for ease of entry. There are plenty of exceptions in languages that allow but don’t mandate Unicode, such as Haskell, or languages that do mandate it, such as APL or Fortress. We therefore use ASCII digraphs, trigraphs, &c. to represent mathematical symbols. Some of these spellings that are “obvious” to me are either unused or very rare.

Examples

  • @ = : for all, since they’re both funny-looking As

  • # = , #! = ∃!: there exists (exactly one) as a squarish number sign (pound sign, hash, octothorpe)

  • [] = : necessarily modality

  • <> = : possibly modality

  • |- = , -| = , |= = , =| = : right and left tacks/turnstiles (judgement, entailment, modelling)

  • _|_ = , ^|^ = : up and down tacks for bottom (minimum, unit of union) and top (maximum, unit of intersection)

  • -o = : multimap/lollipop arrow for linear functions

  • /\ = / / : using the backslash for its original purpose: logical AND (intersection, meet, greatest lower bound)

  • \/ = / / : likewise, for logical OR (union, join, least upper bound)

  • |-> = : rightwards arrow from bar for “maps to” (function, substitution)

  • ~> = : wavy rightwards arrow for “leads to” (reachability)

  • ==> = / : long rightwards arrow for implication

  • < = , > = , <= = , >= = : (strict) round subset and superset

  • [ = , ] = , [= = , ]= = : (strict) square subset and superset

  • <-> = / , <=> = / : material equivalence

  • === = : identical

  • . = ·: multiplication, dot product, logical AND

  • || / // = : parallel, logical OR

  • (+) = , _\/_ = : exclusive OR, disjoint union, additive disjunction

  • (*) = : tensor product, multiplicative conjunction

  • := = , ::= = : definition, grammar production

  • :. / .: / .'. = , '.' = : therefore, because

  • +- = ±, -+ = : plus or minus, minus or plus

  • x = ×: multiplication, cross product

  • <><> = : infinity

  • <>< = : proportional to

  • <| = , |> = : triangles, markers

  • =/= = : not equal

  • ~~ = : almost equal

  • << = , <<< = , >> = , >>> = : (very) much less/greater than

  • ~< = , >~ = : precedes or succedes

  • |_ _| = : floor, minimum

  • |^ ^| = : ceiling, maximum

  • [x, y], [x, y), (x, y], (x, y): closed, half-open, and open intervals/ranges

  • \ = : set difference

  • |B = 𝔹: double-struck B, Booleans

  • |C = : double-struck C, complex numbers

  • |Z = : double-struck Z, integers

  • <- = , -> = : element of, contains

  • </- = , -/> = : not an element of, does not contain

  • () / . = : composition, product

  • |Q = : double-struck Q, rationals

  • A_i = Aᵢ: subscript

  • A^i = Aⁱ: superscript

  • |>< = , ><| = : semidirect product, semijoin

  • |><| / >< = : bowtie, natural join

  •          ________
            / 2
    -b +- \/ b  - 4ac
    -----------------
            2a
    
    rotate(theta, x, y) = {
      let rotated = [ (+cos theta) (-sin theta) ] [ x ]
                    [ (+sin theta) (+cos theta) ] [ y ];
      return (rotated[0][0], rotated[0][1]);
    }
    

    2D notation (see Toody for examples of how to parse this)

Anonymous Values

Typically, in a typed language with judicious use of strong type aliases (cf. Haskell newtype rather than type synonyms), especially in monomorphic functions implementing business logic, there is only one value of a given type in scope. Therefore these values needn’t be given names; they can be referred to by their type, and this will refer to the nearest bound value by that type in scope. This can be introduced with explicit syntax (e.g. the T) or left implicit (just T), but in the latter case, some disambiguation may be required when you want to refer to a type itself (type T).

In a concatenative language, the(T) could be used to refer to the topmost element on the stack of type T, giving a record-like semantics to the stack, indexed by type.

Examples

Explicit

// These function parameters introduce two bindings,
// of type ‘Function(Pixel, Pixel)’ and
// ‘Pixel[][]’ (a 2D array).
procedure transform(Function(Pixel, Pixel), Pixel[][]) {

  // This loop introduces a binding of type ‘Pixel[]’
  // for each row.
  foreach (the Pixel[][]) {

    // And this one, a binding of type ‘Pixel’ for
    // each column in that row.
    foreach (the Pixel[]) {

      // Assuming reference semantics, this mutates the
      // pixel using the function. Note that the type
      // parameters of ‘Function’ can be inferred, since
      // there’s no ambiguity as to the nearest binding.
      Pixel = the Function(the Pixel);

    }

  }

}

Implicit

procedure transform(Function(Pixel, Pixel), Pixel[][]) {
  foreach (Pixel[][]) {
    foreach (Pixel[]) {
      Pixel = Function(Pixel)
    }
  }
}

Ad-hoc Types

In order to add lightweight disambiguation to types without introducing a separate strong type alias with e.g. newtype or single-field struct, types of parameters could be defined inline, and referred to as if they were namespaced to the function (or implicitly available in calls to that function).

Examples

// Just for illustration; in a file handle API, these types
// might want to be shared. But they could also be shared
// implicitly by structural typing!
function open(
  path: struct FilePath { path_string: String },
  mode: enum OpenMode { Read, Write, ReadWrite, Append }
): Handle {
  …
}

// Implicitly ‘open::FilePath { path_string: "foo.txt" }’
// and ‘open::Read’, since we’re in the scope of ‘open’.
var input = open("in.txt", Read);

var path: open::FilePath
  = open::FilePath { path_string: "out.txt" };
var mode: open::FileMode = open::Write;
var output = open(path, mode);

Operator Precedence Grammar

This one is more of a technique for syntax design than a specific syntax. Perhaps surprisingly, you can design an entire language so that it can be parsed with an operator precedence table like a Pratt parser, and still have it look fairly conventional. This of course works especially well in expression-oriented languages, but it can be made to work with statements as well. I’ll focus on expressions here.

Examples

ifelse… syntax can be implemented by making if a binary prefix operator accepting two expressions: the condition and the “then” branch, and else an infix operator of lower precedence; if returns true or some “success” value if the condition was true, or false (resp. “failure”) otherwise, then else evaluates its right operand if its left operand failed. I make a point of including this concept of success and failure as distinct from Booleans because it makes else considerably more useful: f() else x can be implemented in such a way that if f() raises an error (by throwing an exception or otherwise), then else returns x, that is, this is a nice way to capture the common “try or return default” pattern:

// Strawman syntax for a sum type (cf. Rust’s ‘enum’).
type Optional<T> {
  case None;
  case Only(T);
}

// If ‘expr’ returns successfully, its result is wrapped in
// ‘Only’; if it fails, ‘catch’ returns ‘None’ instead.
function catch_optional<T>(lazy expr: () -> T): Optional<T> {
  return Only(expr()) else None;
  // or: return catch_default(Only(expr()), None);
}

print(catch_optional(4 / 2));  // Only(2)
print(catch_optional(1 / 0));  // None

function catch_default(lazy expr: () -> T, lazy default: () -> T): T {
  return expr() else default();
}

print(catch_default(4 / 2, -1));  // 2
print(catch_default(1 / 0, -1));  // -1

With careful arrangement of precedences, fixities, and associativities, this can even gracefully support multiple uses of the same symbol, such as infix if & unless (x if y = if y x), or infix x - y for subtraction vs. prefix -x for negation.

This also makes it very straightforward to add user-defined operators and even general syntax: just modify the precedence table accordingly.

Preprocessor Features for Literate Programming

Exclusion

  • Main.hs

    import Item
    
    main = do
      putStrLn "Hello, what's your name?"
    #exclude "Item.hs"
      module Item where
      type Item = String
    #endexclude
      let
        test :: Item
        test = "beans"
      putStrLn test
    

  • Main.hs

    import Item
    
    main = do
      putStrLn "Hello, what's your name?"
      let
        test :: Item
        test = "beans"
      putStrLn test
    
  • Item.hs

    module Item where
    type Item = String
    

Alternate Syntax Modes

Languages are often used in conjunction with tools like CLI applications and LSP servers that use different notation than the language itself, but want to communicate about the same things. For example, command-line flags can be considered an alternate UI for a compiler API that happens to be more convenient and familiar to type at a terminal than the language itself. A language can offer alternate syntax modes to provide the same semantics in a different context.

Concerns include escaping and making the syntax fit naturally into the host environment. You can always do compiler 'compiler::add_path("foo.ext"); compiler::compile();' but that’s not convenient or what users expect in terms of UI idioms.

CLI Mode

Convert command-line flags into function calls within a particular default namespace, referencing the compiler API.

compiler main.ext --input lib.ext -i util.ext -O2 -Wall,error --profiling -o program -vvv

// If ‘--input’ is set as the default handler for a string argument.
compiler::input("main.ext");

// ‘--input’ may of course be specified explicitly.
compiler::input("lib.ext");

// If ‘-i’ is an abbreviation for ‘--input’.
compiler::input("util.ext");

// If ‘-O’ = ‘--optimize’.
// ‘2’ is parsed as a number per the type of ‘optimize’.
compiler::optimize(2);

// If ‘-W’ = ‘--warn’.
// Parameters are parsed as an array of strings per the type of ‘warn’.
compiler::warn(["all", "error"]);

compiler::profiling();

// If ‘-o’ = ‘--output’.
compiler::output("program");

// If ‘-v’ = ‘--verbose’ and clustering is enabled.
compiler::verbose();
compiler::verbose();
compiler::verbose();

This allows user-defined extensions either inline or in a separate extensions script.

  • cli.ext

    // ‘compiler hlep’
    compiler::cli::add_command("hlep", compiler::cli::flags::help);
    
    function dump_symbols() {
      var symbols = compiler::api::list_symbols_recursive(compiler::api::root_namespace);
      for (var symbol : symbols) {
        var found_definition = compiler::api::get_definition(symbol);
        match (found_definition) {
          case None => continue;
          case Only(definition) => print(definition.source_location, ": ", symbol);
        }
      }
    }
    
    // ‘compiler main.ext -o test --dump-symbols’
    // After compilation, dump all symbols.
    compiler::cli::add_flag("dump-symbols", dump_symbols, compiler::api::POST_COMPILE);
    

URL Mode

A compiler that offers an LSP interface has all the components needed to offer an HTTP interface as well: the compiler can be spun up in a server mode and the user can open a browser pointing to localhost on a particular port to browse definitions and perhaps even do basic editing with a built-in IDE.

In this mode, a URL scheme that allows embedding expressions for simple queries to the compiler API is beneficial. This makes the most sense for “flat” or syntactically simple languages like Forths and Lisps so the mapping is more obvious, but any language could offer a syntax like this.

GET http://localhost:8421/eval/mul/add/2/3/div/8/4
=>
"""
10
"""
GET http://localhost:8421/definition/main/listing
=>
"""
int main() {
  print("hello, world!");
  return 0;
}
"""

Others

JSON, XML, s-expressions, &c.

These aren’t particularly original, but they haven’t been done well imo. (Maybe they can’t be! I doubt that.)

Spreadsheet Embedding

Spreadsheets are already powerful tools and convenient UIs. Instead of trying to turn spreadsheets into more general-purpose programming languages, why not coopt them into UIs for editing a programming language? Export the file to CSV and have the compiler read that; use layout to illustrate dataflow. Spreadsheet formulas become a metaprogramming tool rather than a programming tool.

Functional Message Passing

This is a concept I explored in an object-oriented language called Even. It’s driven by semantics, but affects syntax as a result. The idea is that the primitive operation is message passing, as in Smalltalk and related languages (Objective-C, Pharo, Factor), but arranged in such a way that it naturally supports higher-order programming.

Examples

When sending a verb to another verb, by default the two are composed, not applied, so they can be chained; to pass a verb as an argument, it must be “escaped” into a noun (here, with a curly bracket-delimited block {}).

histogram = dict  map {head; length}  groupBy {==}  sort.

This is evaluated like so:

  • The verb dict awaits a list of pairs: dict(_)

  • The verb map is composed with dict, and awaits a gerund and a list: dict(map(_, _))

  • The gerund {head; length} is filled in as the first argument to map: dict(map({head; length}, _)); the ; operator composes two verbs in a “fanout”, so head; length is equivalent to a block {|x| head(x) & length(x)}, where & is the pair operator

  • Because the argument slots of head and length are inside the gerund expression, they’re not filled in by subsequent messages, so groupBy is placed in the next slot of map, awaiting a gerund and a list: dict(map({head; length}, groupBy(_, _)))

  • The gerund {==} goes in the first argument slot of groupBy: dict(map({head; length}, groupBy({==}, _)))

  • The verb sort goes in the second slot of groupBy, awaiting a list: dict(map({head; length}, groupBy({==}, sort(_))))

  • The period . terminates the definition.

Now, when histogram is applied to a list, its one remaining argument slot is filled, and the whole chain of verbs evaluates to the histogram dictionary:

histogram("CACBCB")
// =
dict(map({head; length}, groupBy({==}, sort("CACBCB"))))
// =
dict(map({head; length}, groupBy({==}, "ABBCCC")))
// =
dict(map({head; length}, ["A", "BB", "CCC"])))
// =
dict([{head; length} "A", {head; length} "BB", {head; length} "CCC"])
// =
dict([{|x| head(x) & length(x)} "A", {|x| head(x) & length(x)} "BB", {|x| head(x) & length(x)} "CCC"])
// =
dict([head("A") & length("A"), head("BB") & length("BB"), head("CCC") & length("CCC")])
// =
dict(['A' & 1, 'B' & 2, 'C' & 3])
// =
['A' = 1, 'B' = 2, 'C' = 3]

This is similar to the treatment of forks and hooks in J, where specific patterns of verbs are composed in different ways depending on their arity, except that argument slots are always filled in from left to right by default. To override this default, we can include a notion of adverbs that modify verbs directly, rather than taking a gerund. To fill in multiple slots with the same value, you can then use an adverb to modify the verb. For example, here’s the definition of the “less than or equal to” operator:

<= = < :|| ==.

An expression like 1 <= 2 is evaluated like so:

  • Desugar infix syntax: (<=) 1 2

  • Substitute the definition of <=: (< :|| ==) 1 2

  • Desugar infix syntax: ((: (||)) (<) (==)) 1 2

  • The “both” adverb : (mnemonic: two dots, two slots) converts the verb (||) into one that duplicates its arguments; it’s variadic, so it will accept any number: {|...args| (||) (_ ...args) (_ ...args)} (<) (==) 1 2

  • The verb < is composed in the first slot: {|...args| (||) ((<) ...args) (_ ...args)} (==) 1 2

  • The verb == is composed in the second slot: {|...args| (||) ((<) ...args) ((==) ...args)} 1 2

  • The block receives its first nominal argument: {|...args| (||) ((<) 1 ...args) ((==) 1 ...args)} 2

  • And its second: {|...args| (||) ((<) 1 2 ...args) ((==) 1 2 ...args)}

  • There are no more arguments, so the variadic parameter is dropped: (||) ((<) 1 2) ((==) 1 2)

  • Evaluation proceeds: (||) ((<) 1 2) ((==) 1 2)(||) true falsetrue

The forks and hooks of J all have simple analogues here, but require phrasing expressions differently:

  • Monadic fork (V0 V1 V2) Ny = (V0 Ny) V1 (V2 Ny)—use “both” (:): :V1 V0 V2 Ny

  • Dyadic fork Nx (V0 V1 V2) Ny = (Nx V0 Ny) V1 (Nx V2 Ny)—use “both” (:): :V1 V0 V2 Nx Ny

  • Monadic noun fork (N0 V1 V2) Ny = N0 V1 (V2 Ny)—use composition: V1 N0 V2 Ny

  • Dyadic noun fork Nx (N0 V1 V2) Ny = N0 V1 (Nx V2 Ny)—use composition: V1 N0 V2 Nx Ny

  • Monadic hook (V0 V1) Ny = Ny V0 (V1 Ny)—use “both” (:) and the identity function {}: :V0 {} V1 Ny

  • Dyadic hook Nx (V0 V1) Ny = Nx V0 (V1 Ny)—use composition: V0 Nx V1 Ny

If we had written <= = < || ==., it would have produced a 4-argument (tetradic) verb:

  • (< || ==) 1 2 3 4

  • (||) ((<) _ _) ((==) _ _) 1 2 3 4

  • (||) ((<) 1 _) ((==) _ _) 2 3 4

  • (||) ((<) 1 2) ((==) _ _) 3 4

  • (||) ((<) 1 2) ((==) 3 _) 4

  • (||) ((<) 1 2) ((==) 3 4)

  • (||) true false

  • true

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment