In this document I will try to design a simple scripting language in informal way and describe why did I choose this or that feature to be present.
The language to be designed must be
- simple;
- ortogonal in features (no 2 features should do the same thing);
- intuitive;
- contain as least special cases as required for comfortable use;
- doesn't break its own premises.
The scope is static; each name used must be declared. You cannot manipulate the scope in the runtime (we're not a LISP).
If a name declared shadows another name, it must be prefixed with a shadows
keyword. If you add a shadows
prefix when the variable isn't shadowing anything, it should give a warning.
Lets look on one of the worst cases: javascript.
function f(x) {
f_body;
return f_result
}
function g(y, z) {
g_body;
return g_result
}
var x = g(4, "foo")
f(5)
Honestly, we don't need the return
keyword. The result of the function is its body (we will deal with side-effect later in a clever way, I promise).
The other thing we don't need is {}
. We might replace {
with =
, and the }
with, well, nothing? Let's try that.
fun f(x) = f_body;
fun g(y, z) = g_body;
var x = g(4, "foo");
f(5)
Wow, so short, much nice. You know what? We have different declarations for functions and so called "vars" (as if we're ever going to reassign them!). We have them for no damn reason.
I quite like the approach of ML-family languages, where you define everything with let
. However, they require
let
rec f x = ...
and g (y, z) = ...
x = g (4, "foo")
in
f (...)
constructs to designate that f
and g
can call each other. That's a nuisance. Lets define things like that:
let
f x = ...
g (y, z) = ...
x = g (4, "foo")
f (5)
or like that
let f x = ...
let g (y, z) = ...
let x = g (4, "foo")
f(5)
In both cases, f
and g
are considered to be inside one let
-block and therefore able to call each other.
Also, notice we've dropped the ()
around f
's argument. Why? Because ()
-thing is used to group. There's nothing to group in one argument.
You might ask, what if I have this?
function vector_norm(x, y) {
function sqr(x) {
return x * x
}
return sqrt(sqr(x) + sqr(y))
}
And I will answer - you do that:
let vector_norm (x, y) =
let sqr x = x * x
sqrt (sqr x + sqr y)
Nice and clean.
We have this function in javascript:
function f (x, y) {
return function (z) {
return x * (y + z)
}
}
Yeah, it returns another function. Why? Because you want to do
[1,2,3].map(foo(4,5))
The map
method expects a function as an argument, and you don't want to write
[1,2,3].map(function (z) {return foo(4,5,z) })
because that's messy as hell.
Let's write it like that:
let f (x, y) = fun z => x * (y + z)
Hmm, no, still messy. How about...
let f (x, y) z = x * (y + z)
Yeah! Much bettah!
Also, let's declare as much functions curried as possible.
Instead of
let add (x, y) = x + y
lets write
let add x y = x + y
Then application of this functions from this:
add(1, 2 + 3)
becomes this:
add 1 (2 + 3)
You remember we dropeed ()
in definitons? There is another change we have to make after that.
Look onto this expression:
let a = sin x + cos y
it actually means this:
let a = (sin(x)) + (cos(y))
So, the priority of funcall is maximal, and every operator has lower priority than that.
So, let
essentially works like this:
let
<definitions>
<more-definitions>
...
let
<even-more-definitions>
...
...
<body>
The indentation of first token after let
keyword will decide the level at which all definitions of the block should start. (So, we're not nailed down to 2 space indent). All sequental let
-blocks share scope. After all the let
-blocks there is a single expression. Function body can only be a naked expression, or a let-block. The let
-block does not share indentation with other let
-blocks, they may have different indentation level.
So, the following code
let
sqr x = x * x
field_density g m r =
let inv x = 1 / x
g * m * inv (sqr r)
field_density g_const 1000 42
means
let {
sqr x = x * x;
field_density g m r =
let {
inv x = 1 / x;
}
g * m * inv (sqr r);
}
field_density g_const 1000 42;
Javascript objects are quite nice, actually.
let qux = "hello"
{
foo: (x, y) => ...,
bar: x => ...,
x: 1,
qux, // <- we _captured_ that thing
method(z) {
return z + 1
},
willBeInNextJSVersion(w) => w - 1,
}
... there is still a lot of things that are bad.
- The field declarations are comma-separated. Why not semicolon? Actually, how about that: both
,
and;
are allowed separators for object field. - The bodies of declarations cannot refer to each other, even methods.
- Inconsistency. We have fields in form of
x: 1
and methods. And lambda-fields. I prefer all declarations to be in form oflet x = ...
and captures to be simplyx
.
Let's fix that:
let qux = "hello"
{
let foo (x, y) = bar(x) + y; (* we are referring to `bar` below here *)
let bar y = y + x;
let x = 1;
let qux;
let method z = z + 1;
let thatThing w = w - 1;
}
Hmm, too much let
. It doesn't do anything here. We're in one scope - the object itself.
Second try:
let qux = "hello"
{
foo (x, y) = ...;
bar x = ...;
x = 1;
qux;
method z = z + 1;
thatThing w = w - 1;
}
Much better.
There are two ways of doing it.
First is "you ain't gonna need it":
let
sqr x = x * x
add (x, y) = x + y
{
foo (x, y) = add (sqr x, sqr y)
}
Here, we "export" the foo
field and sqr
remains hidden, yet accessible in object.
Second way is
{
private
sqr x = x * x
add (x, y) = x + y
foo (x, y) = sqr x + sqr y
}
where private
makes a block, like let
does.
I think we can allow both. Second way will be translated into first during compilation, anyway.
Sometimes, you have 2-3 finctions that depend on a private one; and you want rest of your object to be independent of that.
{
{
private
sqr x = x * x
foo x = sqr x + 1
bar x = sqr x - 1
}
qux x = x + 1
}
Sometimes, you want to extend your object.
extend thing {
foo z = z - 1
}
You know what? I'll make extend
a function. Yeah. You can't access private fields of thing
anyway.
Somethimes, you want to access public fields though, and not via .
-syntax. Well, I have not-that-good news for ya. I decide to have static scope. It means, if you use a name, somewhere in the file this name is declared.
Therefore, to get fields in unqualified manner, you should pattern-match the object:
let {foo, quz} = thing
Let's go back to object updates. We can change extend
a bit: now the last parameter is not an "update" object, but an update function!
extend thing fun {foo, quz} => {
bar x = foo x + qux 1
}
Do you see what I did there? The function pattern-matches its argument to request fields foo
and quz
out of it.
You know what? This fun
keyword for functions makes it harder to read. Let's replace it with... \
?
extend thing \{foo, quz} => {
bar x = foo x + qux 1
}
I think, it didn't become worse, at the very least.
We will definitely need to add structure to our program.
For simplicity, the name of the current module will be derived from its path.
Sometimes, you want to prepare your code to splitting onto submodules, yet still want it to reside in one file for a bit. For that we present a module
keyword.
let bar x = x + 1
let parser = module {
foo = bar x (* will not compile, both `bar` and `x` are outside of the `module` keyword *)
}
let x = 1
What is does? It prohibits for enclosed expression the access to the outside scope.
We need some things to get in that submodule scope, though. And there is an import
keyword for that.
(in foo/bar.scr
)
let
import control.functor {map, fill}
{
stream = module {
...
}
parser = module
let
import foo.bar.stream as stream
{
...
}
}
That seems nice.
The import of control.functor
is unqualified: map
and fill
are accessible directly.
The import of foo.bar.stream
is qualified: you have to use stream.makeStream
to access its contents.
If you drop that as stream
you will have to do it like foo.bar.stream.makeStream
, which is probably not what you want. There's no way to dump contents of the module into current scope. All imports are either qualified or you have to manually enumerate the names to be imported.
That's even simpler. We don't need any keywords for that.
The module body is expression. If it happens to be an object, we can say that it exports all the public fields of that object.
The recursive imports are allowed. Since we have static scope, we can find a name which is an import loop and throw a "linker" error.
Congrats, we've reached the scary part.
What is an "algebraic datatype"? Its a "sum of products", where sum is +
and product is *
.
What does it mean?
Well, if we simplify things beyond the point mathematicians start hitting us with books, product is a tuple. Or an object.
This code declares a constructor of a "product" object:
let point x y z = {x, y, z}
See? Products are easy.
Basically, each object is a "product" of its fields.
Why is it called that? Sorry, won't explain - this ain't a category theory lecture.
Okay, what's a "sum"? We will take a look on a particular case called "designated sum".
What the hell does it mean, you ask? That means, each component of the sum is given a name.
Each component of a sum is a product with a name. To put is simply - component of a sum is a variant of object of whole type. We will call them constructors later. We call them that, because they are functions that return a constructed object.
They are also called case class
es in Scala and data class
es in Kotlin (so the programmers will not run away in fear).
Let's look onto an example.
let list = {
Push {head, tail}
Empty
let headOrElse l def =
match l with
| Push {head} -> head
| Empty -> def
let l123 = Push(1, Push(2, Push(3, Empty)))
}
This code declares a "list type". We have 2 variants:
Push {tail, tail}
- this is a variant of the list object that has fieldshead
andtail
. It represents aPush
of somehead
element into some othertail
list. It doesn't do anything, it just, yo know, represents - I mean "holds that data".Empty
- it represents an empty list, therefore no fields.
Then it matches over some list, checks which constructor was used to build the object and retrieves a field.
If you still don't get it, please have this messy js chunk, which does the exact same thing.
class List {
class Push extends List {
constructor (head, tail) {
Object.assign(this, {head, tail})
}
}
class Empty extends List {
constructor () {}
}
static headOrElse(l, def) {
switch (l.constructor) {
case Push: return l.head
case Empty: return def
}
}
static l123 = new Push(1, new Push(2, new Push(3, new Empty())))
}
The match
is basically nanotech-level switch
, which can not only select a branch by an int or a string, but deconstruct the object to a arbitrary depth. You can't fallthrough from a branch to another branch. You also have no way to deconstruct a function - for obvious reasons.
Assuming you get what constructor is, let's remember that it has a name. The usual perks of "being named" apply: it can be imported, exported, you can assign it to a name.
I would also like to have ?:
-style analog of match
. The hell, I will steal this operator.
Lo and behold!
let safeTail l = l is Push {tail} ? tail : Empty
It is actually ... "is" ...[ "{"..."}" ] "?" ... ":" ...
-operator, and this is what match
-expression will be translated into.
We can say "you ain't gonna need this", because
let if bool selectors =
match bool with
| True -> selectors.then
| False -> selectors.else
let and a b = if (a) {
then = if (b) {
then = True
else = False
}
else = False
}
(oh, the language is lazy, btw, so that's actually fine)
... but I like to have multi-way if. Like that:
let
fib x = if
| x == 0 -> 1
| x == 1 -> 1
| else -> fib (x - 1) + fib (x - 2)
I also like to have it embedded into match
as well.
match l with
| Push {head} if
| isOdd? head -> head
| Empty -> 0
Here if none of the if
-branches apply, it fallsthrough to the next branch. If no branch apply at all, it's an irrecoverable error.
Typical strings. With interpolation. Like that:
let x = 1
let y = "foo"
let str = "x = {% x %}, y = {% y %}"
The syntax was selected, because it makes it easy to distinguish interpolators on lexer phase.
These things:
let foo = (1, 2)
let foo_1 = foo._1
let foo_2 = ._2 foo
Yes, ._2
is a function that retrieves field _2
from its argument. It is more useful than you can imagine. Also, .foo
retrieves field foo
. And .+
retrieves operator +
, but it becomes a function.
Tuples make things very simple. You remember those arguments?
let foo (x, y) = x + y
foo (1, 2)
They are tuples now! And, of course, tuples can be pattern-matched in match
blocks as well.
I'm thinking about adding fields ._3
, ._4
and so on, to the tuples with appropriate length.
Also, tuples are kinda-lists, so maybe we should add a .rest
field? It will cut first argument off a tuple. in case of (a, b)
(which silently is (a, (b, ())
), it will return (b, ())
. So, our tuples are proper-LISP lists!
And the .length
field is mandatory, ofc.
The single concern is - we have to make a tuple constructor(s?). Should we call them ()
, (,())
, (,)
, (,,)
, (,,,)
... or Tuple0
, Tuple1
, Tuple2
...? I like first variant more. Oh, about variants - the tuple is a "canoncal" "product" from a "sum of products" story.
We actually need only 2 constructors - (,)
(2-tuple) and ()
(0-tuple).
There are 2 ways to match
a tuple:
match x with
(* way 1*)
| (42, a, b) -> a + b (* Notice we require 1st elem to be 42 here.
* This will match only tuple with length 3.
*)
| (a, x) -> a - x (* this will only match tiple with a length of 2 *)
| (a, ...xs) -> (a + 2, ...xs) (* this wil match any tuple with length >= 1
* We also have _spliced_ xs here.
*)
We need to establish something. There are 4 kinds of names:
- values -
a
,b
,rest
- constructors -
A
,B
,Rest
- operators -
+
,*
,>>=
- operators-constructors -
::
,:>
,:@?
Operators (and operators-constructors!) are just names that happen to be used as operators. If you wrap it with ()
, you make it just a name.
(* Three entirely equal functions *)
let add = (+)
let add2 x = (+) x
let add3 x y = (+) x y
(* Three entirely equal functions, as well *)
let push = (::)
let push2 x = (::) x
let push3 x y = (::) x y
Values (and constructors!) are just names that are happen to be used normally. If you wrap them with `
,
they can be made into operators.
let add = (+)
let sum = 1 `add` 2 `add` 3
This is just a syntax. This is the single difference between names and operators. We will turn the AST into S-expression along the interpretation process, and names and operators will become one thing.
We have all the machinery now.
let l = [1,2,3,4]
let r = [
1
2
3
4
]
let g = (1) :: (2) :: (3) :: (4) :: Nil (* this is what l and r will actually be *)
match g with
| [a, b, 4, d] -> a (* match a list of given length *)
| [a, b, ...rest] -> a + b (* match a list with given prefix *)
| a :: b :: rest -> a - b (* the same as previous, in "desugared" mode *)
| [] -> 0 (* the empty list *)
| Nil -> 0 (* also the empty list *)
The list will be desugared into g
-version. We will have Nil
and ::
ctors in some pervasive
-module.
Notice the parenthes! They are here to disambiguate operator priorities!
This theme is very slippery. But I want to implement the best scheme I know.
-
Sometimes, we will have qualified operators. Even if its qialified, it's an op.
let foo = { let (+?) a b = a - b } let bar = 1 foo.+? 42
It looks outright bad. However, in case of name-turned-into-op its not that bad.
let parser = { let or l r = ... } let intOrString = int `parser.or` string
-
We will not have integer priorities, like
infix left 5 @>->--, --<-<@
.Instead, we will have relative priorities.
arith = trasitive operator group infix ^ tighter than *, / infix left *, / tighter than +, - infix left +, - all looser than bitwise all tighter than comparison
This means:
- We declare operator group "arith";
bitwise
andcomparison
are imported;^
has more priority than*
,/
,+
and-
(because "transitive");*
and\
have more priority than+
and-
;*
,/
,+
and-
have left associativity;^
has no associativity (requires parens if chained);- all operators have less priority that any of the "bitwise" group;
- all operators have more priority that any of the "comparison" group;
arith
,bitwise
andcomparison
objects will exist in runtime (for simplicity of implementation), but any usage aside from import/export and declaring a group is prohibited; you can't pattern-match them either.
Also, if the group is not transitive, to use op1 and op2 in the same expr, they have to be related directly.
If operators are not related directly, are not related transitive inside a group or their groups aren't related directly, you need parens to diambiguate them. Parser (actually, another block) will enforce that.
How do we do our IO?
Like this:
main = do (io)
putStrLn "Hello"
name <- getLine
putStrLn "Hello, {% name %}!"
return name
This is exactly the same as
main = do (io)
() <- putStrLn "Hello"
name <- getLine
() <- putStrLn "Hello, {% name %}!"
return name
Which will be preprocessed into
main =
putStrLn "Hello" io.>>= \() ->
getLine io.>>= \name ->
putStrLn "Hello, {% name %}!" io.>>= \() ->
io.return (name)
Names like return
(IT'S A NAME!), throw
(YES, THIS IS TOO), catch
(NAME, YES) and some more will be taken from provided object (in our case, io
).
The >>=
operator functions like Promise#then
from js.
main =
putStrLn("Hello").then(_ =>
getLine.then(name =>
putStrLn(`Hello, {name}!`).then(_ =>
Promise.resolve(name)
)
)
)