joemfb/hoon-143-syntax.txt Secret

## hoon-143-syntax.txt
# Syntax reform

There's a very basic problem in Hoon: sometimes a Hoon expression
represents a *value*, and sometimes it represents a *pattern*.

Consider the expression `[%foo %bar]`.  Sometimes we want to
produce the literal value `[%foo %bar]`, the same noun as the
value `[0x6f.6f66 0x72.6162]`.

But sometimes we're using a pattern-matching rune, like `?-`, to
see if some fragment of the subject matches this pattern.

We might also be using a pattern, then creating a default value
of that pattern -- "bunting" it -- as in the argument of a `|=`
function, a default value that replaced by the caller's data.

Or we might be using a pattern because we actually want to make a
model -- a function whose argument is an arbitrary noun, and
whose product matches the pattern.  For example, we might want to
normalize and type raw untrusted data from the network.

Making sense of all this complexity has turned out to be the
hardest language-design problem in Hoon.  It can't be solved
perfectly, but 143 seems about as good as we can get.

### Two modes or two syntaxes

When parsing Hoon source, how do we distinguish between pattern
expressions and value expressions?

The easiest way to distinguish pattern expressions from value
expressions is just to make patterns and values orthogonal.  If
the syntax of pattern expressions doesn't conflict with the
syntax of value expressions, there is no problem, and no need for
a context-specific syntax mode.

Also, if an expression means the same thing whether it is a value
or a pattern, there is no problem.  Any pattern expression can be
used as a a value expression; the value will be the model
(normalizing function) for the pattern.  Any value can be used as
a pattern; the value is presumed to build a model.

In all the regular rune forms, one of these two cases applies.
But in irregular forms, they conflict.  A pair of models is not a
model of a pair -- it's not even a function at all.

We can solve the pattern/value problem in two ways: two syntaxes
and one mode (as in Hoon 151), or two modes and one syntax (as in
164 and 143).

### 151: two syntaxes and one mode

When we choose *two syntaxes and one mode*, every Hoon expression
is parsed in the same way, without context.  The pattern and
value cases must be either identical or orthogonal.

So for a pattern which is a symbol, we say `$foo` rather than
`%foo`.  `%foo` is the atom `0x6f.6f66`, whereas `$foo` is a
function which ignores its arbitrary input and always produces
`%foo`:

    |=  *
    %foo

For a pattern which is a pair, we use curly rather than square
braces -- `{$foo $bar}`, a function which ignores its input and
always produces `[%foo %bar]`:

    |=  *
    [%foo %bar]

A (rather pointless) function whose argument must be the cell
`[%foo %bar]`, and whose product is the same cell, looks like:

    |=  {$foo $bar}
    [%foo %bar]

This is the Hoon 151 style.  The modeless syntax fits the Urbit
aesthetic of mechanical simplicity.  There is no "pattern mode"
or "value mode" -- just the expression `{$foo $bar}`, which is
very different from `[%foo %bar]`.  `{$foo $bar}` is a function
which always produces `[%foo %bar]`.  So it's a different syntax
for a different noun -- very straightforward.

Suppose we love this pair so much, we want to give it a name:

    ++  foobar  {$foo $bar}

    |=  foobar
    [%foo %bar]

There is nothing terribly objectionable about this design.  But
it seemed possible to improve in it.  It has problems.

Here is one small complication: errors.  If the programmer puts a
pattern where a value is expected, that's not an error at all.
We just produce a value which is the model for the pattern (the
normalizing function).

But a value where a pattern is expected should be a syntax error.
Suppose we are pattern-matching with '?-', and we accidentally
write the value `%foo`, not the pattern `$foo`.  We must assume
that this value expression produces a normalizing function.  This
is a totally legitimate way to state a pattern.

But `%foo` is an atom, not a function at all.  So the programmer,
who should see a clear syntax error at a logical position, gets
an error like `find-limb: $.+.2`.  This is confusing.  And it's
just annoying, because it's intuitively clear what the programmer
meant.  A language shouldn't have this kind of syntactic slack.

This unnecessary source of errors is not a major footgun.  You
learn fairly quickly not to trigger the footgun; it is very hard
for it to produce anything but a compiler error; and writing
`%foo` instead of `$foo` becomes about as common as dropping your
semicolons in C.

But it's just not right.  It does not make Hoon feel like a
quality product.  It is especially annoying to newbies, for whom
we must have the utmost sympathy.

### Two modes and one syntax

But in Hoon 143, we're going back to an improved version of the
old way we did it in Hoon 164: two modes and one syntax.  So,
the same function as above:

    |=  [%foo %bar]
    [%foo %bar]

In this model, the language for patterns and the language for
values is logically separate.  At every node in the tree, we know
what mode we're in.

We know, for instance, that `|=` is followed by two expressions.
The first is a pattern, the second a value.  When we parse `[%foo
%bar]` in pattern mode in Hoon 143, it produces the same
expression that `{$foo $bar}` does in 151.

Hoon 143 also has a clear, though ugly, syntax for toggling
modes: the irregular prefix `,`, or the regular rune `$;`.

### Why we're switching back

Because "syntactic sugar" exists for a reason.  Humans are Hoon's
core target market.  Humans ship with great syntax hardware.  We
need to use this hardware, not ignore it.

Linguistic complexity does not even correlate with abstract
intelligence.  Finnish is insanely complicated, while Spanish is
quite simple; learning Spanish is easier than learning Finnish;
Finns do not think harder than Spaniards when they talk.  Once
you learn a complex syntax, you don't need to strain your
forebrain to *use* it.  Syntax seems to get compiled into some
kind of weird wetware FPGA with little cognitive load.  This is
not true for, say, category theory.

On first exposure, people often say Hoon looks like Perl.  Perl
is also into squeezing every last drop out of ASCII.  Perl and
Hoon both have intricate and complex syntaxes, although Hoon's
complexity consists of irregular variations on a highly regular
core.  But Perl also has very complex language semantics (in
either Perl 5 or 6) -- whereas 2500 lines of code compile Hoon to
Nock, with its one-page spec.

Humans are also an aesthetic species.  "Syntactic sugar" means
adding complexity to your syntax, in exchange for making your
programs easier and more pleasant to read and/or write.  Because
humans have great syntax engines, this is often a good trade.
Beauty matters.  Beauty is a user interface.

We want code to look good.  Nobody thinks Perl looks good.  Hoon,
once you get used to the rune syntax, looks quite regular.  But
it looks more regular if there is just one way to write a pair.

For those still attached to the 151 experiment, consider one
example.  In C, some expressions are "lvalues," whereas others
are "rvalues."  Semantically, these are different things -- an
lvalue (an expression that can be assigned to) has to resolve to
a pointer, not to a value.

So logically, instead of

    a = b;

our lvalue should produce an address, so we should write

    &a = b;

There is a subtle righteousness, no doubt, to this design.  But
it is not the right user experience, for C or Hoon.

### What 143 fixed

The goal of 143 is that, unless we're doing something wacky, we
never have to use the expression-value toggling syntax.

Toggling is ugly for a reason.  It is good for wacky things to be
ugly.  Anyone reading the code should be alerted that it's doing
interesting and abnormal things -- either because it has to, or
because it's not good code.

In Hoon 164, there are two places we found ourselves toggling.
One, when defining a pattern as an arm, we had to write

    ++  foobar  ,[%foo %bar]

Two, when building models functionally, Hoon has no idea that a
value is actually a model being passed to a model builder:

    ++  foobar-list  (list ,[%foo %bar])

It was the sheer hideousness of `,[]` that seemed to make it
clear that two modes, one syntax, could not be the path of
righteousness.

But actually, the right path was just to fix these particular
corner cases.  The first problem is solved by a variant of `++`
that starts in pattern mode, `+=`:

    +=  foobar  [%foo %bar]

For the second problem, we observe that whenever we define a
pattern as a value that builds the model, we are either just
dereferencing a name or calling a function.  When we are calling
a function, its argument is almost always a tuple of models.

So we just copy the irregular `()` call syntax, and the regular
`%-` (unary), `%+` (binary), `%^` (ternary), and `%~` (n-ary)
forms, into the pattern syntax.  When we write

    ++  foobar-list  (list [%foo %bar])
    ++  foobar-map   (map %moo (list [%foo %bar]))

`list` and `map` are parsed as values;  `%moo` and `[%foo %bar]`
as patterns.

The toggles are still there if you need them.  For instance, in
151 you would write

    |=  a/*
    ({$foo $bar} a)

and the pattern is obviously a function.  In 143, you write:

    |=  a=*
    (,[%foo %bar] a)

Which looks weird.  It is also a weird thing to do, though.
So it should look weird.
	# Syntax reform

	There's a very basic problem in Hoon: sometimes a Hoon expression
	represents a value, and sometimes it represents a pattern.

	Consider the expression `[%foo %bar]`. Sometimes we want to
	produce the literal value `[%foo %bar]`, the same noun as the
	value `[0x6f.6f66 0x72.6162]`.

	But sometimes we're using a pattern-matching rune, like `?-`, to
	see if some fragment of the subject matches this pattern.

	We might also be using a pattern, then creating a default value
	of that pattern -- "bunting" it -- as in the argument of a `\|=`
	function, a default value that replaced by the caller's data.

	Or we might be using a pattern because we actually want to make a
	model -- a function whose argument is an arbitrary noun, and
	whose product matches the pattern. For example, we might want to
	normalize and type raw untrusted data from the network.

	Making sense of all this complexity has turned out to be the
	hardest language-design problem in Hoon. It can't be solved
	perfectly, but 143 seems about as good as we can get.

	### Two modes or two syntaxes

	When parsing Hoon source, how do we distinguish between pattern
	expressions and value expressions?

	The easiest way to distinguish pattern expressions from value
	expressions is just to make patterns and values orthogonal. If
	the syntax of pattern expressions doesn't conflict with the
	syntax of value expressions, there is no problem, and no need for
	a context-specific syntax mode.

	Also, if an expression means the same thing whether it is a value
	or a pattern, there is no problem. Any pattern expression can be
	used as a a value expression; the value will be the model
	(normalizing function) for the pattern. Any value can be used as
	a pattern; the value is presumed to build a model.

	In all the regular rune forms, one of these two cases applies.
	But in irregular forms, they conflict. A pair of models is not a
	model of a pair -- it's not even a function at all.

	We can solve the pattern/value problem in two ways: two syntaxes
	and one mode (as in Hoon 151), or two modes and one syntax (as in
	164 and 143).

	### 151: two syntaxes and one mode

	When we choose two syntaxes and one mode, every Hoon expression
	is parsed in the same way, without context. The pattern and
	value cases must be either identical or orthogonal.

	So for a pattern which is a symbol, we say `$foo` rather than
	`%foo`. `%foo` is the atom `0x6f.6f66`, whereas `$foo` is a
	function which ignores its arbitrary input and always produces
	`%foo`:

	\|= *
	%foo

	For a pattern which is a pair, we use curly rather than square
	braces -- `{$foo $bar}`, a function which ignores its input and
	always produces `[%foo %bar]`:

	\|= *
	[%foo %bar]

	A (rather pointless) function whose argument must be the cell
	`[%foo %bar]`, and whose product is the same cell, looks like:

	\|= {$foo $bar}
	[%foo %bar]

	This is the Hoon 151 style. The modeless syntax fits the Urbit
	aesthetic of mechanical simplicity. There is no "pattern mode"
	or "value mode" -- just the expression `{$foo $bar}`, which is
	very different from `[%foo %bar]`. `{$foo $bar}` is a function
	which always produces `[%foo %bar]`. So it's a different syntax
	for a different noun -- very straightforward.

	Suppose we love this pair so much, we want to give it a name:

	++ foobar {$foo $bar}

	\|= foobar
	[%foo %bar]

	There is nothing terribly objectionable about this design. But
	it seemed possible to improve in it. It has problems.

	Here is one small complication: errors. If the programmer puts a
	pattern where a value is expected, that's not an error at all.
	We just produce a value which is the model for the pattern (the
	normalizing function).

	But a value where a pattern is expected should be a syntax error.
	Suppose we are pattern-matching with '?-', and we accidentally
	write the value `%foo`, not the pattern `$foo`. We must assume
	that this value expression produces a normalizing function. This
	is a totally legitimate way to state a pattern.

	But `%foo` is an atom, not a function at all. So the programmer,
	who should see a clear syntax error at a logical position, gets
	an error like `find-limb: $.+.2`. This is confusing. And it's
	just annoying, because it's intuitively clear what the programmer
	meant. A language shouldn't have this kind of syntactic slack.

	This unnecessary source of errors is not a major footgun. You
	learn fairly quickly not to trigger the footgun; it is very hard
	for it to produce anything but a compiler error; and writing
	`%foo` instead of `$foo` becomes about as common as dropping your
	semicolons in C.

	But it's just not right. It does not make Hoon feel like a
	quality product. It is especially annoying to newbies, for whom
	we must have the utmost sympathy.

	### Two modes and one syntax

	But in Hoon 143, we're going back to an improved version of the
	old way we did it in Hoon 164: two modes and one syntax. So,
	the same function as above:

	\|= [%foo %bar]
	[%foo %bar]

	In this model, the language for patterns and the language for
	values is logically separate. At every node in the tree, we know
	what mode we're in.

	We know, for instance, that `\|=` is followed by two expressions.
	The first is a pattern, the second a value. When we parse `[%foo
	%bar]` in pattern mode in Hoon 143, it produces the same
	expression that `{$foo $bar}` does in 151.

	Hoon 143 also has a clear, though ugly, syntax for toggling
	modes: the irregular prefix `,`, or the regular rune `$;`.

	### Why we're switching back

	Because "syntactic sugar" exists for a reason. Humans are Hoon's
	core target market. Humans ship with great syntax hardware. We
	need to use this hardware, not ignore it.

	Linguistic complexity does not even correlate with abstract
	intelligence. Finnish is insanely complicated, while Spanish is
	quite simple; learning Spanish is easier than learning Finnish;
	Finns do not think harder than Spaniards when they talk. Once
	you learn a complex syntax, you don't need to strain your
	forebrain to use it. Syntax seems to get compiled into some
	kind of weird wetware FPGA with little cognitive load. This is
	not true for, say, category theory.

	On first exposure, people often say Hoon looks like Perl. Perl
	is also into squeezing every last drop out of ASCII. Perl and
	Hoon both have intricate and complex syntaxes, although Hoon's
	complexity consists of irregular variations on a highly regular
	core. But Perl also has very complex language semantics (in
	either Perl 5 or 6) -- whereas 2500 lines of code compile Hoon to
	Nock, with its one-page spec.

	Humans are also an aesthetic species. "Syntactic sugar" means
	adding complexity to your syntax, in exchange for making your
	programs easier and more pleasant to read and/or write. Because
	humans have great syntax engines, this is often a good trade.
	Beauty matters. Beauty is a user interface.

	We want code to look good. Nobody thinks Perl looks good. Hoon,
	once you get used to the rune syntax, looks quite regular. But
	it looks more regular if there is just one way to write a pair.

	For those still attached to the 151 experiment, consider one
	example. In C, some expressions are "lvalues," whereas others
	are "rvalues." Semantically, these are different things -- an
	lvalue (an expression that can be assigned to) has to resolve to
	a pointer, not to a value.

	So logically, instead of

	a = b;

	our lvalue should produce an address, so we should write

	&a = b;

	There is a subtle righteousness, no doubt, to this design. But
	it is not the right user experience, for C or Hoon.

	### What 143 fixed

	The goal of 143 is that, unless we're doing something wacky, we
	never have to use the expression-value toggling syntax.

	Toggling is ugly for a reason. It is good for wacky things to be
	ugly. Anyone reading the code should be alerted that it's doing
	interesting and abnormal things -- either because it has to, or
	because it's not good code.

	In Hoon 164, there are two places we found ourselves toggling.
	One, when defining a pattern as an arm, we had to write

	++ foobar ,[%foo %bar]

	Two, when building models functionally, Hoon has no idea that a
	value is actually a model being passed to a model builder:

	++ foobar-list (list ,[%foo %bar])

	It was the sheer hideousness of `,[]` that seemed to make it
	clear that two modes, one syntax, could not be the path of
	righteousness.

	But actually, the right path was just to fix these particular
	corner cases. The first problem is solved by a variant of `++`
	that starts in pattern mode, `+=`:

	+= foobar [%foo %bar]

	For the second problem, we observe that whenever we define a
	pattern as a value that builds the model, we are either just
	dereferencing a name or calling a function. When we are calling
	a function, its argument is almost always a tuple of models.

	So we just copy the irregular `()` call syntax, and the regular
	`%-` (unary), `%+` (binary), `%^` (ternary), and `%~` (n-ary)
	forms, into the pattern syntax. When we write

	++ foobar-list (list [%foo %bar])
	++ foobar-map (map %moo (list [%foo %bar]))

	`list` and `map` are parsed as values; `%moo` and `[%foo %bar]`
	as patterns.

	The toggles are still there if you need them. For instance, in
	151 you would write

	\|= a/*
	({$foo $bar} a)

	and the pattern is obviously a function. In 143, you write:

	\|= a=*
	(,[%foo %bar] a)

	Which looks weird. It is also a weird thing to do, though.
	So it should look weird.