-
-
Save joemfb/4288db3c0cae33386b17979fedc2dcda to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Syntax reform | |
There's a very basic problem in Hoon: sometimes a Hoon expression | |
represents a *value*, and sometimes it represents a *pattern*. | |
Consider the expression `[%foo %bar]`. Sometimes we want to | |
produce the literal value `[%foo %bar]`, the same noun as the | |
value `[0x6f.6f66 0x72.6162]`. | |
But sometimes we're using a pattern-matching rune, like `?-`, to | |
see if some fragment of the subject matches this pattern. | |
We might also be using a pattern, then creating a default value | |
of that pattern -- "bunting" it -- as in the argument of a `|=` | |
function, a default value that replaced by the caller's data. | |
Or we might be using a pattern because we actually want to make a | |
model -- a function whose argument is an arbitrary noun, and | |
whose product matches the pattern. For example, we might want to | |
normalize and type raw untrusted data from the network. | |
Making sense of all this complexity has turned out to be the | |
hardest language-design problem in Hoon. It can't be solved | |
perfectly, but 143 seems about as good as we can get. | |
### Two modes or two syntaxes | |
When parsing Hoon source, how do we distinguish between pattern | |
expressions and value expressions? | |
The easiest way to distinguish pattern expressions from value | |
expressions is just to make patterns and values orthogonal. If | |
the syntax of pattern expressions doesn't conflict with the | |
syntax of value expressions, there is no problem, and no need for | |
a context-specific syntax mode. | |
Also, if an expression means the same thing whether it is a value | |
or a pattern, there is no problem. Any pattern expression can be | |
used as a a value expression; the value will be the model | |
(normalizing function) for the pattern. Any value can be used as | |
a pattern; the value is presumed to build a model. | |
In all the regular rune forms, one of these two cases applies. | |
But in irregular forms, they conflict. A pair of models is not a | |
model of a pair -- it's not even a function at all. | |
We can solve the pattern/value problem in two ways: two syntaxes | |
and one mode (as in Hoon 151), or two modes and one syntax (as in | |
164 and 143). | |
### 151: two syntaxes and one mode | |
When we choose *two syntaxes and one mode*, every Hoon expression | |
is parsed in the same way, without context. The pattern and | |
value cases must be either identical or orthogonal. | |
So for a pattern which is a symbol, we say `$foo` rather than | |
`%foo`. `%foo` is the atom `0x6f.6f66`, whereas `$foo` is a | |
function which ignores its arbitrary input and always produces | |
`%foo`: | |
|= * | |
%foo | |
For a pattern which is a pair, we use curly rather than square | |
braces -- `{$foo $bar}`, a function which ignores its input and | |
always produces `[%foo %bar]`: | |
|= * | |
[%foo %bar] | |
A (rather pointless) function whose argument must be the cell | |
`[%foo %bar]`, and whose product is the same cell, looks like: | |
|= {$foo $bar} | |
[%foo %bar] | |
This is the Hoon 151 style. The modeless syntax fits the Urbit | |
aesthetic of mechanical simplicity. There is no "pattern mode" | |
or "value mode" -- just the expression `{$foo $bar}`, which is | |
very different from `[%foo %bar]`. `{$foo $bar}` is a function | |
which always produces `[%foo %bar]`. So it's a different syntax | |
for a different noun -- very straightforward. | |
Suppose we love this pair so much, we want to give it a name: | |
++ foobar {$foo $bar} | |
|= foobar | |
[%foo %bar] | |
There is nothing terribly objectionable about this design. But | |
it seemed possible to improve in it. It has problems. | |
Here is one small complication: errors. If the programmer puts a | |
pattern where a value is expected, that's not an error at all. | |
We just produce a value which is the model for the pattern (the | |
normalizing function). | |
But a value where a pattern is expected should be a syntax error. | |
Suppose we are pattern-matching with '?-', and we accidentally | |
write the value `%foo`, not the pattern `$foo`. We must assume | |
that this value expression produces a normalizing function. This | |
is a totally legitimate way to state a pattern. | |
But `%foo` is an atom, not a function at all. So the programmer, | |
who should see a clear syntax error at a logical position, gets | |
an error like `find-limb: $.+.2`. This is confusing. And it's | |
just annoying, because it's intuitively clear what the programmer | |
meant. A language shouldn't have this kind of syntactic slack. | |
This unnecessary source of errors is not a major footgun. You | |
learn fairly quickly not to trigger the footgun; it is very hard | |
for it to produce anything but a compiler error; and writing | |
`%foo` instead of `$foo` becomes about as common as dropping your | |
semicolons in C. | |
But it's just not right. It does not make Hoon feel like a | |
quality product. It is especially annoying to newbies, for whom | |
we must have the utmost sympathy. | |
### Two modes and one syntax | |
But in Hoon 143, we're going back to an improved version of the | |
old way we did it in Hoon 164: two modes and one syntax. So, | |
the same function as above: | |
|= [%foo %bar] | |
[%foo %bar] | |
In this model, the language for patterns and the language for | |
values is logically separate. At every node in the tree, we know | |
what mode we're in. | |
We know, for instance, that `|=` is followed by two expressions. | |
The first is a pattern, the second a value. When we parse `[%foo | |
%bar]` in pattern mode in Hoon 143, it produces the same | |
expression that `{$foo $bar}` does in 151. | |
Hoon 143 also has a clear, though ugly, syntax for toggling | |
modes: the irregular prefix `,`, or the regular rune `$;`. | |
### Why we're switching back | |
Because "syntactic sugar" exists for a reason. Humans are Hoon's | |
core target market. Humans ship with great syntax hardware. We | |
need to use this hardware, not ignore it. | |
Linguistic complexity does not even correlate with abstract | |
intelligence. Finnish is insanely complicated, while Spanish is | |
quite simple; learning Spanish is easier than learning Finnish; | |
Finns do not think harder than Spaniards when they talk. Once | |
you learn a complex syntax, you don't need to strain your | |
forebrain to *use* it. Syntax seems to get compiled into some | |
kind of weird wetware FPGA with little cognitive load. This is | |
not true for, say, category theory. | |
On first exposure, people often say Hoon looks like Perl. Perl | |
is also into squeezing every last drop out of ASCII. Perl and | |
Hoon both have intricate and complex syntaxes, although Hoon's | |
complexity consists of irregular variations on a highly regular | |
core. But Perl also has very complex language semantics (in | |
either Perl 5 or 6) -- whereas 2500 lines of code compile Hoon to | |
Nock, with its one-page spec. | |
Humans are also an aesthetic species. "Syntactic sugar" means | |
adding complexity to your syntax, in exchange for making your | |
programs easier and more pleasant to read and/or write. Because | |
humans have great syntax engines, this is often a good trade. | |
Beauty matters. Beauty is a user interface. | |
We want code to look good. Nobody thinks Perl looks good. Hoon, | |
once you get used to the rune syntax, looks quite regular. But | |
it looks more regular if there is just one way to write a pair. | |
For those still attached to the 151 experiment, consider one | |
example. In C, some expressions are "lvalues," whereas others | |
are "rvalues." Semantically, these are different things -- an | |
lvalue (an expression that can be assigned to) has to resolve to | |
a pointer, not to a value. | |
So logically, instead of | |
a = b; | |
our lvalue should produce an address, so we should write | |
&a = b; | |
There is a subtle righteousness, no doubt, to this design. But | |
it is not the right user experience, for C or Hoon. | |
### What 143 fixed | |
The goal of 143 is that, unless we're doing something wacky, we | |
never have to use the expression-value toggling syntax. | |
Toggling is ugly for a reason. It is good for wacky things to be | |
ugly. Anyone reading the code should be alerted that it's doing | |
interesting and abnormal things -- either because it has to, or | |
because it's not good code. | |
In Hoon 164, there are two places we found ourselves toggling. | |
One, when defining a pattern as an arm, we had to write | |
++ foobar ,[%foo %bar] | |
Two, when building models functionally, Hoon has no idea that a | |
value is actually a model being passed to a model builder: | |
++ foobar-list (list ,[%foo %bar]) | |
It was the sheer hideousness of `,[]` that seemed to make it | |
clear that two modes, one syntax, could not be the path of | |
righteousness. | |
But actually, the right path was just to fix these particular | |
corner cases. The first problem is solved by a variant of `++` | |
that starts in pattern mode, `+=`: | |
+= foobar [%foo %bar] | |
For the second problem, we observe that whenever we define a | |
pattern as a value that builds the model, we are either just | |
dereferencing a name or calling a function. When we are calling | |
a function, its argument is almost always a tuple of models. | |
So we just copy the irregular `()` call syntax, and the regular | |
`%-` (unary), `%+` (binary), `%^` (ternary), and `%~` (n-ary) | |
forms, into the pattern syntax. When we write | |
++ foobar-list (list [%foo %bar]) | |
++ foobar-map (map %moo (list [%foo %bar])) | |
`list` and `map` are parsed as values; `%moo` and `[%foo %bar]` | |
as patterns. | |
The toggles are still there if you need them. For instance, in | |
151 you would write | |
|= a/* | |
({$foo $bar} a) | |
and the pattern is obviously a function. In 143, you write: | |
|= a=* | |
(,[%foo %bar] a) | |
Which looks weird. It is also a weird thing to do, though. | |
So it should look weird. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment