Skip to content

Instantly share code, notes, and snippets.

@lunacookies
Last active May 7, 2020 01:37
Show Gist options
  • Save lunacookies/760300c80245632d7c73e3c326ac6c4a to your computer and use it in GitHub Desktop.
Save lunacookies/760300c80245632d7c73e3c326ac6c4a to your computer and use it in GitHub Desktop.
Some Syntax Ideas

What is this?

I’m working on a new language (Fjord) for a shell (fj). Although I have some ideas of my own for syntax, I’m not sure if they’re a really bad idea or if they’re fine, so I’ve decided to conduct a ‘sanity check’ of sorts by writing some preliminary ideas down here. Please respond down in the comments with any thoughts you have!

General philosophy

Note: throughout this document I’ll refer to functions, which are what I’m calling commands.

As this is a language for a shell,

  • brevity is extremely highly valued (the more common something is, the easier it should be to type)
  • function calls are more common than anything else
  • string interpolation is also pretty common

Grouping

Expressions can be grouped so that they are evaluated first by wrapping them in parentheses:

(1 + 2) * 5

Function calls

Inspired by languages like Haskell and ML, I think that function calls should use simple juxtaposition:

add 1 2

This greatly reduces typing, which is important, as function calls are the most common operation in a shell. Imagine instead of typing ls /path/to/dir1 /path/to/dir2 you had to use the traditional syntax to call a function, and had to type ls(/path/to/dir1, /path/to/dir2)!

Variable usages

Those same languages from before, Haskell and ML, don’t differentiate syntactically between a variable and a function call without parameters. I would like to avoid this for a number of reasons:

  • it makes implementation a bit more complex
  • syntax highlighting that differentiates between functions and variables becomes extremely complex, if not impossible, to do accurately
  • using a variable has a much lower potential cost than calling a function – it is easier to see parts of the program that might be slow if they are different

Since functions are more often used than variables in the context of a shell, I decided to add syntax beyond just writing their name to variables, rather than functions.

The obvious choice is to prefix variable names with $, as this is used by all kinds of languages for one purpose or another: PHP, Perl, every shell I’ve ever seen, Swift (state properties), Rust (macro_rules!), the list goes on. However, after seeing that the Rust crate quote uses # to interpolate variables, I realised:

  • # is probably a better choice than $ since it’s easier to type
  • I can use any syntax I want – the choice isn’t so obvious

After some experimentation, I think that prefixing variable names with . is the ‘best’ choice, without looking too out of place. What do you think?

Case conventions

How to separate words in different case conventions:

  • snake_case: hold shift, press the hyphen/underscore key, let go of shift before you start typing the next word
  • kebab-case: press the hyphen/underscore key
  • camelCase: hold shift, type the first character of the next word, let go before you type the next character

The final shell will hopefully have tab-completion that supports case-insensitivity, so here is what that list would look like if case isn’t a consideration

  • snake_case: hold shift, press the hyphen/underscore key, let go of shift before you start typing the next word
  • kebab-case: press the hyphen/underscore key
  • camelCase: nothing

Of course, the tab completion could also intelligently convert hyphens to underscores and vice-versa:

  • snake_case: press the hyphen/underscore key
  • kebab-case: press the hyphen/underscore key
  • camelCase: nothing

Camel case is still the easiest to type choice, so that’s what I’ve decided on. What’s your opinion?

Variable definitions

This one is pretty obvious to me:

let name = value

But I guess the equals sign is implied …

let name value

Do you even need let?

name value

Now this looks exactly like a function call. Maybe it’s better with just the equals sign?

name = value

This is the most concise choice, and is also familiar to users of Haskell, Python and Ruby (and probably others). Or is there a better option I haven’t considered?

Function definitions

Most languages have a separate syntax for defining functions and variables:

// JavaScript

function name(param) {
    body
}

var variable = value; // ‘var’ could also be ‘let’ to declare a constant
// Rust

fn name(param: SomeType) -> AnotherType {
    body
}

let variable = value;

Lots of modern languages have support for lambdas, closures, function literals, anonymous functions, whatever you call them. This leads to a duplication of the ways to define a function:

// Rust

// using normal function declaration syntax
fn name(param: SomeType) -> AnotherType {
    body
}

// using a closure
let name = |param| {
    body
};
// JavaScript

function name(param) {
    body
}

// using an arrow function
let name = (param) => {
    body
};
# Python

def name(param):
    body

# using a lambda
name = lambda param : body

Although the ‘anonymous function form’ of all these examples is usually limited in some way compared to normal functions that would stop you from using them for everything, I still found this a little annoying.

Haskell uses a very similar syntax for function and variable definitions:

varName = value
functionName param = body

Haskell also, however, also has the same duplication from before, with a lambda syntax of its own:

name param = body

-- using an anonymous function
name = (\param -> body)

I really like the look and brevity of Haskell’s normal function definitions, but want to avoid the ‘duplication’ with anonymous functions. Initially something like this springs to mind:

name = param {
    body
}

But I plan on adding support for block expressions, which would mean that the syntax above would be confused with a function call with the name param and a parameter whose value is whatever body evaluates to.

So maybe something like this?

name = |param| {
    body
}

It is easier to type almost any character other than |, though:

name = fn param {
    body
}

Maybe it shouldn’t be required that function definitions use a block expression though?

name = fn param body

Some kind of separation would be nice:

name = fn param -> body

Although the arrow looks very nice, it would be easier to type if : is used instead:

name = fn param : body

Now we don’t need the fn any more:

name = param : body

Here’s what this syntax looks like with multiple parameters:

name = param1 param2 : body

Those block expressions can of course be used for the body of a function:

name = param1 param2 : {
    do
    lots
    of
    stuff
}

What do you think function definitions should look like? Do you think that the duplication of syntaxes for normal and anonymous functions is needed for some reason?

Strings

Simple enough: use double quotes. I’ve decided against single quotes because too many strings contain single quotes themselves, which would all require escaping.

Personally, I don’t like how some languages give you the choice of quotes, because it leads to inconsistency.

String interpolation

I think string interpolation deserves a special syntax for the string itself, like Python does in its f-strings:

"Hello, Sarah!"   # literal
f"Hello, {name}!" # interpolation

I like how it only takes one extra character to create an f-string, so this is something I hope to copy for Fjord. Besides, string interpolation is one of the most common tasks in a shell. This is in contrast to Ruby, where strings that have interpolations don’t get any differentiation from literals:

"Hello, Sarah!"   # literal
"Hello, #{name}!" # interpolation

Swift has the same ‘problem’:

"Hello, Sarah!"   // literal
"Hello, \(name)!" // interpolation

Here is what string interpolations could look like for Fjord, using the variable usage syntax from before:

f"Hello, .name!"

But this doesn’t allow for arbitrary expressions like all the syntaxes above do, only variables, so some kind of delimiter around the interpolation is needed. After playing around with it for a bit, I came to the conclusion that the curly-brace syntax that Python uses is my favourite.

f"Hello, {.name}!"

This is two extra characters compared to a traditional shell, such as bash in this example:

"Hello, $name!"

This is a little misleading, however, as that syntax can’t be used for interpolating any arbitrary expression like Fjord’s can:

f"Hello, {getUserName}!"

The syntax to do the same thing in bash takes the same number of characters:

"Hello, $(getusername)!" # I wrote it in lowercase because camel case command names look wrong

Do you think that strings containing interpolations should get a different syntax to literals? Do you think that you should be able to interpolate any expression, or is being able to interpolate just variables enough? Are you a fan of Python’s f-string syntax that I nicked for Fjord?

Options and named function parameters

A convention has arisen over the decades for passing options to commands:

$ command -o # short option name
$ command --option # long option name
$ command --speed=25 # option with value
$ command --speed 25 # most commands support using a space instead of =
$ command --flag --speed 25 "positional arguments follow options"

This isn’t set in stone, so sometimes I’m caught off guard by a command that doesn’t completely follow the convention:

$ find . -name '*foo*' # for some reason find uses a single dash for option names
$ command --speed=25 --speed 25 # some commands accept only one of these forms
$ ls /path/to/directory --long # the GNU utilities support putting options after
                               # positional arguments, while most commands don’t

I’ve realised that the whole ‘options’ convention that has appeared over time bears a striking resemblance to the handling of function parameters in some languages. Apart from being able to pass an option without a value (this can be viewed as equivalent to setting it to true), options are exactly like named parameters which can have default values.

As my main programming language is Rust (which doesn’t have named or default parameters), I’m not really familiar with these concepts. Python’s approach to named parameters and default paramter values seems very reasonable, so maybe Fjord could imitate it:

downloadUrl = url timeout=5 httpVersion=1.1 : doTheThing

These are all equivalent:

downloadUrl "https://google.com"
downloadUrl url="https://google.com" 5
downloadUrl timeout=5 url="https://google.com"

I’m not so sure about that = without any space around it – it kind of irks me. Here’s what it looks like with Swift-/Ruby-style colons:

downloadUrl = url timeout: 5 httpVersion=1.1 : doTheThing
downloadUrl "https://google.com"
downloadUrl url: "https://google.com" 5
downloadUrl timeout: 5 url: "https://google.com"

It looks a little strange to me without commas separating the arguments, so I think I prefer Python’s style for now.

This still doesn’t take into account how the command option convention has short option names and how if you don’t give an option a value it’s equivalent to setting its value to true. If we ignore those two features, here’s what a call to ls that uses a few different parameters could look like:

ls all=true long=true color=never /path/to/dir1 /path/to/dir2

Here’s what that looks like in a traditional shell:

$ ls -al --color=never /path/to/dir1 /path/to/dir2

# With long option names
$ ls --all --long --color=never /path/to/dir1 /path/to/dir2

Much, much cleaner. I’m not really sure how to integrate short named parameters and if-you-pass-a-named-paramter-without-a-value-it’s-a-boolean-true, so if you have any ideas, I’d appreciate it.

@liljencrantz
Copy link

That’s an interesting design idea, I’ve never heard about that before. What would happen if you went to ‘execute’ (1 2 3 4)? Would it ‘return’ 1? Or would it throw an error that 1 doesn’t take any parameters?

If you do that in Lisp, Lisp will be cross with you. In crush, I've done thing differently.

First of all, I added a val command val that simply outputs it's argument, so val 3 puts 3 in the stream. Secondly, if a list has exactly one element, and that element is not executable, then it gets implicitly converted into a call to val with that element. That means that in cĆrush, you can write my_variable := (find /); my_variable | head 1. The first command will create a long running thread that outputs all files in your system, and it will stick the table_stream containing that output into the variable my_variable. The second command will send that stream as input to the head command, which will output the first line. Note that you can execute the head command like that many times, and each time you do, it will output the next line of the stream. Also note that the find thread will output a few lines until the channel becomes full and then block until somebody starts consuming the channel.

And what if you are just inspecting a variable? I assume that, if it is a REPL, you could use the following:

crush> var_name
value

Yup, that worls because of the implicit val thing I mentioned above.

Or is every variable a ‘function’ that takes no parameters? My mind is kind of broken by this :)

I am old. I have stopped caring about which case convention people use, I just really wish they pick a consistent one and stuck with it.

I definitely agree that consistency is more important than anything else – I just thought that if I needed to pick one, I might as well pick one suited to the use case. What do you think of making the use of a case convention other than ‘the chosen one’ a warning?

Makes sense. I wish compilers did that in general.

To catch bugs from typos fast. Exactly how in. Rust, you have to declare a variable using let before you're allowed to reassign it using =.

How does having a separate syntax for declaration and assignment prevent typos? Is it that it catches a situation in which you meant to declare a variable earlier, but forgot to, so try to assign without declaration?

It protects you against things like

crush> fooo := X
crush> foo = Y # Typo, I meant fooo here!

But even more importantly, it protects you against

crush> fooo := X
# 10000 lines of unrelated code
crush> fooo := Z # This now becomes an new variable, so we will not accidentally clobber the other fooo. Thanks, explicit declarations!

It makes sense to me in Rust, though, because the whole mutablity/immutability thing, as well as the type, has to be decided at the binding’s entry point.

If that was the only reason, we'd be able to use foo = bar in situations where you don't need mutability, and we could have the syntax of mut foo = bar in the much rarer cases where mutability is needed.

I think long/short options and prefixing options with -/-- are to be considered workarounds, rather than good designs. I could imagine adding prefix matching on option names, like in GNU getopt_long, so if you only have one argument that begins with the letter 'a', it would be enough to say a=fnurple instead of add-humongous-cow=fnurple, but in general, I think that emulating getopt-style argument passing is a bad idea.

I think that whole ‘unambiguous prefix’ thing is a recipe for unmaintainable code, so I’ll be avoiding that!

getopt-long has had that feature since forever, and people seem to not be abusing it too badly. But maybe that's because they also have the option of using the short options, hard to say.

But maybe with tab completion a lack of short option names won’t be such a hindrance. What about options without values? Do you think it’s worth adding a shorthand for =true, simply because of how common it is?

What would you think of such a shorthand:

ls all=true long=true
ls all= long=

+1, I have been thinking of making --foo be an alias for foo=true which is exactly the same idea. I think your suggestion looks nicer, but I don't see a way to make it parseable. Because of the lack of separators between expressions, it's impossible to use the same operator both as a infix and postfix operator. That's also why Crush needs to use neg instead of - for negating a number - - is already used as an infix subtraction operator.

If you're planning on using . as a sigil for variables, how are you planning on talking about file names? Like, if I want to touch the file 'foo.lock' what do I write? How about wildcards? How do you do the equivalent of cat *.txt?

I plan on adding some kind of a special syntax for filenames, possibly using single quotes:

Interesting. Right now, crush allows you to do exactly that. 'foo' is a file named foo.

But also a bit inconvenient, no? Super-common shell operations turn into a chore, e.g. cd .. becomes cd '..', cat foo.txt becomes cat 'foo.txt'. Not a deal breaker, but definitely feels annoying enough to avoid for me.

Crush mostly solves this by also importing the content of the current directory into your namespace, so if you have a file named foo.txt in your current directory, that means that there is a variable named foo.txt in your namespace. Files in Crush support a / operator that works like the .operator in e.g. Rust, but does file lookup instead of regular member lookup.

@liljencrantz
Copy link

liljencrantz commented Mar 31, 2020

@arzg
A while ago I was thinking about how I could possibly add that feature that all REPLs seem to have, where they go into a special mode when they are waiting for you to close something, e.g. a fi in bash or the end of an indentation level in Python. I realised that an easy way to add something similar to this is to create a keybind that adds a newline in the input itself, allowing the user to split their input into multiple lines. I was thinking that maybe ctrl-enter might be a nice option for this.

Try out fish. It detects if you have an unterminated block command and it's editor goes into multiline mode. You can still move the cursor between lines and edit the whole command.

@lunacookies
Copy link
Author

@liljencrantz

That’s an interesting design idea, I’ve never heard about that before. What would happen if you went to ‘execute’ (1 2 3 4)? Would it ‘return’ 1? Or would it throw an error that 1 doesn’t take any parameters?

If you do that in Lisp, Lisp will be cross with you. In crush, I've done thing differently.

First of all, I added a val command val that simply outputs it's argument, so val 3 puts 3 in the stream. Secondly, if a list has exactly one element, and that element is not executable, then it gets implicitly converted into a call to val with that element. That means that in cĆrush, you can write my_variable := (find /); my_variable | head 1. The first command will create a long running thread that outputs all files in your system, and it will stick the table_stream containing that output into the variable my_variable. The second command will send that stream as input to the head command, which will output the first line. Note that you can execute the head command like that many times, and each time you do, it will output the next line of the stream. Also note that the find thread will output a few lines until the channel becomes full and then block until somebody starts consuming the channel.

And what if you are just inspecting a variable? I assume that, if it is a REPL, you could use the following:

crush> var_name
value

Yup, that worls because of the implicit val thing I mentioned above.

Do you think that this might be overcomplicating things? Because it sure seems very complex to me :)

I am old. I have stopped caring about which case convention people use, I just really wish they pick a consistent one and stuck with it.

I definitely agree that consistency is more important than anything else – I just thought that if I needed to pick one, I might as well pick one suited to the use case. What do you think of making the use of a case convention other than ‘the chosen one’ a warning?

Makes sense. I wish compilers did that in general.

At the moment Fjord doesn’t have a warnings system, so you have to use CAMEL CASE THE ONLY CASE CONVENTION or your program will error out :)

To catch bugs from typos fast. Exactly how in. Rust, you have to declare a variable using let before you're allowed to reassign it using =.

How does having a separate syntax for declaration and assignment prevent typos? Is it that it catches a situation in which you meant to declare a variable earlier, but forgot to, so try to assign without declaration?

It protects you against things like

crush> fooo := X
crush> foo = Y # Typo, I meant fooo here!

But even more importantly, it protects you against

crush> fooo := X
# 10000 lines of unrelated code
crush> fooo := Z # This now becomes an new variable, so we will not accidentally clobber the other fooo. Thanks, explicit declarations!

Is something like this an example?

let mut counter = 0;
for i in 0..100 {
    counter += 1;
    let counter = 0; // This doesn’t have any effect because it’s a new variable that disappears at the end of the scope
}

I think long/short options and prefixing options with -/-- are to be considered workarounds, rather than good designs. I could imagine adding prefix matching on option names, like in GNU getopt_long, so if you only have one argument that begins with the letter 'a', it would be enough to say a=fnurple instead of add-humongous-cow=fnurple, but in general, I think that emulating getopt-style argument passing is a bad idea.

I think that whole ‘unambiguous prefix’ thing is a recipe for unmaintainable code, so I’ll be avoiding that!

getopt-long has had that feature since forever, and people seem to not be abusing it too badly. But maybe that's because they also have the option of using the short options, hard to say.

But maybe with tab completion a lack of short option names won’t be such a hindrance. What about options without values? Do you think it’s worth adding a shorthand for =true, simply because of how common it is?

What would you think of such a shorthand:

ls all=true long=true
ls all= long=

+1, I have been thinking of making --foo be an alias for foo=true which is exactly the same idea.

I was thinking of making it so that when you tab-complete a named parameter, it automatically inserts the = ready for you to give it a value. This would make it easer to default to true.

I think your suggestion looks nicer, but I don't see a way to make it parseable.

I’m not really sure I understand why this isn’t parseable though. If no whitespace is allowed around the =, if it sees a space after = then it knows to default to true.

Because of the lack of separators between expressions, it's impossible to use the same operator both as a infix and postfix operator. That's also why Crush needs to use neg instead of - for negating a number - - is already used as an infix subtraction operator.

Don’t lots of languages use - for negation and minus though?

If you're planning on using . as a sigil for variables, how are you planning on talking about file names? Like, if I want to touch the file 'foo.lock' what do I write? How about wildcards? How do you do the equivalent of cat *.txt?

I plan on adding some kind of a special syntax for filenames, possibly using single quotes:

Interesting. Right now, crush allows you to do exactly that. 'foo' is a file named foo.

But also a bit inconvenient, no? Super-common shell operations turn into a chore, e.g. cd .. becomes cd '..', cat foo.txt becomes cat 'foo.txt'. Not a deal breaker, but definitely feels annoying enough to avoid for me.

That definitely seems quite annoying. I would like to avoid having to escape common filename characters like (, and ) because it seems so … wrong to have to do that. The only way I see that would allow avoiding this is some kind of delimiter before and after a filename, which takes up at least two extra characters.

Well, the closing quote could be auto-inserted (like the pair matching in most editors), and maybe the opening one could be part of the tab-completion suggestion?

Crush mostly solves this by also importing the content of the current directory into your namespace, so if you have a file named foo.txt in your current directory, that means that there is a variable named foo.txt in your namespace. Files in Crush support a / operator that works like the .operator in e.g. Rust, but does file lookup instead of regular member lookup.

That does seem like a cool idea, but I fear that it might be quite fragile – imagine not being able to define a variable if there happens to be a file with that name in the current directory.

@lunacookies
Copy link
Author

@liljencrantz

A while ago I was thinking about how I could possibly add that feature that all REPLs seem to have, where they go into a special mode when they are waiting for you to close something, e.g. a fi in bash or the end of an indentation level in Python. I realised that an easy way to add something similar to this is to create a keybind that adds a newline in the input itself, allowing the user to split their input into multiple lines. I was thinking that maybe ctrl-enter might be a nice option for this.

Try out fish. It detects if you have an unterminated block command and it's editor goes into multiline mode. You can still move the cursor between lines and edit the whole command.

The problem I have is I’m not sure how to detect an unterminated block command. I’m using the Nom library for parsing, which has a bunch of parsers in the category of ‘streaming’. I believe this is made for networking, where all the data may not have arrived yet, so the parser can say that more data is needed. Perhaps this could be used to model an incomplete user input?

@liljencrantz
Copy link

Do you think that this might be overcomplicating things? Because it sure seems very complex to me :)

Maybe. I feel the syntax is really convenient and almost always does what you want, though.

Is something like this an example?

let mut counter = 0;
for i in 0..100 {
    counter += 1;
    let counter = 0; // This doesn’t have any effect because it’s a new variable that disappears at the end of the scope
}

I guess. If you're dealing with multiple scopes, I believe it's really important to be able to differentiate between reassigning to a variable in an outer scope vs declaring a new variable in an inner scope that has the same name as some other variable.

I’m not really sure I understand why this isn’t parseable though. If no whitespace is allowed around the =, if it sees a space after = then it knows to default to true.

But then you've suddenly made your language whitespace sensitive in a way that I believe will trip up a lot of people. If foo bar = baz is different from foo bar=baz, there will be endless errors as a result.

Because of the lack of separators between expressions, it's impossible to use the same operator both as a infix and postfix operator. That's also why Crush needs to use neg instead of - for negating a number - - is already used as an infix subtraction operator.

Don’t lots of languages use - for negation and minus though?

Those languages separate function call arguments with something, usually commas. This is unambigous:

foo 1, 2, -3

This is ambiguous unless you make your language white space sensitive:

foo 1 2 - 3

I really don't want to make the language more whitespace sensitive than absolutely needed, because that tends to come with huge cans of worms.

That definitely seems quite annoying. I would like to avoid having to escape common filename characters like (, and ) because it seems so … wrong to have to do that. The only way I see that would allow avoiding this is some kind of delimiter before and after a filename, which takes up at least two extra characters.

Well, the closing quote could be auto-inserted (like the pair matching in most editors), and maybe the opening one could be part of the tab-completion suggestion?

Crush mostly solves this by also importing the content of the current directory into your namespace, so if you have a file named foo.txt in your current directory, that means that there is a variable named foo.txt in your namespace. Files in Crush support a / operator that works like the .operator in e.g. Rust, but does file lookup instead of regular member lookup.

That does seem like a cool idea, but I fear that it might be quite fragile – imagine not being able to define a variable if there happens to be a file with that name in the current directory.

You can. Filenames live in in an outermost scope, any variable will shadow them. So potentially, you have the opposite problem, that you meant to reference a file and instead you had a variable lying around with the same name. To protect yourself against that, you can simply prefix your filename with ./, and there is no ambiguity.

That said, I definitely consider this file naming solution to be an experimental feature. I'm trying it out to see how well it works in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment