Skip to content

Instantly share code, notes, and snippets.

@lunacookies
Last active May 7, 2020 01:37
Show Gist options
  • Save lunacookies/760300c80245632d7c73e3c326ac6c4a to your computer and use it in GitHub Desktop.
Save lunacookies/760300c80245632d7c73e3c326ac6c4a to your computer and use it in GitHub Desktop.
Some Syntax Ideas

What is this?

I’m working on a new language (Fjord) for a shell (fj). Although I have some ideas of my own for syntax, I’m not sure if they’re a really bad idea or if they’re fine, so I’ve decided to conduct a ‘sanity check’ of sorts by writing some preliminary ideas down here. Please respond down in the comments with any thoughts you have!

General philosophy

Note: throughout this document I’ll refer to functions, which are what I’m calling commands.

As this is a language for a shell,

  • brevity is extremely highly valued (the more common something is, the easier it should be to type)
  • function calls are more common than anything else
  • string interpolation is also pretty common

Grouping

Expressions can be grouped so that they are evaluated first by wrapping them in parentheses:

(1 + 2) * 5

Function calls

Inspired by languages like Haskell and ML, I think that function calls should use simple juxtaposition:

add 1 2

This greatly reduces typing, which is important, as function calls are the most common operation in a shell. Imagine instead of typing ls /path/to/dir1 /path/to/dir2 you had to use the traditional syntax to call a function, and had to type ls(/path/to/dir1, /path/to/dir2)!

Variable usages

Those same languages from before, Haskell and ML, don’t differentiate syntactically between a variable and a function call without parameters. I would like to avoid this for a number of reasons:

  • it makes implementation a bit more complex
  • syntax highlighting that differentiates between functions and variables becomes extremely complex, if not impossible, to do accurately
  • using a variable has a much lower potential cost than calling a function – it is easier to see parts of the program that might be slow if they are different

Since functions are more often used than variables in the context of a shell, I decided to add syntax beyond just writing their name to variables, rather than functions.

The obvious choice is to prefix variable names with $, as this is used by all kinds of languages for one purpose or another: PHP, Perl, every shell I’ve ever seen, Swift (state properties), Rust (macro_rules!), the list goes on. However, after seeing that the Rust crate quote uses # to interpolate variables, I realised:

  • # is probably a better choice than $ since it’s easier to type
  • I can use any syntax I want – the choice isn’t so obvious

After some experimentation, I think that prefixing variable names with . is the ‘best’ choice, without looking too out of place. What do you think?

Case conventions

How to separate words in different case conventions:

  • snake_case: hold shift, press the hyphen/underscore key, let go of shift before you start typing the next word
  • kebab-case: press the hyphen/underscore key
  • camelCase: hold shift, type the first character of the next word, let go before you type the next character

The final shell will hopefully have tab-completion that supports case-insensitivity, so here is what that list would look like if case isn’t a consideration

  • snake_case: hold shift, press the hyphen/underscore key, let go of shift before you start typing the next word
  • kebab-case: press the hyphen/underscore key
  • camelCase: nothing

Of course, the tab completion could also intelligently convert hyphens to underscores and vice-versa:

  • snake_case: press the hyphen/underscore key
  • kebab-case: press the hyphen/underscore key
  • camelCase: nothing

Camel case is still the easiest to type choice, so that’s what I’ve decided on. What’s your opinion?

Variable definitions

This one is pretty obvious to me:

let name = value

But I guess the equals sign is implied …

let name value

Do you even need let?

name value

Now this looks exactly like a function call. Maybe it’s better with just the equals sign?

name = value

This is the most concise choice, and is also familiar to users of Haskell, Python and Ruby (and probably others). Or is there a better option I haven’t considered?

Function definitions

Most languages have a separate syntax for defining functions and variables:

// JavaScript

function name(param) {
    body
}

var variable = value; // ‘var’ could also be ‘let’ to declare a constant
// Rust

fn name(param: SomeType) -> AnotherType {
    body
}

let variable = value;

Lots of modern languages have support for lambdas, closures, function literals, anonymous functions, whatever you call them. This leads to a duplication of the ways to define a function:

// Rust

// using normal function declaration syntax
fn name(param: SomeType) -> AnotherType {
    body
}

// using a closure
let name = |param| {
    body
};
// JavaScript

function name(param) {
    body
}

// using an arrow function
let name = (param) => {
    body
};
# Python

def name(param):
    body

# using a lambda
name = lambda param : body

Although the ‘anonymous function form’ of all these examples is usually limited in some way compared to normal functions that would stop you from using them for everything, I still found this a little annoying.

Haskell uses a very similar syntax for function and variable definitions:

varName = value
functionName param = body

Haskell also, however, also has the same duplication from before, with a lambda syntax of its own:

name param = body

-- using an anonymous function
name = (\param -> body)

I really like the look and brevity of Haskell’s normal function definitions, but want to avoid the ‘duplication’ with anonymous functions. Initially something like this springs to mind:

name = param {
    body
}

But I plan on adding support for block expressions, which would mean that the syntax above would be confused with a function call with the name param and a parameter whose value is whatever body evaluates to.

So maybe something like this?

name = |param| {
    body
}

It is easier to type almost any character other than |, though:

name = fn param {
    body
}

Maybe it shouldn’t be required that function definitions use a block expression though?

name = fn param body

Some kind of separation would be nice:

name = fn param -> body

Although the arrow looks very nice, it would be easier to type if : is used instead:

name = fn param : body

Now we don’t need the fn any more:

name = param : body

Here’s what this syntax looks like with multiple parameters:

name = param1 param2 : body

Those block expressions can of course be used for the body of a function:

name = param1 param2 : {
    do
    lots
    of
    stuff
}

What do you think function definitions should look like? Do you think that the duplication of syntaxes for normal and anonymous functions is needed for some reason?

Strings

Simple enough: use double quotes. I’ve decided against single quotes because too many strings contain single quotes themselves, which would all require escaping.

Personally, I don’t like how some languages give you the choice of quotes, because it leads to inconsistency.

String interpolation

I think string interpolation deserves a special syntax for the string itself, like Python does in its f-strings:

"Hello, Sarah!"   # literal
f"Hello, {name}!" # interpolation

I like how it only takes one extra character to create an f-string, so this is something I hope to copy for Fjord. Besides, string interpolation is one of the most common tasks in a shell. This is in contrast to Ruby, where strings that have interpolations don’t get any differentiation from literals:

"Hello, Sarah!"   # literal
"Hello, #{name}!" # interpolation

Swift has the same ‘problem’:

"Hello, Sarah!"   // literal
"Hello, \(name)!" // interpolation

Here is what string interpolations could look like for Fjord, using the variable usage syntax from before:

f"Hello, .name!"

But this doesn’t allow for arbitrary expressions like all the syntaxes above do, only variables, so some kind of delimiter around the interpolation is needed. After playing around with it for a bit, I came to the conclusion that the curly-brace syntax that Python uses is my favourite.

f"Hello, {.name}!"

This is two extra characters compared to a traditional shell, such as bash in this example:

"Hello, $name!"

This is a little misleading, however, as that syntax can’t be used for interpolating any arbitrary expression like Fjord’s can:

f"Hello, {getUserName}!"

The syntax to do the same thing in bash takes the same number of characters:

"Hello, $(getusername)!" # I wrote it in lowercase because camel case command names look wrong

Do you think that strings containing interpolations should get a different syntax to literals? Do you think that you should be able to interpolate any expression, or is being able to interpolate just variables enough? Are you a fan of Python’s f-string syntax that I nicked for Fjord?

Options and named function parameters

A convention has arisen over the decades for passing options to commands:

$ command -o # short option name
$ command --option # long option name
$ command --speed=25 # option with value
$ command --speed 25 # most commands support using a space instead of =
$ command --flag --speed 25 "positional arguments follow options"

This isn’t set in stone, so sometimes I’m caught off guard by a command that doesn’t completely follow the convention:

$ find . -name '*foo*' # for some reason find uses a single dash for option names
$ command --speed=25 --speed 25 # some commands accept only one of these forms
$ ls /path/to/directory --long # the GNU utilities support putting options after
                               # positional arguments, while most commands don’t

I’ve realised that the whole ‘options’ convention that has appeared over time bears a striking resemblance to the handling of function parameters in some languages. Apart from being able to pass an option without a value (this can be viewed as equivalent to setting it to true), options are exactly like named parameters which can have default values.

As my main programming language is Rust (which doesn’t have named or default parameters), I’m not really familiar with these concepts. Python’s approach to named parameters and default paramter values seems very reasonable, so maybe Fjord could imitate it:

downloadUrl = url timeout=5 httpVersion=1.1 : doTheThing

These are all equivalent:

downloadUrl "https://google.com"
downloadUrl url="https://google.com" 5
downloadUrl timeout=5 url="https://google.com"

I’m not so sure about that = without any space around it – it kind of irks me. Here’s what it looks like with Swift-/Ruby-style colons:

downloadUrl = url timeout: 5 httpVersion=1.1 : doTheThing
downloadUrl "https://google.com"
downloadUrl url: "https://google.com" 5
downloadUrl timeout: 5 url: "https://google.com"

It looks a little strange to me without commas separating the arguments, so I think I prefer Python’s style for now.

This still doesn’t take into account how the command option convention has short option names and how if you don’t give an option a value it’s equivalent to setting its value to true. If we ignore those two features, here’s what a call to ls that uses a few different parameters could look like:

ls all=true long=true color=never /path/to/dir1 /path/to/dir2

Here’s what that looks like in a traditional shell:

$ ls -al --color=never /path/to/dir1 /path/to/dir2

# With long option names
$ ls --all --long --color=never /path/to/dir1 /path/to/dir2

Much, much cleaner. I’m not really sure how to integrate short named parameters and if-you-pass-a-named-paramter-without-a-value-it’s-a-boolean-true, so if you have any ideas, I’d appreciate it.

@liljencrantz
Copy link

Do you think that this might be overcomplicating things? Because it sure seems very complex to me :)

Maybe. I feel the syntax is really convenient and almost always does what you want, though.

Is something like this an example?

let mut counter = 0;
for i in 0..100 {
    counter += 1;
    let counter = 0; // This doesn’t have any effect because it’s a new variable that disappears at the end of the scope
}

I guess. If you're dealing with multiple scopes, I believe it's really important to be able to differentiate between reassigning to a variable in an outer scope vs declaring a new variable in an inner scope that has the same name as some other variable.

I’m not really sure I understand why this isn’t parseable though. If no whitespace is allowed around the =, if it sees a space after = then it knows to default to true.

But then you've suddenly made your language whitespace sensitive in a way that I believe will trip up a lot of people. If foo bar = baz is different from foo bar=baz, there will be endless errors as a result.

Because of the lack of separators between expressions, it's impossible to use the same operator both as a infix and postfix operator. That's also why Crush needs to use neg instead of - for negating a number - - is already used as an infix subtraction operator.

Don’t lots of languages use - for negation and minus though?

Those languages separate function call arguments with something, usually commas. This is unambigous:

foo 1, 2, -3

This is ambiguous unless you make your language white space sensitive:

foo 1 2 - 3

I really don't want to make the language more whitespace sensitive than absolutely needed, because that tends to come with huge cans of worms.

That definitely seems quite annoying. I would like to avoid having to escape common filename characters like (, and ) because it seems so … wrong to have to do that. The only way I see that would allow avoiding this is some kind of delimiter before and after a filename, which takes up at least two extra characters.

Well, the closing quote could be auto-inserted (like the pair matching in most editors), and maybe the opening one could be part of the tab-completion suggestion?

Crush mostly solves this by also importing the content of the current directory into your namespace, so if you have a file named foo.txt in your current directory, that means that there is a variable named foo.txt in your namespace. Files in Crush support a / operator that works like the .operator in e.g. Rust, but does file lookup instead of regular member lookup.

That does seem like a cool idea, but I fear that it might be quite fragile – imagine not being able to define a variable if there happens to be a file with that name in the current directory.

You can. Filenames live in in an outermost scope, any variable will shadow them. So potentially, you have the opposite problem, that you meant to reference a file and instead you had a variable lying around with the same name. To protect yourself against that, you can simply prefix your filename with ./, and there is no ambiguity.

That said, I definitely consider this file naming solution to be an experimental feature. I'm trying it out to see how well it works in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment