I’m working on a new language (Fjord) for a shell (fj). Although I have some ideas of my own for syntax, I’m not sure if they’re a really bad idea or if they’re fine, so I’ve decided to conduct a ‘sanity check’ of sorts by writing some preliminary ideas down here. Please respond down in the comments with any thoughts you have!
Note: throughout this document I’ll refer to functions, which are what I’m calling commands.
As this is a language for a shell,
- brevity is extremely highly valued (the more common something is, the easier it should be to type)
- function calls are more common than anything else
- string interpolation is also pretty common
Expressions can be grouped so that they are evaluated first by wrapping them in parentheses:
(1 + 2) * 5
Inspired by languages like Haskell and ML, I think that function calls should use simple juxtaposition:
add 1 2
This greatly reduces typing, which is important, as function calls are the most common operation in a shell. Imagine instead of typing ls /path/to/dir1 /path/to/dir2
you had to use the traditional syntax to call a function, and had to type ls(/path/to/dir1, /path/to/dir2)
!
Those same languages from before, Haskell and ML, don’t differentiate syntactically between a variable and a function call without parameters. I would like to avoid this for a number of reasons:
- it makes implementation a bit more complex
- syntax highlighting that differentiates between functions and variables becomes extremely complex, if not impossible, to do accurately
- using a variable has a much lower potential cost than calling a function – it is easier to see parts of the program that might be slow if they are different
Since functions are more often used than variables in the context of a shell, I decided to add syntax beyond just writing their name to variables, rather than functions.
The obvious choice is to prefix variable names with $
, as this is used by all kinds of languages for one purpose or another: PHP, Perl, every shell I’ve ever seen, Swift (state properties), Rust (macro_rules!
), the list goes on. However, after seeing that the Rust crate quote uses #
to interpolate variables, I realised:
#
is probably a better choice than$
since it’s easier to type- I can use any syntax I want – the choice isn’t so obvious
After some experimentation, I think that prefixing variable names with .
is the ‘best’ choice, without looking too out of place. What do you think?
How to separate words in different case conventions:
- snake_case: hold shift, press the hyphen/underscore key, let go of shift before you start typing the next word
- kebab-case: press the hyphen/underscore key
- camelCase: hold shift, type the first character of the next word, let go before you type the next character
The final shell will hopefully have tab-completion that supports case-insensitivity, so here is what that list would look like if case isn’t a consideration
- snake_case: hold shift, press the hyphen/underscore key, let go of shift before you start typing the next word
- kebab-case: press the hyphen/underscore key
- camelCase: nothing
Of course, the tab completion could also intelligently convert hyphens to underscores and vice-versa:
- snake_case: press the hyphen/underscore key
- kebab-case: press the hyphen/underscore key
- camelCase: nothing
Camel case is still the easiest to type choice, so that’s what I’ve decided on. What’s your opinion?
This one is pretty obvious to me:
let name = value
But I guess the equals sign is implied …
let name value
Do you even need let
?
name value
Now this looks exactly like a function call. Maybe it’s better with just the equals sign?
name = value
This is the most concise choice, and is also familiar to users of Haskell, Python and Ruby (and probably others). Or is there a better option I haven’t considered?
Most languages have a separate syntax for defining functions and variables:
// JavaScript
function name(param) {
body
}
var variable = value; // ‘var’ could also be ‘let’ to declare a constant
// Rust
fn name(param: SomeType) -> AnotherType {
body
}
let variable = value;
Lots of modern languages have support for lambdas, closures, function literals, anonymous functions, whatever you call them. This leads to a duplication of the ways to define a function:
// Rust
// using normal function declaration syntax
fn name(param: SomeType) -> AnotherType {
body
}
// using a closure
let name = |param| {
body
};
// JavaScript
function name(param) {
body
}
// using an arrow function
let name = (param) => {
body
};
# Python
def name(param):
body
# using a lambda
name = lambda param : body
Although the ‘anonymous function form’ of all these examples is usually limited in some way compared to normal functions that would stop you from using them for everything, I still found this a little annoying.
Haskell uses a very similar syntax for function and variable definitions:
varName = value
functionName param = body
Haskell also, however, also has the same duplication from before, with a lambda syntax of its own:
name param = body
-- using an anonymous function
name = (\param -> body)
I really like the look and brevity of Haskell’s normal function definitions, but want to avoid the ‘duplication’ with anonymous functions. Initially something like this springs to mind:
name = param {
body
}
But I plan on adding support for block expressions, which would mean that the syntax above would be confused with a function call with the name param
and a parameter whose value is whatever body
evaluates to.
So maybe something like this?
name = |param| {
body
}
It is easier to type almost any character other than |
, though:
name = fn param {
body
}
Maybe it shouldn’t be required that function definitions use a block expression though?
name = fn param body
Some kind of separation would be nice:
name = fn param -> body
Although the arrow looks very nice, it would be easier to type if :
is used instead:
name = fn param : body
Now we don’t need the fn
any more:
name = param : body
Here’s what this syntax looks like with multiple parameters:
name = param1 param2 : body
Those block expressions can of course be used for the body of a function:
name = param1 param2 : {
do
lots
of
stuff
}
What do you think function definitions should look like? Do you think that the duplication of syntaxes for normal and anonymous functions is needed for some reason?
Simple enough: use double quotes. I’ve decided against single quotes because too many strings contain single quotes themselves, which would all require escaping.
Personally, I don’t like how some languages give you the choice of quotes, because it leads to inconsistency.
I think string interpolation deserves a special syntax for the string itself, like Python does in its f-strings:
"Hello, Sarah!" # literal
f"Hello, {name}!" # interpolation
I like how it only takes one extra character to create an f-string, so this is something I hope to copy for Fjord. Besides, string interpolation is one of the most common tasks in a shell. This is in contrast to Ruby, where strings that have interpolations don’t get any differentiation from literals:
"Hello, Sarah!" # literal
"Hello, #{name}!" # interpolation
Swift has the same ‘problem’:
"Hello, Sarah!" // literal
"Hello, \(name)!" // interpolation
Here is what string interpolations could look like for Fjord, using the variable usage syntax from before:
f"Hello, .name!"
But this doesn’t allow for arbitrary expressions like all the syntaxes above do, only variables, so some kind of delimiter around the interpolation is needed. After playing around with it for a bit, I came to the conclusion that the curly-brace syntax that Python uses is my favourite.
f"Hello, {.name}!"
This is two extra characters compared to a traditional shell, such as bash in this example:
"Hello, $name!"
This is a little misleading, however, as that syntax can’t be used for interpolating any arbitrary expression like Fjord’s can:
f"Hello, {getUserName}!"
The syntax to do the same thing in bash takes the same number of characters:
"Hello, $(getusername)!" # I wrote it in lowercase because camel case command names look wrong
Do you think that strings containing interpolations should get a different syntax to literals? Do you think that you should be able to interpolate any expression, or is being able to interpolate just variables enough? Are you a fan of Python’s f-string syntax that I nicked for Fjord?
A convention has arisen over the decades for passing options to commands:
$ command -o # short option name
$ command --option # long option name
$ command --speed=25 # option with value
$ command --speed 25 # most commands support using a space instead of =
$ command --flag --speed 25 "positional arguments follow options"
This isn’t set in stone, so sometimes I’m caught off guard by a command that doesn’t completely follow the convention:
$ find . -name '*foo*' # for some reason find uses a single dash for option names
$ command --speed=25 --speed 25 # some commands accept only one of these forms
$ ls /path/to/directory --long # the GNU utilities support putting options after
# positional arguments, while most commands don’t
I’ve realised that the whole ‘options’ convention that has appeared over time bears a striking resemblance to the handling of function parameters in some languages. Apart from being able to pass an option without a value (this can be viewed as equivalent to setting it to true
), options are exactly like named parameters which can have default values.
As my main programming language is Rust (which doesn’t have named or default parameters), I’m not really familiar with these concepts. Python’s approach to named parameters and default paramter values seems very reasonable, so maybe Fjord could imitate it:
downloadUrl = url timeout=5 httpVersion=1.1 : doTheThing
These are all equivalent:
downloadUrl "https://google.com"
downloadUrl url="https://google.com" 5
downloadUrl timeout=5 url="https://google.com"
I’m not so sure about that =
without any space around it – it kind of irks me. Here’s what it looks like with Swift-/Ruby-style colons:
downloadUrl = url timeout: 5 httpVersion=1.1 : doTheThing
downloadUrl "https://google.com"
downloadUrl url: "https://google.com" 5
downloadUrl timeout: 5 url: "https://google.com"
It looks a little strange to me without commas separating the arguments, so I think I prefer Python’s style for now.
This still doesn’t take into account how the command option convention has short option names and how if you don’t give an option a value it’s equivalent to setting its value to true
. If we ignore those two features, here’s what a call to ls
that uses a few different parameters could look like:
ls all=true long=true color=never /path/to/dir1 /path/to/dir2
Here’s what that looks like in a traditional shell:
$ ls -al --color=never /path/to/dir1 /path/to/dir2
# With long option names
$ ls --all --long --color=never /path/to/dir1 /path/to/dir2
Much, much cleaner. I’m not really sure how to integrate short named parameters and if-you-pass-a-named-paramter-without-a-value-it’s-a-boolean-true, so if you have any ideas, I’d appreciate it.
@liljencrantz
That’s an interesting design idea, I’ve never heard about that before. What would happen if you went to ‘execute’
(1 2 3 4)
? Would it ‘return’1
? Or would it throw an error that1
doesn’t take any parameters?And what if you are just inspecting a variable? I assume that, if it is a REPL, you could use the following:
Or is every variable a ‘function’ that takes no parameters? My mind is kind of broken by this :)
I definitely agree that consistency is more important than anything else – I just thought that if I needed to pick one, I might as well pick one suited to the use case. What do you think of making the use of a case convention other than ‘the chosen one’ a warning?
How does having a separate syntax for declaration and assignment prevent typos? Is it that it catches a situation in which you meant to declare a variable earlier, but forgot to, so try to assign without declaration?
It makes sense to me in Rust, though, because the whole mutablity/immutability thing, as well as the type, has to be decided at the binding’s entry point.
Interesting design decision, I might consider that.
I think that whole ‘unambiguous prefix’ thing is a recipe for unmaintainable code, so I’ll be avoiding that! But maybe with tab completion a lack of short option names won’t be such a hindrance. What about options without values? Do you think it’s worth adding a shorthand for
=true
, simply because of how common it is?What would you think of such a shorthand:
I plan on adding some kind of a special syntax for filenames, possibly using single quotes: