Skip to content

Instantly share code, notes, and snippets.

@nicklockwood
Last active January 3, 2016 23:49
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save nicklockwood/8537947 to your computer and use it in GitHub Desktop.
Save nicklockwood/8537947 to your computer and use it in GitHub Desktop.
The Perfect Language

It's Impossible

I have reluctantly come to accept that it is impossible to create a "perfect" programming language. Every language is domain-specific in some sense, and many of the criteria that make a language good for one purpose are fundamentally in opposition to qualities that are good for another.

A classic example would be "scripting" languages versus "embedded" languages.

Good qualities in a scripting language are:

  • Dynamic typing (no need to specify types, or cast between them)
  • Interpreted (no need for lengthy pre-compilation)
  • High-level (abstracts away any hardware concerns)
  • Garbage collected (no need to worry about cleaning up memory)

On the other hand, good qualities in an embedded language are:

  • Strict typing (this allows the compiler to use optimal storage and instructions for each type)
  • Compiled (this makes the code compact and efficient)
  • Low-level (allows developer to take full advantage of limited hardware resources)
  • Manual memory management (no unpredicatable thread locking or non-deterministic collection cycles)

So I think we've established that the perfect language is impossible, right? All a language can hope to be is a compromise between these two extremes. It's possible to design a language that strikes a good balance for a given problem domain, but it would always be possible to make any such language either faster or easier to use (but not both).

So it's hopeless. A fool's errand.

With that in mind, let's try to design the perfect language!

Given that we've already established that this is impossible, I don't expect it to be easy. But I believe that since there is no upper limit to how good a language can be, if we start with something reasonable and keep making it better, we can get arbitrarily close to perfection. Think of the perfect language as being like the speed of light; we've proven that it's unattainable, but there is no limit to how close to it you can get, and trying to go faster is still worthwhile.

Requirements Gathering

So what are the criteria that make a good language? We could all argue for hours about whether garbage collection is better than reference counting, so let's avoid specific solutions and focus on requirements instead. We can all agree that a language should be fast at runtime, right? OK, so we'll start there.

  • Speed. At runtime, the ideal language should be as fast as possible.
  • Safe. It should be difficult (or impossible) to write code that crashes, leaks or otherwise behaves incorrectly.
  • Concise. The ideal language should not require you to write code that does not relate to solving your problem.
  • Obvious. The syntx of an ideal language should be memorable and non-obscure.
  • DRY. Implied by the concise requirement, the language should not make you repeat the same information
  • Unambiguous. Very important; The behaviour of any piece of code should be absolutely defined by the language spec
  • Shallow Learning Curve. Simple things should be simple, complex things should be possible

Those seem to be fairly uncontroversial requirements. Now for the less clear-cut stuff:

  • Normalized. Is it better if a language has multiple ways to achieve something, or only one way?
  • Functional. FRP-type concepts reduce state, simplify event handling and make concurrency and parallel processing easy.
  • Object-Oriented. The alternatives are too horrible to contemplate. Aren't they?
  • Introspective. It's really useful when code can infer stuff about itself at runtime. Or is it?
  • Metaprogramming. Are macros a good thing? Do we like compile-time code generation?
  • Polymorphic. Do we like inheritance and method overloading? Are there alternatives?
  • Whitespace Agnostic. Should our language ignore whitespace (like C), or embrace it (like Python)?
  • Symbolic. Is it better to minimise namespace polution by using symbols instead? Does that make code harder to read?
  • Namespaces. These are a basic necessity, right? Prefixing classes (and worse, methods) in Objective-C is not fun.
  • Operator Overloading. This makes vector code in C++ really elegant, but is it worth the cognitive load?

Great Artists Steal

I don't believe any existing language qualifies as "perfect", but there are certainly languages that have some good (and bad) qualities that we can learn from. Here is a thoroughly biased assessment of some of the best and worst features of some popular languages.

Objective-C

Good:

  • Categories. The implementation of these is scary dangerous in Objective-C due to the way the runtime works, but the principle is great. The ability to add new methods to existing classes, or compose classes from several distinct files seems like a real win for code elegance. In our ideal language, we wouldn't neccesarily want to be able to override existing methods with categories though. That seems like a bad idea.

  • Runtime introspection and reflection. The Objective-C implementations of these features are a bit clunky and have annoying limitations (e.g. incomplete information about method argument types), but even with these limitations, amazing things are possible.

  • Performance. With a few exceptions, Objective-C has great performance. This is partly because ARC has minimal overhead compared to most garbage collected memory management solutions, and partly because most of the logic is plain old C.

  • Nil messaging. I love love love the way that Objective-C handles nil messages. So much boilerplate error code is avoided by cascading a nil result up the chain until the object that actually cares if something went wrong can deal with it. It still needs work though. It's a bit easy to have hard-to diagnose errors because of a nil showing up somewhere unexpected and then causing problems much further down the line.

  • Exceptions. Exceptions should be exceptional, and that's how Objective-C treats them. The principle of only throwing exceptions for programmer errors makes code much cleaner. It would be nice if they didn't have to corrupt the stack and take down the whole app though. In an ideal language I'd prefer non-fatal exceptions to be the norm.

Bad:

  • Prefixes. Oh the horror! Prefixing classes is bad enough, but having to prefix private and category method names to avoid accidental collisions makes my skin crawl. And it's not even guaranteed to work! Apple could easily use you TLA for their next framework.

  • Message passing. "What, why?" How can message passing be a bad thing? The problem with message passing is that it is fundamentally unoptimisable. Objective-C methods can always be overriden at runtime, which means that they can never be optimised away, even in trivial cases such as getters/setters. They are also unsafe due to the way that identical selectors with different argument or return types can lead to ambiguous method IMP resolution. They put an upper limit on the speed of the language, making it fundamentally unsuitable for performance-critical applications such as graphics. The perfect language won't use message passing - or at least it won't be the de-facto way to implement methods.

  • Syntax. There are many nice qualities to the objective C method syntax - the way it encodes argument names/types in the method name itself makes code very readable. But Objective-C methods are looooong. They're too long. The way that they wrap is absurd, and makes things even worse if a pre-wrapped method gets wrapped to a narrower width. Compare this:

      Vector *vector = [Vector vectorWithX:100 y:200: z:300]
    

    With some hypothetical alternatives:

      Vector *vector = new Vector(100, 200, 300);
      Vector *vector = Vector.new(100, 200, 300);
      Vector *vector = Vector(x:100, y:200, z:300);
    

    It seems to me that each of the latter options is much nicer, even if they don't all convey as much information.

  • Safety. Objective-C is very crashy. Even with ARC it's pretty easy to crash or leak if you are using threads or blocks. An ideal language wouldn't be this easy to use wrong.

Java

Good:

  • Safe. It's pretty hard to do anything really wrong with Java. It might have bad performance or use too much memory, but it won't leak or crash (we won't count throwing exceptions as crashing in Java, since they can be handled gracefully).

  • Syntax. Java syntax is a pretty nice extension of C. Dot syntax seems like an elegant way to access member methods. I much prefer this to the :: and -> of C++. Not sure about the getThis() and setThat() though. Seems like it could use some @property-esque syntax sugar for accessors.

  • Packages (aka Namespaces). This one's a no-brainer (or so you'd think, Apple).

  • Generics. I was in two minds about putting this under "Good". It's good that Java has them, but they're a blunt solution to the type-safety problem, that requires a lot of extra typing. An ideal language would have the same concept, but with less (or, preferably no) supporting syntax.

  • Enums. Java enums are way more powerful than the simple list of related integer constants in C. They allow you to effectively create constant instances of any class, with arbitrary sub-properties. Still, they're pretty heavyweight to set up, and (AFAIK) there's no way to create sequential integer enums or bitmasks in Java, like you can in C.

Bad:

  • Slow. A really good Java compiler can do a pretty good job with Java performance, but it's still not in the same league as C for high-performance graphics work. And then there's that garbage collector, always waiting in the wings to gum up your CPU with another asynchronous collection cycle.

  • Objects. Java isn't just object-oriented, it's object-obsessed. Data? Put it in an object. Functions? Wrap them in objects. Enums? Objects as well. Java just can't get its head around the idea that if you want a function pointer, maybe you just want a function pointer instead of yet another object! It wouldn't be so bad if functions were objects in Java (as they are in the similarly-named-but-entirely-different JavaScript), but Java's issue is not so much that it treats everything as an object, but rather that it can't do anything useful with anything that isn't.

  • Exceptions. Oh God, the exceptions. So many exceptions. Exceptions for everything, and all of them must be handled of course. No ignoring those exceptions! If something happens all the time, is it really an "exception"?

  • Boilerplate. Do I really need a "public static virtual AbstractHammerFactoryFactory"? All I wanted was a hammer!

Go

Good:

  • Duck-Typed Interfaces. I really like that Go treats any class as conforming to an interface if it implements the required methods. This reduces coupling between files in terms of needing to import shared headers, etc.
  • Composition-Based Inheritance. This is so clever: If you embed a struct of type Foo inside a struct Bar then you can call Foo's methods on instances of Bar and it just does the right thing. This allows all of the benefits of inheritance but without most of the problems, and it allows for multiple inheritance, and it still lets you precisely control data layout within the struct.

Bad:

  • Syntactically Ugly. This is highly subjective of course, but I just don't like the way Go looks. It has most of the flaws of C, and the things it changes (e.g. putting types after variables names and methods), while logical, look alien to me.

JavaScript

Good:

  • Object / Array Literal Syntax. Though many other language have similar syntax, JavaScript's approach for defining objects/hashmaps, arrays and regular expression literals is very nice.
  • Flexibility. JavaScript's closure and prototypical inheritance system is atually incredibly flexible, and although it an be confusing to novices, it is interesting how almost any programming paradigm (tradition single and multiple inheritance, mixins, interfaces, aspect-oriented programming, funcitonal programming, etc) can be easily grafted onto JavaScript as a metasyntax.

Bad:

  • Too Flexible. It's possible to replace any property or method of any object at runtime with JavaScript, and whilst this is very flexible, it makes it almost impossible to do any pre-validation of JavScript code, or to guarantee that a piece of code does not contain trivial type errors that would be caught at compile time in other languages.
  • Slow. JavaScript's dynamic nature makes it incredibly hard to optimise. True advance compilation is impossible due to the ability to generate code and reorganised classes at runtime, so the best that can be achieved is trace-JIT compiling, where code paths are compiled as needed. This combined with garbage collection and weak typing makes the language much slower than C, even though most implementations have been dramatically optimised in recent years.

Memory Management

There are a number of different approaches to memory management, each with strengths and weaknesses. Here are the most common. Which of these solutions (if any) should our ideal language use?

Manual Memory Management

This is the approach used by C/C++. In MMM, the programmer is responsible for releasing every block of memory when they are finished with it.

This is the most efficient approach froma runtime point of view, but places a great deal of burden on the developer to keep track of allocations and ownership. Determining which part of a program is responsible for releasing a given piece of memory can be a complex exercise, leading to the the development of patterns such as...

Reference Counting

With reference counting, each object maintains a count of the number of pointers/references to it. Each time a reference is added, the count (known as the "retain" count) is incremented. Each time a reference is removed, it's is decremented. When the retain count goes to zero, the object's memory is released.

This retain count can either be maintained manually by the programmer, or can be automated, either as part of the language runtime (e.g. Visual Basic) or by static analysis at compile time (e.g. Objective-C's ARC).

Reference counting involves extra work at runtime (incrementing and decrementing the retain count each time a reference is made), and is consequently a bit slower than MMM. The advantage over other automated systems like Garbage Collection is that the process is deterministic - a developer can predict when a piece of memory will be released, and it will be the same each time the program is run.

A disadvantage of Reference Counting over Garabage Collection is that it does not protect against retain cycles, where one or more objects that are otherwise isolated from the system cannot be released because they retain themselves or each other.

Garbage Collection

There are various forms of Garbage Collector, but they basically all follow the same pattern: The GC keeps track of all allocated memory blocks, and repeatedly scans the program for pointers to these blocks. If the scan reveals that there are no longer any pointers to a given block, it will be released.

Garbage Collection is the "safest" ememory management system because it avoids retain cycles, but it is non-deterministic because there is no way for the programmer to control when a block of memory might be released, which cane lead to excessive memory consumption (objects being retained too long) or performance problems (large blocks of memory being released at an inconveient time, during a peformance-critical task).

The End?

No, just the end of the beginning. I don't have answers to most of the questions above yet. That's where you come in. I'd like to see arguments for or against the "controversial" features listed above. Which of these are necessary for a good language? Are any of these actually bad? Are there superior alternatives?

Comments welcome.

@perlfly
Copy link

perlfly commented Jan 21, 2014

If I'm not getting it wrong, objective-c lets you switch from object oriented paradigm, to functional paradigm in the same implementation file. (Thanks to its standard C base)

Why can't be done something similar in order to switch from scripting needs to embedded needs.
Just thinking ... I'm not a programming language theorist

@agis
Copy link

agis commented Jan 21, 2014

Given that we've already established that this is impossible, I don't expect it to be easy.

How can impossible be easy?

@perlfly
Copy link

perlfly commented Jan 21, 2014

Reflecting on why most of us pray for a "perfect" programming language at least once a day ...
Are we searching the lost keys under the lamp, just because there is the light?

The first time I see a new method, I appreciate it has a long descriptive name. After using it a lot, I start asking for a shorter name :-|

Maybe what we really need is some kind of "Cognitive" IDE?

Among the "controversial" features, I think that it should not be Normalized mainly for two reasons:

  • No way that there is only one way to achieve something, so trying to enforce it could bring to a lot of frustrating and wired artefacts.
  • Suppose the language can enforce Normalization. If someone find a better way to achieve something (optimisations and kind of), it couldn't be "achievable" ... so the language would not be perfect.

@nicklockwood
Copy link
Author

@perlfly by "Normalized" I just mean that the language should try to have multiple different syntaxes for the same operation as this increases the learning surface area and makes the user have to wonder things like "is this way more efficient, or this way", etc.

Examples:

In Objective-C you can loop over an array using any of the following:

for (int i = 0; i < [array count]; i++)
{
    id element = array[i];
}

for (id element in array)
{
    ...
}

[array enumerateObjectsUsingBlock:(void (^)(id obj, BOOL *stop) {
    ...
}];

And there's not really any indication which is the most efficient way (spoiler: it's the 3rd, according to bbum, who probably knows what he's talking about: http://stackoverflow.com/a/4487012/422133). Obviously this is because ObjC is an evolved language and the latter two methods were added later, but when designing a language from scratch it's better to try to avoid such duplication if possible.

I like that Go has only one loop type for example (for) so you don't have to decide if a for or while loop will be better.

@nicklockwood
Copy link
Author

@Agis- that's the point.

@nicklockwood
Copy link
Author

@perlfly switching between two different paradigms in one file is a solution to the performance vs simplicity thing, but it's not really an optimal solution. It places the burden on the developer to decide which to use at any given point in time, and they still have to effectively learn two languages.

It feels like this is something that the compiler should be able to do for you. Both functional and object-oriented code has to be converted to procedural code at assembly time anyway, so in principle the compiler could convert between paradigms as appropriate.

PS, C is "procedural", rather than "functional" in the sense of FRP. Not the same thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment