secret
Last active

A Strongly Named Language

  • Download Gist
gistfile1.md
Markdown

A Strongly Named Language

The debate about strongly typed vs weak/dynamically typed languages is ancient. Here's a (hopefully) new take :P

Yesterday before I fell asleep I had an idea of a new type of function calling handling that ends up at a funny compromise between the two: it's more strict than say C++ in some ways, less in many others. I'm calling this idea a "strongly named" language.

First a summary of some basics of where this idea comes from:

Traditional strongly typed languages (C/C++/Java/C#/etc) catches the errors where you're trying to use a variable as an integer but it's actually a string. But if you're passing an integer that is a measure of meters into a function that expects an integer of feet this is silently accepted and maybe ends up resulting in a failed Mars mission. Not perfect. (And no, boost units or F# units or similar doesn't generally help, it just handled this specific case of mismatched units).

Traditional weakly typed languages (PHP) automatically tries to figure out what you actually wanted, turning 10 + "1" into 11. This is just a disaster waiting to happen so let's just learn nothing from this :P

Python is strongly typed but dynamic which means it doesn't check anything until the very last moment. This can result in crashes deep inside your code in production because some code wanted a string and it got an int or a nil. This sounds really really bad but is surprisingly ok to work with. But it does require that you have pretty good coverage of your tests or other tooling just to catch simple spelling errors.

Another inspiration I have is Objective-C. Many are horrified at the syntax of calling a method in Objective-C but the thing Objective-C does correct is it strongly favors readability/maintainability by naming all the parameters:

[@"foo" drawAtX:10 y:4 width:6 height:8 font:@"helvetica"]

the equivalent code in C/Java/Python is much less readable because you need to look up the parameters in documentation or source:

drawString("foo", 10, 4, 6, 8, "helvetica")

Since C/C++/Java are statically typed languages you can change the signature and get errors if the types don't match up, but in the above example it's obvious that you can't, for example, flip x & y with width & height because that'd silently pass the wrong parameters in the wrong place. Objective-C however handles this case correctly because the name of the method changes from drawAtX:y:width:height:font: to drawWithWidth:height:atX:y:font:.

Now I think it's time to note that even though Objective-C is statically typed like C/C++ it is actually NOT the static type check that saves our bacon here. Even if there was no check on the types at all we would still have been saved by the compiler.

This got me thinking: what if we had a language with python-style late strong type checks but with compile-time (or parse-time) NAME checking?

A problem with Objective-C is that a lot of code ends up with stuff like [@"foo" drawAtX:x y:y width:width height:height font:font]. Saying "foo:foo" a lot of times quickly feels silly. Wouldn't it better if you only had to put in the name of the parameter if it differed from your variable name? So our example above would be closer to [@"foo" drawAt x y width height font]. But if we had another string we'd like to draw at y2 we'd have to do: [@"foo" drawAt x y:y2 width height font]. Now if the compiler checks these names we can safely change our signature and trust that we'll get errors in all the places we need to change and we still get all the readability of the best case of python AND the best case of Objective-C at the same time!

What if we did this with return values too?

Let's look at a practical example namely parsing some ISO8601 dates. Here's a bit of python code to parse the timezone info from a string:

1. def parse_timezone(s):
2.     assert s[0] in {'+', '-'}
3.     sign, timezone = s[0], s[1:]
4.     timezone = parse_time(timezone)
5.     minutes = timezone.hour*60 + timezone.minute
6.     if sign == '-':
7.         minutes = -minutes
8.     return TimeZone(timedelta(minutes=minutes))

Trying to run this in our hypothetical strongly named language would produce a bunch of errors:

  • line 1: no name defined for what is returned
  • line 4: the variable timezone we assign to doesn't match the name that parse_time returns (it returns time for future reference)

We also see that on line 8 we're writing minutes=minutes which is a bit redundant. Here a first stab at the same thing in a strongly named pythonic language:

 1. def timezone:parse_timezone(_:s):
 2.     assert s[0] in {'+', '-'}
 3.     sign, timezone = s[0], s[1:]
 4.     time = parse_time(timezone)
 5.     minutes = hours_to_minutes(time.hours) + time.minutes
 6.     if sign == '-':
 7.         minutes = -minutes
 8.     return TimeZone(timedelta(minutes))
 9.
10. def minutes:hour_to_minutes(hours):
11.     minutes<-hours = hours * 60
12.     return minutes
  • line 1: we define our returned name as timezone. _:s here means "I don't care what the variable that comes in here is called, but inside parse_timezone it will be called s.
  • line 4: we've renamed our variable to time to make the compiler happy. We could have instead told the compiler to accept the rename by writing timezone<-time =... but this change is easier, shorter and increases readability.
  • line 5: we're calling hours_to_minutes and the members of time are renamed to their plural form because in this language you have to be careful about stuff like that :P
  • line 8: timedelta() now gets just minutes passed. The signature of timedelta() is timedelta(hours=None, minutes=None, seconds=None). Since the name of our variable minutes matches the second argument it's passed in there and not to hours.
  • line 10-11: define the function hours_to_minutes. Note the <- variable which is a way to say that the variable with the name hours should be put into a variable with the name minutes.

The changes to this code aren't very big. We've had to write a little bit more, deleted some text and rename a variable. But what checks are the compiler performing on this code?

  • line 4: parse_time() return time so it must be assigned to a variable time. If parse_time() is changed to return something else this line won't compile anymore
  • line 5 and 7: minutes are assigned to minutes, everything is fine
  • line 8: timedelta take an argument minutes as second argument as mentioned before, but there are two checks more on this line 8: TimeZone takes an argument timedelta as argument and returns timezone which matches the return name of the function.
  • line 11: we explicitly say here that we know why we're going from one name to another

This code is longer and obviously still a bit more cumbersome. Let's give it another shot. Since the return name of the function is explicit we can also use that to match which function to call. Let's use that to simplify the code:

 1. def timezone:parse(_:s):
 2.     assert s[0] in {'+', '-'}
 3.     sign, timezone = s[0], s[1:]
 4.     time = parse(timezone)
 5.     minutes = convert(time.hours) + time.minutes
 6.     if sign == '-':
 7.         minutes = -minutes
 8.     return TimeZone(timedelta(minutes))
 9.
10. def minutes:convert(hours):
11.     minutes<-hours = hours * 60
12.     return minutes
  • line 1: renamed the function to parse or really timezone:parse(_) since the names of inputs and returns are a part of the name of the function for lookup purposes
  • line 4: since we know the return name we know that this must be a call to time:parse(_)
  • line 5: call the conversion function minutes:convert(hours)

This is a little bit longer, but for such a language a lot of standard conversion functions like minutes:convert(hours) should be available as convenience functions in the standard library and if you discount that the strongly named example is actually 16 characters shorter even though it's arguably even more strictly checked than most C++ code.

This is obviously a toy example, but I believe and hope that this method would be even more useful for big code bases where readability and maintainability is even more important. Comments and suggestions are welcome!

It is an interesting idea, in the sense that it looks like you can reduce the amount of typing, provided that people follow a consistent naming model. It also might be considered an intuitive way of encoding information in variable names and forcing some level of consistency, which currently in general often poorly named.

The main issues I see are 1/ whether this becomes a maintenance problem (e.g. refactoring - changing the name of a variable, is slightly more complex if it has been used in a function invocation), 2/ whether localization issues make this unworkable, e.g. code written by non-English coders and 3/ whether the benefit of the encoding mechanisms outweighs the difficulty of breaking tradition.

I think this would be an interesting idea to try. I feel like you could make a prototype of it in JavaScript if you wanted to.

I was going to claim that you confused strong/weak with static/dynamic, but then you correctly state that Python is strong and dynamic. However, the languages you list at the opening (C, C++, and Java – I’m not familiar enough with C#) are weakly typed (because of casting) but static.

That said, I like your idea. My own language is a bit like Objective-C in the named-parameter sense (except the parameters are a map, so you can pass them in any order rather than having to remember the order the API designer used). I’ll have to try adding some macros to see if I can support omitting the label when it matches the variable name. Handling returned names wouldn’t really work with my semantics, since it can’t determine the name from the argument position.

C can be extremely strongly typed. Just enforce Strong typing as part of your coding standard and you can't flip arguments to a function (like width and height). i.e. typedef width_t int; typedef heigh_t int;

Markdown combines adjacent lines with no space between them into one paragraph. This feature is messing up your line-by-line error lists and making them really hard to read. You probably want to use bulleted lists, the closest equivalent. Write them like * list item.

For other readers, until this Gist is updated, here’s a version with fixed formatting. It also enables syntax highlighting of code, which I think improves readability.

@roryokane thanks!

@Hezix my point is that it really isn't in practice and I think that's a consequence of the language design at least partially.

Interesting idea. What would you do when calling a function twice? Say you had three points of a triangle and you wanted to compute the distances of two sides. If you had a function, say 'def meters:compute_distance(a, b)', and tried to call it twice you would have to use the same variable name 'meters' twice, correct?

Please sign in to comment on this gist.

Something went wrong with that request. Please try again.