Skip to content

Instantly share code, notes, and snippets.

@boxed
Last active December 18, 2015 13:09
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save boxed/ef4184597912eafdc99f to your computer and use it in GitHub Desktop.
Save boxed/ef4184597912eafdc99f to your computer and use it in GitHub Desktop.
A Strongly Named Language

A Strongly Named Language

The debate about strongly typed vs weak/dynamically typed languages is ancient. Here's a (hopefully) new take :P

Yesterday before I fell asleep I had an idea of a new type of function calling handling that ends up at a funny compromise between the two: it's more strict than say C++ in some ways, less in many others. I'm calling this idea a "strongly named" language.

First a summary of some basics of where this idea comes from:

Traditional strongly typed languages (C/C++/Java/C#/etc) catches the errors where you're trying to use a variable as an integer but it's actually a string. But if you're passing an integer that is a measure of meters into a function that expects an integer of feet this is silently accepted and maybe ends up resulting in a failed Mars mission. Not perfect. (And no, boost units or F# units or similar doesn't generally help, it just handled this specific case of mismatched units).

Traditional weakly typed languages (PHP) automatically tries to figure out what you actually wanted, turning 10 + "1" into 11. This is just a disaster waiting to happen so let's just learn nothing from this :P

Python is strongly typed but dynamic which means it doesn't check anything until the very last moment. This can result in crashes deep inside your code in production because some code wanted a string and it got an int or a nil. This sounds really really bad but is surprisingly ok to work with. But it does require that you have pretty good coverage of your tests or other tooling just to catch simple spelling errors.

Another inspiration I have is Objective-C. Many are horrified at the syntax of calling a method in Objective-C but the thing Objective-C does correct is it strongly favors readability/maintainability by naming all the parameters:

[@"foo" drawAtX:10 y:4 width:6 height:8 font:@"helvetica"]

the equivalent code in C/Java/Python is much less readable because you need to look up the parameters in documentation or source:

drawString("foo", 10, 4, 6, 8, "helvetica")

Since C/C++/Java are statically typed languages you can change the signature and get errors if the types don't match up, but in the above example it's obvious that you can't, for example, flip x & y with width & height because that'd silently pass the wrong parameters in the wrong place. Objective-C however handles this case correctly because the name of the method changes from drawAtX:y:width:height:font: to drawWithWidth:height:atX:y:font:.

Now I think it's time to note that even though Objective-C is statically typed like C/C++ it is actually NOT the static type check that saves our bacon here. Even if there was no check on the types at all we would still have been saved by the compiler.

This got me thinking: what if we had a language with python-style late strong type checks but with compile-time (or parse-time) NAME checking?

A problem with Objective-C is that a lot of code ends up with stuff like [@"foo" drawAtX:x y:y width:width height:height font:font]. Saying "foo:foo" a lot of times quickly feels silly. Wouldn't it better if you only had to put in the name of the parameter if it differed from your variable name? So our example above would be closer to [@"foo" drawAt x y width height font]. But if we had another string we'd like to draw at y2 we'd have to do: [@"foo" drawAt x y:y2 width height font]. Now if the compiler checks these names we can safely change our signature and trust that we'll get errors in all the places we need to change and we still get all the readability of the best case of python AND the best case of Objective-C at the same time!

What if we did this with return values too?

Let's look at a practical example namely parsing some ISO8601 dates. Here's a bit of python code to parse the timezone info from a string:

1. def parse_timezone(s):
2.     assert s[0] in {'+', '-'}
3.     sign, timezone = s[0], s[1:]
4.     timezone = parse_time(timezone)
5.     minutes = timezone.hour*60 + timezone.minute
6.     if sign == '-':
7.         minutes = -minutes
8.     return TimeZone(timedelta(minutes=minutes))

Trying to run this in our hypothetical strongly named language would produce a bunch of errors:

  • line 1: no name defined for what is returned
  • line 4: the variable timezone we assign to doesn't match the name that parse_time returns (it returns time for future reference)

We also see that on line 8 we're writing minutes=minutes which is a bit redundant. Here a first stab at the same thing in a strongly named pythonic language:

 1. def timezone:parse_timezone(_:s):
 2.     assert s[0] in {'+', '-'}
 3.     sign, timezone = s[0], s[1:]
 4.     time = parse_time(timezone)
 5.     minutes = hours_to_minutes(time.hours) + time.minutes
 6.     if sign == '-':
 7.         minutes = -minutes
 8.     return TimeZone(timedelta(minutes))
 9.
10. def minutes:hour_to_minutes(hours):
11.     minutes<-hours = hours * 60
12.     return minutes
  • line 1: we define our returned name as timezone. _:s here means "I don't care what the variable that comes in here is called, but inside parse_timezone it will be called s.
  • line 4: we've renamed our variable to time to make the compiler happy. We could have instead told the compiler to accept the rename by writing timezone<-time =... but this change is easier, shorter and increases readability.
  • line 5: we're calling hours_to_minutes and the members of time are renamed to their plural form because in this language you have to be careful about stuff like that :P
  • line 8: timedelta() now gets just minutes passed. The signature of timedelta() is timedelta(hours=None, minutes=None, seconds=None). Since the name of our variable minutes matches the second argument it's passed in there and not to hours.
  • line 10-11: define the function hours_to_minutes. Note the <- variable which is a way to say that the variable with the name hours should be put into a variable with the name minutes.

The changes to this code aren't very big. We've had to write a little bit more, deleted some text and rename a variable. But what checks are the compiler performing on this code?

  • line 4: parse_time() return time so it must be assigned to a variable time. If parse_time() is changed to return something else this line won't compile anymore
  • line 5 and 7: minutes are assigned to minutes, everything is fine
  • line 8: timedelta take an argument minutes as second argument as mentioned before, but there are two checks more on this line 8: TimeZone takes an argument timedelta as argument and returns timezone which matches the return name of the function.
  • line 11: we explicitly say here that we know why we're going from one name to another

This code is longer and obviously still a bit more cumbersome. Let's give it another shot. Since the return name of the function is explicit we can also use that to match which function to call. Let's use that to simplify the code:

 1. def timezone:parse(_:s):
 2.     assert s[0] in {'+', '-'}
 3.     sign, timezone = s[0], s[1:]
 4.     time = parse(timezone)
 5.     minutes = convert(time.hours) + time.minutes
 6.     if sign == '-':
 7.         minutes = -minutes
 8.     return TimeZone(timedelta(minutes))
 9.
10. def minutes:convert(hours):
11.     minutes<-hours = hours * 60
12.     return minutes
  • line 1: renamed the function to parse or really timezone:parse(_) since the names of inputs and returns are a part of the name of the function for lookup purposes
  • line 4: since we know the return name we know that this must be a call to time:parse(_)
  • line 5: call the conversion function minutes:convert(hours)

This is a little bit longer, but for such a language a lot of standard conversion functions like minutes:convert(hours) should be available as convenience functions in the standard library and if you discount that the strongly named example is actually 16 characters shorter even though it's arguably even more strictly checked than most C++ code.

This is obviously a toy example, but I believe and hope that this method would be even more useful for big code bases where readability and maintainability is even more important. Comments and suggestions are welcome!

@boxed
Copy link
Author

boxed commented Jun 19, 2013

@Hezix my point is that it really isn't in practice and I think that's a consequence of the language design at least partially.

@TrevorSundberg
Copy link

Interesting idea. What would you do when calling a function twice? Say you had three points of a triangle and you wanted to compute the distances of two sides. If you had a function, say 'def meters:compute_distance(a, b)', and tried to call it twice you would have to use the same variable name 'meters' twice, correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment