The debate about strongly typed vs weak/dynamically typed languages is ancient. Here's a (hopefully) new take :P
Yesterday before I fell asleep I had an idea of a new type of function calling handling that ends up at a funny compromise between the two: it's more strict than say C++ in some ways, less in many others. I'm calling this idea a "strongly named" language.
First a summary of some basics of where this idea comes from:
Traditional strongly typed languages (C/C++/Java/C#/etc) catches the errors where you're trying to use a variable as an integer but it's actually a string. But if you're passing an integer that is a measure of meters into a function that expects an integer of feet this is silently accepted and maybe ends up resulting in a failed Mars mission. Not perfect. (And no, boost units or F# units or similar doesn't generally help, it just handled this specific case of mismatched units).
Traditional weakly typed languages (PHP) automatically tries to figure out what you actually wanted, turning 10 + "1"
into 11. This is just a disaster waiting to happen so let's just learn nothing from this :P
Python is strongly typed but dynamic which means it doesn't check anything until the very last moment. This can result in crashes deep inside your code in production because some code wanted a string and it got an int or a nil. This sounds really really bad but is surprisingly ok to work with. But it does require that you have pretty good coverage of your tests or other tooling just to catch simple spelling errors.
Another inspiration I have is Objective-C. Many are horrified at the syntax of calling a method in Objective-C but the thing Objective-C does correct is it strongly favors readability/maintainability by naming all the parameters:
[@"foo" drawAtX:10 y:4 width:6 height:8 font:@"helvetica"]
the equivalent code in C/Java/Python is much less readable because you need to look up the parameters in documentation or source:
drawString("foo", 10, 4, 6, 8, "helvetica")
Since C/C++/Java are statically typed languages you can change the signature and get errors if the types don't match up, but in the above example it's obvious that you can't, for example, flip x & y with width & height because that'd silently pass the wrong parameters in the wrong place. Objective-C however handles this case correctly because the name of the method changes from drawAtX:y:width:height:font:
to drawWithWidth:height:atX:y:font:
.
Now I think it's time to note that even though Objective-C is statically typed like C/C++ it is actually NOT the static type check that saves our bacon here. Even if there was no check on the types at all we would still have been saved by the compiler.
This got me thinking: what if we had a language with python-style late strong type checks but with compile-time (or parse-time) NAME checking?
A problem with Objective-C is that a lot of code ends up with stuff like [@"foo" drawAtX:x y:y width:width height:height font:font]
. Saying "foo:foo" a lot of times quickly feels silly. Wouldn't it better if you only had to put in the name of the parameter if it differed from your variable name? So our example above would be closer to [@"foo" drawAt x y width height font]
. But if we had another string we'd like to draw at y2 we'd have to do: [@"foo" drawAt x y:y2 width height font]
. Now if the compiler checks these names we can safely change our signature and trust that we'll get errors in all the places we need to change and we still get all the readability of the best case of python AND the best case of Objective-C at the same time!
What if we did this with return values too?
Let's look at a practical example namely parsing some ISO8601 dates. Here's a bit of python code to parse the timezone info from a string:
1. def parse_timezone(s):
2. assert s[0] in {'+', '-'}
3. sign, timezone = s[0], s[1:]
4. timezone = parse_time(timezone)
5. minutes = timezone.hour*60 + timezone.minute
6. if sign == '-':
7. minutes = -minutes
8. return TimeZone(timedelta(minutes=minutes))
Trying to run this in our hypothetical strongly named language would produce a bunch of errors:
- line 1: no name defined for what is returned
- line 4: the variable
timezone
we assign to doesn't match the name thatparse_time
returns (it returnstime
for future reference)
We also see that on line 8 we're writing minutes=minutes
which is a bit redundant. Here a first stab at the same thing in a strongly named pythonic language:
1. def timezone:parse_timezone(_:s):
2. assert s[0] in {'+', '-'}
3. sign, timezone = s[0], s[1:]
4. time = parse_time(timezone)
5. minutes = hours_to_minutes(time.hours) + time.minutes
6. if sign == '-':
7. minutes = -minutes
8. return TimeZone(timedelta(minutes))
9.
10. def minutes:hour_to_minutes(hours):
11. minutes<-hours = hours * 60
12. return minutes
- line 1: we define our returned name as
timezone
._:s
here means "I don't care what the variable that comes in here is called, but insideparse_timezone
it will be calleds
. - line 4: we've renamed our variable to
time
to make the compiler happy. We could have instead told the compiler to accept the rename by writingtimezone<-time =...
but this change is easier, shorter and increases readability. - line 5: we're calling
hours_to_minutes
and the members oftime
are renamed to their plural form because in this language you have to be careful about stuff like that :P - line 8: timedelta() now gets just
minutes
passed. The signature oftimedelta()
istimedelta(hours=None, minutes=None, seconds=None)
. Since the name of our variableminutes
matches the second argument it's passed in there and not tohours
. - line 10-11: define the function hours_to_minutes. Note the
<-
variable which is a way to say that the variable with the namehours
should be put into a variable with the nameminutes
.
The changes to this code aren't very big. We've had to write a little bit more, deleted some text and rename a variable. But what checks are the compiler performing on this code?
- line 4:
parse_time()
returntime
so it must be assigned to a variabletime
. Ifparse_time()
is changed to return something else this line won't compile anymore - line 5 and 7:
minutes
are assigned tominutes
, everything is fine - line 8:
timedelta
take an argumentminutes
as second argument as mentioned before, but there are two checks more on this line 8:TimeZone
takes an argumenttimedelta
as argument and returnstimezone
which matches the return name of the function. - line 11: we explicitly say here that we know why we're going from one name to another
This code is longer and obviously still a bit more cumbersome. Let's give it another shot. Since the return name of the function is explicit we can also use that to match which function to call. Let's use that to simplify the code:
1. def timezone:parse(_:s):
2. assert s[0] in {'+', '-'}
3. sign, timezone = s[0], s[1:]
4. time = parse(timezone)
5. minutes = convert(time.hours) + time.minutes
6. if sign == '-':
7. minutes = -minutes
8. return TimeZone(timedelta(minutes))
9.
10. def minutes:convert(hours):
11. minutes<-hours = hours * 60
12. return minutes
- line 1: renamed the function to
parse
or reallytimezone:parse(_)
since the names of inputs and returns are a part of the name of the function for lookup purposes - line 4: since we know the return name we know that this must be a call to
time:parse(_)
- line 5: call the conversion function
minutes:convert(hours)
This is a little bit longer, but for such a language a lot of standard conversion functions like minutes:convert(hours)
should be available as convenience functions in the standard library and if you discount that the strongly named example is actually 16 characters shorter even though it's arguably even more strictly checked than most C++ code.
This is obviously a toy example, but I believe and hope that this method would be even more useful for big code bases where readability and maintainability is even more important. Comments and suggestions are welcome!
Markdown combines adjacent lines with no space between them into one paragraph. This feature is messing up your line-by-line error lists and making them really hard to read. You probably want to use bulleted lists, the closest equivalent. Write them like
* list item
.For other readers, until this Gist is updated, here’s a version with fixed formatting. It also enables syntax highlighting of code, which I think improves readability.