Skip to content

Instantly share code, notes, and snippets.

@non
Last active January 9, 2024 22:06
Show Gist options
  • Save non/ec48b0a7343db8291b92 to your computer and use it in GitHub Desktop.
Save non/ec48b0a7343db8291b92 to your computer and use it in GitHub Desktop.
answer @nuttycom

What is the appeal of dynamically-typed languages?

Kris Nuttycombe asks:

I genuinely wish I understood the appeal of unityped languages better. Can someone who really knows both well-typed and unityped explain?

I think the terms well-typed and unityped are a bit of question-begging here (you might as well say good-typed versus bad-typed), so instead I will say statically-typed and dynamically-typed.

I'm going to approach this article using Scala to stand-in for static typing and Python for dynamic typing. I feel like I am credibly proficient both languages: I don't currently write a lot of Python, but I still have affection for the language, and have probably written hundreds of thousands of lines of Python code over the years.

Losing static guarantees

Obviously the biggest problem with writing Python compared to Scala is that you have many fewer static guarantees about what the program does. I'm not going to sugarcoat this -- it's a big disadvantage.

Most of the other advantages have to be understood in terms of this. If you value compile-time guarantees you may be tempted not to acknowledge the advantages. I think this is a mistake. If you really want to understand what makes writing Python appealing (or even fun), you have to be willing to suspend disbelief.

Less rope

At least when I was writing Python, the right way to think about Python's types was structural: types are strong (a string can't become a number for example), but best understood as a collection of capabilities. Rather than asserting that x is a Collator, and then calling .collate() once that fact is established, we just call .collate() directly.

Because types are not enforced, you would not tend to use them to guide API decisions. Compared to languages like Scala or Java, Python strongly encourages APIs that don't require exotic types, many parameters (unless sensible defaults can be provided for almost all of them), or deeply-nested structures.

In an interesting way, it keeps you from going all-in on object-oriented programming. You only want to create a class when a casual API user will understand how it works and what they would use it for. Otherwise, you tend to prefer using static methods (or similar) that can act on simpler data types. Similarly, there is strong pressure to use the standard collection types (lists, sets, dictionaries) in almost all cases.

This has a number of consequences:

  • You rarely have to wade through huge class hierarchies with poor documentation
  • APIs tend to be strongly-focused on what you want to do
  • You get much more mileage out of learning the default collections' APIs
  • Custom collections feel stronger pressure to conform to default APIs
  • You can assume most data has a useful string representation
  • You rarely have to worry about baked-in limitations of your types

To address the last point: you rarely have to worry about someone baking in the wrong collection type, or numeric type. If you have a type that behaves similarly, you can use that instead.

(A corollary here is that someone who comes to Python from e.g. Java may tend to produce APIs that are very hard to use.)

Distinctions without a difference

In Scala or Java, you often end up with tons of classes that are essentially data containers. Case classes do a great job of minimizing the boilerplate of these. But in Python, all of these classes are just tuples. You don't have to try to come up with names for them, or anything else. It is really liberating to be able to build modules that are much smaller, by virtue of not worrying about having to give things names, or which class to use.

Abstraction as pattern recognition

Abstracting over things like variance become much simpler. It's interesting that many Python programmers see the major difference between lists and tuples as immutability (tuples are immutable), but it makes sense when you consider that both can be iterated over, indexed by number, and have no limitations on their size. Compare this to the difficulties of correctly expressing and abstracting over product types in Scala. Even with Shapeless' help, it is a lot of work.

More generally, finding abstractions in Python feels much more like pattern recognition. If you see two stanzas of code that are essentially the same, it is trivial to abstract over their differences and write a common method. This is true even when the differences are down to:

  • Field or method names used
  • Arity of functions or tuples
  • Classes instantiated
  • Class or package names
  • Imports needed

In static typing, these sorts of abstractions involve figuring out how to relate the types of ASTs as well as the AST shape itself. It doesn't feel as much like a pure abstraction or compression problem as it does in a dynamic language.

Speculative programming

John De Goes made a point about fixing dynamic programs one-error-at-a-time, versus hundreds of compiler errors at once. I think he's right about that, but I don't think it does justice to why the approach sometimes feels better.

One of the first things we have to learn as programmers is how to emulate what the "machine" (a computer, interpreter, VM, whatever) is going to do. We learn to trace through programs imagining counters incrementing, data being allocated, functions being called, etc. We can use print to view intermediate values, we can raise exceptions to halt the process at an intermediate point, etc.

There are arguments that this is the wrong way to do programming. I find some of them convincing. But even for most people using static types, this is how they determine what their program will do, how they assemble it out of other programs, how they debug it, etc. Even those of us who wish programming was more like writing a proof do this.

One advantage Python has is that this same faculty that you are using to create a program is also used to test it and debug it. When you hit a confusing error, you are learning how the runtime is executing your code based on its state, which feels broadly useful (after all, you were trying to imagine what it would do when you wrote the code).

By contrast, writing Scala, you have to have a grasp on how two different systems work. You still have a runtime (the JVM) which is allocating memory, calling methods, doing I/O, and possibly throwing exceptions, just like Python. But you also have the compiler, which is creating (and inferring) types, checking your invariants, and doing a whole host of other things. There's no good way to peek inside that process and see what it is doing. Most people probably never develop great intuitions around how typing works, how complex types are encoded and used by the compiler, etc. (Although in Scala we are fortunate to have a lot of folks like Stephen Compall, Miles Sabin, and Jason Zaugg who do and are happy to talk about it.)

Not having to learn (or think about) this whole parallel system of constraints and proofs is really nice. I think it's easy for those of us who have learned both systems to ignore the intellectual cost to someone who is getting started.

An obvious question is why we have to mentally emulate a machine at all? In the long run I'm not sure we do. But with the current offering of statically-typed languages most folks are likely to use, I think we still do.

Where's the fire?

People are often confused that many scientists seem to love Python. But I think it makes sense.

Static typing is most useful in large, shared codebases where many of the main risks are misusing someone else's API, failing to refactor something correctly, or dealing with long-lived codebases full of deeply-nested interacting structures.

By contrast, a scientist's main concerns are probably mathematical errors (most of which the type system won't catch), methodological problems (even less likely to be caught) and overall code complexity. They are also unlikely to maintain code for very long periods of time or share codebases. This is someone for whom an empirical (and dynamic) runtime debugging process probably seems more pleasant than trying to understand what the type system and compiler are complaining about. (Even after their program compiles they will probably need to do the runtime testing anyway.)

Conclusion

I don't plan to stop writing code in Scala, Haskell, or Rust (or even C). And when I write Python these days, I do find that I miss the static guaranteees and type-driven development. But I don't hate writing Python, and when I'm writing Scala I still find things to envy.

@non
Copy link
Author

non commented May 13, 2015

@chris-martin For the purposes of this essay, I just meant a type a reader is unlikely to have encountered before, which is specific to the API in question and/or not obviously useful. I feel like I see this in Java APIs a lot. For example, think of all the classes that live in java.io._ and now consider an author who just wants to iterate over the lines in a file. I would argue that many of the types involved in the "correct" Java/Scala solution feel exotic, at least compared to:

for line in open(path, 'r'):
    ...

Obviously you can write a Scala library that makes this particular example easier (I wrote one myself) but the argument there is that statically-typed languages may be more prone to this issue.

@EntilZha
Copy link

That said, of course the data science ecosystem on the JVM is much more limited - yes, you can use JavaCV, NLP librries like Stanford CoreNLP, etc., but there's no real scikit-learn equivalent, etc. But, as an ML engineer, I'd be tasked with building that stuff, which is different from consuming it the way data scientists do.

@mkolod your thoughts exactly echo my own. I use python mostly, but for at least a good portion of the time I would prefer to be in writing in scala. However, the libraries for Scala/JVM for data science aren't as well oiled as Python ones. A good example is the seamless integration between pandas, sklearn, matplotlib, and seaborn

@mdedetrich
Copy link

@barendventer

This reads more like a comparison between structural and nominal typing than between static and dynamic typing.

From a purely rational PoV sure, but then you get languages like Go (which is a statically compiled structural language) that get a huge amount of flack for similar reasons

@shalmuyakubov
Copy link

@nuttycom, I wonder why nobody notices an elephant in the room - refactoring code in dynamic languages. Maybe people don't know what refactoring is and why is it important? I am sure you do know, but to those who don't, let me give you a simple example.

Say, you have a project of 100 classes. One of the classes has a property "id". You realize that it should be called "position" not "id" and you want to rename it. Safely. However there are 4 other classes with property "id" and in all those classes the name "id" is correct. It means that you cannot just "search/replace" id in the entire project. Oh, also important: all those five classes are used everywhere, in all 100 others. Now then: how do you rename it safely in dynamic language? You can try, then run the program and if you're lucky - you'll see the errors. But what if there is one piece of code that uses this "id" somewhere in the menu you rarely access? I know how to do it in typed language: go ahead and just rename the property, then hit "compile" and see all the errors, go fix them one by one and you're done.

And it's only the simplest refactoring case. There is also "extract class", "inline class", "duplicated code removal", etcetera, etcetera.

So, maybe there is some magic tool to do it safely in dynamic language I am not familiar with? Please share.

Or maybe, people think "refactoring is not that important"? I beg to differ, because when you deal with big projects, refactoring is inevitable at some point.

@nuttycom, do you have an answer for it?

@perlun
Copy link

perlun commented Mar 30, 2017

@shalmuyakubov, great points. I also happen to believe that refactoring is much harder in dynamic languages than in the static world. The IDEs and tools have by nature a harder time supporting such scenarios in the dynamic world.

I think the traditional answer to this in the dynamic camp is "unit tests", but I'm not sure I'd say it fully solves this. Static languages let me refactor fearlessly. Dynamic languages make it, unfortunately, much harder, because of their nature.


(I wish all language authors reading this would realize that this is a major reason why TypeScript is gaining in popularity. It is a fact that refactoring is much, much easier with TypeScript code than plain JavaScript. I have seen it with my own eyes.)

@yoelblum
Copy link

yoelblum commented May 16, 2018

@perlun how do you refactor "fearlessly" in a static language ? maybe something in the front end is gonna break? maybe the compilation passed but your change will cause some logic to fail somewhere? (i'm not talking only about renaming, but changing method implementation, behavior etc). You're making it sound as if you can change whole methods and classes and be done with it just because your IDE isn't showing any errors, that's far from true. I'm sure java programmers write tests and do manual testing with the browser like the rest of us dynamic programmers to make sure their sh** works.

You may be able to refactor with "less fear", but honestly I think people give way too much credit to "compilation correctness".

@yoelblum
Copy link

The trend I'm seeing is languages like c# or java are trying to become more dynamic and concise ('var' etc) while languages like python/ruby are getting better and better IDEs or tooling meant to solve some of the pain you are talking about. So it seems to me like the gap between the two worlds is narrowing.

@DrMetallius
Copy link

DrMetallius commented Jul 21, 2018

This writeup focuses too much on why dynamic typing makes it easier to write, but completely leaves out the fact that it also makes the code more difficult to read. And reading code constitutes a far larger portion of the time programmer spends.

Most consequences of dynamic typing in the section "Less rope" are mainly outlining that everything you have is nameless tuples and collections. It's great if you are writing this code and know what is supposed to be at a given place, but how are you supposed to do that in the code not written by you? When you have to "wade through huge class hierarchies with poor documentation", yeah, that's bad, but why is it supposed to be better when you reduce this to just "bunch of collections and tuples with poor documentation"? Now you don't even have the type information to rely on, all you have is trial and error and source-code diving.

The author even writes,

But in Python, all of these classes are just tuples. You don't have to try to come up with names for them, or anything else.

Yeah, it's great when you are hacking something really fast. But how am I supposed to know what a (String, String) is? Is that first name and last name? Is that login and password? How am I supposed to understand that? Right, named tuples fix it. And guess what? You've just invented a class. For example, in Kotlin I would most of the time write data class Person(val firstName: String, val lastName: String). Isn't this the same minus the type declarations?

The "Speculative programming" section is probably the only one I agree with somewhat, but even here you have to remember that "learning how the runtime is executing your code" may come to bite you later if you've made wrong assumptions about the types. Let's say I've got one of those nameless collections which is supposed to represent an HTTP response. I see a response code, I see a body, I use them. But how do I know what happens when a 500 error happens? Will I get a special entry for the error message? How will it be named? Until this error actually happens, no idea. Good if I can read this in the documentation. If not - lengthy source code reading. And if I had a type for the response, it would've been trivial.

@yoelblum That's because you are thinking about refactoring in terms of dynamic languages. You don't just replace id with position with find-replace and pray it compiles, you let the tools do that for you. Since they have all available type information, they are guaranteed to do the correct thing. Of course, there are more complex cases of refactoring where you do need unit tests, but all the trivial cases are handled effortlessly.

As to your second post, there's nothing dynamic about var in Java, there still is a very concrete type behind it. You just don't have to write it out explicitly. Type inference is pretty common among statically typed languages, and it has been there for ages, just not in Java.

@FrankHB
Copy link

FrankHB commented Aug 11, 2018

Here is a missed point in aspect of language design and its evolution: derivation of several statically typed languages based on a single untyped language is far more easier and cheaper than the approach based on a statically typed language. Concerning portability and the learning curve, static typing is often poor on performance-price ratio as a general-purposed language design strategy adapting to various tasks, where ideally both dynamic typing and static typing are useful and single hard-coded would cause hatred due to lack of generality, sooner or later. (What the heck I wasted time learning such a disabled type system, why I cannot re-spec the type system, so on.)

Most people ignore the fundamental things just because they have no chance to make new languages (or merely bare-metal type system implementations) by themselves. If they do invent, say, Typed Racket based on Racket, they should have recognized the importance about such facts - the order of specification and experience of implementations would make significant differences, at least on the amount of the work. (Personally I use my homebrew language similar to the Kernel language to overcome the problems.)

Some other few specialists, like Robert Harper, would not agree this, as they may insist static typing is the nature of any non-trivial language. However, philosophy preference should not work well here. Though perhaps some of them always want to neglect - untyped lambda calculus does have more expressiveness over typed ones; set theories still beat category theory in many cases... But as long as no lang spec is written, who should care?

BTW, similar point is also valid for code/data separation. Separation is trivial, but deduplicating the design of language features to erase the gap is not. Reflection (if bloating in the spec is not cared) is relatively easy to add to a language spec, but homoiconicity is not. Lessons are enough.

@FrankHB
Copy link

FrankHB commented Aug 11, 2018

@DrMetallius Difficulties on reading should be expected in any non-disciplined use of a seriously designed language. To be short, if a language leads to such difficulties, it is the fault of either a) the reader; b) the writer; c) the designer of the language; not dynamic typing. A properly designed dynamic and dynamically typed language can ultimately pretend it looks like a static and statically typed language by using some uniformed in-object-language extensions (e.g. hygienic macros), even without any re-implementation of the language itself. (The performance of a naive implementation would be worse, but it is a QoI issue anyway.)

@tiesvandeven
Copy link

Nice article, but the way i see it, is that you always have types... in any language. The types are just made explicit in typed languages.
In both typed and untyped languages, a function would require a (String, String, int) type for example. So you will still need to look at the function documentation, or even the implementation to see what type you need to supply to the function, or else the code will just fail at runtime. In a static type enviroment, you assign a name to this type, and the compiler checks it for you.

So both typed and dynamically typed languages have types :P, so basically the whole discussion is about doing a bit of extra work implementing API objects, and get compile time help and checking, vs not having to do that extra work, and rely on tests do do this for you.

@aoeu256
Copy link

aoeu256 commented Aug 6, 2019

For refactoring a piece of Python code that has 100 classes, I don't think you need classes as much in most dynamic languages because you have closures, hashtables, instance attributes, etc... Unless your doing GUI/Simulation programming don't use classes, use closures when possible since classes are more brittle and verbose. I guess to refactor Dynamic code you'll need do several hard things: use a type inference tool that pretends that Python is a statically typed language like Haskell, I believe Pycharm/mypy does this. Every time you set or get the id field you can log the callstack and linenumber by using Python's inspect with curframe().f_back stuff, or sys.set_trace , or you can use class decorators / metaclasses by installing a metaclass to every single object in your program that checks if id is being written and checks the object type. But you still need to hit every code path : (. In Lisp this is less of a problem than Python because you can redefine methods to be macros to get static checking, and its simple parse tree allows you to build a walker (its like that one page eval but with some lines changed)that lets you do type inference or even add your own typing system (look at stuff like clojure.spec, clojure.typed, SBCL's type inference, TypedRacket, etc...).

@FrankHB
Copy link

FrankHB commented Nov 9, 2019

is that you always have types... in any language

Correct and wrong. In the sense of the meta language, this is more or less true since you always follows the rules with normalized properties which can be modeled as a type system with plural nontrivial nominal types. This is false because an object language can easily having the ability to build the type system from the scratch without relying on any typing rules given by the meta language, but only some deduction rules instead. In the case of the latter, types can be never in the language (provided by the language rules determined by the language designers), but outside of the language to be modeled (by users) instead. Anything you called as a "type" is initially unknown before being expressed by some untyped terms in the object language.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment