Skip to content

Instantly share code, notes, and snippets.

@RocketPuppy
Last active November 3, 2017 18:07
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RocketPuppy/5be05115250dea34393bf91cdf5e7802 to your computer and use it in GitHub Desktop.
Save RocketPuppy/5be05115250dea34393bf91cdf5e7802 to your computer and use it in GitHub Desktop.

Uncle Bob, you recently wrote a post titled Types And Tests. It generated quite a lot of buzz. It was also, in my opinion, a bit naive regarding the current state of affairs in type systems. This made it difficult for me to understand exactly what your points were and I would like to understand those better. To help clarify those I'm going to provide counter examples to your claims and propose other questions that both might help me better understand what you mean. Hopefully someone else finds this useful as well. I'll be using Ruby and Haskell for most code snippets as they are representative of the two sides and I'm most familiar with them right now.


No, types are not tests. Type systems are not tests. Type checking is not testing. Here’s why.

This quote wasn't really a comprehensive point, but I want to take it as an opportunity to point out the difference between types and tests to my understanding. After all it's important to make sure that we're arguing from the same base, otherwise we'll just talk past each other.

I agree that types are not tests. They both are tools used to provide guarantees and place constraints on how code can change. The primary difference is in how they do this. A test takes a single, representative example, initializes the system or performs an operation with it, and then checks that the final state conforms to a set of assertions. Here's a trivial test:

describe 'add' do
  it 'returns a number when given two numbers' do
    expect((1 + 2).class).to be Number
  end
end

The test says that for the numbers 1 and 2, when I add them together I get a number back. This is sometimes a reasonable test to write, especially in the presence of languages like Javascript which will do implicit coercion from numeric values to string values. Notice that this doesn't say anything about what happens if I pass two different numbers to +, or if I pass a number and a string.

Where a test speaks about specific, representative examples, a type speaks about sets of values with certain properties. A type can speak about all numbers, or all strings, whereas a test cannot because those sets of values are infinite. A type can also speak about all Foo, where Foo is a domain object in our domain. This immediately lets us make much broader constraints that make our code safer to change. Here's an example type:

add :: Num a => a -> a -> a

Here this says that the add function must take two Numeric values (these can be integers, floats, etc. as long as they are both the same type) and return a Numeric value of the same type. That is, this will add floats, integers, rationals, etc. together but won't add two values of different types. This is different from the Ruby version, which will do some implicit casting. It is primarily just a design choice to not do any implicit casting of numeric values, and not necessarily a shortcoming of the type system (see this article on the Haskell wiki).


Types do not specify behavior. Types are constraints placed, by the programmer, upon the textual elements of the program. Those constraints reduce the number of ways that different parts of the program text can refer to each other.

This is the paragraph that I, and probably many others, found most objectionable. In the second part I think you're getting at the idea that type checkers are static and all they can operate on is the text of the program at rest. This is true. However they do not operate on the raw textual elements of the program at all, but on the syntactical elements. The phrasing as written seems to relegate type checkers to nothing more than glorified spell checkers, which is far from their true nature.

The first portion of this paragraph I also found objectionable. It says "Types do not specify behavior." I've heard this word "behavior" before in the context of "Objects have behavior", but I haven't heard a consistent definition of it yet. I'm curious what your definition is. I want to give an example of what I find is a reasonable test to write and its equivalent type.

describe 'Order#purchase' do
  it "raises an error when the order isn't confirmed" do
    order = Order.new(price: 3.00, status: :unconfirmed)

    expect { order.purchase }.to raise UnconfirmedOrder
  end

  it "changes the status of the order to purchased" do
    order = Order.new(price: 3.00, status: :confirmed)

    expect(order.purchased?).to be false

    order.purchase

    expect(order.purchased?).to be true
  end
end

This is a test that specifies the behavior of some method purchase on some object Order. It specifies two of what I consider behaviors of the method: that only confirmed orders can be purchased, and purchasing a confirmed order changes its status to be purchased.

Here's the corresponding type definitions in Haskell.

data Order status = Order { price :: Integer }
data PurchasedStatus
data ConfirmedStatus

purchase :: Order ConfirmedStatus -> Order PurchasedStatus

I've defined three datatypes here. The first is what an Order looks like. It just has a price which is an Integer. It also is parameterized on a status type. The second two are empty types. That means you can't construct a value of those types, but you can use it in a type signature as a tag. These are what I use to parameterize Order in the signature of purchase. Read in the language of types, the signature of purchase says two things. First it says that I cannot call purchase with anything but an Order that has been confirmed. This is equivalent to the first test above. The second thing the signature says is that I can treat the Order that is returned as if it has been purchased. This is equivalent to the second test above. Note again that the tests are only guaranteed to apply a constraint for those specific orders that were created in the test, whereas the types apply the constraint for all possible order values. This technique used in the Haskell example, if you're interested, is called "phantom types".

I'll give a fair shake to the Ruby type system in that it can also achieve some of the same guarantees as we do with the Haskell type system. See:

class ConfirmedOrder
  def purchase
    # do magic
    return PurchasedOrder.new
  end
end

class PurchasedOrder
end

In the same manner that I said purchase can only be called on a confirmed order in the Haskell code, now I can only call purchase on a confirmed order in the Ruby code.

Note that I did not specify in either tests or types what exactly happens when an order is purchased. It might be that a credit card is charged or an email is sent. Those are both things that we would test regardless of the type system.

My question to you, Uncle Bob, is if this is an example of behavior that you would test? If it is, I think you'll be forced to agree that types can specify behavior, and types can replace some tests. If it is not I would like to know why, and what behavior you would then test.


You might therefore say that the type system is a kind of “test” that fails for all inappropriate invocations of f. I might concede that point except for one thing – the way f is called has nothing to do with the required behavior of the system. Rather it is a test of an arbitrary constraint imposed by the programmer. A constraint that was likely over specified from the point of view of the system requirements.

I'd like to ask what an inappropriate invocation of f is. For example, if f is Order#purchase as above then is this an inappropriate invocation?

unconfirmed_order = Order.unconfirmed.find(params[:id])
unconfirmed_order.purchase

I think this is inappropriate, because I already established above that Order#purchase should not be called on unconfirmed orders. If it is not inappropriate, why is that? Let's assume it is inappropriate. Doesn't that invocation impede the required behavior of the system? As another example, is this inappropriate:

credit_card = CreditCard.new(number: nil, expiration: nil, name: nil, cvv: nil)
credit_card.charge(5.00)

Assume that the number, expiration, name and nil parameters are required to make a charge. This seems totally inappropriate to me, and also contrary to the desired behavior of the system.

Finally I'd like to propose that the constraints specified by tests and the constraints specified by types overlap. So what makes the constraint specified by a type any more arbitrary and over specified than the constraint specified by a test?


Now this is no small thing. Writing program text that is internally consistent is pretty important. Inconsistencies and ambiguities in program text can lead to misbehaviors of all kinds. So I don’t want to downplay this at all.

On the other hand, internal self-consistency does not mean the program exhibits the correct behavior. Behavior and self-consistency are orthogonal concepts. Well behaved programs can be, and have been, written in languages with high ambiguity and low internal consistency. Badly behaved programs have been written in languages that are deeply self-consistent and tolerate few ambiguities.

There's a contradiction in these two paragraphs. First you state that "inconsistencies and ambiguities in program text can lead to misbehaviors of all kinds." Then you state that "Behavior and self-consistency are orthogonal concepts." You can't have both. If they were truly orthogonal then inconsistencies could not lead to any behavioral changes, let alone incorrect ones! I suspect that you are actually speaking about two different perspectives of behavior here: that of behavior specified by the programmer; and that of behavior specified by the user. I want to be clear that these are two different sets of behavior.

The programmer's job then is to translate the behavior specified by the user into a program that has the same behavior. That translation cannot be helped by types or tests because it happens before that step. However when it comes time to write the program the programmer must specify the behavior they want to the computer. This is where types and tests can help. A self-consistent specification of behavior is one that will not misbehave. That is to say, it will do exactly as specified. It's up to the programmer to specify it correctly. Finally as to the last part of the second paragraph, a language can help, hinder, or ignore the programmer in writing a self-consistent program. If the programmer has a consistent mental model of the program then it will itself be consistent regardless of the language.


Clearly, every language chooses the latter option. No language forces the program text to be absolutely unambiguous and self consistent. Indeed, such a language would likely be impossible to create. And even if it weren’t, it would likely be impossible to use. Absolute precision and consistency has never been, nor should it ever be, the goal.

I'm unsure what you're trying to say here. It seems to me that one of the most desirable properties in a programming language is to be unambiguous and self-consistent. A language that does not have those properties would be difficult to use because it would be difficult to predict. You can see these properties being violated in undefined, implementation defined, or confusing and unpredictable behavior in languages. It's why Javascript and PHP are so often the target of jokes and frustration.


The problem with increasing the level of precision and internal self consistency, is that it implies an increase in the number of constraints. But constraints need to be specified; and specification requires notation. Therefore as the number of constraints grows, so does the complexity of the notation. The syntax and the semantics of a language grows as a function of the internal self-consistency and specificity of the language.

This paragraph made me think you were arguing against specific syntax for specifying types, not against static typing itself. Which one is it? There are strong, statically typed languages with minimal syntax for types.


Do languages like this exist? No; but the more type-safe a language is, the more internally consistent and specific it forces the programmers to be, the more it approaches that ultimate TDP condition.

What do you mean by "internally consistent and specific" here? If you mean that every type must be specified precisely as a primitive or user defined type then I'd have to say this paragraph is incorrect. Polymorphism can be worked into a strongly type-safe language and allows programmers to be generic to the point of ridiculousness. Haskell is living proof of this.

There's a relevant digression here in that when you make a function or method more generic you must also make its implementation more specific. More generic means fewer assumptions and with fewer assumptions to rely on there are less ways to implement a solution. This can be counter-intuitive, but an example might help. Consider the following functions:

id :: a -> a

not :: Bool -> Bool

foo :: Int -> Int

The id function must be able to operate on any value, and return any value. That is to say it cannot make any assumptions about its inputs or outputs. Therefore it cannot do anything but return its input, unchanged. It has one possible implementation:

id x = x

The not function takes a boolean, which I know must be one of two values. It also returns a boolean. I can easily enumerate all the possible implementations of not:

-- Implementation 1
not True = True
not False = False

-- Implementation 2
not True = False
not False = True

-- Implementation 3
not True = True
not False = True

-- Implementation 4
not True = False
not False = False

The foo function operates on integers which I know are infinitely many. Therefore there are infinitely many possible implementations of foo.

To bring it back to the point at hand, a type-safe language does not mean a specific language. Polymorphism is vitally important, and perhaps more useful in a strong type-safe language. It allows for the observation of assumptions (or the lack thereof) which allows for fewer possible implementations, less ways for functions to go wrong, and thus less required test cases.


Every step down that path increases the difficulty of using and maintaining the language. Every step down that path forces users of the language to get their type models “right” up front; because changing them later is too expensive. Every step down that path forces us back into the regime of Big Design Up Front.

In what way are you saying languages get more expensive? If you mean that a single change requires more subsequent changes to remain consistent then I'll agree with you. However every place where those changes need to be made is pointed out to you by the compiler. So the programmer has to spend very little time finding those places. Self-consistency must be maintained in general, so those changes would need to made whether or not the language forced you to make them. I know I would rather free up that mental capacity for more important work.


This was quite a long rebuttal, and I've made quite a few points along the way. I'll summarize them here:

  • Types can specify behavior
  • Types can provide stronger guarantees than tests
  • Strong, statically-typed languages can exist with minimal syntax for types
  • Self-consistency is important for programmer specified behavior to be correct, but it is not enough to determine if user-specified behavior is correct.
  • Types are not a replacement for all tests, and both can be used together to create something more than the sum of their parts.
  • A static type-checker makes a cost trade-off, trading off mental energy to maintain self-consistency for manual work to maintain self-consistency
  • A program text is desired to be unambiguous and consistent in order to allow the programmer to predict its behavior.
  • Finally, types are not a replacement for tests (yet), but they can be used to dramatically reduce the number of tests required to have confidence in a system.
@unclebob
Copy link

Daniel,

Thank you for the thoughtful rebuttal. I appreciate the time and effort you put into this. I also appreciate your tone of collegial debate. I shall address your points in order.

  1. Your initial example using the add function in Ruby and Haskell shows something important. The Ruby program was testing a particular behavior; whereas the Haskell example was precluding that behavior. The behavior, of course, is the implicit casting. The Ruby example was incomplete and should have included many other tests such as the following:
describe 'add' do
  it 'returns a number when given two numbers' do
    expect((1 + 2).class).to be Number
  end
  it 'returns a string when given two strings' do
    expect(("a" + "b").class).to be String
  end
end

The Haskell example was complete. No extra behavior was implied. So the two situations are hardly equivalent; and the Ruby situation encompasses a large swathe of behavior whereas the Haskell situation precludes all that behavior.

  1. Your second example follows suit by creating a certain behavior in Ruby which is precluded by the Haskell example. That behavior is the throwing of an exception. The Haskell example cannot throw that exception. The Ruby example will.

The point you were making was that the Haskell code was defining the behavior that the purchase function converts a ConfirmedOrder to a PurchasedOrder. I shall concede the point that this is a behavior -- but it is a behavior of a very particular kind that I shall call static behavior. In other words, it is a runtime behavior that cannot be dynamically altered at runtime. It is a more of a behavioral constraint than a specification of behavior.

At this point the argument could reduce down to the semantics of the word behavior which I doubt either of us wants. Suffice it to say that the behavior specified by the type system is of a special, static, kind.

Something else happened in this example. You replaced a runtime flag with a compile time type. In effect you translated a runtime value with a compile time type. Indeed, this is what types systems always do.

Now, let's go back to the purchase function. We need to get our ConfirmedOrders from some source. Let's say that they are delivered to us through a socket from a micro-service. Our code must parse the incoming orders and determine whether they are confirmed or not. Those that are confirmed have to be loaded into a ConfirmedOrder. Thus, the if statement that you had written in the Ruby version of purchase must still exist somewhere. It hasn't been eliminated. It's just been moved. Moreover, the exception that you had your Ruby program throw, still has to be thrown somewhere. Again, moved but not eliminated.

Now I will happily agree that separating the code that checks and throws, from the code that processes valid purchases is a good thing; and that the type system, in this instance, encourages that separation. That is one of the powers of a good type system. However, I note that the behavior that you suggested was specified by the type system is really still encoded into if and throw statements elsewhere in the code. So, although the type system did specify the static behavior of the code, the dynamic behavior still needed to be written. The type system could not specify that dynamic behavior.

  1. The tactic of replacing runtime values with compile time types can be extended to your example of Booleans. Rather than allow a Boolean variable we could specify False and True to be types. Having done that, we could write the not function as two overloaded functions:
False not(True) {return new False();}
True not(False) {return new True();}

Just as the ConfirmedOrder and PurchasedOrder created static behavior, so too does this new definition of False and True.

Indeed, this is just what the Nullable types in Swift and Kotlin do. They replace the dynamic value of null with a static type.

The more we replace dynamic values with static types the more we opt for static behavior over dynamic behavior. There is a balance point at which this becomes detrimental. Taken to the extreme we might eliminate all dynamic values and simply specify every possible integer and every possible floating point number, and every possible character as an individual type. The add function would then simply be coded as 2^128 different overloaded functions. Clearly this would be absurd. Going beyond that balance point is what I call The Dark Path.

So, to your conclusion:

  • Types can specify behavior

True, but only a very particular kind of behavior that is static.

  • Types can provide stronger guarantees than tests

True, but only guarantees of static behavior. And with each new substitution of a dynamic value for a static type, we take a step down The Dark Path.

  • Strong, statically-typed languages can exist with minimal syntax for types

Yes, and No. The notational load either has to be in the language syntax or in the type system itself. Type system are systems of constraints that need specification. What's more, as we saw with the micro-service situation above, static specification does not eliminate the need for dynamic specification. So static type systems increase, sometimes drastically, the overall notational load of a system.

  • Self-consistency is important for programmer specified behavior to be correct,
    but it is not enough to determine if user-specified behavior is correct.

Granted.

  • Types are not a replacement for all tests, and both can be used together to
    create something more than the sum of their parts.

Again, the micro-service example comes into play. The code that parses the incoming orders has to be tested. The type system simply moved those tests; it did not eliminate them. So type systems are not efficient at eliminating tests.

I happily concede the synergy argument. A good type system can be very profitably used.

  • A static type-checker makes a cost trade-off, trading off mental energy to
    maintain self-consistency for manual work to maintain self-consistency

Here I disagree. The mental effort must be expended in either case. Type systems act more as static reminders of previously specified constraints. Tests do the same.

  • A program text is desired to be unambiguous and consistent in order to allow
    the programmer to predict its behavior.

Of course. But in order to be usable, a program text must also allow for unspecified values (e.g. variables). The value of those variables is not specified at compile time. Thus, the program text is not entirely unambiguous.

  • Finally, types are not a replacement for tests (yet), but they can be used to
    dramatically reduce the number of tests required to have confidence in a
    system.

Again, I disagree. The micro-service argument suggests that the tests must still be there -- just in a different place.

I strongly disagree with the "yet" qualifier. It might (might) be possible to go all the way down The Dark Path until every dynamic value was replaced by a type; but I don't think any of us want to go too far in that direction.

@RocketPuppy
Copy link
Author

Uncle Bob,

Thanks for the thoughtful reply. I'm enjoying the discussion, and I'm certainly learning!

You made a couple of points that I want to reiterate, to make sure I understood correctly.

  1. Types preclude behavior.
  2. There is a difference between static behavior and runtime (dynamic?) behavior.
  3. A good type system encourages separation of concerns.

Allow me to add some comments and questions on these. I'll start with 3 since it's the most straightforward.

You say a good type system encourages separation of concerns. I agree. That's one of the most valuable things I find when working in Haskell. As this relates to testing, allow me to pose the following question. Without a good separation of concerns, is it not possible - likely even - that a test suite duplicates specifications of behavior? Consider the case of parsing an incoming order and then purchasing it if it is confirmed. With the Haskell example the type system encourages the separation of the parsing code from the purchasing code. Because the purchase function can assume the order is confirmed it makes no sense to test it with examples of unconfirmed orders. Now, this property is reliant on the proper separation of concerns which is not dependent on having a good type system. It does seem like a good tool for encouraging that practice though. From personal experience I know I write much better factored Ruby code because of my experience with Haskell.

Regarding points 1 and 2. They both deal with behavior. You mention that there are two kinds of behaviors: runtime and static. There is a difference between them, but I am not clear on what that difference is. Before continuing, allow me to digress on what I mean when I speak of behavior. I have yet to see a formal definition, so I'll propose one here.

A behavior is simply a state transition from one state to another, taking into account all relevant pieces of state in a system. It does not necessarily mean the old state is lost. This implies that printing to the screen is a behavior because it changes the state of the screen. It also implies that sorting an immutable list is a behavior. From the perspective of the caller, a list is passed in unsorted, and a list is returned sorted. Hopefully this makes my stance more clear.

Addressing point 1 specifically. I would say that a properly specified type system precludes only the behaviors that are undesirable. We may disagree about what behaviors are desirable, but I think that's an entirely different discussion.

To address point 2, I'd like to propose some further definitions. A static behavior is one that can be predicted without running the system. A dynamic behavior is one that cannot be predicted without running the system. Put another way, with a statically specified behavior I can predict with 100% confidence what state a system will end up in when it is run without previously having run it. With a dynamically specified behavior I cannot predict what state a system will end up in when it's run without first running it and observing that state. From this perspective I must ask, are dynamic behaviors desired? What if the behavior under test is the firing of a missile or some other expensive or dangerous operation? I would certainly not want to be the one to push the "Run" button without any confidence that it would hit the right target! I think we can both agree that the more confidence we have in a system before it is run the better we feel about deploying it. Tests try to ameliorate this problem by running a system in a safe, controlled environment. This helps to increase confidence, but it can never provide the same confidence that a static behavior does.

Now a common rebuttal to this is that not all behaviors are expensive or risky, so the cost of proving them statically outweighs the benefit. I'll grant this as true. This forces us then to the question, "Where is that break even point?" I think this might be what you were trying to say with The Dark Path; that in your opinion we have reached that break even point and are progressing past it. I disagree. That break even point depends on parameters that are not just tied to languages and technologies. It also depends on the individual skill of a programmer, or a team of programmers. So, as programmers get more skilled at statically specifying behavior and technologies make this easier, specifying static behaviors becomes cheaper. Thus the break even point begins to favor more static behavior and less dynamic behavior.

This, I think, is the ultimate goal: I want to be able to write a program and have confidence that it will behave correctly before I run it. I don't want to write tests and set up a safe, controlled environment. It's possible today! I encounter this often in my Haskell programs. Now, it involves more than just a static type system. It also involves good practices and some discipline. However a static type system is, in my opinion, an indispensable tool for making this possible.

@unclebob
Copy link

Daniel,

Point 1. Yes! Experience with a strong type system (like Haskell) will make you a much better programmer -- even in languages that don't have a strong static type system.

Point 2. Think carefully about this. If the compiler can predict every desired state transition, then you will never need to run the program itself. The type system will have become Turing-complete; and your system will run in the compiler.

Now there's a funny thing about compilers with type systems that have become Turing-complete. While the types themselves are statically enforced, there are meta-types that are "dynamic" at compile time. And since the system is now running in the compiler, it is, in effect, dynamically typed.

In reality strongly typed systems do not predict all the state transitions. For example, given int add(int, int), the type system will predict that two ints are converted to one int by the add function. However, that type system will not predict that add(3,4) produces the number 7. That is the dynamic behavior that makes the program valuable to the user; and that the type system has no control over.

@RocketPuppy
Copy link
Author

That's a very good point. Turing-complete systems make things messy and unpredictable.

I think I'm going to have to rephrase my previous statements. I still believe the goal is to have 100% confidence that our programs will behave correctly before we run them. I think static types are a very good tool to build that confidence in some portions of our programs. How much is determined by how familiar the programmer is with the type system and the type system's expressive power. Other tools help us gain confidence in the other portions. Things such as good decomposition of problems so that our solutions are "obviously correct", high-level abstractions, and tests help us get the rest of the way there.

I think our primary disagreement is on what portions of the program should be covered by types, and that's fine! I think the ideal portion is going to vary for everyone because it depends partly on familiarity and skill with a specific type system. We have other tools to help fill in the rest of the gaps.


Postscript:

While I acknowledge the usefulness of other tools in providing confidence in our programs, I'm still fascinated by the idea of subsuming them with a static type checker. I don't think we've hit the limits yet on that. You've given me a lot of things to think about, and more importantly forced me to clarify my own thoughts around static and dynamic behavior checking. I have some more to say about that but it's not crystallized in my mind yet and it's not really relevant here.

Thanks for the stimulating discussion!

@RocketPuppy
Copy link
Author

Post post-script:

Do you mind if I submit this to /r/programming?

@unclebob
Copy link

unclebob commented Feb 3, 2017

Not at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment