Skip to content

Instantly share code, notes, and snippets.

@uucidl
Last active September 1, 2017 12:49
Show Gist options
  • Save uucidl/68d471b05c3a82d0f0556274f57cf6a3 to your computer and use it in GitHub Desktop.
Save uucidl/68d471b05c3a82d0f0556274f57cf6a3 to your computer and use it in GitHub Desktop.
Maintaining APIs: Resist adding that boolean parameter!

Maintaining Interfaces: Adding a boolean parameter

"The Boolean Trap" "Boolean parameters are wrong"

TODO(uucidl): add examples for every definition.

boolean type: (bit) minimum amount of information: { 0, 1 } usually represented in text as { true, false }.

Consider an interface: T1 proc(T0)

Usage code:

// Context S0
T0 a;
T1 b0 = proc(a); // with effect: { e0 }

effect: observable change not tracked by manifest variables. Like changing a global value, sending data to the network, touching a file on disk, displaying graphics.

Let's add a boolean parameter to it: T1 proc(T0,bool)

// In same context S0
T0 a;
bool c;
T1 b1 = proc(a, c); // with effect on S0: e1

Let's assume that c == false is the fixed point, where the API is stable, and c == true is the new behavior:

// In same context S0
T0 a;
bool c; // c == false
T1 b1 = proc(a, c); // with effect on S0: e1
// stable case: e1 == e0 &&  b1 == b0

First, the effect of this change is to break the usage code.

Unless the host language supports default values or keyword arguments, the call to proc has to be changed. And even if it has not been changed, it still may require compilation/binding-linking.

If c == true is the new behavior, the effect of the boolean can affect the return value as well as the effect of proc:

Let's review the effect of the boolean on the return value and effect:

// In same context S0
T0 a;
bool c; // c == true
T1 b1 = proc(a, c); // with effect on S0: e1
// 3 different cases:
// 1. e0 == e1 && b1 != b0
// 2. e0 != e1 && b1 == b0
// 3. e0 != e1 && b1 != b0

This highlights the legibility impact of a boolean parameter. The call site says only true or false, without giving more details about how that value affects the return value or side-effects or both.

Consider if we went from:

step_count = move_forward(40);

To:

step_count = move_forward(40, true); // this could mean anything
step_count = move_forward(40, false);

What would you as a reader understand now?

do(false); // inverse?
do_this_and_that(false); // which one is affected?

Let's see what we could do instead of adding the boolean parameter.

  1. (easy case) new entry point that takes b0 and turn it into b1
// state S0
T0 a;
T1 b;
b = proc(a); // b == b0, effect e0
b = to_b1(b); // b == b1

2 + 3. (harder cases) For 2 & 3, it depends whether e0 is allowed to be observed or not.

If e0 is allowed to be observed then a new entry point may produce e1 from the state left by e0(S0) and usage code becomes:

// in state S0
T0 a;
T1 b;
b = proc(a); // b = b0, with effect on S0: e0
b = proc_e1(a); // effect e1, b = b1
// state S1

If however e0 should not occur at all, which is the usual case where the boolean is added, we have some options.

If you can split proc in an effectful part and non effectful part, then creating finer grained entry points can allow users to implement the desired behavior:

// equivalent to b = proc(a);
b = proc_v(a)
proc_e0(a); 
// new behavior:
b = proc_v(a)
proc_e1(a);

Alternatively, if the effects are more complicated to decompose, and if the boolean starts to creep into more effectful parts, a state for the API could be added instead:

// in state S0
want_effect_e1(true);
b = proc(a1);
c = proc(a2);
d = proc(a3);	
want_effect_e1(false);

This reduces legibility of the interface, making it less context-free due to the existence of implicit dependencies between calls. It does preserve the stability of the interface at the expense of more state management pushed onto the new case. Which is maybe fine if the new case is specific or rare!

Common usages?

The entry point is being changed because a new usage has been discovered.

The question now is: What is the most common usage? If we consider the default case (fixed point) to be still the most common case, and the boolean represents a divergence (rare) from the common case. This is likely the case when the boolean is added late. Or do we have two equally common cases? What makes us think then that there are only two cases.

We're trying to preserve the same number of entry points stable, but we're also making the existing, most common usages suffer, by changing the existing entry point. What are we trying to save? Entry points, documentation.

Why would you want to preserve creating more entry points? To preserve accessibility of the interface by keeping the API small.

So what do we know about this new case? That it is a rare case. It should be documented as such. It should be put into a special, separate part of the documentation, to keep the common interface clear.

The case where the most common case was mistaken and actually the old and new are equally frequent. In this case it would seem justified to deprecate the old entry point and add a new one.

Additonally booleans themselves only encode one bit of information. Are we sure we won't need a third case later? Shouldn't we immediately go for a bitset flag argument, able to represent a larger set of values? Managing the entry points count and keeping low by introducing parameters with types that are more open to later additions.

Going for a boolean case now would introduce yet another breakage later on as the new cases get discovered.

Resist adding that boolean parameter!

Also known as: “The Boolean Trap” “Boolean parameters are wrong”

Consider a trivial software interface with one entry point proc:

// Transforms a value (type T0) into another value (type T1) with effect E0
T1 proc(T0);
Effect
Change not directly observable from values in the program fragment.
  • changing a global value
  • uploading data to the network
  • changing a file on disk
  • displaying graphics
  • playing or recording sounds
  • continuation somewhere else in the program (exception)

In usage we write:

// Context s0
T0 a;
T1 b0 = proc(a); // produces context: { s1 = E0(s0) }

Bifurcation

So far so good. However one (exactly 1) new type of usage has been discovered and requested by users of the interface.

Let’s enumerate what this new usage (named tentatively proc_1) can represent:

T0 → T1Effect
procXE0
proc_1XE1
proc_1X1E0
proc_1X1E1

Examples of how proc_1 may differ from proc:

  • changing the transform (example: inverse transform)
  • controls how to behaves in front of errors/degeneracies (examples: throw exceptions, return a sentinel value…)
  • adding or removing an effect (logging)

Some questions a maintainer could ask at that stage:

  • What is the most common usage? How frequent are each usage?
  • Are we adding an effect that is shared by many other transformations, an orthogonal concern to all of them? (examples: memory management, logging)
  • Are we adding an effect that can be seen as a composition E0 → E’?
  • Can the X1 (new transformation) be seen as a composition X → Y?
  • How stable should the interface stay?

Interface stability.

The only change that’s easy to do on an interface in common use is to add to it:

  • adding a new entry point
  • adding interpretation for some yet unused input value.

Any other change requires changing existing uses, inflicting maintenance cost onto users of the interface.

Usage Frequency.

At the extremes, either proc_1 and proc are going to be equally common in use, or proc_1 will be a rare occurence, for a special usage.

If we anticipate proc_1 and proc to be equally useful, then it might suggest n other cases might be equally useful in the future.

If the new usage is discovered late, then it is likely that proc is still the most common case, while proc_1 represents an uncommon divergence to that common case.

However what if we have two equally common cases? Are we also confident at this point that only two new cases exist?

Often maintainers try to preserve the same number of entry points stable, adding a boolean flag to it. This makes the existing, most common usages suffer, by changing the existing entry point. What are we trying to save? Entry points, documentation.

Why would you want to preserve creating more entry points? To preserve accessibility of the interface by keeping the API small.

So what do we know about this new case? That it is a rare case. It should be documented as such. It should be put into a special, separate part of the documentation, to keep the common interface clear.

The case where the most common case was mistaken and actually the old and new are equally frequent. In this case it would seem justified to have two equally important entry points published.

If the equally common cases are connected somehow with the same usage site, then we can predict that users will often have to write an if case like so:

bool cond;
T0 a;
T1 b;
if(cond)
    b = proc(a);
else
    b = proc_1(a);

It becomes at that point attractive to share the conditional into the API as such:

bool cond;
T0 a;
T1 b = proc_2(a, cond);

If however we predict that many more variations are going to be created, then it becomes attractive to keep one single entry point, and use a data type for cond that can represent a wider range of values from the get go.

The entry point is being changed because a new usage has been discovered.

Additonally booleans themselves only encode one bit of information. Are we sure we won’t need a third case later? Shouldn’t we immediately go for a bitset flag argument, able to represent a larger set of values? Managing the entry points count and keeping low by introducing parameters with types that are more open to later additions.

Going for a boolean case now would introduce yet another breakage later on as the new cases get discovered.

Let’s add a boolean parameter to it: src_C[:exports code]{T1 proc(T0,bool)}

boolean type: (bit) minimum amount of information: { 0, 1 } usually represented in text as { true, false }.

// In same context s0
T0 a;
bool c;
T1 b1 = proc(a, c); // with effect on s0: e1

Let’s assume that src_C[:exports code]{c == false} is the fixed point, where the API is stable, and src_C[:exports code]{c == true} is the new behavior:

// In same context s0
T0 a;
bool c; // c == false
T1 b1 = proc(a, c); // with effect on s0: e1
// stable case: e1 == e0 &&  b1 == b0

First, the effect of this change is to break the usage code.

Unless the host language supports default values or keyword arguments, the call to proc has to be changed. And even if it has not been changed, it still may require compilation/binding-linking.

If `c == true` is the new behavior, the effect of the boolean can affect the return value as well as the effect of proc:

Let’s review the effect of the boolean on the return value and effect:

// In same context s0
T0 a;
bool c; // c == true
T1 b1 = proc(a, c); // with effect on s0: e1
// 3 different cases:
// 1. e0 == e1 && b1 != b0
// 2. e0 != e1 && b1 == b0
// 3. e0 != e1 && b1 != b0

This highlights the legibility impact of a boolean parameter. The call site says only true or false, without giving more details about how that value affects the return value or side-effects or both.

Consider if we went from:

“`C step_count = move_forward(40); “`

To:

“`C step_count = move_forward(40, true); // this could mean anything step_count = move_forward(40, false); “`

What would you as a reader understand now?

“`C do(false); // inverse? do_this_and_that(false); // which one is affected? “`

Let’s see what we could do instead of adding the boolean parameter.

  1. (easy case) new entry point that takes b0 and turn it into b1

“`C // state s0 T0 a; T1 b; b = proc(a); // b == b0, effect e0 b = to_b1(b); // b == b1 “`

2 + 3. (harder cases) For 2 & 3, it depends whether e0 is allowed to be observed or not.

If e0 is allowed to be observed then a new entry point may produce e1 from the state left by e0(s0) and usage code becomes:

“` // in state s0 T0 a; T1 b; b = proc(a); // b = b0, with effect on s0: e0 b = proc_e1(a); // effect e1, b = b1 // state S1 “`

If however e0 should not occur at all, which is the usual case where the boolean is added, we have some options.

If you can split proc in an effectful part and non effectful part, then creating finer grained entry points can allow users to implement the desired behavior:

“`C // equivalent to b = proc(a); b = proc_v(a) proc_e0(a); “`

“` // new behavior: b = proc_v(a) proc_e1(a); “`

Alternatively, if the effects are more complicated to decompose, and if the boolean starts to creep into more effectful parts, a state for the API could be added instead:

“` // in state s0 want_effect_e1(true); b = proc(a1); c = proc(a2); d = proc(a3); want_effect_e1(false); “`

This reduces legibility of the interface, making it less context-free due to the existence of implicit dependencies between calls. It does preserve the stability of the interface at the expense of more state management pushed onto the new case. Which is maybe fine if the new case is specific or rare!

Common usages?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment