Skip to content

Instantly share code, notes, and snippets.

@bigeasy
Last active January 1, 2016 16:49
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save bigeasy/8173012 to your computer and use it in GitHub Desktop.
Save bigeasy/8173012 to your computer and use it in GitHub Desktop.
Thoughts on designing interfaces to algorithms.

TL;DR

An interface to an algorithm is not an abstraction of a problem domain. It is an implementation. Understanding the algorithm is a requirement for applying it. It should expose the meaningful properties of the algorithm. If you black box it at too low a level, you frustrate the developer who understands the algorithm, forcing them to write their own.

  • Leak all the abstractions. Arrays, sets, maps and graphs are a valid level of abstraction.
  • Make decisions and turn them into rules. Fear not commitment.

The Problem In Need of Interface

I don't like to write about software, that is the first thought. The second thought is that I needed to track the visited versions in my MVCC library, in Amalgamate, the merge utility, so that I would know which versions had been merged into the primary tree. When they are merged in the primary tree we no longer have to account for them in our set of outstanding but unmerged versions.

However, Amalgamate does not visit the pages of the b-tree itself. It uses the Designate library to merge the iteration of multiple b-trees, each containing records with different versions, into the iteration of the records with the latest versions.

But, Designate does not iterate the b-trees either. It uses Skip, which skips over the different versions of the same record within a b-tree to return the latest version within the b-tree.

If I want to keep track of the versions that have been visited in Amalgamate, I need to keep track of them in Designate. If I need to keep track of them in Designate I need to keep track of them in Skip.

Interface

What is the API for this? This is why I am writing this essay. It is an example of a decision that I tend to make that comes from an attitude I've taken to my software of late that is shared by few others.

The implementation seemed obvious but I was shut down for the night as it became complicated.

I wanted to return the versions as return value from the Amalgamate function. They are only interesting to someone who is merging trees.

I added a versions property to the Designate iterator. Upon doing this I realized that I needed to add it to Skip as well, because skipped versions are still versions, we would still want to know about them for the sake of removing the version from the housekeeping.

Now I had a problem. I was building however many number of maps in the Skip iterator and I would need to combine them in the Designate iterator for them to be presented to the user by Amalgamate.

The Designate iterator unlock function could merge the maps and return them, except I don't want unlock to have a return value, that was never the intent. It could be a property would have to be constantly updated during iteration. It could be a property that only exsits after unlock is called, but that is nothing I've ever seen before.

Given that this is today's JavaScript and for Node.js, it really ought to be a property that has a getter function that does the gathering. The Skip iterators maintain their individual maps, the Designate iterator has the getter, and Amalgamate returns the value from Desingates getter.

But, wouldn't it be easier if the object used to collect them as a set was passed in into Amalgamate, Designate or Skip? Then Amalgamate could pass a the set into Designate, and Designate into however many instances of Skip. The instances of Skip would each update the same object.

A level of indirection has been added with a little less code than would be needed to copy things around.

Algorithms Not Abstractions

However, this is not the JavaScript way. The JavaScript way is much like the Java way. Your TPSReport object thas a function named getCoverSheet and then maybe TPSReportCoverSheet has a margin property and we do this because most of the developers in Initech are going adjust the margins. Make these things properties means that they can be ignored and discovered.

But we're not going to ignore and discover the gathered versions. Designate can be used without Amalgamate for aspects of an application, but eventually the same developer will come to know Amalgamate.

Thus, there is one bit of information that is available to people that is not obvious as they are working their way through this library. Someone is going to say that this is a leaking abstraction, except that it is not. Again, algorithm.

If you don't understand the purpose of the version number nor accept that it is present and requires your consideration then you can't use this library. It is not a TPSReport, a model of something readily understood. It is not a noun. It is not an object. It is an algorithm.

I need to make these decisions faster, even though I made this decision rather quickly, I'm writing this out so that I'll remember my concerns and decisions. I'll never make this same decision again, but so long as record the decision making process, I might be better able to short circuit the second guessing that I employ upon myself to fend off the inevitable second guessing of the practitioners of the current fashion.

One thought was that by using an object to collect the set, that I was not preserving the type of the version, which was a number. But, the version number is really a token. It doesn't matter and for the rest the MVCC implementation let's allow the use of the version number as object properties -- map keys -- and accept that we're building a valid number set, which is most easily represented as plain-old JavaScript object.

But, yes, I was using Array.indexOf for a little bit last night, to preserve the type, because part of me was trying to keep things generic, future proof, as if someone was going to be using StationWagons as a version key and I need to accommodate them or else my supervisor is going to chide me in the next scrum.

Refactoring Your Singletons

One concern I have is the Windows APIs and how they had these functions that exposed everything, with structures you were supposed to allocate so you could set all this many properties to pass to the Windows API function that would then do something mundane like create a file. Reasonable defaults would be nice.

But, in order to knock out these complicated algorithms, it's important that you do not refactor your Singletons, that you do not apply the Law of Demeter, and that a good way to manage the complexity is through a level of indirection that is passed in from the top. It's how the module system works anyway, and the Java chuckle heads create a problem for themselves with their perfect black boxes that they then need dependency injection to solve.

Lessons Learned

What is the short rule? Dunno. I've got a good sense and I follow it. But, let's try to give ourselves something to follow.

When you are implementing algorithms, leak all the abstractions. That is, you are not creating an abstraction, you are creating an algorithm.

Stop future proofing and instead make a rule. Instead of looking at something like the version set and say, "oh, what if someone want to use the state capitols as version keys, what do I do then?" make a rule that it is a number that converts to a string to be used in a object that acts as a set.

Then that's the rule so follow the rule. Don't spend a lot of time explaining why it's not future proofed, and why you can't use the state capitols.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment