To Design or Not To Design? A Third Good Question
Following my earlier discussions of testing and defect fixing, I’ll complete the trilogy by discussing the role of design early in projects. Recall the context: projects that have validated a genuine need by haven’t validated an economic proposition. Because of the uncertainties of such a situation, iteration is inevitable. Piloting a project on the runway you will like have to try many experiments to find value customers will buy. Capital efficiency while iterating extends the lifespan of the project and increases its chances of success. The more experiments you can perform per dollar, the higher your expected return.
In the sibling essays to this one I argued that the Extreme Programming principles of Test Until Bored and Defects Zero were perfectly appropriate to the cruise phase of a product, where the business driver is increasing profit by lowering cost. These principles also serve to prepare the software to act as a platform from which to launch new ventures. However, by sacrificing latency in favor of throughput, they are not suited to the takeoff phase.
Only some testing and defect fixing serve to reduce latency and increase the frequency and value of experimentation over the short term. Design belongs on the list of activities that need to be responsibly performed in moderation during the takeoff phase. If it’s Thursday and you only have enough money to last until Friday at 5, the responsible thing to do is perform another market experiment, not automate a difficult test, fix a random defect, or refactor away duplication.
During the takeoff phase, the team is constantly trying to add value by increasing the chance of survival. During the cruise phase, reducing costs adds the most value. A different mix of activities goes into achieving these different goals.
Design for Latency
Consider two design styles: connected and modular. In a connected system, elements are highly available to each other (via global state, for example). Adding the first feature to a connected system is cheap. All the resources you need are available. However, the cost of all those connections is that subsequent features are very likely to interact with previous features, driving up the cost of development over time.
A modular design has connections deliberately kept to a minimum. The cost for the first feature is likely to be higher than in the connected system, because you need to find the necessary resources and bring them together, possibly re-modularizing in the process. Features are much less likely to interact in a modular system, though, leading to a steady stream of features at relatively constant cost.
Here is a conceptual graph of the cost of adding features in the two design styles (I say “conceptual” because the graph displays a way to think about the differences, it doesn’t display data):
Connected and Modular Design
The strategy I use is to stay on the connected curve until the project has reached the climb phase, then switch to the modular curve. In so doing I am betting that I will learn enough before climbing the curve to get the project off the ground. If I don’t, I invest just enough time to drive the project back down the cost curve, by testing, designing, and fixing defects. Sometimes this feels like cleaning up just enough counter space in the kitchen to cook, as opposed to having a really clean kitchen, but it accelerates the overall process of experimenting.
Contrast this switching strategy to the pure connected and the pure modular design strategies. In the pure connected strategy you drive happily up the cost curve, trusting to exponentially increasing revenue to cover the exponentially increasing costs. Sometimes the revenues do cover the costs, for a while, but customers run out of money to spend sooner than programmers run out of ways to add complexity (just ask Microsoft). The pure modular design strategy stays in the flat part of the cost curve, but gives up speed of experimentation, reducing the overall chances of project success.
There are a couple of tricky moments in the switching strategy. One is when you run up the cost curve before you run up the revenue curve. Do you refactor just enough to drive yourself back down, even at the cost of slower experimentation? Do you abandon the code and start over? Do you keep going and hope that one of your dwindling supply of experiments will pay off big?
Another tricky moment in the switching strategy is the transition from connected to modular design. Not only is this a transition of design techniques (like the four techniques for introducing change–leap, parallel, stepping stone, and simplification), but of prioritizing time, of values (sustainability instead of raw speed), and perhaps even platform and tools. The whole organization needs to switch from feeling good about how fast they can turn around experiments to feeling good about sustainability while continuing to learn about customers and markets. Some people enjoy both styles of development, others are only effective in one or the other.
Your platform has a big effect on the shape and relationship of the two curves. Bare metal Perl in expert hands can achieve incredible experiment velocity but easily creates a vertical cost curve over time. Rails or Seaside make modular design less expensive relative to connected design. Evolution-oriented data storage like Gemstone or BigTable can extend the zone of experimentation.
A final challenge is designing for latency of experiments while at the same time not precluding the ability of the system to scale during the climb phase. I have seen too many cases of elaborate architectures ready to scale for hordes of customers who never arrived to put any faith in that approach. Use design to maximize the chance of making it off the runway, then trust in your ability to see the transition from takeoff to climb in time to shift strategy and begin taking safe steps to improve scalability.
Some readers reacted to the earlier essays as if I was proposing a style of development teetering on the brink of disaster through lack of testing, mountains of defects, and deteriorating design. Utter chaos is not the only alternative to dogma. The timing and intensity of testing, defect fixing, and design need to change to emphasize latency over throughput. However, they are all still useful activities, even on the runway. Watch for win-win situations, where the right test, the right defect fix, or the right design improvement enables improved feedback from real customers.
Every startup feels overwhelming from the inside, but every dawn brings exactly one day in which to work. The cellist Pablo Casals was once asked how he had the stamina to play a long passage of blisteringly fast sixteenth notes. “I rest between the notes,” was his reply. In a startup you can “rest” between experiments. If an A/B test takes a day to gather significant results, that’s a day you can spend investing in the future without jeopardizing the present. Spend time between crises wisely and you’ll have both a system and a business you can be proud of.
August 12th, 2009 in Responsible Development, Startups