janv/bug_tracking_triage_apenwarr.md

## bug_tracking_triage_apenwarr.md

      
    Raw
  

              bug_tracking_triage_apenwarr.md
            
          
    Project management, bug tracking and triage

This is a summary of https://apenwarr.ca/log/20171213, a very long and interesting article. I'm trying to present here only the ideas that are immediately relevant to our situation.


A lot of the article deals with scaling production operations and efficiency and examines the rate of bugs, postulates theories about how to manage projects and bugs and runs some statistical experiments to illustrate the postulated ideas.


Efficiency comes from smart project management.


Arbitrary, output-oriented goals to be met at the end of some deadline don't work, because they don't create urgency until the last minute, then fail to be effective, leading to a vicious cycle. While managers need goals and targets and estimates to plan ahead internally, estimation and goals should not be communicated to dev teams as such.


Do not get into a situation where engineers negotiate schedules with management. Motivation, project estimation, feature prioritization are all psychological games. Understanding the rules helps us turn these excercises into something productive.


Student syndrome:

No matter how far you extend a deadline, a student will always start work at the latest possible moment before it.


Agile: The good parts


He picks apart Agile, identifying some of its parts as useful psychological games that let you work smarter.


Physical index cards: Make features feel like tangible things, for people who need that. Otherwise useless.


Story points: These are useful


Pair programming: Helps some people, but generally irrelevant


Daily standup meetings: Overhead


Strict Prioritization:

This is a huge one that we'll get to next - and so is flexible prioritization. Since everyone always knows what your priorities are [...], then people are more likely to work on tasks in the "right" order, which gets features done sooner than you can change your mind. Which means when you do change your mind, it'll be less expensive. That's one of the main effects of Agile. Basically, if you can manage to get everyone to prioritize effectively, you don't need Agile at all. It just turns out to be really hard.


Tedious Progress Tracking: Not needed. Done right, the progress reports write themselves


Burndown charts: A fundamental unit of progress measurement


Series of Sprints: Sprints are goals and goals don't work


Strict prioritization


Each change in a project sets back its progress


Lots of early changes are better than a single change that comes late


even if your decisions aren't optimal, sticking to them, unless you're completely wrong, usually works better than changing them.


This section contains the most important quote:

If you only take one thing away from reading this talk, it should be that. Make decisions and stick to them.

Everything else are just methods that help doing that.


One of the great product management diseases is that we change our minds when the facts don't. There are so many opinions, and so much debate, and everyone feels that they can question everyone else (note: which is usually healthy), that we sometimes don't stick to decisions even when there is no new information. And this is absolutely paralyzing.


If you want to know what Tesla does right and most of us do wrong, it's this: they ship something small, as fast as they can. Then they listen. Then they make a decision. Then they stick to it. And repeat.
They don't make decisions any better than we do. That's key. It's not the quality of the decisions that matters. Well, I mean, all else being equal, higher quality decisions are better.  But even if your decisions aren't optimal, sticking to them, unless you're completely wrong, usually works better than changing them.


Kanban: The good parts


Kanban comes from the Japanese car industry of the 50's. They used index cards, but differently than we do in Software


Kanban also uses stories and story points


Kanban, like agile gives the huge benefit of strict prioritization. The difference:


Agile makes you do things in a certain order


Kanban makes you do fewer things at a time.
The idea here is that inventory is expensive, so you build things just in time.

unreleased software is inventory. It's very expensive, it slowly rusts in the warehouse, and worst of all, it means you produced work in the wrong order.
[...]
Your buildup of low-priority inventory is slowing down the people working on the high-priority things, and that's unacceptable.


He gives a few examples for this


We're engineers, so we're pretty smart. If we want, we could just, you know, dispense with the psychological games, and decide we're going to strictly prioritize and strictly limit multitasking. It takes some willpower, but it can be done. I happen to be terrible at it.
[...]
It's actually really, really hard. It's one of the hardest things in all of engineering. Most people are very bad at it, and need constant reminders to stop multitasking and to work on the most important things. That's why these psychological games, like sprints (artificial deadlines) and index cards and kanban boards were invented. But if we want to become the best engineers we can be, we have to move beyond tricking ourselves and instead understand the underlying factors that make our processes work or not work.


Restricted multitasking


Another simulation looks at the effects of multitasking, by putting work against value, to demonstrate how multitasking delays delivering value to the customer.


One important aspect of this simulation that

every features produces work (tasks/bugs), whether launched or not, to reflect changing market conditions or other influences that happen even to unlaunched features.
Launched features aren't done, they also keep generating bugs and tasks, but at the same time delivering value.


All the features you have released, together, over time, generate a constant workload, a flat burn-down chart:

At that point, you either need to fundamentally change something - can you make your code somehow require less maintenance? - or get a bigger team. That goes back to our headcount scalability slides from the beginning. Is your code paying for itself yet? That is, is more value being accrued than the cost of maintenance? If so, you can afford to invest more SWEs. If not, you have to cancel the project or figure out how to do it cheaper. Scaling up a money loser is the wrong choice.


Work rates are non-negotiable


You can not negotiate how long something takes, how fast people will work


You can have a conversation about which features go into a release and which won't


You still want to have that conversation as early as possible


Knowing the work rate and the cost of features of, managers can negotiate with sales, marketing or biz-dev about feature requirements and deadlines.


And then, crucially, they will not tell those dates to the engineering team. The engineering team doesn't need to know. That would be setting a goal, and goals are bad. The engineers just do the work in priority order, and don't multitask too much, and let statistics handle things.


Putting things into practice

Stories


What is a story?

A small bit of useful functionality delivered to a customer
The customer must actually be impacted
(they might not notice, like when reducing downtime)
It doesn't have to be written like a story ,but the point is: You can't tell a story without the main character (the customer)
(A bug is not a story)


Personas are fine for UX design, for engineers, something more succinct is helpful: "User will be able to search for emails by keyword, and the results will be returned in no more than 2000ms, and results will be ranked by relevance"
The main point about stories is that they involve the customer
Dev might have to do 10 things to deliver value to a customer, that do not directly affect the customer. Then you have 10 bugs or tasks making up one story/feature

Story points


These make estimation and burndown charts possible


People are good at relative estimates, not at absolute estimates


Absolute estimates are goals we're setting ourselves


Don't let engineers know the due date


Points bypass that:

Nobody sets a "goal" that the project will take 5 points to complete. What does that even mean? It's a five point story, it will always take 5 points to complete, no matter how long that turns out to be. It is 5 points, by definition.


Get to point estimates by doing planning poker

Great disparity: discuss/revote
Alternative: always pick the higher number. Consistent Biases dont matter, over time that will just factor into the velocity. Beware of inconsistencies


Interesting: nobody has a vested interest in a particular story having a particular number of points.

Instead of fighting to be right about the exact size, people can instead focus on why two people have such a widely varying (at least two fibonacci slots) difference of opinion. When that happens, usually it's because there's missing information about the scope of the story, and that kind of missing information is what really screws up your estimates if you don't resolve it. The ensuing discussion often uncovers some very important misunderstanding (or unstated assumptions) in the story itself, which you can fix before voting again.


Tracking the sequence of stories


Just use a spreadsheet. Stories should be big, there shouldn't be that many that you need a special tool


You want estimation to be so quick and easy that you end up estimating a lot of tasks that never get scheduled - because when the PM realizes how much work they are, they realize there's something more effective to be working on.


The spreadsheet is just a bunch of rows that list the stories and their estimates, but most importantly, the sequence you're planning to do them in. I suggest working on no more than one at a time, if at all possible. Since each one, when implemented, is made up of a bunch of individual tasks (bugs), it is probably possible to share the work across several engineers. That's how you limit multitasking, like kanban says to do. If your team is really big or your stories are small, you'll have to work on several stories at a time. But try not to.


Stories are not bugs


Stories
Bugs


Slow
Fast


Infrequent
Numerous


Controversial (PM, execs)
Boring


Can be tracked on index cards
Need automated tracking


Break planning into two layers of abstraction
In this model, stories are a bit bigger than usual
Bugs/Tasks are small and Stories are made up of several of them
Division of labour

Stories

PM come up with stories
Engineers estimate stories
PM sequence stories
Engineers work on them in order


Bugs

PM don't care
Engineers can work on them in whatever order they want, with whatever level of multitasking they think is appropriate


Do not estimate bugs
This makes bug fixing a first class activity. Most other agile methods treat bugs as overhead
Every bug is on average the same size

Article goes into a lot of detail on this one, with also some real-world examples, not just the simulation
Bug creation and resolution rates are always essentially constant. But you need to make sure that the resolution rate is at least as high as the creation rate, otherwise you'll always play catch-up. There's no way out of this in the long term, bug bankruptcies are not a solution.


Triage

A lot of the advice in here doesn't really apply to us, as we don't seem to be drowning in bugs. But I still find it to contain useful ideas.

Dealing with the inevitability that you can't fix bugs as fast as they get found
In that situation, deal only with the bugs that really matter. How do you decide which ones matter?
Priorities

Highest prio is for pager incidents
Next ist for urgent problems
Then there's a prio for "bugs we should probably fix"
Finally, one or two levels for "bugs we should fix but obviously never will"


Customer psychology:

"Won't fix"/"Obsolete" & closing a bug makes people angry
Leaving them open on low priority is fine


Fixing vs. Triaging


Fixing bugs
Triaging bugs


Slow
100x faster


Requires expertise
Easy to parallelize


You can't fix them all
You can triage them all


Fix it right the first time
Expect occasional re-triage


You need a system to handle the inevitable backlog
No big deal if you do a little every day


Trying to assign every bug to a person makes the assignment field lose meaning. Don't initially do this
Have a system of sorting bugs into component. These are only relevant for triage, not for actually working off the lists
Don't create too many components, only one per Triage Team
Use labels instead, to track

Triage status
"Needs Discussion" status
Release/Sprint/Milestone sequence
Feature backlogs: One per major feature area


Don't feel pressured to assign every bug to someone
Real advice

Never look at the project-wide tracker. Learn how to query intelligently
Components are almost useless. They are only good for helping end users point bugs at the right triage team (This is not really applicable to us).
Triage team queries for bugs that haven't been triaged and assign them to a Milestone, story or some other hotlist, where they can be picked up by devs


Re-triage

Outlines a process for periodically re-triaging old bugs instead of closing them all
Again, this is only relevant to projects with thousands of bugs


Needs discussion

For when the triage team doesn't have enough information to triage and needs more info to reproduce the bug
Separate from "Needs Triage". This prevents bugs waiting for more information from popping up in when looking at the "Needs Triage" list


TL;DR


Goals and deadlines for engineers lead to student syndrome: nothing is urgent until shortly before the deadline
Work rates are mostly constant
Estimations are good for managers to get an idea for when a project will be done, to perform internal scheduling
Late changes are more expensive than early ones. It's better to stick with a sub-optimal decision than to introduce changes late.
Have clear priorities
Avoid multitasking
Stories	Bugs
Slow	Fast
Infrequent	Numerous
Controversial (PM, execs)	Boring
Can be tracked on ~~index cards~~	Need automated tracking
Fixing bugs	Triaging bugs
Slow	100x faster
Requires expertise	Easy to parallelize
You can't fix them all	You can triage them all
Fix it right the first time	Expect occasional re-triage
You need a system to handle the inevitable backlog	No big deal if you do a little every day