rjz/NOTES.md

## NOTES.md

      
    Raw
  

              NOTES.md
            
          
    Performance Antipatterns - Ben Evans (@kittylyst), New Relic

First, what's a pattern?


a model or design used as a guide in needlework and other crafts
-- OED

A model or design: patterns are more abstract than the substrate you're
applying them to. Software's pretty abstract already, but design patterns are
more abstract still.
And a software pattern?


a general, reusable solution to a commonly occurring problem within a given
context.
-- Wikipedia

Or Ben's definition that, "it's a small-scale best practice."
Antipatterns must be the opposite


a common response to a recurring problem that is usually ineffective and
risks being highly counterproductive
-- Wikipedia


a pattern that tells how to go from a problem to a bad solution
--C2Wiki

Why do they exist?


Software is complex - far more complicated than the most complicated
mechanical systems, meaning lots of places for complex behavior to hide


Lots of tradeoffs, so easy to optimize for the wrong tradeoff or
misunderstand alternatives


Software changes over time. Software developers are out to manage the risks
associated with change. Yet we forget that software changes from day-to-day
and month-to-month--in fact, change is the constant. It's hard enough to map
out the state of the system at a point in time. What about many points in
time?


Teams change over time, too. Software starts in someone's mind. Then they
have to explain it to others (a lossy process), passing it through the
physical world (via test cases, documentation, etc.) and up into the mind of
other humans. "Good luck with that."


And Performance Antipatterns?

They're another beast again.


Software seems quantitative. It feels like you should be able to measure it
and do "real science" with the software.


Complexity breeds subjectivity. "The system is too slow, make it faster!" is
an inherently subjective ask from the user. Yet users' view is the only one
that matters.


Both are true.


Compounding factor? Bad technology choices.

"Why do developers make bad software decisions?". Come down to
one of five reasons: boredom, resume padding, peer pressure, lack of
understanding, or misunderstanding.
Meet our (Anti-)Patterns


UAT (User Acceptance Testing) is my desktop. "We'd love to do UAT, but
it's too expensive. We don't have the hardware." So what do you do? You
cobble one together from whatever's lying around. Oh yeah?

Track cost of outages and incidents (in opportunity, productivity cost,
etc). They're almost always more expensive than the resources needed for
UAT.
Argument that unrepresentative UAT is better than nothing? Not really
true. A JVM running on a desktop with a different number of cores will
play differently.


Distracted by shiny. Pure development pattern: the new stuff that
everyone wants to play with gets exercised first.


Distracted by simple. Test the easy stuff first. It also tends to be the
stuff that's well understood when you should probably be looking at the
less-familiar parts.


Production-like data is hard. Dev and production are apples and oranges.
Prod is bigger, gnarlier, and totally not represented by dev. Don't
underestimate the shape of the data-problem.


E.g., betting engine in the UK (~100K bets on a Saturday) seized up when
company expanded to Turkey, where many bets take the form of a 'Goliath,'
with many different (~800) separate bets originating from a single
request. ~30X hit on database because the model wasn't designed for the
load.


May not be possible to make test data like prod data (think PII). You
could try scrambling it, but there's still value (risk) in the shape of
obfuscated data. This remains an open problem.


Fiddle with switches. JVM-specific. Team starts changing flags (worth
knowing: there are more flags for the JVM GC than flags at the UN, and they
get strange fast). Developers get obsessed with the level of control and
start trying to change things.

You can
...measure in production
...measure in UAT
...change one switch in UAT
measure it
have someone else double-check your reasoning
...change in prod
...measure again in prod
and if it doesn't match the results from UAT, you roll it back.

One of the hard things in performance testing is figuring out what it's
worth in comparison to all the other things a developer needs to do.


Tuning by folklore. Perf tests are boring. They're not about brave
knights slaying performance dragons. They're about measurement and
statistics. Null hypotheses, T-Tests, all that.

"I found these great tips on SO" don't necessarily lead to best practices.
Performance tips are workarounds. Tuning addresses problems that already
exist, which makes tips a solution in search of a problem.
If someone finds the problem and fixes it, a "tip" winds up somewhere
between useless and harmful.
Tips tend to exist without context. Performance happens in a specific
context. E.g. admin manuals contain general advice meant to keep a company
from getting sued. Take it as you will.
Finally, once a tip's on the Internet, it's there forever. Ask the Python
crew how much fun it is tracking down answers for Python2 v. Python3

Performance tuning is not:

tips and tricks
secret sauce
...or particularly interesting


The Blame Donkey (or "The Scapegoat"). What gets the blame? It's JMS,
Hibernate, etc.--whatever the seniors / management hate. But usually
no-one's done the investigation. They've just heard of the problem and
jumped on the bandwagon.

E.g. quants in financial services, who can program Just Enough to insist
on a specific piece of technology--which never ends well.


Micro-analysis. The belief that you can focus on a tiny piece of the
system and understand its impact on the overall system. This is even worse
with a managed runtime (JVM, V8, Mono, etc).
An analogy: take a molecule of water and explain how a bucket of water
behaves, surface-tension, specific heat, and all. The reality is that only
end-user perception matters, and end-user perception is far, far removed
from any tiny piece of the system.


What to do?

Treat applications as experiments. You can measure them ("measure, don't
guess"). You can analyze the data you collect. You can assess systematic error
(accuracy) and random error (precision). We're good at seeing patterns where
they don't exist, and the only way to overcome cognitive bias is through data.
Consider:

Confirmation bias - we see what we're looking for
Reductionist bias - won't understand a reassembled system
Action bias - "doing something is better than doing nothing, and since this
is something We should do this." -- scary in an outage
Clustering illusion - in any random sample you'll see clumps. Those don't
necessarily indicate a signal
Texas Sharpshooter Fallacy - shoot at random, draw your clusters the next
morning? Don't draw conclusions after data has been collected!
Disregarding regression to the mean - lots of things get better by themselves.

Why measure? Because humans are bad at guessing and riddled with cognitive
biases. We're easily overwhelmed by data and can't easily spot patterns by eye.