Performance testing java applications - Martin Thompson
Fast is generic, breaks down to: -- throughput -- bandwidth
-- latency -- response time
Throughput vs latency -- what is the latency for a given throughput
Build a histogram of latency to see where we have clusters.
Amortise the expensive interactions
Define your performance characteristics. -- does performance drive the business? -- amazon -> people buy more if your site is performant
Decompose the transaction budget. -- take the total allowed time and break down into the parts that need to be executed. -- flow book had budget management stuff for this.
- Throughput
- Latency
- Stress
- Concurrent / contention
- Endurance / soak (long term testing, memory leaks, datastrutures that grow, GC, OS background tasks etc)
- Capacity -> load on a few users vs the same load on many users will be different.
Are unbounded queries allowed? e.g. list me all the orders I've ever made. Always design apis to be bounded.
Test things as separate concerns -- if you can't do this your design might be wrong. -- then grow this out, add layers
Stress test until breaking point -- do this degrade gracefully -- do things crash -- order of the algorithms
"Premature optimization is the root of all evil"
but...
Knowing how your system performs doesn't mean doing anything to it. It means understanding.
What does optimization mean: -- specialization vs flexibility -- different from knowing your system -- test / profile early
Test early and often, monitor production systems. Can we map performance to commits? Also build / test performance would be nice.
Performance "test first"
Red, Green, Debug, Profile, Refactor -- a deeper understanding makes you faster
Like live pairing stations -- develop on a similar server to that which you deploy on
Poor performance should break the build
Runtime compiler -- JIT and On stack replacement (OSR) -- needs to warm up, on a server need to run a method 10,000 before it becomes compiled. -- will swap in an impl that can be replaced part way through, e.g. a long loop calls a method many times -- Polymorphism and in-lining -- one or two impls can be inlined in code -- beyond that can be polymorphic -- dead code elimination -- needs realistic examples to avoid this -- race conditions in optimisation
Garbage Collection -- which collector, dev vs prod? -- skewed results from pauses -- card marking -- which objects are reachable, can't search all, too difficult. Object in old gen (tenured), and field that refs another object, even in new-gen, that object needs to have a card marked to say that this root has had a change and thus needs to be searched when finding objects for GC. -- sometimes independent threads can have collisions due to card marking?! -- didn't follow this!
Class loading
Microbenchmarking can be misleading due to some of the above
Micro benchmarking
Good benchmarking will keep calling until it gets consistent results - i.e. it is warmed up
int i = reps + 1; while (--i != 0) { MyClass.operation(); }
Faster than the equiv for loop as CPUs can optimise operations checking for 0. 150,000,000 vs 250,000,000 ops per second in terms of just how we test it!
Microbenchmarking - code can end up being eliminated and thus seem like it runs faster than would be possible. -- add side effects to get around this!? rude!
Straight performance -- measure ops / sec throughput - averages vs intervals -- measure latency
Sanity check all assumptions at the end - does it all add up together
Network probes to observe traffic? Independent observer, graphite good for this, in some parts, though as our real requests do have to write to graphite anyway.
Disruptor histogram HdrHistogram
Java apps -- JVisualVM, YourKit, SolarisStudio -- what is the GC doing -- Look at bytecode profiling -- field access / write -> could be cache miss
OS -- htop -- ostat -- dmstat -- pidstat -- netstat
Theory of constraints -- always work on the thing that is taking up the most of your time, even if it's the hardest.
Look at his blog on mechanical sympathy
Careful System.currentTimeMillis() as NTP can correct time.
System.nanoTime is monotonic but can vary across sockets.
Time can be very expensive on VMs
Startup code can be bad -- not just mean time to failure (MTTF) but also mean time to recovery (MTTR)
@mjpt777