Skip to content

Instantly share code, notes, and snippets.

@uucidl
Last active November 10, 2022 07:58
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save uucidl/fe95a32e504452f51dcc4691a8db811e to your computer and use it in GitHub Desktop.
Save uucidl/fe95a32e504452f51dcc4691a8db811e to your computer and use it in GitHub Desktop.
Profiling and data analysis resources

Goals

Looking at software from a different angle as done during profiling and data analysis has numerous benefits. It requires and increases our understanding of the problems being solved by the software. It exposes us to unexpected discoveries and insights. It allows us to suggest improvements.

Since there is a great element of skills involved, and it is rarely taught in Computer Science degrees, the role of practice is very important to gain the necessary skills.

Exercises

Ex1: sequential processing

Reading sequentially and processing each element from:

  • Case 1: an array `Foo arr[N]`
  • Case 2: an array `Foo *arr[N];` (pointers to “random” locations)
  • Case 3: linked list of Foos.
  1. Order the cases
  2. Explain the ordering
  3. Measure

Ex1.1: measure again while another Core continuously exercises Case 1 in a loop.

Suggested reading list

Essential

Other

People

Tips

Reasons

Rob Pike Rules

From http://users.ece.utexas.edu/~adnan/pike.html:

  • Rule 1. You can’t tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck is.
  • Rule 2. Measure. Don’t tune for speed until you’ve measured, and even then don’t unless one part of the code overwhelms the rest.
  • Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don’t get fancy. (Even if n does get big, use Rule 2 first.)
  • Rule 4. Fancy algorithms are buggier than simple ones, and they’re much harder to implement. Use simple algorithms as well as simple data structures.
  • Rule 5. Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

From Reality to Model, rather than Model to Reality

Software creation is often presented as the creation of a model that will run “in reality.”

This activity of profiling and analyzing data + code is about reversing that arrow:

From http://realworldrisk.com/:

When and if we model, we go from reality to models not from models to reality

Tools

Tricks

Adding some overhead to the thing under measurement (especially if it’s diffuse) is a nice way to see its impact. (Like adding a loop around a section)

Rico Mariani: “Performance Culture Best Practices”

@url: https://www.facebook.com/notes/rico-marianis-performance-tidbits/performance-culture-best-practices/473526169665226/

Quoting:

Two Rules You Must Follow

Rule #1: Measure

Just thinking about what to measure will help you do a good job If you don’t measure you can be sure it will be slow, big, or whatever else you don’t want If you haven’t measured, your job’s not finished

Rule #2: Do your homework

Good engineering requires you to understand your raw materials What are the key properties of your Framework? Your processor? Your target system? Three Steps To Success

Budget

An exercise to assess the value of a new feature and the cost your customer would be willing to pay, not a technical assessment of what is possible

Plan

Design and validate against the budget, this is a design plan and a risk assessment

Verify

Measure the final results, discard failures without remorse or penalty, don’t make your customers live with them

Fabian Giesen @rygorous

Top 3 benchmarking mistakes/gotchas:

  1. Are you using a relevant test set (are you testing the right thing)

Example: If you're interested in some string processing thing and most strings you care about are short, test on 100 000 short strings, not one 1MB string. Very different workloads!

  1. Does your test give consistent results if run repeatedly?

If you don't verify this and try to iterate on something, you're just looking at noise.

  1. Have you verified that you're measuring what you think you're measuring?

a) In microbenchmarks especially, look at what the compiler does with it. If your test consists of repeated executions of a pure function with the same arg and you never do anything with the result many compilers will get rid of the loop entirely.

b) don't run microbenchmarks to gather data, period. They're OK for hypothesis testing when trying to confirm an expected result but if you have no idea what the result should be you also have no way of detecting the many, many ways to screw microbenchmarks up.

  1. If you're using a recursive Fibonacci "benchmark" for any purpose other than as the compiler writer's equivalent to "Hello, world" to make sure you're function calls work, you're in a state of sin.

Using a Fib benchmark result in any debate results in instant ban. It's the law.

“How NOT to Measure Latency” by Gil Tene

@url{https://www.youtube.com/watch?v=lJ8ydIuPFeU} How NOT to Meaure Latency by Gil Tene.

Presenter

Gil Tene, Azul CTO and Cofounder of Azul System. (Builder of high performance JVMs) Ex: Nortel Networks, Shasta Networks, Check Point Software Technologies.

@url: http://stuff-gil-says.blogspot.de/ @url: http://latencytipoftheday.blogspot.de/ @url: https://twitter.com/giltene

Summary

This talk has been given in multiple forms. It usually gets a Oh Shit! response from its audience. Gil Tene presents himself as an experienced System Builder.

The topic is latency, or response-time. The time it takes for one operation to happen.

More accurately, as we’re dealing with not just one operation but many, the topic is how to study the latency behavior of a system: How does latency in general behave?

Pretty, misleading charts

If there’s one thing you should and can measure and plot, it’s the max response time. This is a genuine signal of something that happened.

However often the question we ask ourselves is what the most common response time was. This can however lead to misleading visualizations, unfortunately present in many monitoring and measuring systems.

see @url: http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-q-whats- wrong-with_21.html

Percentile
a percentage corresponding to a certain amount of observations from a system. The 95% percentile of requests is: if you take 100 requests, looking at 95 of them.

A typical chart contains:

  • response time of a system over time (2h)
  • latency as one series per percentile

Beware of graphs that only plot only what’s convenient. Gil Tene here shows a graph that shows response times at most for 95% of requests. What happens for the 5% of requests that were left out?

Then the presenter makes a thought experiment. What is the user experience like?

He takes the example of users visiting a website. These users have a typical interaction of about 5 pages visit each of which make about 40 requests. Each request is essential, and may lead to a bad user experience.

Let’s figure out how many users encounter the requests that were in our 95% percentile? Just 0.95^200 = 0.0035% of your users, which is basically nobody.

So why should we look at the 95% and below percentile? This chart was really misleading in that it gave us the impression that we were looking at a common case!

Let’s do the experiment in reverse. What percentile of requests should you look at to have an idea of what 99% of the users experience?

99% = y^200 log(99%) = 200*log(y) y = exp(log(0.99)/200) = 99.9950 %

Gil Tene suggests collecting and printing the response time with x axis: percentile and y axis: response time, to get an overview of the response time at various percentiles.

Sale pitch: HdrHistogram. @url: https://github.com/HdrHistogram/HdrHistogram

The coordinated omission problem.

A problem due to measurement methodology. That induces bad conclusion about performance.

Under a serial load generator, emitting operations one after the other at a fixed rate, an unresponsive system that freezes for a long time, will give misleading read-outs, underestimating the extent of the freeze:

As the system becomes unresponsive (completely) for a long time, we get 1 bad response time, and n very good ones once it has become responsive again.

A naive load generator does that if it waits and does not emit requests until the previous one has responded.

In code measurement the same occurs: as we collect traces of operations, we won’t get new traces of what was wrong because we simply were unresponsive. This results in only keeping the “good” numbers.

Service Time vs Response Time.

The serial load generator/measurement shown in the part before was bad in that it only measured service time and did not include waiting time.

However latency is really composed of the total waiting + service times:

+---------------------+
| Response Time       | 
+------+--------------+
|      | Service Time |
+---------------------+
| Wait | Work         |
+------+--------------+
  • Wait is pure waste. (Queuing to get service?)
  • Work is productive. (Getting serviced)

Tip to verify whether your load generator/monitoring system measures actual response time:

  • Push it hard. After a while the response time should grow linearly.
  • Linear response is when the system reaches fragility.

Sustainable Throughput

Latency relating to throughput and load.

Figuring out how to avoid “saturation” is better than trying to study what happens after or at saturation. Figure out how fast are we while still being safe.

Comparing response time or latency behaviors

Differential comparison at various workloads. 40K/85K/ {90K bad} Looking at requirements can help comparing two setups/systems and assess whether which ones are more economical than others for the requirement.

Plot the max

Is a good visual summary for comparing two systems.

@title: Scylla’s Approach to Improve Performance for CPU-bound workloads

@url: https://www.scylladb.com/2017/07/06/scyllas-approach-improve-performance-cpu-bound-workloads/

Funny. Today almost everything is CPU-bound. (If you don't make a big difference between CPU and memory)

I mostly skimmed through the article because I was familiar with some of out. Things that stood out:

knowing how much processor time was spent in particular functions may be not enough. When the CPU is busy it may be so because it is actually doing some work or is stalled waiting for memory

@see: https://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html

completed instructions per cycle: IPC

If it is low, less than 1, it is very likely that the application is, in fact, memory bound.

modern processors have Performance Monitoring Unit (PMU)

@see: https://github.com/andikleen/pmu-tools/wiki/toplev-manual @see: https://sites.google.com/site/analysismethods/yasin-pubs @see: https://drive.google.com/file/d/0B_SDNxjh2Wbcc0lWemFNSGMzLTA/view?resourcekey=0-eHHOKhPrwPN8uq-t_L9yLg (Top-Down analysis paper)

toplev is a tool for Linux, so not directly useable here on windows. It appears that Intel has implemented the Top-Down analysis paper in V-Tune: @see: https://software.intel.com/content/www/us/en/develop/documentation/vtune-cookbook/top/methodologies/top-down-microarchitecture-analysis-method.html

@see: https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/vtune-profiler/download.html

The V-Tune profiler appears to be available to all without fee these days, which is good news! I've heard of it for so long and never got to try it out.

I guess the interesting idea in this article, beyond sharing information about CPU architecture is the idea behind the Top-Down paper, which is to arrange bottlenecks into hierarchies, which can help with the analysis.

Another thing I like is that it highlighted how ICache misses are quite often an issue. Can be alleviated by batching and creating

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment