frail/00_ProfilingAndDataAnalysisInSoftware.org

## 00_ProfilingAndDataAnalysisInSoftware.org

      
    Raw
  

              00_ProfilingAndDataAnalysisInSoftware.org
            
          
    Goals

Looking at software from a different angle as done during profiling and data analysis has numerous benefits. It requires and increases our understanding of the problems being solved by the software. It exposes us to unexpected discoveries and insights. It allows us to suggest improvements.
Since there is a great element of skills involved, and it is rarely taught in Computer Science degrees, the role of practice is very important to gain the necessary skills.
Suggested reading list

Essential


  https://www.icloud.com/keynote/05iUiTiZFlOBN7qZny3Vr8bfQ#Engine_programmers_should_know Engine programmers should know… (insights on practice, problem solving process and pitfalls)
  http://www.gdcvault.com/play/1021866/Code-Clinic-2015-How-to How to Write Code the Compiler Can Actually Optimize by Mike Acton (shows process of analyzing bottlenecks, i.e. finding the resource that is most in demand, verifying hypothesis. Value of analyzing one value and getting value distributions, the value of practice, and an example of that practice)

Other


  https://www.facebook.com/perftidbits/photos/pcb.504731853211324/504731296544713/?type=3&theater Performance by Design by Rico Mariani
  https://docs.google.com/presentation/d/1ST3mZgxmxqlpCFkdDhtgw116MQdCr2Fax2yjd8Az6zM/edit#slide=id.g238f34177c_0_82 OOP Pitfalls talk 2017 by Tony Albrecht
  https://shipilev.net/blog/2014/java-scala-divided-we-fail/ show that our formal and informal training as software engineers often leaves out holes in certain areas such as how to properly design and interpret an experiment. shows how “known inputs” can skew a benchmark in a certain direction and mislead.
  http://gpuopen.com/video-memory-profiling-wpa/
  https://bling.kapsi.fi/blog/x86-memory-access-visualization.html
  https://twitter.com/Reedbeta/status/828402842184486912 nice example of unicode usage visualisation
  http://stackoverflow.com/questions/29551516/glclear-takes-too-long-android-opengl-es-2/29553399#29553399 asynchronous APIs can be hard to profile
  https://fgiesen.wordpress.com/2013/02/17/optimizing-sw-occlusion-culling-index/ Fabian Giesen records his profiling+optimization work of an Intel library for occlusion culling
  https://macton.smugmug.com/Other/2008-07-30-by-Eye-Fi/n-qN9F8/i-Z2vc8sC On the importance of focusing on the data + final transform + actual contraints
  https://hero.handmade.network/episode/code/day112 A Mental Model of CPU Performance by Casey Muratori. (how to think about optimization/performance, CPU instruction level, exposure to vocabulary around lower-level optimizations, cycles / sec)
  https://gist.github.com/ocornut/cb980ea183e848685a36 Memory, Cache, CPU optimization resources (compiled by Omar Cornut + others)
  http://www.codercorner.com/blog/?page_id=1632 The Box Pruning Series
    
      https://gist.github.com/rygorous/fdd41f45b24472649aaeb5b55bbe6e26?ts=4
    
  
  https://gist.github.com/ocornut/cb980ea183e848685a36 Memory, Cache, CPU optimization resources (compiled by Omar Cornut + others
  https://youtu.be/gNsF8Q1Nh2Q?list=PLHxtyCq_WDLXFAEA-lYoRNQIezL_vaSX-&t=1538 Stepanov, Programming Conversations
  http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html CPU Utilization Is Wrong
  http://aras-p.info/blog/2017/08/08/Unreasonable-Effectiveness-of-Profilers/ An example of studying build times with a tracing profiler
  https://www.snellman.net/blog/archive/2017-08-19-slow-ps4-downloads/ Gives a good outlook of what a performance experiment design looks like, from the establishment of a mental model and hypothesis.
  https://www.youtube.com/watch?v=lJ8ydIuPFeU How NOT to measure Latency by Gil Tene
  https://www.facebook.com/notes/rico-marianis-performance-tidbits/a-brief-word-about-some-common-metrics/489778214706688/ Mach-O Metrics (Macos, IOS) and their usefulness
  https://www.facebook.com/perftidbits/photos/ms.c.eJw9zdsNwEAIA8GOIl4Hpv~;GIp3jfI5goWBlmDhd2TlPfcZO75jckW5pv~;POe~_VD~;~;3wXshLq4dz32X~_D~;VgH~_pRd~;9Abrpl0CUvnZ~;X7z3HC~;fpMSM~-.bps.a.480408715643638.1073741829.414843438866833/480408725643637/?type=3&theater Performance Troubleshooting Techniques: Keep big picture, Look for correlated resource comsumption (for ex Memory usage for a CPU usage increase) to drive analysis around a period of time

People


  Paul Khuong https://twitter.com/pkhuong http://pvk.ca
  Mike Acton https://twitter.com/mike_acton http://www.macton.ninja/
  Fabian Giesen https://fgiesen.wordpress.com
  Brendan Gregg http://www.brendangregg.com
  Rico Mariani https://www.facebook.com/notes/rico-marianis-performance-tidbits/

Tips


  http://uucidl.github.io/use-frame-time-not-frame-rate-when-working-on-performance.html

Reasons

Rob Pike Rules

From http://users.ece.utexas.edu/~adnan/pike.html:

  
    Rule 1. You can’t tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don’t try to second guess and put in a speed hack until you’ve proven that’s where the bottleneck is.
    Rule 2. Measure. Don’t tune for speed until you’ve measured, and even then don’t unless one part of the code overwhelms the rest.
    Rule 3. Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don’t get fancy. (Even if n does get big, use Rule 2 first.)
    Rule 4. Fancy algorithms are buggier than simple ones, and they’re much harder to implement. Use simple algorithms as well as simple data structures.
    Rule 5. Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
  

From Reality to Model, rather than Model to Reality

Software creation is often presented as the creation of a model that will run “in reality.”
This activity of profiling and analyzing data + code is about reversing that arrow:
From http://realworldrisk.com/:

  When and if we model, we go from reality to models not from models to reality


## 01_Tools.org

      
    Raw
  

              01_Tools.org
            
          
    Tools


  Microsoft PIX d3d12, x64, windows
  Very Sleepy windows. Free, lightweight, easy to use. Slow.
  ETW/Windows Performance Toolkit/XPerf/WPA. Windows. Really worth it, yet barely useable before Bruce Dawson documented it and made a front-end for it (UIForETW)
    
      https://randomascii.wordpress.com/2012/05/08/the-lost-xperf-documentationcpu-sampling/
      https://randomascii.wordpress.com/2012/11/21/the-lost-xperf-documentationdisk-usage/
      https://randomascii.wordpress.com/2012/05/11/the-lost-xperf-documentationcpu-scheduling/
    
  
  Visual Studio’s Performance Profiler (Debug > Performance Profiler). Windows
  RenderDoc OpenGL, Vulkan, D3D11, D3D12 capture/debugging, windows/linux
  https://catapult.gsrc.io/ Google’s Catapult suite of tools


## 02_AutomatedSystems.org

      
    Raw
  

              02_AutomatedSystems.org
            
          
    Google Chrome

Google Chrome has a suite of performance tests that are continuously checked.
https://chromium.googlesource.com/chromium/src/+/master/docs/speed/addressing_performance_regressions.md
Reports can be found at: https://chromeperf.appspot.com/
Rico Mariani on Quantized Distributions (Problems in Benchmarking)

https://www.facebook.com/notes/rico-marianis-performance-tidbits/quantized-distributions-illustrated/475243556160154/

  
## 03_PerformanceCulture.org

      
    Raw
  

              03_PerformanceCulture.org
            
          
    Rico Mariani: “Performance Culture Best Practices”

@url: https://www.facebook.com/notes/rico-marianis-performance-tidbits/performance-culture-best-practices/473526169665226/
Quoting:
Two Rules You Must Follow

Rule #1: Measure

Just thinking about what to measure will help you do a good job
  If you don’t measure you can be sure it will be slow, big, or whatever else you don’t want
  If you haven’t measured, your job’s not finished
Rule #2: Do your homework

Good engineering requires you to understand your raw materials
  What are the key properties of your Framework? Your processor? Your target system?
  Three Steps To Success
Budget

An exercise to assess the value of a new feature and the cost your customer would be willing to pay, not a technical assessment of what is possible
Plan

Design and validate against the budget, this is a design plan and a risk assessment
Verify

Measure the final results, discard failures without remorse or penalty, don’t make your customers live with them

  
## RR_HowNotToMeasureLatency_GilTene.org

      
    Raw
  

              RR_HowNotToMeasureLatency_GilTene.org
            
          
    “How NOT to Measure Latency” by Gil Tene

@url{https://www.youtube.com/watch?v=lJ8ydIuPFeU} How NOT to Meaure
  Latency by Gil Tene.
Presenter

Gil Tene, Azul
  CTO and Cofounder of Azul System. (Builder of high performance JVMs)
  Ex: Nortel Networks, Shasta Networks, Check Point Software Technologies.
@url: http://stuff-gil-says.blogspot.de/
  @url: http://latencytipoftheday.blogspot.de/
  @url: https://twitter.com/giltene
Summary

This talk has been given in multiple forms. It usually gets a Oh Shit!
  response from its audience. Gil Tene presents himself as an
  experienced System Builder.
The topic is latency, or response-time. The time it takes for one
  operation to happen.
More accurately, as we’re dealing with not just one operation but
  many, the topic is how to study the latency behavior of a system: How
  does latency in general behave?
Pretty, misleading charts

If there’s one thing you should and can measure and plot, it’s the max
  response time. This is a genuine signal of something that happened.
However often the question we ask ourselves is what the most common
  response time was. This can however lead to misleading visualizations,
  unfortunately present in many monitoring and measuring systems.
see @url:
  http://latencytipoftheday.blogspot.de/2014/06/latencytipoftheday-q-whats-
  wrong-with_21.html

  Percentile
a percentage corresponding to a certain amount of
    observations from a system. The 95% percentile of
    requests is: if you take 100 requests, looking at 95
    of them.

A typical chart contains:

  response time of a system over time (2h)
  latency as one series per percentile

Beware of graphs that only plot only what’s convenient. Gil Tene here
  shows a graph that shows response times at most for 95% of
  requests. What happens for the 5% of requests that were left out?
Then the presenter makes a thought experiment. What is the user
  experience like?
He takes the example of users visiting a website. These users have a
  typical interaction of about 5 pages visit each of which make about 40
  requests. Each request is essential, and may lead to a bad user
  experience.
Let’s figure out how many users encounter the requests that were in
  our 95% percentile? Just 0.95^200 = 0.0035% of your users, which is
  basically nobody.
So why should we look at the 95% and below percentile? This chart was
  really misleading in that it gave us the impression that we were
  looking at a common case!
Let’s do the experiment in reverse. What percentile of requests should
  you look at to have an idea of what 99% of the users experience?
99% = y^200
  log(99%) = 200*log(y)
  y = exp(log(0.99)/200) = 99.9950 %
Gil Tene suggests collecting and printing the response time with x
  axis: percentile and y axis: response time, to get an overview of the
  response time at various percentiles.
Sale pitch: HdrHistogram. @url: https://github.com/HdrHistogram/HdrHistogram
The coordinated omission problem.

A problem due to measurement methodology. That induces bad conclusion about
  performance.
Under a serial load generator, emitting operations one after the other
  at a fixed rate, an unresponsive system that freezes for a long time,
  will give misleading read-outs, underestimating the extent of the
  freeze:
As the system becomes unresponsive (completely) for a long time, we
  get 1 bad response time, and n very good ones once it has become
  responsive again.
A naive load generator does that if it waits and does not emit
  requests until the previous one has responded.
In code measurement the same occurs: as we collect traces of
  operations, we won’t get new traces of what was wrong because we
  simply were unresponsive. This results in only keeping the “good”
  numbers.
Service Time vs Response Time.

The serial load generator/measurement shown in the part before was bad
  in that it only measured service time and did not include waiting
  time.
However latency is really composed of the total waiting + service
  times:
+---------------------+
| Response Time       | 
+------+--------------+
|      | Service Time |
+---------------------+
| Wait | Work         |
+------+--------------+


  Wait is pure waste. (Queuing to get service?)
  Work is productive. (Getting serviced)

Tip to verify whether your load generator/monitoring system measures actual response time:

  Push it hard. After a while the response time should grow linearly.
  Linear response is when the system reaches fragility.

Sustainable Throughput

Latency relating to throughput and load.
Figuring out how to avoid “saturation” is better than trying to study
  what happens after or at saturation. Figure out how fast are we while
  still being safe.
Comparing response time or latency behaviors

Differential comparison at various workloads. 40K/85K/ {90K bad}
  Looking at requirements can help comparing two setups/systems and
  assess whether which ones are more economical than others for the
  requirement.
Plot the max

Is a good visual summary for comparing two systems.

  
## ZZ_ReadingList.org

      
    Raw
  

              ZZ_ReadingList.org
            
          
  https://chadaustin.me/2017/05/writing-a-really-really-fast-json-parser/
  https://www.youtube.com/watch?v=lJ8ydIuPFeU

Test:

  http://telescopp.com/howto Orbit Profiler