Skip to content

Instantly share code, notes, and snippets.

@daverigby
Last active March 2, 2023 11:48
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save daverigby/e016daf3bd0ae341274bd5b936e2c49c to your computer and use it in GitHub Desktop.
Save daverigby/e016daf3bd0ae341274bd5b936e2c49c to your computer and use it in GitHub Desktop.

Step 1 - Gather profile

  1. Install debuginfo package (in addition to main RPM/deb)
  2. Install linux perf tools:
    • RHEL / CentOS / Amazon Linux: yum install perf.
    • Note: CentOS7 has a pretty old version of perf; which includes some issues decoding backtraces via DWARF. Prefer a newer version if possible. I have a local build of 5.11 for CentOS7 at ~/Documents/linux-perf-5.11-with-separate_debuginfo_unwind_fix
  3. Start workload to be measured
  4. Record profile
    1. x86-64:
      perf record -F 199 -g --call-graph dwarf -p $(pgrep memcached) -- sleep 30
      
    2. AArch64:
      perf record -F 999 -g -p $(pgrep memcached) -- sleep 30
      
  5. Convert binary profile to readible. Note: this must be done with the same memcached binary as used above - i.e. don't try to archive the perf.data and create this later, do this straight after profiling.
perf script -i perf.data > profile.linux-perf.txt

Step 2 - Analyse

Good choices are:

Notes

Explanation of options:

  • -F : Sampling frequenty (Hz). Non-round number to reduce chance of accidently sampling at same rate as periodic tasks in process. Higher can be better if running for short period, but dwarf call-graph mode is expensive so 199Hz is generally reasonable for DWARF. 999 is a good default when using frame-pointer (default on AArch64).
  • -g : Call-graph mode
  • --call-graph dwarf (x86-64): Use DWARF debugging informaiton for determining call-graphs. While dwarf call-graph mode is slower and requires more space (resulting in lower sampling frequency), it does work with our existing binaries and crucially libc / libpthread which do omit the frame-pointer on x86-64 - i.e even rebuilding kv_engine with -fno-omit-frame-pointer won't give complete backtraces.
  • -p : Process to sample.
  • -- sleep 30 : Duration to sample. 30s is normally long enough to get useful amount of data, without massive files (assuming memcached is busy). If memcached is idle / only slighly loaded, a longer duration (60s, 300s) might be needed.

Kernel symbols

These are not normally needed, but can be useful if looking into syscall cost etc.

From https://hadibrais.wordpress.com/2017/03/13/installing-ubuntu-kernel-debugging-symbols/ :

codename=$(lsb_release -c | awk  '{print $2}')
sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ ${codename}      main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-updates  main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse
EOF
sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment