- Install debuginfo package (in addition to main RPM/deb)
- Install linux perf tools:
- RHEL / CentOS / Amazon Linux:
yum install perf
. - Note: CentOS7 has a pretty old version of perf; which includes some issues decoding backtraces via DWARF. Prefer a newer version if possible. I have a local build of 5.11 for CentOS7 at
~/Documents/linux-perf-5.11-with-separate_debuginfo_unwind_fix
- RHEL / CentOS / Amazon Linux:
- Start workload to be measured
- Record profile
- x86-64:
perf record -F 199 -g --call-graph dwarf -p $(pgrep memcached) -- sleep 30
- AArch64:
perf record -F 999 -g -p $(pgrep memcached) -- sleep 30
- x86-64:
- Convert binary profile to readible. Note: this must be done with the same memcached binary as used above - i.e. don't try to archive the
perf.data
and create this later, do this straight after profiling.
perf script -i perf.data > profile.linux-perf.txt
Good choices are:
- http://speedscope.app - Good for single-thread viewing (colour-coded, different views)
- https://profiler.firefox.com - Good for multi-threaded viewing (shows multiple threads on gantt chart, filtering, can save profiles and link to others).
- -F : Sampling frequenty (Hz). Non-round number to reduce chance of accidently sampling at same rate as periodic tasks in process. Higher can be better if running for short period, but dwarf call-graph mode is expensive so 199Hz is generally reasonable for DWARF. 999 is a good default when using frame-pointer (default on AArch64).
- -g : Call-graph mode
- --call-graph dwarf (x86-64): Use DWARF debugging informaiton for determining call-graphs. While
dwarf
call-graph mode is slower and requires more space (resulting in lower sampling frequency), it does work with our existing binaries and crucially libc / libpthread which do omit the frame-pointer on x86-64 - i.e even rebuilding kv_engine with-fno-omit-frame-pointer
won't give complete backtraces. - -p : Process to sample.
- -- sleep 30 : Duration to sample. 30s is normally long enough to get useful amount of data, without massive files (assuming memcached is busy). If memcached is idle / only slighly loaded, a longer duration (60s, 300s) might be needed.
These are not normally needed, but can be useful if looking into syscall cost etc.
From https://hadibrais.wordpress.com/2017/03/13/installing-ubuntu-kernel-debugging-symbols/ :
codename=$(lsb_release -c | awk '{print $2}')
sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ ${codename} main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse
EOF
sudo apt-get update
sudo apt-get install linux-image-$(uname -r)-dbgsym