Skip to content

Instantly share code, notes, and snippets.

Last active July 1, 2024 20:54
Show Gist options
  • Save pankkor/8a7bedc8498d26471d1b47b045f3e6db to your computer and use it in GitHub Desktop.
Save pankkor/8a7bedc8498d26471d1b47b045f3e6db to your computer and use it in GitHub Desktop.

Linux Perf Tools

Materials used to write this gist.


dmesg | tail
vmstat 1
mpstat -P ALL 1
pidstat 1
iostat -xz 1
free -m
sar -n DEV 1
sar -n TCP,ETCP 1


uptime for hight level load average.

Number of processes wanting to run. Includes processes blocked in uninterruptible IO.

dmesg system messages/errors.

Look for errors that can cause performance problems. "Out of memory", "TCP: ... dropping request", etc.

dmesg | grep oom-killer

vmstat for virtual memory statistics.

  • r: Number of processes running or waiting. Doesn't include IO. IF r > num_cpu THEN saturation.
  • b: Number of processes blocked by IO.
  • free: Free memory in kb
  • buffers: buffer cache, used for block device I/O.
  • cached: page cache, used by file systems.
  • si, so: Swap-ins and swap-outs. IF si,so != 0 THEN out_of_memory.
  • us, sy, id, wa, st: CPU time average
    • us: user time.
    • sy: system time. IF sy > 20% THEN kernel processing IO inefficeitnly.
    • id: idle.
    • wa: IO wait.
    • st: stolten time, time spent by hypervisor for other VMs.

mpstat -P ALL 1 fur CPU time breakdown per CPU.

Look out for signle hot CPU

pidstat 1 rolling process summary.

Like top but prints a rolling summary instead of clearing the screen.

iostat -xz 1 disk usage analysis.

  • r/s: delivered reads per second
  • rkB/s: kB read per second
  • w/s: delivered writes per second
  • wkB/s: kB write per second
  • await: the average time for the IO in milliseconds. Includes both time queued and time being serviced. IF high THEN device_saturation | device_problems
  • avgqu-sz: the average number of requests issued to the device. IF avgqu-sz > 1 THEN could_be saturation. Still Multilple back-end disk devices can operate on requests in parallel.
  • %util: device utilization (busy %). IF %util > 60% THEN poor_performance (double check with await). IF %util ~= 100% THEN saturation

free -m -s 1 (-m display in MB, -h - display human readable).

  • buffers: buffer cache, used for block device I/O.
  • cached: page cache, used by file systems.
  • buff/cache: sum of buffers and cached.
  • available: used for caches but could be quickly made available for the application. IF buffers or cached ~= 0 THEN higher disk IO

sar -n DEV 1 for network interface throughput.

  • rxpck/s: number of packets received per second.
  • txpck/s: number of packets transmitted per second.
  • rxkB/s: number of kilobytes received per second.
  • txkB/s: number of kilobytes transmitted per second.
  • %ifutil: utilization percentage of the network interface. Could be unreliable.

sar -n TCP,ETCP 1 for summarized view of some key TCP metrics.

  • active/s: number of locally-initiated TCP connections per second (e.g., via connect()). (~ outbound)
  • passive/s: number of remotely-initiated TCP connections per second (e.g., via accept()). (~ inbound)
  • retrans/s: number of TCP retransmits per second. Sign of network or server issue.

perf sampling profiler.

You'll need a call graph:

  • --call-graph lbr - aka Last Branch Record utilizes special hardware registers to store some limited call graph of last branching instruction (you can expect aroudn ~32 entries). Very fast, but requires modern hardware >Haswell >ARMv9.2-A.
  • --call-graph fp - use frame pointer to determine call graph, use if your binary is built with frame pointer (-fno-omit-frame-pointer)
  • --call-graph dwarf - saves 8k of call stack to be analyzed later together with debug info. Produces large records, which are extremely slow to perf report. Practically unuseful with high sampling rate, therefore limit sampling rate to 99 Hz with -F99.

Example of comamnds:

Attach to running process to sample it for 10 seconds with 1000 Hz sample rate and LBR call-graph. Creates record.

perf record -p <pid> --call-graph lbr -F1000 -- sleep 10

Sample all the system for 10 secods with dward debug info, limiting samling rate to 99 Hz

perf record -a --call-graph dwarf -F99 -- sleep 10

If run on remote system, pack all necessary information to be analyzed later on a host system. This will create a .tar archive. Copy it together with to the host system.

perf archive

On the host system unpack the .tar archive. This will extract .tar archive to ~/.debug

perf archive --unpack

You can later generate a report with from remote system

perf report

perf-archive for Ubuntu

perf-archive is missing from all the Ubuntu perf packages. Get one from Linux source:

mkdir /usr/libexec/perf-core/
wget -O /usr/libexec/perf-core/perf-archive
chmod +x /usr/libexec/perf-core/perf-archive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment