pando85/debug.md

## debug.md

      
    Raw
  

              debug.md
            
          
    Debug


Debug

Methodologies

Workload characterization
Drill-down
USE Method
Checklist

60s analysis
BCC tool checklist


CPUs

perf
execsnoop
exitsnoop
runqlat
runqlen
runqslower
cpudist
cpufreq
profile
offcputime
syscount
argdist and trace
funccount
softirqs
hardirqs
smpcalls
llcstat


Memory

Traditional tools
Profiler (TODO)
Page faults


Filesystem

Traditional tools (TODO)
Opensnoop


Disks

biolatency
biosnoop


Methodologies

Workload characterization


Who is causing the load (e.g., PID, process name, UID, IP address)?
Why is the load called (code path, stack trace, flame graph)?
What is the load (IOPS, throughput, type)?
How is the load changing over time (per-interval summaries)?

Drill-down


Start examining the highest level.
Examine next-level details.
Pick the most interesting breakdown or clue.
If the problem is unsolved, go back to step 2.

USE Method

For every resource, check:

Utilization
Saturation
Errors


Hardware targets for USE method analysis.
Checklist

60s analysis


uptime


dmesg | tail


vmstat 1
r: The number of processes running on CPU and waiting for a turn. This provides a better
signal than load averages for determining CPU saturation, as it does not include I/O.
To interpret: an "r" value greater than the CPU count indicates saturation.
free: Free memory, in Kbytes. If there are too many digits to count, you probably have
enough free memory. The free -m command, included in Section 3.3.7 better explains the state
of free memory.
si and so: Swap-ins and swap-outs. If these are non-zero, you’re out of memory.
These are only in use if swap devices are configured.
us, sy, id, wa, and st: These are breakdowns of CPU time, on average,
across all CPUs. They are user time, system time (kernel), idle, wait I/O, and
stolen time (by other guests, or, with Xen, the guest’s own isolated driver domain).


mpstat -P ALL 1
This command prints per-CPU time broken down into states. The output reveals a problem: CPU 0
has hit 100% user time, evidence of a single-thread bottleneck.
Also look out for high %iowait time, which can be explored with disk I/O tools, and high
%sys time, which can be explored with syscall and kernel tracing, as well as CPU profiling


pidstat 1


iostat -xz 1


free -m


sar -n DEV 1


sar -n TCP,ETCP 1


top


kubectl debug -n ${NAMESPACE} -it ${POD} --image=debian:10.9-slim --target=${CONTAINER}
apt update && apt install -y htop sysstat procps
BCC tool checklist


execsnoop
opensnoop
ext4slower (or btrfs*, xfs*, zfs*)
biolatency
biosnoop
cachestat
tcpconnect
tcpaccept
tcpretrans
runqlat
profile

CPUs

perf

We can use flamegraph directly from flamegraph-rs.
E.g.:
flamegraph -o /tmp/timer_flamegraph.svg timer 10s

execsnoop

execsnoop(8)3 is a BCC and bpftrace tool that traces new process execution system-wide. It can
find issues of short-lived processes that consume CPU resources and can also be used to debug
software execution, including application start scripts.
exitsnoop

exitsnoop(8)6 is a BCC tool that traces when processes exit, showing their age and exit reason.
The age is the time from process creation to termination, and includes time both on and off CPU.
Like execsnoop(8), exitsnoop(8) can help debug issues of short-lived processes, providing
different information to help understand this type of workload.
runqlat

runqlat(8)8 is a BCC and bpftrace tool for measuring CPU scheduler latency, often called run
queue latency (even when no longer implemented using run queues). It is useful for identifying
and quantifying issues of CPU saturation, where there is more demand for CPU resources than
they can service. The metric measured by runqlat(8) is the time each thread (task) spends
waiting for its turn on CPU.
runqlen

runqlen(8)11 is a BCC and bpftrace tool for sampling the length of the CPU run queues, counting
how many tasks are waiting their turn, and presenting this as a linear histogram. This can be
used to further characterize issues of run queue latency or as a cheaper approximation.
runqslower

runqslower(8)12 is a BCC tool that lists instances of run queue latency exceeding a configurable
threshold and shows the process that suffered the latency and its duration.
cpudist

cpudist(8)13 is a BCC tool for showing the distribution of on-CPU time for each thread wakeup.
This can be used to help characterize CPU workloads, providing details for later tuning and
design decisions.
cpufreq

cpufreq(8)14 samples the CPU frequency and shows it as a system-wide histogram, with per-process
name histograms. This only works for CPU scaling governors that change the frequency, such
as powersave, and can be used to determine the clock speed at which your applications are running.
profile

Flame graphs are visualizations of stack traces that can help you quickly understand profile(8)
output. They were introduced in Chapter 2.
To support flame graphs, profile(8) can produce output in folded format using -f: Stack traces
are printed on one line, with functions separated by semicolons. For example, writing a 30-second
profile to an out.stacks01 file and including kernel annotations (-a):
cd /tmp
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph

profile -af 30 > out.stacks01
./flamegraph.pl --color=java < out.stacks01 > out.svg

offcputime

offcputime(8)16 is a BCC and bpftrace tool to summarize time spent by threads blocked and off CPU,
showing stack traces to explain why. For CPU analysis, this tool explains why threads are not
running on a CPU. It’s a counterpart to profile(8); between them, they show the entire time spent
by threads on the system: on-CPU time with profile(8) and off-CPU time with offcputime(8).
cd /tmp
git clone https://github.com/brendangregg/FlameGraph
cd FlameGraph

offcputime -fKu 5 > out.offcputime01.txt
./flamegraph.pl --hash --bgcolors=blue --title="Off-CPU Time Flame Graph" \
    < out.offcputime01.txt > out.offcputime01.svg

syscount

syscount(8)19 is a BCC and bpftrace tool for counting system calls system-wide. It is included in
this chapter because it can be a starting point for investigating cases of high system CPU time.
argdist and trace

argdist(8) and trace(8) are introduced in Chapter 4, and are BCC tools that can examine events in
custom ways. As a follow-on from syscount(8), if a syscall was found to be called frequently,
you can use these tools to examine it in more detail.
For example, the read(2) syscall was frequent in the previous syscount(8) output. You can use
argdist(8) to summarize its arguments and return value by instrumenting either the syscall tracepoint or its kernel functions. For the tracepoint, you need to find the argument names, which the BCC tool tplist(8) prints out with the -v option:
# tplist -v syscalls:sys_enter_read
syscalls:sys_enter_read
    int __syscall_nr;
    unsigned int fd;
    char * buf;
    size_t count;

argdist -H 't:syscalls:sys_enter_read():int:args->count'

funccount

funccount(8), introduced in Chapter 4, is a BCC tool that can frequency-count functions and other
events. It can be used to provide more context for software CPU usage, showing which functions
are called and how frequently. profile(8) may be able to show that a function is hot on CPU, but
it can’t explain why20: whether the function is slow, or whether it was simply called millions
of times per second.
softirqs

softirqs(8) is a BCC tool that shows the time spent servicing soft IRQs (soft interrupts).
The system-wide time in soft interrupts is readily available from different tools. For example,
mpstat(1) shows it as %soft. There is also /proc/softirqs to show counts of soft IRQ events. The BCC softirqs(8) tool differs in that it can show time per soft IRQ rather than event count.
hardirqs

hardirqs(8)21 is a BCC tool that shows time spent servicing hard IRQs (hard interrupts). The
system-wide time in hard interrupts is readily available from different tools. For example,
mpstat(1) shows it as %irq. There is also /proc/interrupts to show counts of hard IRQ events.
The BCC hardirqs(8) tool differs in that it can show time per hard IRQ rather than event count.
smpcalls

smpcalls(8)22 is a bpftrace tool to trace and summarize time in the SMP call functions (also known
as cross calls). These are a way for one CPU to run functions on other CPUs, including all other
CPUs, which can become an expensive activity on large multi-processor systems. For example, on a
36-CPU system:
llcstat

llcstat(8)23 is a BCC tool that uses PMCs to show last-level cache (LLC) miss rates and hit ratios
by process. PMCs are introduced in Chapter 2.
Memory


Traditional tools


Tool
Type
Description


dmesg
Kernel log
OOM killer event details


swapon
Kernel statistics
Swap device usage


free
Kernel statistics
System-wide memory usage


ps
Kernel statistics
Process statistics, including memory usage*


pmap
Kernel statistics
Process memory usage by segment


vmstat
Kernel statistics
Various statistics, including memory


sar
Kernel statistics
Can show page fault and page scanner rates*


perf
Software events, hardware statistics, hardware sampling
Memory-related PMC statistics and event sampling


*ps output:

%MEM: The percentage of the system’s physical memory in use by this process
VSZ: Virtual memory size
RSS: Resident set size: the total physical memory in use by this process

*sar: sar -B 1
Profiler (TODO)

heaptrack: memory profiler for Linux
Page faults

bpftrace -e 'software:major-faults:1 { @[comm, pid] = count(); }'

bpftrace -e 'software:major-faults:1 { printf("%s %s\n", comm, str(pid)); }'

Filesystem


Traditional tools (TODO)


Tool
Type
Description


df


mount


strace


perf


fatrace

fatrace(1) is a specialized tracer that uses the Linux fanotify API (file access notify)


Opensnoop

opensnoop(8)4 was shown in Chapters 1 and 4, and is provided by BCC and bpftrace. It traces file opens and is useful for discovering the location of data files, log files, and configuration files. It can also discover performance problems caused by frequent opens, or help troubleshoot issues caused by missing files.
Disks

biolatency

biolatency(8)2 is a BCC and bpftrace tool to show block I/O device latency as a histogram. The term device latency refers to the time from issuing a request to the device, to when it completes, including time spent queued in the operating system.
biosnoop

biosnoop(8)3 is a BCC and bpftrace tool that prints a one-line summary for each disk I/O.
Tool	Type	Description
dmesg	Kernel log	OOM killer event details
swapon	Kernel statistics	Swap device usage
free	Kernel statistics	System-wide memory usage
ps	Kernel statistics	Process statistics, including memory usage*
pmap	Kernel statistics	Process memory usage by segment
vmstat	Kernel statistics	Various statistics, including memory
sar	Kernel statistics	Can show page fault and page scanner rates*
perf	Software events, hardware statistics, hardware sampling	Memory-related PMC statistics and event sampling
Tool	Type	Description
df
mount
strace
perf
fatrace		fatrace(1) is a specialized tracer that uses the Linux fanotify API (file access notify)