Navigation Menu

Skip to content

Instantly share code, notes, and snippets.

View brendangregg's full-sized avatar

Brendan Gregg brendangregg

View GitHub Profile
@brendangregg
brendangregg / dockerpsns.sh
Last active August 23, 2023 10:11
docker ps --namespaces
#!/bin/bash
#
# dockerpsns - proof of concept for a "docker ps --namespaces".
#
# USAGE: ./dockerpsns.sh
#
# This lists containers, their init PIDs, and namespace IDs. If container
# namespaces equal the host namespace, they are colored red (this can be
# disabled by setting color=0 below).
#
@brendangregg
brendangregg / chaintest.py
Last active May 26, 2023 09:55
chaintest
#!/usr/bin/python
#
# chaintest Summarize off-CPU time by kernel stack + 2 waker stacks
# WORK IN PROGRESS. For Linux, uses BCC, eBPF.
#
# USAGE: chaintest [-h] [-u] [-p PID] [-i INTERVAL] [-T] [duration]
#
# PLEASE DO NOT RUN THIS IN PRODUCTION! This is a work in progress, intended to
# explore chain graphs on Linux, using eBPF capabilities from a particular
# kernel version (4.3ish). This tool will eventually get much better.
@brendangregg
brendangregg / gist:eebe3455fd8e528bb14d193a93d43b59
Created August 16, 2016 23:45
tcp dport fetching with ftrace on linux 3.13 x86_64
Using my perf-tools just to wrap ftrace:
# ./perf-tools/bin/kprobe 'p:tcp_v4_connect skc_dport=+2(%si):u16'
Tracing kprobe tcp_v4_connect. Ctrl-C to end.
telnet-9723 [001] d... 62326244.175951: tcp_v4_connect: (tcp_v4_connect+0x0/0x480) skc_dport=1600
telnet-9725 [001] d... 62326246.502760: tcp_v4_connect: (tcp_v4_connect+0x0/0x480) skc_dport=1700
telnet-9726 [001] d... 62326247.861937: tcp_v4_connect: (tcp_v4_connect+0x0/0x480) skc_dport=100
telnet-9727 [001] d... 62326249.220740: tcp_v4_connect: (tcp_v4_connect+0x0/0x480) skc_dport=e803
Now a crappy ntohs() to process the dport string:
@brendangregg
brendangregg / sysbench-analysis.txt
Created April 16, 2022 11:47
sysbench cpu x86 arm analysis
Some rough notes from an analysis of sysbench cpu on x86 vs ARM, which showed it was 2.6x faster on ARM only
because of a faster div instruction, which did not translate to a production win. The benchmark was misleading.
I also talked about this topic in my IntelON 2021 talk.
m6g.4xl
=======
test bgregg-focal-arm-v000 us-east-1 i-0d6f5a0062ee66c4e
(root) ~ # sysbench --max-requests=10000000 --max-time=60 --test=cpu --cpu-max-prime=100000 run
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
WARNING: --max-requests is deprecated, use --events instead
@brendangregg
brendangregg / decade.md
Last active April 15, 2022 04:08
decade quick benchmarks

There are some quick benchmarks for the "Decade of Wasted Cores" patches on Linux 4.1. I had to add "extern int sched_max_numa_distance;" to arch/x86/kernel/smpboot.c for Linux 4.1 to compile. Brief analysis during the benchmarks using time(1) and mpstat(1) to check runtimes, usr/sys time, and per-CPU balance; iostat(1) to check for disk bottlenecks.

Summary: no significant difference seen in these tests.

c3.8xlarge (32 CPU) PV 1-node NUMA

The patch shouldn't make a difference to this 1-node system, but I felt it worth checking, especially since most of our systems are 1-node.

@brendangregg
brendangregg / gist:f8ed5345cfc903599a60
Created August 5, 2014 01:08
dynamic tracing of ZFS on Linux, on Linux

So I just found ZFS on my test Linux ubuntu system, and gave my perf-tools (https://github.com/brendangregg/perf-tools) a spin.

Per-second zfs* calls:

# ./funccount -Ti 1 'zfs*'
Tracing "zfs*"... Ctrl-C to end.

Tue Aug  5 00:51:41 UTC 2014
FUNC                              COUNT
@brendangregg
brendangregg / fsmicrobench.md
Last active February 16, 2022 08:25
some FS micro-benchmarks

F1. FS 128k streaming writes

Benchmark: fio write

Command: fio --name=seqwrite --rw=write --bs=128k --size=4g --end_fsync=1 --loops=4 # aggrb tput

Rationale: Measure the performance of a single threaded streaming write of a reasonably large file. The aim is to measure how well the file system and platform can sustain a write workload, which will depend on how well it can group and dispatch writes. It's difficult to benchmark buffered file system writes in both a short duration and in a repeatable way, as performance greatly depends on if and when the pagecache begins to flush dirty data. As a workaround, an fsync() at the end of the benchmark is called to ensure that flushing will always occur, and the benchmark also repeats four times. While this provides a much more reliable measurement, it is somewhat worst-case (applications don't always fsync), providing closer to a minimum rate – rather than a maximum rate – that you should expect.

F2. FS cached 4k random reads

@brendangregg
brendangregg / usdt
Created January 29, 2016 17:24
usdt (ftrace)
#!/bin/bash
#
# usdt - trace user statically defined tracepoints. User-level dynamic tracing.
# Written using Linux ftrace. Experimental.
#
# WARNING: This is a proof of concept for USDT tracing from Linux ftrace, and
# is not safe to use in production environments. In particular, the -i option
# sets memory semaphores by piping the output of printf through dd and then
# to process memory via /proc/PID/mem. Yes, this program pipes the output of
# the shell directly over top of live process memory. If you don't understand
@brendangregg
brendangregg / cpuunclaimed.py
Last active February 14, 2019 01:46
cpuunclaimed.py draft
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# cpuunclaimed Sample CPU run queues and calculate unclaimed idle CPU.
# For Linux, uses BCC, eBPF.
#
# This samples the length of the run queues and determine when there are idle
# CPUs, yet queued threads waiting their turn. Report the amount of idle
# (yet unclaimed by waiting threads) CPU as a system-wide percentage.
#
@brendangregg
brendangregg / biosnoop.py
Created August 22, 2016 18:35
biosnoop with 4.7 fix
#!/usr/bin/python
# @lint-avoid-python-3-compatibility-imports
#
# biosnoop Trace block device I/O and print details including issuing PID.
# For Linux, uses BCC, eBPF.
#
# This uses in-kernel eBPF maps to cache process details (PID and comm) by I/O
# request, as well as a starting timestamp for calculating I/O latency.
#
# Copyright (c) 2015 Brendan Gregg.