Skip to content

Instantly share code, notes, and snippets.

@navarrothiago
Created September 6, 2021 17:03
Show Gist options
  • Save navarrothiago/e1f7228610a0bd07aea4928f6381e61d to your computer and use it in GitHub Desktop.
Save navarrothiago/e1f7228610a0bd07aea4928f6381e61d to your computer and use it in GitHub Desktop.
bpftool features from Quentin Monnet

bpftool feature probe kernel

  1. "bpftool prog show" is used to list all BPF programs currently loaded on the system (loaded != attached)

  2. load a BPF program from ELF file “foo.o” to the system and pin it under the BPF virtual file system as “bar”:

# bpftool prog load foo.o /sys/fs/bpf/bar

pinning the program makes it persistent (and offers a handle for later management, e.g. to attach that program to a hook).

  1. dump bytecode for a program loaded on the system, as “translated” instructions:

# bpftool prog dump xlated id 40

"translated” means after kernel rewrites (as opposed to “llvm-objdump -d my_program_objfile.o”).

  1. dump JIT-compiled instructions for a BPF program (here from its pinned handle):

# bpftool prog dump jited pinned /sys/fs/bpf/foo

Obviously, works only for programs loaded when JIT is on.

  1. bpftool is not just about BPF programs, you can also manage BPF maps. Here is how to list the maps on the system:

# bpftool map show

for programs, the alias “bpftool map list” does the same. “bpftool map show id 7” shows info just for the map of given id.

  1. let's inspect BPF maps by retrieving one entry, here the second entry of an array map with:

# bpftool map lookup id 182 key 0x01 0x00 0x00 0x00

Or dump all entries of a given map:

# bpftool map dump id 182

note the use of host endianness for passing the key

  1. bpftool can print its output formatted as JSON. Use the “-j” (or “--json”) switch when typing commands to get a one-line JSON dump, or use “-p” (long option name: “--pretty”) to produce human-readable JSON with indent and line breaks.

  2. it is possible to use bpftool to create a map:

# bpftool map create /sys/fs/bpf/stats_map type array key 4 value 32 entries 8 name stats_map

map is pinned under the BPF virtual file system (or it would be lost when bpftool exits, as no BPF program uses it yet).

  1. update an entry of a compatible map type:

# bpftool map update id 7 key 3 0 0 0 value 1 1 168 192

"bpftool map update" is also used to create new entries, and "bpftool map delete" to remove them. Hash maps support it, but fixed-length arrays can only be updated.

  1. bpftool has a “hex” keyword to conjure the use of hexadecimal numbers in command key/value. All syntaxes bellow are equivalent:
# bpftool map lookup id 7 ...
    ... key 3 15 32 64
    ... key 0x3 0xf 0x20 0x40
    ... key 0x03 0x0f 0x20 0x40
    ... key hex 03 0f 20 40
  1. let's pin a BPF program to the BPF virtual file system, e.g. to keep it loaded once detached:

# bpftool prog pin id 27 /sys/fs/bpf/foo_prog

remove with “rm /sys/fs/bpf/foo_prog”. Also works for maps.

  1. once loaded, BPF programs of certain types can be attached with bpftool. This is the case of programs attached to sockets with:

# bpftool prog attach <program> <attach type> <target map>

or to cgroups, with:

# bpftool cgroup attach <cgroup> <attach type> <program> [flags]

  1. bpftool can show the programs attached to a given cgroup:

# bpftool cgroup show <cgroup>

it can iterate over cgroups and show all programs:

# bpftool cgroup tree [cgroup-root]

with no argument it defaults to the cgroup v2 mountpoint

  1. all tracing BPF programs currently attached on the system (to tracepoints, raw_tracepoints, k[ret]probes, u[ret]probes):

# bpftool perf show

“bpftool perf list” or simply “bpftool perf” produce the same output

  1. bpftool can be used to iterate over BPF map elements (this is especially useful with hash maps, with no predictable array indices):

# bpftool map getnext id 27 key 1 0 0 10

Returns the key of the “next” entry.

if no key is provided, it returns the “first” key from the map.

  1. Linux 5.1 introduces stats for attached BPF programs: total run time and run count. bpftool prints them with classic info dump:

# bpftool prog show

gathering stats impacts perf (~10 to 30 nsecs/run) so defaults to off, activate with: # sysctl -w kernel.bpf_stats_enabled=1

  1. similarly to bpftool cgroup tree or bpftool perf show, bpftool has a mode to dump programs related to network processing:

# bpftool net show

This lists programs attached to TC or XDP hooks.

It is possible to filter on a given interface: # bpftool net show dev <iface>

  1. load a program and reuse two existing maps (e.g.) (instead of automatically creating new ones):
# bpftool prog load foo.o /sys/fs/bpf/foo_prog \
        map idx 0 id 27 \
        map name stats pinned /sys/fs/bpf/stats_map

(“idx 0”: index of the map in the ELF program file)

  1. for object files with more than one BPF program, bpftool can load all of them at once:

# bpftool prog loadall bpf_flow.o /sys/fs/bpf/flow type flow_dissector

This is especially useful when working with BPF tail calls. Maps can be pinned by adding “pinmaps ”.

  1. there is a batch mode in bpftool for running several commands at once:

# bpftool batch file <file>

it can read commands from standard input if file is dash: # echo 'prog show \n map show \n net show' | bpftool batch file -

  1. bpftool can update “prog_array” maps (holding references to BPF programs, for BPF tail calls).
# bpftool map update pinned /sys/fs/bpf/my_prog_array_map \
        key 0 0 0 0 value pinned /sys/fs/bpf/my_prog

map MUST be pinned for this to work.

  1. bpftool can dump the C source code of a program in addition to BPF/jited insns.
# bpftool prog load xxx.o /sys/fs/bpf/xxx type classifier pinmaps /sys/fs/bpf/xxx_maps

# bpftool prog dump xlated pinned /sys/fs/bpf/xxx

must compile program with -g flag to clang and recent clang

btf also provides info on the structure of map entries, printable with:

# bpftool map dump pinned /sys/fs/bpf/xxx_maps/<map_name>
  1. bpftool can dump the trace pipe, used by BPF helper bpf_trace_printk() to print debug output.

# bpftool prog tracelog

shorter than “cat /sys/kernel/debug/tracing/trace_pipe”.

  1. perf events are used to stream data to user space and bpftool can dump this data:

# bpftool map event_pipe <MAP> [cpu <N> index <M>]

  1. there are also stack and queue maps in BPF and we can use bpftool to manipulate them. Because such maps don't rely on keys (only values), it differs somewhat from “bpftool map lookup/update”:
# bpftool map pop/dequeue/peek <map>
# bpftool map push/enqueue <map> value <val>
  1. bpftool also works for BPF hardware offload. you can list, load, dump, etc. programs and maps offloaded to a SmartNIC. You can also probe BPF features supported by the hardware:

# bpftool feature probe dev <ifname>

  1. bpftool has a “--bpffs” option (short name: “-f”) to print the path(s), if any, where those objects are pinned in the virtual file system:
# bpftool prog show --bpffs
# bpftool -f map
  1. bpftool just got support for dumping BTF information for BPF programs or maps, for a loaded BTF object, or from an object file containing one.

# bpftool btf dump <btf_source>

  1. bpftool can list all BTF objects loaded in the system:

# bpftool btf [show|list]

in addition to seeing BTF object attached to a given program or map

  1. bpftool can be used to “freeze” maps (make them read-only from user space, permissions unchanged from BPF program side):

# bpftool map freeze id 1337

  1. like “ip link”, bpftool can attach programs to the XDP hook (and later detach them):
# bpftool net attach xdp id 42 dev eth0
# bpftool net detach xdp dev eth0

program must be loaded already xdpgeneric/xdpdrv/xdpoffload variants also supported

  1. bpftool can generate a “skeleton” header file from a BPF program for inclusion in user space apps managing this BPF prog:

$ bpftool gen skeleton bpf_prog.o > user_prog.h

then include "user_prog.h".

details in “bpftool-gen” man page.

  1. bpftool can attach progs (fentry/fexit) to entry/exit of BPF programs and use perf events to collect stats.

# bpftool prog profile <prog> <metrics>

  1. bpftool can list/dump/register/unregister BPF-implemented “struct ops” used to substitute to kernel operations

# bpftool struct_ops ...

example: “struct tcp_congestion_ops” for custom TCP congestion algos see also bpftool-struct_ops man page

  1. “bpf_link” abstraction is used to represent and manage links between BPF programs and hooks. bpftool can show or pin (to bpffs) such links:
# bpftool link show
# bpftool link pin id 27 /sys/fs/bpf/my_link
  1. BPF “iterators” use “seq_ops” to help iterate on kernel data (think /proc -like info created with BPF. To work with such iterators, bpftool got a “iter” subcommand:
# bpftool iter pin <objfile.o> <bpffs_path>
# cat <bpffs_path>
  1. bpftool now supports “map iterators” to apply in-kernel filtering, aggregation, etc. to eBPF map entries before dumping them:
# bpftool iter pin <objfile.o> <bpffs_path> map <map_handle>
# cat <bpffs_path>
  1. BPF programs support custom metadata:

https://t.co/EGLqnhBMh6

  1. static linking: progs in multiple ELF object files can be linked into a single one with bpftool:

$ bpftool gen object output.o input1.o input2.o ...

able to link functions, subprograms, ..., defined in independent .o files.

eBPF libraries coming soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment