Skip to content

Instantly share code, notes, and snippets.

@grahamking
Last active November 7, 2023 21:17
Show Gist options
  • Save grahamking/9c8c91b871843a9a6ce2bec428b8f48d to your computer and use it in GitHub Desktop.
Save grahamking/9c8c91b871843a9a6ce2bec428b8f48d to your computer and use it in GitHub Desktop.
Bash script to run a benchmark under decent conditons.
#!/bin/bash
#
# Usage: runperf ./my-benchmark-binary
#
# Script to run a benchmark / performance test in decent conditions. Based on:
# - https://www.llvm.org/docs/Benchmarking.html
# - "Performance Analysis and Tuning on Modern CPU" by Denis Bakhvalov, Appendix A.
# - https://github.com/andikleen/pmu-tools
#
# Note that this doesn't do any actual benchmarking, your binary must be able to do that all by itself.
# Instead, this sets up the machine to make your benchmarks reliable.
#
# Usage with rust/cargo criterion (https://github.com/bheisler/cargo-criterion):
# Build the binary: `cargo criterion --bench my-bench --no-run`
# Run it: `runperf ./target/release/deps/my-bench-<hex-string> --bench`
#
# Setup
#
# (optional) mount input / output files in ramdisk to eliminate disk access variability
# mount -t tmpfs -o size=<XX>g none dir_to_mount
echo "Disable address space randomization"
echo 0 | sudo tee /proc/sys/kernel/randomize_va_space > /dev/null
echo "Disable kernel NMI watchdog"
sudo sysctl -q -w kernel.nmi_watchdog=0
echo "Allow access to perf events"
# Sometimes we run as ./runperf perf stat <the-binary>
echo -1 | sudo tee /proc/sys/kernel/perf_event_paranoid > /dev/null
echo "Set scaling governor to performance"
# If you don't have `cpupower`:
# for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# do
# echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# done
sudo cpupower frequency-set -g performance > /dev/null
echo "Disable turbo mode (short-term higher freq)"
# This is the biggest source of variability. For short runs the CPU can go much faster but it heats
# up and cannot sustain the boost, so later runs will be slower.
echo 1 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo > /dev/null
echo "Move system processes to second two cores (and their hyper-threaded pairs)"
# Adjust this if your topology is different. I have four cores / eight threads.
#
# This only moves system stuff not user stuff, so your terminal/browser/Gnome are all still using the shared CPU.
# Ideally we would also do the same for `user.slice`, but then `taskset` later can't use the reserved CPUs.
# To mitigate we use `nice` to make our job higher priority.
sudo systemctl set-property --runtime system.slice AllowedCPUs=2,3,6,7
echo "Disable the hyper-threading pair of the reserved CPUs"
# Find the pair: cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list
echo 0 | sudo tee /sys/devices/system/cpu/cpu4/online > /dev/null
echo 0 | sudo tee /sys/devices/system/cpu/cpu5/online > /dev/null
#
# Run the script
# . on our reserved CPUs so it doesn't migrate
# . re-niced so it doesn't get context switched
#
taskset -c 0,1 sudo nice -n -5 runuser -u $USERNAME -- $@
# Monitor cpu-migrations and context-switches. They should both be 0. perf must come before nice to make the benchmark higher priority.
#taskset -c 0,1 perf stat -e context-switches,cpu-migrations sudo nice -n -5 $@
#
# Restore
#
echo "Restoring to non-perf settings"
echo 1 | sudo tee /sys/devices/system/cpu/cpu4/online > /dev/null
echo 1 | sudo tee /sys/devices/system/cpu/cpu5/online > /dev/null
sudo systemctl set-property --runtime system.slice AllowedCPUs=0-7
echo 0 | sudo tee /sys/devices/system/cpu/intel_pstate/no_turbo > /dev/null
sudo cpupower frequency-set -g powersave > /dev/null
echo 0 | sudo tee /proc/sys/kernel/perf_event_paranoid
sudo sudo sysctl -q -w kernel.nmi_watchdog=1
echo 2 | sudo tee /proc/sys/kernel/randomize_va_space > /dev/null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment