Skip to content

Instantly share code, notes, and snippets.

@pelletier
Last active June 3, 2021 13:03
Show Gist options
  • Save pelletier/b1e9380b09313db394fa2bfa447751cf to your computer and use it in GitHub Desktop.
Save pelletier/b1e9380b09313db394fa2bfa447751cf to your computer and use it in GitHub Desktop.
Benchmarking Go on AMD

While benchmarking https://github.com/pelletier/go-toml/tree/v2, I decided to play with CPU frequency scaling, to eliminate some noise in the benchmarks.

Running on the following:

goos: linux
goarch: amd64
cpu: AMD Ryzen 9 5950X 16-Core Processor
kernel: 5.12.8-300.fc34.x86_64

Setting CPU scaling to max and disabling boosting seems to provide a steady 3.4GHz on all cores.

As seen in results, it significantly speeds up this program, so benchmarks need to be re performed accordingly.

Set CPU scaling to max

# echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

Disable boosting

https://www.kernel.org/doc/Documentation/cpu-freq/boost.txt

# echo 0 > /sys/devices/system/cpu/cpufreq/boost

Checking CPU

$ grep "cpu MHz" /proc/cpuinfo 
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000
cpu MHz		: 3400.000

Results

name                                old time/op    new time/op     delta
UnmarshalDataset/config-32            35.5ms ± 5%     20.4ms ± 1%  -42.68%  (p=0.000 n=10+9)
UnmarshalDataset/canada-32             111ms ± 6%       72ms ± 1%  -35.40%  (p=0.000 n=10+10)
UnmarshalDataset/citm_catalog-32      42.5ms ± 2%     25.1ms ± 1%  -40.79%  (p=0.000 n=9+10)
UnmarshalDataset/twitter-32           15.7ms ± 2%      9.4ms ± 1%  -39.97%  (p=0.000 n=10+9)
UnmarshalDataset/code-32               129ms ± 4%       89ms ± 1%  -31.26%  (p=0.000 n=10+10)
UnmarshalDataset/example-32            279µs ± 1%      163µs ± 1%  -41.69%  (p=0.000 n=9+10)
Unmarshal/SimpleDocument/struct-32     950ns ± 5%      580ns ± 1%  -38.95%  (p=0.000 n=10+10)
Unmarshal/SimpleDocument/map-32       1.37µs ± 8%     0.83µs ± 1%  -39.29%  (p=0.000 n=10+10)
Unmarshal/ReferenceFile/struct-32     60.5µs ± 7%     36.0µs ± 1%  -40.51%  (p=0.000 n=10+10)
Unmarshal/ReferenceFile/map-32         105µs ± 3%       59µs ± 1%  -43.40%  (p=0.000 n=9+10)
Unmarshal/HugoFrontMatter-32          20.8µs ± 9%     12.4µs ± 0%  -40.27%  (p=0.000 n=10+9)

name                                old speed      new speed       delta
UnmarshalDataset/config-32          29.5MB/s ± 5%   51.5MB/s ± 1%  +74.35%  (p=0.000 n=10+9)
UnmarshalDataset/canada-32          19.8MB/s ± 6%   30.7MB/s ± 1%  +54.69%  (p=0.000 n=10+10)
UnmarshalDataset/citm_catalog-32    13.1MB/s ± 2%   22.2MB/s ± 1%  +68.87%  (p=0.000 n=9+10)
UnmarshalDataset/twitter-32         28.2MB/s ± 2%   47.0MB/s ± 1%  +66.57%  (p=0.000 n=10+9)
UnmarshalDataset/code-32            20.8MB/s ± 4%   30.3MB/s ± 1%  +45.42%  (p=0.000 n=10+10)
UnmarshalDataset/example-32         29.1MB/s ± 1%   49.8MB/s ± 1%  +71.48%  (p=0.000 n=9+10)
Unmarshal/SimpleDocument/struct-32  11.6MB/s ± 5%   19.0MB/s ± 1%  +63.70%  (p=0.000 n=10+10)
Unmarshal/SimpleDocument/map-32     8.04MB/s ± 9%  13.22MB/s ± 1%  +64.38%  (p=0.000 n=10+10)
Unmarshal/ReferenceFile/struct-32   86.8MB/s ± 7%  145.7MB/s ± 1%  +67.76%  (p=0.000 n=10+10)
Unmarshal/ReferenceFile/map-32      50.0MB/s ± 3%   88.4MB/s ± 1%  +76.66%  (p=0.000 n=9+10)
Unmarshal/HugoFrontMatter-32        26.3MB/s ± 9%   43.9MB/s ± 0%  +67.06%  (p=0.000 n=10+9)

name                                old alloc/op   new alloc/op    delta
UnmarshalDataset/config-32            5.91MB ± 0%     5.91MB ± 0%     ~     (p=0.504 n=9+9)
UnmarshalDataset/canada-32            84.4MB ± 0%     84.4MB ± 0%     ~     (p=0.381 n=10+10)
UnmarshalDataset/citm_catalog-32      35.6MB ± 0%     35.6MB ± 0%     ~     (p=0.123 n=10+10)
UnmarshalDataset/twitter-32           13.5MB ± 0%     13.5MB ± 0%     ~     (p=0.051 n=10+8)
UnmarshalDataset/code-32              22.2MB ± 0%     22.2MB ± 0%     ~     (p=0.306 n=10+10)
UnmarshalDataset/example-32            193kB ± 0%      193kB ± 0%     ~     (p=0.323 n=10+9)
Unmarshal/SimpleDocument/struct-32      597B ± 0%       597B ± 0%     ~     (all equal)
Unmarshal/SimpleDocument/map-32         973B ± 0%       973B ± 0%     ~     (all equal)
Unmarshal/ReferenceFile/struct-32     11.6kB ± 0%     11.6kB ± 0%     ~     (all equal)
Unmarshal/ReferenceFile/map-32        28.9kB ± 0%     28.9kB ± 0%     ~     (all equal)
Unmarshal/HugoFrontMatter-32          7.39kB ± 0%     7.39kB ± 0%     ~     (all equal)

name                                old allocs/op  new allocs/op   delta
UnmarshalDataset/config-32              233k ± 0%       233k ± 0%     ~     (all equal)
UnmarshalDataset/canada-32              782k ± 0%       782k ± 0%     ~     (all equal)
UnmarshalDataset/citm_catalog-32        192k ± 0%       192k ± 0%     ~     (p=0.137 n=8+10)
UnmarshalDataset/twitter-32            56.9k ± 0%      56.9k ± 0%     ~     (p=0.086 n=9+9)
UnmarshalDataset/code-32               1.06M ± 0%      1.06M ± 0%     ~     (all equal)
UnmarshalDataset/example-32            1.36k ± 0%      1.36k ± 0%     ~     (all equal)
Unmarshal/SimpleDocument/struct-32      7.00 ± 0%       7.00 ± 0%     ~     (all equal)
Unmarshal/SimpleDocument/map-32         12.0 ± 0%       12.0 ± 0%     ~     (all equal)
Unmarshal/ReferenceFile/struct-32        182 ± 0%        182 ± 0%     ~     (all equal)
Unmarshal/ReferenceFile/map-32           649 ± 0%        649 ± 0%     ~     (all equal)
Unmarshal/HugoFrontMatter-32             143 ± 0%        143 ± 0%     ~     (all equal)

The old benchmark had turbo enabled and was using the schedutil governor.

Nothing new under the sun, just a reminder for myself to set that up when benchmarking.

CPU pin / nice

Experiementally this has the least variance on my machine for this project:

export GOMAXPROCS=1; nice -n -19 taskset -c 4 go test -run=Nothing ./benchmark -bench=Unmarshal -count 10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment