- Run cyclictest, hackbench, and count primes 1e12, concurrently.
- Run algorithm3 and speedometer 2.1 benchmarks, concurrently.
- Run hackbench -pTl 40000
- Run hackbench -l 10000
Hackbench runs fastest with HZ=750
under load. However, counting primes is
noticeably slower. It turns out HZ=800
works well, improving smoothness.
On rare occasions, my NVIDIA RTX 3070 fails initialization for one of the
displays. The solution has been to reboot the machine. Perhaps, enabling
HZ_800
may help resolve the issue. I will know over time.
hackbench time 29.611 29.825 28.726 29.668 30.371 28.545 29.487 29.665
count primes 20.339 20.309 20.425 20.425 20.280 20.193 20.576 20.235
cyclictest
max latency 1772 1947 2729 2671 2177 1913 2701 1493
avg latency 455 422 543 663 567 547 734 466
speedometer 2.1 179 179 173 176
algorithm3 4e12 69.734 69.973 70.217 70.207
taskset -c 0-7 hackbench -pTl 40000 9.526 11.143 9.561
taskset -c 0-7 hackbench -l 10000 5.707 5.729 5.711
hackbench time 28.017 28.458 28.521 30.447 28.149 29.753 28.221 29.578
count primes 20.341 20.346 20.449 20.329 20.289 20.192 20.656 20.323
cyclictest
max latency 1770 1997 2191 1649 1764 2312 1372 1877
avg latency 528 412 519 434 562 627 283 502
speedometer 2.1 180 179 180 178
algorithm3 4e12 69.876 70.046 69.968 70.185
taskset -c 0-7 hackbench -pTl 40000 9.357 9.234 9.280
taskset -c 0-7 hackbench -l 10000 5.679 5.722 5.667
Note: A future ClearMod update replaces HZ_750
to HZ_720
, resolves count primes regression.
hackbench time 25.127 25.348 25.204 25.188 25.705 26.204 26.756 26.612
count primes 23.495 22.940 22.614 22.364 22.120 22.002 21.954 21.646
cyclictest
max latency 1940 2218 1995 1667 1898 1727 2845 1878
avg latency 382 557 492 428 488 437 620 377
speedometer 2.1 177 179 180 178
algorithm3 4e12 69.275 69.690 69.744 69.911
taskset -c 0-7 hackbench -pTl 40000 9.715 11.155 9.425
taskset -c 0-7 hackbench -l 10000 5.774 5.694 5.735
Note: A future ClearMod update replaces HZ_600
to HZ_625
, better hackbench results.
hackbench time 28.849 28.798 29.715 29.788 29.538 27.608 29.760 29.752
count primes 20.284 20.384 20.395 20.425 20.318 20.255 20.672 20.346
cyclictest
max latency 1877 1114 1691 1595 2178 3325 2002 1974
avg latency 419 354 577 446 555 634 505 501
speedometer 2.1 174 174 175 171
algorithm3 4e12 70.392 70.161 70.174 70.190
taskset -c 0-7 hackbench -pTl 40000 9.465 10.639 9.624
taskset -c 0-7 hackbench -l 10000 5.698 5.691 5.694
hackbench time 27.686 27.716 28.441 29.355 27.844 29.766 28.547 29.673
count primes 20.292 20.264 20.413 20.339 20.374 20.158 20.582 20.335
cyclictest
max latency 1802 2536 2217 1901 2121 1991 2760 2119
avg latency 440 589 544 486 441 466 758 511
speedometer 2.1 182 178 176 176
algorithm3 4e12 69.755 70.086 70.075 70.164
taskset -c 0-7 hackbench -pTl 40000 10.635 9.471 10.532
taskset -c 0-7 hackbench -l 10000 5.683 5.681 5.656
The CPU is an AMD Ryzen Threadripper 3970X with 3600 MHz DDR4 memory.
A patch will be included in my clearmod
repo for adding HZ_800
, HZ_750
,
and HZ_600
to the XanMod kernel.
The algorithm3
program is located in my mce-sandbox
repo;
involving CPU, memory, and socket pairs for IPC.
Running cyclictest
for step one (requires sudo or root):
#!/bin/bash
max_latencies=$(
taskset -c 0-61 cyclictest -m -Sp99 -i200 -h400 -D30 -q 2>&1 |\
grep "^# Max Latencies:" | cut -f2 -d:
)
./tally.sh $max_latencies
Running hackbench
for step one:
#!/bin/bash
exec taskset -c 0-31 ./hackbench -s 512 -l 51200 -P
Counting prime numbers for step one:
cd /path/to/mce-sandbox/bin
./algorithm3.pl 1e12
The tally.sh script:
#!/bin/bash
line=$*
if [[ -z "$line" ]]; then
echo "usage: $0 num1 num2 ..."
exit
fi
max=$(
echo "$line" | tr -s ' ' '\n' | sed -e 's/^0\+//mg' | sort -rn |\
awk '{ print $1; exit }'
)
avg=$(
echo "$line" | tr -s ' ' '\n' | sed -e 's/^0\+//mg' |\
awk '{ t = t + $1; c += 1 } END { print t / c }'
)
echo "max latency: $max"
echo "avg latency: $avg"