Skip to content

Instantly share code, notes, and snippets.



Last active Aug 29, 2015
What would you like to do?

Writing benchmarks with nonius

Writing benchmarks is not easy. Nonius simplifies certain aspects but you'll always need to take care about various aspects. Understanding a few things about the way nonius runs your code will be very helpful when writing your benchmarks.

First off, let's go over some terminology that will be used throughout.

  • User code: user code is the code that the user provides to be measured.
  • Run: one run is one execution of the user code.
  • Sample: one sample is one data point obtained through measurement of the time it takes to perform a certain number of runs. One sample can consist of more than one run if the clock available does not have enough resolution to accurately measure a single run. All samples for a given benchmark execution are obtained with the same number of runs.

Execution procedure

Now I can explain how a benchmark is executed in nonius. There are three main steps, though the first does not need to be repeated for every benchmark.

  1. Environmental probe: before any benchmarks can be executed, the clock's resolution is estimated. A few other environmental artifacts are also estimated at this point, like the cost of calling the clock function, but they almost never have any impact in the results.

  2. Estimation: the user code is executed a few times to obtain an estimate of the amount of runs that should be in each sample. This also has the potential effect of bringing relevant code and data into the caches before the actual measurement starts.

  3. Measurement: all the samples are collected sequentially by performing the number of runs estimated in the previous step for each sample.

This already leads to one important rule for writing benchmarks for nonius: the benchmarks must be repeatable. The user code will be executed several times, and the number of times it will be executed during the estimation step cannot be known beforehand since it depends on the time it takes to execute the code. User code that cannot be executed repeatedly will lead to bogus results or crashes.

The optimizer

Sometimes the optimizer will optimize away the very code that you want to measure. There are several ways to use results that will prevent the optimiser from removing them. You can use the volatile keyword, or you can output the value to standard output or to a file, which require the program to actually generate the value somehow.

Nonius adds a third option. The values returned by any function provided as user code are guaranteed to be evaluated and not optimised out. This means that if your user code consists of computing a certain value, you don't need to bother with using volatile or forcing output. Just return it from the function. That helps with keeping the code in a natural fashion.

Here's an example:

// may measure nothing at all by skipping the long calculation since its
// result is not used
NONIUS_BENCHMARK("no return", [] { long_calculation(); })

// the result of long_calculation() is guaranteed to be computed somehow
NONIUS_BENCHMARK("with return", [] { return long_calculation(); })

However, there's no other form of control over the optimizer whatsoever. It is up to you to write a benchmark that actually measures what you want and doesn't just measure the time to do a whole bunch of nothing.

To sum up, there are two simple rules: whatever you would do in handwritten code to control optimization still works in nonius; and nonius makes return values from user code into observable effects that can't be optimized away.


The recommended way to use nonius is with the single header form. You can just #include <nonius.h++> and everything is available.

There are two distinct parts of the nonius interface: specifying benchmarks, and running benchmarks.


Nonius includes an imperative interface to specify benchmarks for execution, but the declarative interface is much simpler. As of this writing the imperative interface is still subject to change, so it won't be documented.

The declarative interface consists of the NONIUS_BENCHMARK macro. This macro expands to some machinery that registers the benchmark in a global registry that can be accessed by the standard runner.

NONIUS_BENCHMARK takes two parameters: a string literal with a unique name to identify the benchmark, and a callable object with the actual code. This callable object is usually provided as a lambda expression.

There are two types of callable objects that can be provided. The simplest ones take no arguments and just run the user code that needs to be measured. However, if the callable can be called with a nonius::chronometer argument, some advanced features are available. The simple callables are invoked once per run, while the advanced callables are invoked twice (once during the estimation phase, and another time during the execution phase).

NONIUS_BENCHMARK("simple", [] { return long_computation(); });

NONIUS_BENCHMARK("advanced", [](nonius::chronometer meter) {
    meter.measure([] { return long_computation(); });

These advanced callables no longer consist entirely of user code to be measured. In these cases, the code to be measured is provided via the nonius::chronometer::measure member function. This allows you to set up any kind of state that might be required for the benchmark but is not to be included in the measurements, like making a vector of random integers to feed to a sorting algorithm.

A single call to nonius::chronometer::measure performs the actual measurements by invoking the callable object passed in as many times as necessary. Anything that needs to be done outside the measurement can be done outside the call to measure.

The callable object passed in to measure can optionally accept an int parameter.

meter.measure([](int i) { return long_computation(i); });

If it accepts an int parameter, the sequence number of each run will be passed in, starting with 0. This is useful if you want to measure some mutating code, for example. The number of runs can be known beforehand by calling nonius::chronometer::runs; with this one can set up a different instance to be mutated by each run.

std::vector<std::string> v(meter.runs());
std::fill(v.begin(), v.end(), test_string());
meter.measure([&v](int i) { in_place_escape(v[i]); });

Note that it is not possible to simply use the same instance for different runs and resetting it between each run since that would pollute the measurements with the resetting code.


Nonius includes an implementation of main() that provides a command-line runner. This means you can just make your benchmarks into an executable and you're good to go. If you want that default implementation of main, just #define NONIUS_RUNNER before #including the nonius header.

You can also write your own main if you need something fancy, but for now that API is subject to change and not documented.

Invoking the standard runner with the --help flag provides information about the options available. Here are some examples of common choices:

Run all benchmarks and provide a simple textual report

$ runner

Run all benchmarks and provide extra details

$ runner -v

Run all benchmarks collecting 500 samples instead of the default 100, and report extra details

$ runner -v -s 500

Run all benchmarks and output all samples to a CSV file named results.csv

$ runner -r csv -o results.csv

Run all benchmarks and output a JUnit compatible report named results.xml

$ runner -r junit -o results.xml

Run all benchmarks and output an HTML report named results.html with the title "Some benchmarks", using 250 samples per benchmark

$ runner -r html -o results.html -t "Some benchmarks" -s 250
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment