How many CPU ticks it costs to allocate a piece of memory in heap via malloc()
?
TL;DR Around 200 ticks.
Here is the code:
#include <iostream>
#include <chrono>
using namespace std::chrono;
class Foo {
int x;
};
int sum(int total) {
auto start = high_resolution_clock::now();
int sum = 0;
for (int i = 0; i < total; i++) {
sum += 1;
sum += 1;
sum += 1;
sum += 1;
sum += 1;
sum += 1;
sum += 1;
sum += 1;
sum += 1;
sum += 1;
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
return duration.count();
}
int heap(int total) {
auto start = high_resolution_clock::now();
for (int i = 0; i < total; i++) {
new Foo();
new Foo();
new Foo();
new Foo();
new Foo();
new Foo();
new Foo();
new Foo();
new Foo();
new Foo();
}
auto stop = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(stop - start);
return duration.count();
}
int main() {
int total = 100000000;
int m1 = sum(total);
int m2 = heap(total);
std::cout << (m2 / m1) << std::endl;
return 0;
}
Compile it like this and run:
$ clang++ -O0 a.cpp && ./a.out
I'm getting numbers between 60 and 80.
Thus, a single malloc
costs about 180-240 CPU ticks, because sum += 1
is exactly three Assembly instructions. You can check it by compiling the code above into Assembly:
$ clang++ -S -mllvm --x86-asm-syntax=intel -O0 a.cpp
The file a.s
will contain Assembly, where sum += 1
is represented with the following three:
ldur w8, [x29, #-20]
add w8, w8, #1
stur w8, [x29, #-20]
By the way, it is MacBook Pro M2 with macOS Ventura 13.3.1:
$ clang -v
Homebrew clang version 15.0.7
Target: arm64-apple-darwin22.4.0
Thread model: posix
InstalledDir: /opt/homebrew/opt/llvm/bin