Skip to content

Instantly share code, notes, and snippets.

@coder-mike
Last active October 21, 2022 20:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save coder-mike/26fdade2a5166b3858523b0e2727852c to your computer and use it in GitHub Desktop.
Save coder-mike/26fdade2a5166b3858523b0e2727852c to your computer and use it in GitHub Desktop.
Closure size measurements

Closure size measurements

Open for details.

Closure size measurements

Summary

The sizes in the following table include the cost of the reference to the closure. These are all measured on my 64-bit Windows desktop machine. The "Min size" is measures a closure with 1 variable, which is assumed to be a small integer.

Env Methodology Min size Asymptotic size for n integer vars
Microvium Counting based on theory 8 B 6 + 2n
C++ VC++ Profiler 40 B 80 + 4n
Moddable XS Counting in a debugger 160 B 128 + 32n
Node.js process.memoryUsage 114 B Inconclusive
// node --expose-gc script.js
function testSize(varCount) {
const vars = [];
for (let i = 0; i < varCount; i++) {
vars.push(`v${i}`)
}
return eval(`
(function() {
const count = 100 * 1024;
global.gc(true);
const before = process.memoryUsage().heapUsed;
const arr = [];
for (let i = 0; i < count; i++) {
${vars.map(v => `const ${v} = i;`).join('')}
arr.push(() => ${vars.join('+')});
}
global.gc(true);
const after = process.memoryUsage().heapUsed;
const sizePerClosure = (after - before) / count;
console.log(\` Used \${Math.round(sizePerClosure)} bytes per closure\`)
return sizePerClosure;
})
`)()
}
for (let n = 0; n < 7; n++) {
const varCount = 4 ** n;
console.log(`Considering ${varCount} variables`)
const sizePerClosure = testSize(varCount)
const sizePerVariable = sizePerClosure / varCount;
console.log(` Size per variable ${sizePerVariable}`)
}
/*
The results I got from this are wildly inconsistent, and minor changes to the code seem to change its memory performance in unexpected ways. I have no idea what the optimizer is doing for this.
Considering 1 variables
Used 118 bytes per closure
Size per variable 118.1871875
Considering 4 variables
Used 97 bytes per closure
Size per variable 24.174296875
Considering 16 variables
Used 13 bytes per closure
Size per variable 0.78943359375
Considering 64 variables
Used 633 bytes per closure
Size per variable 9.883857421875
Considering 256 variables
Used 1308 bytes per closure
Size per variable 5.109556884765625
Considering 1024 variables
Used 6150 bytes per closure
Size per variable 6.006165771484375
Considering 4096 variables
Used 24585 bytes per closure
Size per variable 6.002287578582764
*/
// node --expose-gc script.js
const count = 10 * 1024 * 1024;
global.gc(true);
const before = process.memoryUsage().heapUsed;
const arr = [];
for (let i = 0; i < count; i++) {
const i2 = i;
arr.push(() => i2);
}
global.gc(true);
const after = process.memoryUsage().heapUsed;
// On my machine: "Use 114 bytes per closure"
console.log(`Use ${Math.round((after - before) / count)} bytes per closure`)

Closure size measurements in Moddable XS

I've been putting this off, but I think I need to at some point get back to being able to run JavaScript code on XS in Visual Studio, so I can step through different scenarios and see how they work in XS.

My primary purpose for the moment is to measure the size of various constructs. For example, I know that a minimum 1-variable-closure in Microvium takes 12 bytes, and I have a proposal to reduce it to 6 bytes, but I'm wondering how much it takes in XS. To do this, I feel like I've got to have an example closure in XS and actually step through it and see how much space it uses.

Methodology

I duplicated the examples/helloworld example code and compiled it using mcconfig -d -m -p win. Among other things, this produces:

  • mc.xs.c -- a C file with the compiled bytecode embedded as constants

I created a Visual C++ project with a main that starts up the engine and runs the main JS script. I had to one-by-one resolve missing files and project configurations but eventually got it working.

#include "xsPlatform.h"
#include "xs.h"
#include "mc.xs.h"

int main(int argc, char* argv[]) {
	xsMachine* the = fxPrepareMachine(NULL, xsPreparation(), "", NULL, NULL);
	xsBeginHost(the);
	xsAwaitImport("main", 0);
	xsEndHost(the);
	xsDeleteMachine(the);
	return 0;
}

Then in the JS file, I added some test code:

function makeCounter() {
  let x = 0;
  function incCounter() {
    return ++x;
  }
  return incCounter;
}

debugger;
const myCounter = makeCounter();

trace(myCounter() + "\n"); // 1
trace(myCounter() + "\n"); // 2
trace(myCounter() + "\n"); // 3

debugger;

I added a breakpoint at the instruction XS_CODE_DEBUGGER and a debug-watch for the->currentHeapCount variable. I measured the difference between the state of this variable at the two debugger points, and consider that to be the increase in heap size used by the experimental code.

I tried different permutations of this, such as adding more variables or creating 2 counters instead of 1, to make sure I'm not seeing the effects of lazy initialization or something.

Results

Closures

A closure declared using a nested function is 144B + N*32B where N is the number of variables.

A closure declared using a nested => is 112B + N*32B where N is the number of variables.

#include <functional>
#include <vector>
using Counter = std::function<int()>;
const int NUM_VARIABLES = 1;
Counter makeCounter() {
int x[NUM_VARIABLES] = { 0 };
return [=]() mutable {
return ++x[0];
};
}
int main() {
std::vector<Counter> myCounters;
for (int i = 0; i < 1024; i++)
myCounters.push_back(makeCounter());
myCounters.shrink_to_fit();
return 0;
}
/*
I compiled this using Visual C++, x86 (32-bit) target, and I measured the size
using VC++ memory profiler at the beginning and end of `main`. I didn't test in
GCC because I struggled to get memory stats with it, where as Visual Studio
comes with an easy profiler out of the box.
Results:
Overhead = Size - NUM_VARIABLES * sizeof(int)
+---------------+-----------+------------------------------+
| NUM_VARIABLES | Size (kB) | Overhead (bytes per closure) |
+---------------+-----------+------------------------------+
| 1 | 40.12 | 36.12 |
| 4 | 40.12 | 24.12 |
| 5 | 40.12 | 20.12 |
| 8 | 40.12 | 8.12 |
| 9 | 116.12 | 80.12 |
| 10 | 120.12 | 80.12 |
| 100 | 480.12 | 80.12 |
| 1000 | 4080.12 | 80.12 |
+---------------+-----------+------------------------------+
Note that `std::function<int()>` is polymorphic so it does a heap allocation,
but it also has an optimization that if the closure is small enough then it's
inlined directly into the `std::function<int()>`. This is why the first few have
a fixed size even though the variable count is increasing.
In the limit, it looks like there is about 80 bytes of overhead per closure.
I think all this overhead is in `std::function<int()>`, not the lambda itself.
If I do `sizeof` on the lambda inside `makeCounter`, it's exactly the size of
the captured variables, with no overhead. Within that context (within the scope
of makeCounter), the compiler statically knows how to invoke the lambda, and so
no dynamic information is needed to store a function pointer for dynamic
dispatch.
Based on other tests, not shown here, it looks like a large part of this
overhead is heap allocation. If I just do `new int[n]`, this seems to consume
`40 + 4*n` bytes of memory. That seems crazy to me. So I think the first 40
bytes is the size of `std::function<int()>` in the vector, and the second 40
bytes for `NUM_VARIABLES >= 9` is the allocation overhead of the lambda.
I felt it important to use `std::function<int()>` for this experiment because
it's the standard way to decouple the consumer of the closure from the static
type of the closure used by the producer. This makes it a realistic scenario to
compare with closures in an environment like the Microvium JavaScript engine
where the producer and consumer are only coupled by the call signature.
I should also say that the semantics of `std::function<int()>` are different to
what one would expect in JavaScript because it's a copy-by-value type. Since the
closure is mutable, this may result in unexpected behavior if you're comparing
with closures in Microvium. If you wanted a copy-by-reference type then you'd
need to add another layer such as `unique_ptr` which I expect would result in
another 40 bytes of allocation overhead.
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment