Open for details.
-
-
Save coder-mike/26fdade2a5166b3858523b0e2727852c to your computer and use it in GitHub Desktop.
The sizes in the following table include the cost of the reference to the closure. These are all measured on my 64-bit Windows desktop machine. The "Min size" is measures a closure with 1 variable, which is assumed to be a small integer.
Env | Methodology | Min size | Asymptotic size for n integer vars |
---|---|---|---|
Microvium | Counting based on theory | 8 B | 6 + 2n |
C++ | VC++ Profiler | 40 B | 80 + 4n |
Moddable XS | Counting in a debugger | 160 B | 128 + 32n |
Node.js | process.memoryUsage | 114 B | Inconclusive |
// node --expose-gc script.js | |
function testSize(varCount) { | |
const vars = []; | |
for (let i = 0; i < varCount; i++) { | |
vars.push(`v${i}`) | |
} | |
return eval(` | |
(function() { | |
const count = 100 * 1024; | |
global.gc(true); | |
const before = process.memoryUsage().heapUsed; | |
const arr = []; | |
for (let i = 0; i < count; i++) { | |
${vars.map(v => `const ${v} = i;`).join('')} | |
arr.push(() => ${vars.join('+')}); | |
} | |
global.gc(true); | |
const after = process.memoryUsage().heapUsed; | |
const sizePerClosure = (after - before) / count; | |
console.log(\` Used \${Math.round(sizePerClosure)} bytes per closure\`) | |
return sizePerClosure; | |
}) | |
`)() | |
} | |
for (let n = 0; n < 7; n++) { | |
const varCount = 4 ** n; | |
console.log(`Considering ${varCount} variables`) | |
const sizePerClosure = testSize(varCount) | |
const sizePerVariable = sizePerClosure / varCount; | |
console.log(` Size per variable ${sizePerVariable}`) | |
} | |
/* | |
The results I got from this are wildly inconsistent, and minor changes to the code seem to change its memory performance in unexpected ways. I have no idea what the optimizer is doing for this. | |
Considering 1 variables | |
Used 118 bytes per closure | |
Size per variable 118.1871875 | |
Considering 4 variables | |
Used 97 bytes per closure | |
Size per variable 24.174296875 | |
Considering 16 variables | |
Used 13 bytes per closure | |
Size per variable 0.78943359375 | |
Considering 64 variables | |
Used 633 bytes per closure | |
Size per variable 9.883857421875 | |
Considering 256 variables | |
Used 1308 bytes per closure | |
Size per variable 5.109556884765625 | |
Considering 1024 variables | |
Used 6150 bytes per closure | |
Size per variable 6.006165771484375 | |
Considering 4096 variables | |
Used 24585 bytes per closure | |
Size per variable 6.002287578582764 | |
*/ |
// node --expose-gc script.js | |
const count = 10 * 1024 * 1024; | |
global.gc(true); | |
const before = process.memoryUsage().heapUsed; | |
const arr = []; | |
for (let i = 0; i < count; i++) { | |
const i2 = i; | |
arr.push(() => i2); | |
} | |
global.gc(true); | |
const after = process.memoryUsage().heapUsed; | |
// On my machine: "Use 114 bytes per closure" | |
console.log(`Use ${Math.round((after - before) / count)} bytes per closure`) |
I've been putting this off, but I think I need to at some point get back to being able to run JavaScript code on XS in Visual Studio, so I can step through different scenarios and see how they work in XS.
My primary purpose for the moment is to measure the size of various constructs. For example, I know that a minimum 1-variable-closure in Microvium takes 12 bytes, and I have a proposal to reduce it to 6 bytes, but I'm wondering how much it takes in XS. To do this, I feel like I've got to have an example closure in XS and actually step through it and see how much space it uses.
I duplicated the examples/helloworld
example code and compiled it using mcconfig -d -m -p win
. Among other things, this produces:
mc.xs.c
-- a C file with the compiled bytecode embedded as constants
I created a Visual C++ project with a main
that starts up the engine and runs the main
JS script. I had to one-by-one resolve missing files and project configurations but eventually got it working.
#include "xsPlatform.h"
#include "xs.h"
#include "mc.xs.h"
int main(int argc, char* argv[]) {
xsMachine* the = fxPrepareMachine(NULL, xsPreparation(), "", NULL, NULL);
xsBeginHost(the);
xsAwaitImport("main", 0);
xsEndHost(the);
xsDeleteMachine(the);
return 0;
}
Then in the JS file, I added some test code:
function makeCounter() {
let x = 0;
function incCounter() {
return ++x;
}
return incCounter;
}
debugger;
const myCounter = makeCounter();
trace(myCounter() + "\n"); // 1
trace(myCounter() + "\n"); // 2
trace(myCounter() + "\n"); // 3
debugger;
I added a breakpoint at the instruction XS_CODE_DEBUGGER
and a debug-watch for the->currentHeapCount
variable. I measured the difference between the state of this variable at the two debugger points, and consider that to be the increase in heap size used by the experimental code.
I tried different permutations of this, such as adding more variables or creating 2 counters instead of 1, to make sure I'm not seeing the effects of lazy initialization or something.
A closure declared using a nested function
is 144B + N*32B
where N is the number of variables.
A closure declared using a nested =>
is 112B + N*32B
where N is the number of variables.
#include <functional> | |
#include <vector> | |
using Counter = std::function<int()>; | |
const int NUM_VARIABLES = 1; | |
Counter makeCounter() { | |
int x[NUM_VARIABLES] = { 0 }; | |
return [=]() mutable { | |
return ++x[0]; | |
}; | |
} | |
int main() { | |
std::vector<Counter> myCounters; | |
for (int i = 0; i < 1024; i++) | |
myCounters.push_back(makeCounter()); | |
myCounters.shrink_to_fit(); | |
return 0; | |
} | |
/* | |
I compiled this using Visual C++, x86 (32-bit) target, and I measured the size | |
using VC++ memory profiler at the beginning and end of `main`. I didn't test in | |
GCC because I struggled to get memory stats with it, where as Visual Studio | |
comes with an easy profiler out of the box. | |
Results: | |
Overhead = Size - NUM_VARIABLES * sizeof(int) | |
+---------------+-----------+------------------------------+ | |
| NUM_VARIABLES | Size (kB) | Overhead (bytes per closure) | | |
+---------------+-----------+------------------------------+ | |
| 1 | 40.12 | 36.12 | | |
| 4 | 40.12 | 24.12 | | |
| 5 | 40.12 | 20.12 | | |
| 8 | 40.12 | 8.12 | | |
| 9 | 116.12 | 80.12 | | |
| 10 | 120.12 | 80.12 | | |
| 100 | 480.12 | 80.12 | | |
| 1000 | 4080.12 | 80.12 | | |
+---------------+-----------+------------------------------+ | |
Note that `std::function<int()>` is polymorphic so it does a heap allocation, | |
but it also has an optimization that if the closure is small enough then it's | |
inlined directly into the `std::function<int()>`. This is why the first few have | |
a fixed size even though the variable count is increasing. | |
In the limit, it looks like there is about 80 bytes of overhead per closure. | |
I think all this overhead is in `std::function<int()>`, not the lambda itself. | |
If I do `sizeof` on the lambda inside `makeCounter`, it's exactly the size of | |
the captured variables, with no overhead. Within that context (within the scope | |
of makeCounter), the compiler statically knows how to invoke the lambda, and so | |
no dynamic information is needed to store a function pointer for dynamic | |
dispatch. | |
Based on other tests, not shown here, it looks like a large part of this | |
overhead is heap allocation. If I just do `new int[n]`, this seems to consume | |
`40 + 4*n` bytes of memory. That seems crazy to me. So I think the first 40 | |
bytes is the size of `std::function<int()>` in the vector, and the second 40 | |
bytes for `NUM_VARIABLES >= 9` is the allocation overhead of the lambda. | |
I felt it important to use `std::function<int()>` for this experiment because | |
it's the standard way to decouple the consumer of the closure from the static | |
type of the closure used by the producer. This makes it a realistic scenario to | |
compare with closures in an environment like the Microvium JavaScript engine | |
where the producer and consumer are only coupled by the call signature. | |
I should also say that the semantics of `std::function<int()>` are different to | |
what one would expect in JavaScript because it's a copy-by-value type. Since the | |
closure is mutable, this may result in unexpected behavior if you're comparing | |
with closures in Microvium. If you wanted a copy-by-reference type then you'd | |
need to add another layer such as `unique_ptr` which I expect would result in | |
another 40 bytes of allocation overhead. | |
*/ |