coder-mike/_ Closure Size Meaurements.md

## _ Closure Size Meaurements.md

      
    Raw
  

              _ Closure Size Meaurements.md
            
          
    Closure size measurements


Open for details.

  
## readme.md

      
    Raw
  

              readme.md
            
          
    Closure size measurements

Summary

The sizes in the following table include the cost of the reference to the closure. These are all measured on my 64-bit Windows desktop machine. The "Min size" is measures a closure with 1 variable, which is assumed to be a small integer.


Env
Methodology
Min size
Asymptotic size for n integer vars


Microvium
Counting based on theory
8 B
6 + 2n


C++
VC++ Profiler
40 B
80 + 4n


Moddable XS
Counting in a debugger
160 B
128 + 32n


Node.js
process.memoryUsage
114 B
Inconclusive


## asymptotic-closure-size-in-node.js
// node --expose-gc script.js

function testSize(varCount) {
  const vars = [];
  for (let i = 0; i < varCount; i++) {
    vars.push(`v${i}`)
  }
  return eval(`
    (function() {
      const count = 100 * 1024;
      global.gc(true);
      const before = process.memoryUsage().heapUsed;
      const arr = [];
      for (let i = 0; i < count; i++) {
        ${vars.map(v => `const ${v} = i;`).join('')}
        arr.push(() => ${vars.join('+')});
      }
      global.gc(true);
      const after = process.memoryUsage().heapUsed;

      const sizePerClosure = (after - before) / count;
      console.log(\`  Used \${Math.round(sizePerClosure)} bytes per closure\`)
      return sizePerClosure;
    })
  `)()
}

for (let n = 0; n < 7; n++) {
  const varCount = 4 ** n;
  console.log(`Considering ${varCount} variables`)
  const sizePerClosure = testSize(varCount)
  const sizePerVariable = sizePerClosure / varCount;
  console.log(`  Size per variable ${sizePerVariable}`)
}


/*
The results I got from this are wildly inconsistent, and minor changes to the code seem to change its memory performance in unexpected ways. I have no idea what the optimizer is doing for this.

Considering 1 variables
  Used 118 bytes per closure
  Size per variable 118.1871875
Considering 4 variables
  Used 97 bytes per closure
  Size per variable 24.174296875
Considering 16 variables
  Used 13 bytes per closure
  Size per variable 0.78943359375
Considering 64 variables
  Used 633 bytes per closure
  Size per variable 9.883857421875
Considering 256 variables
  Used 1308 bytes per closure
  Size per variable 5.109556884765625
Considering 1024 variables
  Used 6150 bytes per closure
  Size per variable 6.006165771484375
Considering 4096 variables
  Used 24585 bytes per closure
  Size per variable 6.002287578582764
*/

## closure-size-in-node.js
// node --expose-gc script.js

const count = 10 * 1024 * 1024;
global.gc(true);
const before = process.memoryUsage().heapUsed;
const arr = [];
for (let i = 0; i < count; i++) {
  const i2 = i;
  arr.push(() => i2);
}
global.gc(true);
const after = process.memoryUsage().heapUsed;

// On my machine: "Use 114 bytes per closure"
console.log(`Use ${Math.round((after - before) / count)} bytes per closure`)

## closure-size-measurements-in-moddable-xs.md

      
    Raw
  

              closure-size-measurements-in-moddable-xs.md
            
          
    Closure size measurements in Moddable XS

I've been putting this off, but I think I need to at some point get back to being able to run JavaScript code on XS in Visual Studio, so I can step through different scenarios and see how they work in XS.
My primary purpose for the moment is to measure the size of various constructs. For example, I know that a minimum 1-variable-closure in Microvium takes 12 bytes, and I have a proposal to reduce it to 6 bytes, but I'm wondering how much it takes in XS. To do this, I feel like I've got to have an example closure in XS and actually step through it and see how much space it uses.
Methodology

I duplicated the examples/helloworld example code and compiled it using mcconfig -d -m -p win. Among other things, this produces:

mc.xs.c -- a C file with the compiled bytecode embedded as constants

I created a Visual C++ project with a main that starts up the engine and runs the main JS script. I had to one-by-one resolve missing files and project configurations but eventually got it working.
#include "xsPlatform.h"
#include "xs.h"
#include "mc.xs.h"

int main(int argc, char* argv[]) {
	xsMachine* the = fxPrepareMachine(NULL, xsPreparation(), "", NULL, NULL);
	xsBeginHost(the);
	xsAwaitImport("main", 0);
	xsEndHost(the);
	xsDeleteMachine(the);
	return 0;
}
Then in the JS file, I added some test code:
function makeCounter() {
  let x = 0;
  function incCounter() {
    return ++x;
  }
  return incCounter;
}

debugger;
const myCounter = makeCounter();

trace(myCounter() + "\n"); // 1
trace(myCounter() + "\n"); // 2
trace(myCounter() + "\n"); // 3

debugger;
I added a breakpoint at the instruction XS_CODE_DEBUGGER and a debug-watch for the->currentHeapCount variable. I measured the difference between the state of this variable at the two debugger points, and consider that to be the increase in heap size used by the experimental code.
I tried different permutations of this, such as adding more variables or creating 2 counters instead of 1, to make sure I'm not seeing the effects of lazy initialization or something.
Results

Closures

A closure declared using a nested function is 144B + N*32B where N is the number of variables.
A closure declared using a nested => is 112B + N*32B where N is the number of variables.

  
## main.cpp
#include <functional>
#include <vector>

using Counter = std::function<int()>;

const int NUM_VARIABLES = 1;

Counter makeCounter() {
  int x[NUM_VARIABLES] = { 0 };

  return [=]() mutable {
    return ++x[0];
  };
}

int main() {

  std::vector<Counter> myCounters;
  for (int i = 0; i < 1024; i++)
    myCounters.push_back(makeCounter());
  myCounters.shrink_to_fit();

  return 0;
}

/*

I compiled this using Visual C++, x86 (32-bit) target, and I measured the size
using VC++ memory profiler at the beginning and end of `main`. I didn't test in
GCC because I struggled to get memory stats with it, where as Visual Studio
comes with an easy profiler out of the box.

Results:

    Overhead = Size - NUM_VARIABLES * sizeof(int)

    +---------------+-----------+------------------------------+
    | NUM_VARIABLES | Size (kB) | Overhead (bytes per closure) |
    +---------------+-----------+------------------------------+
    |             1 |     40.12 |                        36.12 |
    |             4 |     40.12 |                        24.12 |
    |             5 |     40.12 |                        20.12 |
    |             8 |     40.12 |                         8.12 |
    |             9 |    116.12 |                        80.12 |
    |            10 |    120.12 |                        80.12 |
    |           100 |    480.12 |                        80.12 |
    |          1000 |   4080.12 |                        80.12 |
    +---------------+-----------+------------------------------+


Note that `std::function<int()>` is polymorphic so it does a heap allocation,
but it also has an optimization that if the closure is small enough then it's
inlined directly into the `std::function<int()>`. This is why the first few have
a fixed size even though the variable count is increasing.

In the limit, it looks like there is about 80 bytes of overhead per closure.

I think all this overhead is in `std::function<int()>`, not the lambda itself.
If I do `sizeof` on the lambda inside `makeCounter`, it's exactly the size of
the captured variables, with no overhead. Within that context (within the scope
of makeCounter), the compiler statically knows how to invoke the lambda, and so
no dynamic information is needed to store a function pointer for dynamic
dispatch.

Based on other tests, not shown here, it looks like a large part of this
overhead is heap allocation. If I just do `new int[n]`, this seems to consume
`40 + 4*n` bytes of memory. That seems crazy to me. So I think the first 40
bytes is the size of `std::function<int()>` in the vector, and the second 40
bytes for `NUM_VARIABLES >= 9` is the allocation overhead of the lambda.

I felt it important to use `std::function<int()>` for this experiment because
it's the standard way to decouple the consumer of the closure from the static
type of the closure used by the producer. This makes it a realistic scenario to
compare with closures in an environment like the Microvium JavaScript engine
where the producer and consumer are only coupled by the call signature.

I should also say that the semantics of `std::function<int()>` are different to
what one would expect in JavaScript because it's a copy-by-value type. Since the
closure is mutable, this may result in unexpected behavior if you're comparing
with closures in Microvium. If you wanted a copy-by-reference type then you'd
need to add another layer such as `unique_ptr` which I expect would result in
another 40 bytes of allocation overhead.

*/
Env	Methodology	Min size	Asymptotic size for `n` integer vars
Microvium	Counting based on theory	8 B	6 + 2n
C++	VC++ Profiler	40 B	80 + 4n
Moddable XS	Counting in a debugger	160 B	128 + 32n
Node.js	process.memoryUsage	114 B	Inconclusive
	// node --expose-gc script.js

	function testSize(varCount) {
	const vars = [];
	for (let i = 0; i < varCount; i++) {
	vars.push(`v${i}`)
	}
	return eval(`
	(function() {
	const count = 100 * 1024;
	global.gc(true);
	const before = process.memoryUsage().heapUsed;
	const arr = [];
	for (let i = 0; i < count; i++) {
	${vars.map(v => `const ${v} = i;`).join('')}
	arr.push(() => ${vars.join('+')});
	}
	global.gc(true);
	const after = process.memoryUsage().heapUsed;

	const sizePerClosure = (after - before) / count;
	console.log(\` Used \${Math.round(sizePerClosure)} bytes per closure\`)
	return sizePerClosure;
	})
	`)()
	}

	for (let n = 0; n < 7; n++) {
	const varCount = 4 ** n;
	console.log(`Considering ${varCount} variables`)
	const sizePerClosure = testSize(varCount)
	const sizePerVariable = sizePerClosure / varCount;
	console.log(` Size per variable ${sizePerVariable}`)
	}


	/*
	The results I got from this are wildly inconsistent, and minor changes to the code seem to change its memory performance in unexpected ways. I have no idea what the optimizer is doing for this.

	Considering 1 variables
	Used 118 bytes per closure
	Size per variable 118.1871875
	Considering 4 variables
	Used 97 bytes per closure
	Size per variable 24.174296875
	Considering 16 variables
	Used 13 bytes per closure
	Size per variable 0.78943359375
	Considering 64 variables
	Used 633 bytes per closure
	Size per variable 9.883857421875
	Considering 256 variables
	Used 1308 bytes per closure
	Size per variable 5.109556884765625
	Considering 1024 variables
	Used 6150 bytes per closure
	Size per variable 6.006165771484375
	Considering 4096 variables
	Used 24585 bytes per closure
	Size per variable 6.002287578582764
	*/
	// node --expose-gc script.js

	const count = 10 * 1024 * 1024;
	global.gc(true);
	const before = process.memoryUsage().heapUsed;
	const arr = [];
	for (let i = 0; i < count; i++) {
	const i2 = i;
	arr.push(() => i2);
	}
	global.gc(true);
	const after = process.memoryUsage().heapUsed;

	// On my machine: "Use 114 bytes per closure"
	console.log(`Use ${Math.round((after - before) / count)} bytes per closure`)
	#include <functional>
	#include <vector>

	using Counter = std::function<int()>;

	const int NUM_VARIABLES = 1;

	Counter makeCounter() {
	int x[NUM_VARIABLES] = { 0 };

	return [=]() mutable {
	return ++x[0];
	};
	}

	int main() {

	std::vector<Counter> myCounters;
	for (int i = 0; i < 1024; i++)
	myCounters.push_back(makeCounter());
	myCounters.shrink_to_fit();

	return 0;
	}

	/*

	I compiled this using Visual C++, x86 (32-bit) target, and I measured the size
	using VC++ memory profiler at the beginning and end of `main`. I didn't test in
	GCC because I struggled to get memory stats with it, where as Visual Studio
	comes with an easy profiler out of the box.

	Results:

	Overhead = Size - NUM_VARIABLES * sizeof(int)

	+---------------+-----------+------------------------------+
	\| NUM_VARIABLES \| Size (kB) \| Overhead (bytes per closure) \|
	+---------------+-----------+------------------------------+
	\| 1 \| 40.12 \| 36.12 \|
	\| 4 \| 40.12 \| 24.12 \|
	\| 5 \| 40.12 \| 20.12 \|
	\| 8 \| 40.12 \| 8.12 \|
	\| 9 \| 116.12 \| 80.12 \|
	\| 10 \| 120.12 \| 80.12 \|
	\| 100 \| 480.12 \| 80.12 \|
	\| 1000 \| 4080.12 \| 80.12 \|
	+---------------+-----------+------------------------------+


	Note that `std::function<int()>` is polymorphic so it does a heap allocation,
	but it also has an optimization that if the closure is small enough then it's
	inlined directly into the `std::function<int()>`. This is why the first few have
	a fixed size even though the variable count is increasing.

	In the limit, it looks like there is about 80 bytes of overhead per closure.

	I think all this overhead is in `std::function<int()>`, not the lambda itself.
	If I do `sizeof` on the lambda inside `makeCounter`, it's exactly the size of
	the captured variables, with no overhead. Within that context (within the scope
	of makeCounter), the compiler statically knows how to invoke the lambda, and so
	no dynamic information is needed to store a function pointer for dynamic
	dispatch.

	Based on other tests, not shown here, it looks like a large part of this
	overhead is heap allocation. If I just do `new int[n]`, this seems to consume
	`40 + 4*n` bytes of memory. That seems crazy to me. So I think the first 40
	bytes is the size of `std::function<int()>` in the vector, and the second 40
	bytes for `NUM_VARIABLES >= 9` is the allocation overhead of the lambda.

	I felt it important to use `std::function<int()>` for this experiment because
	it's the standard way to decouple the consumer of the closure from the static
	type of the closure used by the producer. This makes it a realistic scenario to
	compare with closures in an environment like the Microvium JavaScript engine
	where the producer and consumer are only coupled by the call signature.

	I should also say that the semantics of `std::function<int()>` are different to
	what one would expect in JavaScript because it's a copy-by-value type. Since the
	closure is mutable, this may result in unexpected behavior if you're comparing
	with closures in Microvium. If you wanted a copy-by-reference type then you'd
	need to add another layer such as `unique_ptr` which I expect would result in
	another 40 bytes of allocation overhead.

	*/