ableasdale/v8.md

## v8.md

      
    Raw
  

              v8.md
            
          
    Installing V8 on a Mac

My Troubleshooting Notes

I had issues which I initially believed were related to python 3 but may have been XCode issues - I got it to work by doing all of these things but I'm not sure what was actually necessary - so I'll put all the steps in that I can recall.
Ultimately, I was seeing python stack traces when running tools/dev/v8gen.py x64.optdebug; the resulting stack trace looked like this:
tools/dev/v8gen.py x64.optdebug

Hint: You can raise verbosity (-vv) to see the output of failed commands.

Traceback (most recent call last):
  File "tools/dev/v8gen.py", line 307, in <module>
    sys.exit(gen.main())
  File "tools/dev/v8gen.py", line 301, in main
    return self._options.func()
  File "tools/dev/v8gen.py", line 169, in cmd_gen
    gn_outdir,
  File "tools/dev/v8gen.py", line 211, in _call_cmd
    stderr=subprocess.STDOUT,
  File "/usr/local/Cellar/python@2/2.7.17_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/subprocess.py", line 223, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['/usr/local/opt/python@2/bin/python2.7', '-u', 'tools/mb/mb.py', 'gen', '-f', 'infra/mb/mb_config.pyl', '-m', 'developer_default', '-b', 'x64.optdebug', 'out.gn/x64.optdebug']' returned non-zero exit status 1

So I had to change a few things to get past this:

I have anaconda set up which sets up a .bash_profile for me (and which means I'm using python 3 when I open a bash shell).  I created a separate profile (let's call it .newprofile) that contained the following lines:

export GCLIENT_PY3=0
export PATH=/Users/ableasdale/depot_tools:"$PATH"
export D8_PATH="/Users/ableasdale/js/v8/out.gn/x64.optdebug"
alias d8=/Users/ableasdale/js/v8/out.gn/x64.optdebug/d8
alias tick-processor=/Users/ableasdale/js/v8/tools/mac-tick-processor

From there, running source .newprofile

I also had some issues with XCode (based on what I could tell from the stack trace).

I ran:
npm -g install node-gyp@latest

And I also ran:
sudo xcode-select -s /Applications/Xcode.app/Contents/Developer

After running this, I tested the failing command by running:
python -u tools/mb/mb.py gen -f infra/mb/mb_config.pyl -m developer_default -b x64.release out.gn/x64.release

(This had previously failed, apparently citing an XCode error - now fixed).
After that,  I ran:
tools/dev/v8gen.py x64.optdebug

This now worked, so I could run ninja -C out.gn/x64.optdebug and now my fan is going!
Further reading for debugging / remedial steps:


https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/Omx_d0KBRB8 (on the GCLIENT_PY3 environment variable)
https://stackoverflow.com/questions/17980759/xcode-select-active-developer-directory-error/36349295#36349295 (on the npm command)
https://bugs.chromium.org/p/v8/issues/detail?id=6638 (on the python -u tools/mb/mb.py gen -f infra/mb/mb_config.pyl -m developer_default -b x64.release out.gn/x64.release line)
nodejs/node-gyp#569 (on the xcode select argument)

After everything had compiled, I confirmed d8 worked as expected

$ d8
V8 version 8.4.0 (candidate)
d8>

The goal

I wanted to work through the optimisation scenario in the Google I/O 2012 - Breaking the JavaScript Speed Limit with V8 presentation; the original video can be seen here:
https://www.youtube.com/watch?v=UJPdhx5zTaw
primes.js

This is the original version (prior to optimising) of the code from the presentation:
function Primes() {
    this.prime_count = 0;
    this.primes = new Array(25000);
    this.getPrimeCount = function() { return this.prime_count; }
    this.getPrime = function(i) { return this.primes[i]; }
    this.addPrime = function(i) {
        this.primes[this.prime_count++] = i;
    }
    this.isPrimeDivisible = function(candidate) {
        for (var i = 1; i <=this.prime_count; ++i) {
            if ((candidate % this.primes[i]) == 0) return true;
        }
        return false;
    }
};

function main() {
    p = new Primes();
    var c = 1;
    while (p.getPrimeCount() < 25000) {
        if (!p.isPrimeDivisible(c)) {
            p.addPrime(c);
        }
        c++;
    }
    print(p.getPrime(p.getPrimeCount()-1))
}

main();
To test this, we can run:
$ time d8 primes.js
287107

real	0m2.387s
user	0m2.367s
sys	0m0.019s

Now let's log what gets optimised:
$ d8 --trace-opt primes.js
[marking 0x21c908250865 <JSFunction Primes.isPrimeDivisible (sfi = 0x21c908250681)> for optimized recompilation, reason: small function]
[compiling method 0x21c908250865 <JSFunction Primes.isPrimeDivisible (sfi = 0x21c908250681)> using TurboFan]
[optimizing 0x21c908250865 <JSFunction Primes.isPrimeDivisible (sfi = 0x21c908250681)> - took 1.489, 2.146, 0.086 ms]
[completed optimizing 0x21c908250865 <JSFunction Primes.isPrimeDivisible (sfi = 0x21c908250681)>]
[marking 0x21c908250309 <JSFunction main (sfi = 0x21c9082501e5)> for optimized recompilation, reason: hot and stable]
[compiling method 0x21c908250309 <JSFunction main (sfi = 0x21c9082501e5)> using TurboFan OSR]
[optimizing 0x21c908250309 <JSFunction main (sfi = 0x21c9082501e5)> - took 1.302, 2.791, 0.098 ms]
287107

Those flags in full:
d8 --trace-opt --trace-deopt --trace-bailout primes.js

Note that --trace-bailout appears to be an unknown flag now.  I believe this is because TurboFan now addresses the cases where the compiler would previously have bailed.
Profiling primes.js

d8 primes.js --prof

The output is written to v8.log (it's a large file!) - we can use the tick processor (mac-tick-processor) against this file:
$ tick-processor
Statistical profiling result from v8.log, (1873 ticks, 0 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
      2    0.1%          /usr/lib/system/libsystem_kernel.dylib
      1    0.1%          /usr/lib/system/libsystem_pthread.dylib

 [JavaScript]:
   ticks  total  nonlib   name
   1818   97.1%   97.2%  LazyCompile: *main primes.js:17:14
      5    0.3%    0.3%  LazyCompile: *Primes.isPrimeDivisible primes.js:9:37

Primes2.js

In this example, the out of bounds check (on line 10) is replaced; the code now looks like this:
function Primes() {
    this.prime_count = 0;
    this.primes = new Array(25000);
    this.getPrimeCount = function() { return this.prime_count; }
    this.getPrime = function(i) { return this.primes[i]; }
    this.addPrime = function(i) {
        this.primes[this.prime_count++] = i;
    }
    this.isPrimeDivisible = function(candidate) {
        for (var i = 1; i < this.prime_count; ++i) {
            if ((candidate % this.primes[i]) == 0) return true;
        }
        return false;
    }
};

function main() {
    p = new Primes();
    var c = 1;
    while (p.getPrimeCount() < 25000) {
        if (!p.isPrimeDivisible(c)) {
            p.addPrime(c);
        }
        c++;
    }
    print(p.getPrime(p.getPrimeCount()-1))
}

main();
Running that shows some speedup:
$ time d8 primes2.js 
287107

real	0m1.530s
user	0m1.512s
sys	0m0.017s

And less ticks than before:
$ d8 primes2.js --prof
287107
bash-3.2$ tick-processor
Statistical profiling result from v8.log, (1178 ticks, 2 unaccounted, 0 excluded).

 [Shared libraries]:
   ticks  total  nonlib   name
      2    0.2%          /usr/lib/system/libsystem_kernel.dylib

 [JavaScript]:
   ticks  total  nonlib   name
   1126   95.6%   95.7%  LazyCompile: *main primes2.js:17:14
      2    0.2%    0.2%  LazyCompile: *Primes.isPrimeDivisible primes2.js:9:37

Primes3.js

The final optimisation was to iterate to the square root of the candidate to test whether it could be divided:
function Primes() {
    this.prime_count = 0;
    this.primes = new Array(25000);
    this.getPrimeCount = function() { return this.prime_count; }
    this.getPrime = function(i) { return this.primes[i]; }
    this.addPrime = function(i) {
        this.primes[this.prime_count++] = i;
    }
    this.isPrimeDivisible = function(candidate) {
        for (var i = 1; i < this.prime_count; ++i) {
            var current_prime = this.primes[i];
            if (current_prime * current_prime > candidate) {
                return false;
            }
            if ((candidate % this.primes[i]) == 0) return true;
        }
        return false;
    }
};

function main() {
    p = new Primes();
    var c = 1;
    while (p.getPrimeCount() < 25000) {
        if (!p.isPrimeDivisible(c)) {
            p.addPrime(c);
        }
        c++;
    }
    print(p.getPrime(p.getPrimeCount()-1))
}

main();
And this shows further performance improvement over the second version of the code:
$ time d8 primes3.js 
287107

real	0m0.194s
user	0m0.137s
sys	0m0.020s

Original gist below...

Prerequisites


Install Xcode (Avaliable on the Mac App Store)
Install Xcode Command Line Tools (Preferences > Downloads)
Install depot_tools

git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git
sudo nano ~/.bash_profile
Add export PATH=/path/to/depot_tools:"$PATH" (it's important that depot_tools comes first here)
source ~/.bash_profile
From the directory you want to install V8 into, run gclient


Build V8


fetch v8
cd v8
gclient sync
tools/dev/v8gen.py x64.optdebug
ninja -C out.gn/x64.optdebug (prepare for lots of fan noise)

I'd also recommend adding these to your .bash_profile:

sudo nano ~/.bash_profile
Add alias d8=/path/to/v8/repo/out.gn/x64.optdebug/d8
Add alias tick-processor=/path/to/v8/repo/tools/mac-tick-processor
Add export D8_PATH="/path/to/v8/repo/out.gn/x64.optdebug"
source ~/.bash_profile

d8 shell examples

Print optimization stats

Create test.js with the following code:
function test( obj ) {
  return obj.prop + obj.prop;
}

var a = { prop: 'a' }, i = 0;

while ( i++ < 10000 ) {
  test( a );
}
Run d8 --trace-opt-verbose test.js
You should see that the test function was optimized by V8, along with an explanation of why. "ICs" stands for inline caches -- and are one of the ways that V8 performs optimizations. Generally speaking, the more "ICs with typeinfo" the better.
Now modify test.js to include the following code:
function test( obj ) {
  return obj.prop + obj.prop;
}

var a = { prop: 'a' }, b = { prop: [] }, i = 0;

while ( i++ < 10000 ) {
  test( Math.random() > 0.5 ? a : b );
}
Run d8 --trace-opt-verbose test.js
So, you'll see that this time, the test function was never actually optimized. And the reason for that is because it's being passed objects with different hidden classes. Try changing the value of prop in a to an integer and run it again. You should see that the function was able to be optimized.
Print deoptimization stats

Modify the contents of test.js:
function test( obj ) {
  return obj.prop + obj.prop;
}

var a = { prop: 'a' }, b = { prop: [] }, i = 0;

while ( i++ < 10000 ) {
  test( i !== 8000 ? a : b );
}
Run d8 --trace-opt --trace-deopt test.js
You should see that the optimized code for the test function was thrown out. What happened here was that V8 kept seeing test being passed an object that looked like {prop: <String>}. But on the 8000th round of the while loop, we gave it something different. So V8 had to throw away the optimized code, because its initial assumptions were wrong.
Profiling

Modify test.js:
function factorial( n ) {
  return n === 1 ? n : n * factorial( --n );
}

var i = 0;

while ( i++ < 1e7 ) {
  factorial( 10 );
}
Run time d8 --prof test.js (Generates v8.log)
Run tick-processor (Reads v8.log and cats the parsed output)
This'll show you where the program was spending most of its time, by function. Most of it should be under LazyCompile: *factorial test.js:1:19. The asterisk before the function name means that it was optimized.
Make a note of the execution time that was logged to the terminal. Now try modifying the code to this dumb, contrived example:
function factorial( n ) {
  return equal( n, 1 ) ? n : multiply( n, factorial( --n ) );
}

function multiply( x, y ) {
  return x * y;
}

function equal( a, b ) {
  return a === b;
}

var i = 0;

while ( i++ < 1e7 ) {
  factorial( 10 );
}
Run time d8 --prof test.js
Run tick-processor
Roughly the same execution time as the last function, which seems like it should have been faster. You'll also notice that the multiply and equal functions are nowhere on the list. Weird, right?
Run d8 --trace-inlining test.js
Okay. So, we can see that the optimizing compiler was smart here and completely eliminated the overhead of calling both of those functions by inlining them into the optimized code for factorial.
The optimized code for both versions ends up being basically identical (which you can check, if you know how to read assembly, by running d8 --print-opt-code test.js).
Tracing Garbage Collection

Modify test.js
function strToArray( str ) {
  var i = 0,
    len = str.length,
    arr = new Uint16Array( str.length );
  for ( ; i < len; ++i ) {
    arr[ i ] = str.charCodeAt( i );
  }
  return arr;
}

var i = 0, str = 'V8 is the collest';

while ( i++ < 1e5 ) {
  strToArray( str );
}
Run d8 --trace-gc test.js
You'll see a bunch of Scavenge... [allocation failure].
Basically, V8's GC heap has different "spaces". Most objects are allocated in the "new space". It's super cheap to allocate here, but it's also pretty small (usually somewhere between 1 and 8 MB). Once that space gets filled up, the GC does a "scavenge".
Scavenging is the fast part of V8 garbage collection. Usually somewhere between 1 and 5ms from what I've seen -- so it might not necessarily cause a noticeable GC pause.
Scavenges can only be kicked off by allocations. If the "new space" never gets filled up, the GC never needs to reclaim space by scavenging.
Modify test.js:
function strToArray( str, bufferView ) {
  var i = 0,
    len = str.length;
  for ( ; i < len; ++i ) {
    bufferView[ i ] = str.charCodeAt( i );
  }
  return bufferView;
}

var i = 0,
  str = 'V8 is the coolest',
  buffer = new ArrayBuffer( str.length * 2 ),
  bufferView = new Uint16Array( buffer );

while ( i++ < 1e5 ) {
  strToArray( str, bufferView );
}
Here, we use a preallocated ArrayBuffer and an associated ArrayBufferView (in this case a Uint16Array) in order to avoid reallocating a new object every time we run strToArray(). The result is that we're hardly allocating anything.
Run d8 --trace-gc test.js
Nothing. We never filled up the "new space", so we never had to scavenge.
One more thing to try in test.js:
function strToArray( str ) {
  var i = 0,
    len = str.length,
    arr = new Uint16Array( str.length );
  for ( ; i < len; ++i ) {
    arr[ i ] = str.charCodeAt( i );
  }
  return arr;
}

var i = 0, str = 'V8 is the coolest', arr = [];

while ( i++ < 1e6 ) {
  strToArray( str );
  if ( i % 100000 === 0 ) {
    // save a long-term reference to a random, huge object
    arr.push( new Uint16Array( 100000000 ) );
    // release references about 5% of the time
    Math.random() > 0.95 && ( arr.length = 0 );
  }
}
Run d8 --trace-gc test.js
Lots of scavenges, which is expected since we're no longer using a preallocated buffer. But there should also be a bunch of Mark-sweep lines.
Mark-sweep is the "full" GC. It gets run when the "old space" heap reaches a certain size, and it tends to take a lot longer than a scavenge. If you look at the logs, you'll probably see Scavenge at around ~1.5ms and Mark-sweep closer to 25 or 30ms.
Since the frame budget in a web app is about 16ms, you're pretty much guaranteed to drop at least 1 frame every time Mark-sweep runs.
Random stuff

d8 --help logs all available d8 flags

There's a ton there, but you can usually find what you're looking for with something like d8 --help | grep memory or whatever.
d8 --allow-natives-syntax file.js

This actually lets you call V8 internal methods from within your JS file, like this:
function factorial( n ) {
  return n === 1 ? n : factorial( --n );
}

var i = 0;

while ( i++ < 1e8 ) {
  factorial( 10 );
  // run a full Mark-sweep pass every 10MM iterations
  i % 1e7 === 0 && %CollectGarbage( null );
}
...and run d8 --allow-natives-syntax --trace-gc test.js
Native functions are prefixed with the % symbol. A (somewhat incomplete) list of native functions are listed here.
Logging

d8 doesn't have a console object (or a window object, for that matter). But you can log to the terminal using print().
Comparing Hidden Classes

This is probably my favorite one. I actually just found it.
So in V8, there's this concept of "hidden classes" (Good explanation a couple paragraphs in). You should read that article – but basically, hidden classes are how V8 (SpiderMonkey and JavaScript Core use similar techniques, too) determine whether or not two objects have the same "shape".
All things considered, you always want to pass objects of the same hidden class as arguments to functions.
Anyway, you can actually compare the hidden classes of two objects:
function Class( val ) {
  this.prop = val;
}

var a = new Class('foo');
var b = new Class('bar');

print( %HaveSameMap( a, b ) );

b.prop2 = 'baz';

print( %HaveSameMap( a, b ) );

Run d8 --allow-natives-syntax test.js
You should see true, then false. By adding b.prop2 = 'baz', we modified its structure and created a new hidden class.
Node.js

A lot of these flags (but not all of them) work with Node, too. --trace-opt, --prof, --allow-natives-syntax are all supported.
That can be helpful if you want to test something that relies on another library, since you can use Node's require().
A list of supported V8 flags can be accessed with node --v8-options.
Links

Performance Tips for JavaScript in V8 (Good basic intro to Hidden Classes)
Use forensics and detective work to solve JavaScript performance mysteries
Breaking the JavaScript Speed Limit with V8
V8 - A Tale of Two Compilers (Good explanation of Inline Caches)

Anyway, this is all still pretty new to me, and there's a lot I haven't figured out yet. But the stuff I've found so far is pretty cool, so I wanted to write something up and share it.
Oh, and I'm sure there's stuff in here that I'm wrong about, because I'm honestly a little out of my depth here. Feedback is appreciated.