shipilev/gist:cab0bdf8c82ddbf7532f

## gistfile1.txt
Benchmark code:
 https://github.com/shipilev/article-exception-benchmarks

Prepare:
 $ mvn clean install

Run:
 $ java -jar target/microbenchmarks.jar ".*ThrowingStackless.*" -f 1 -wi 5 -i 5 -t ${THREADS}

On 2x12x2 Xeon, JDK 8 GA x86_64:

1 thread:
Benchmark                                             Mode   Samples        Score  Score error    Units
n.s.p.e.ThrowingStacklessBench.exception_inline       avgt         5       22.937        0.369    ns/op
n.s.p.e.ThrowingStacklessBench.exception_noInline     avgt         5      156.771        3.405    ns/op
n.s.p.e.ThrowingStacklessBench.plain_inline           avgt         5        0.840        0.004    ns/op
n.s.p.e.ThrowingStacklessBench.plain_noInline         avgt         5       14.013        1.284    ns/op

12 thread: // no more cores per socket
Benchmark                                             Mode   Samples        Score  Score error    Units
n.s.p.e.ThrowingStacklessBench.exception_inline       avgt        60       26.669        0.807    ns/op
n.s.p.e.ThrowingStacklessBench.exception_noInline     avgt        60      175.740       13.113    ns/op
n.s.p.e.ThrowingStacklessBench.plain_inline           avgt        60        0.970        0.136    ns/op
n.s.p.e.ThrowingStacklessBench.plain_noInline         avgt        60       15.057        1.336    ns/op

24 threads: // no more real cores
Benchmark                                             Mode   Samples        Score  Score error    Units
n.s.p.e.ThrowingStacklessBench.exception_inline       avgt       120       44.157        1.256    ns/op
n.s.p.e.ThrowingStacklessBench.exception_noInline     avgt       120      256.479       27.080    ns/op
n.s.p.e.ThrowingStacklessBench.plain_inline           avgt       120        1.362        0.134    ns/op
n.s.p.e.ThrowingStacklessBench.plain_noInline         avgt       120       20.606        1.281    ns/op

48 threads: // no more hardware threads
Benchmark                                             Mode   Samples        Score  Score error    Units
n.s.p.e.ThrowingStacklessBench.exception_inline       avgt       240      101.671        0.867    ns/op
n.s.p.e.ThrowingStacklessBench.exception_noInline     avgt       240      385.515        9.752    ns/op
n.s.p.e.ThrowingStacklessBench.plain_inline           avgt       240        1.746        0.028    ns/op
n.s.p.e.ThrowingStacklessBench.plain_noInline         avgt       240       38.263        0.404    ns/op

NOTE: This is an *average* time across all threads, so the average time staying the same means perfect
scalability.

Results interpretation: for exception_noInline, there is only a 1.64x hit when going 1->24 threads, and
only 2.46x hit when going 1->48 threads (expected because of hyper-threads contending over executors).
For the baseline plain_noInline, which does allocations as well, it is 1.47x hit on 1->24 threads, and
2.72x hit on 1->48 threads, both because of the allocation pressure?
	Benchmark code:
	https://github.com/shipilev/article-exception-benchmarks

	Prepare:
	$ mvn clean install

	Run:
	$ java -jar target/microbenchmarks.jar ".ThrowingStackless." -f 1 -wi 5 -i 5 -t ${THREADS}

	On 2x12x2 Xeon, JDK 8 GA x86_64:

	1 thread:
	Benchmark Mode Samples Score Score error Units
	n.s.p.e.ThrowingStacklessBench.exception_inline avgt 5 22.937 0.369 ns/op
	n.s.p.e.ThrowingStacklessBench.exception_noInline avgt 5 156.771 3.405 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_inline avgt 5 0.840 0.004 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_noInline avgt 5 14.013 1.284 ns/op

	12 thread: // no more cores per socket
	Benchmark Mode Samples Score Score error Units
	n.s.p.e.ThrowingStacklessBench.exception_inline avgt 60 26.669 0.807 ns/op
	n.s.p.e.ThrowingStacklessBench.exception_noInline avgt 60 175.740 13.113 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_inline avgt 60 0.970 0.136 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_noInline avgt 60 15.057 1.336 ns/op

	24 threads: // no more real cores
	Benchmark Mode Samples Score Score error Units
	n.s.p.e.ThrowingStacklessBench.exception_inline avgt 120 44.157 1.256 ns/op
	n.s.p.e.ThrowingStacklessBench.exception_noInline avgt 120 256.479 27.080 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_inline avgt 120 1.362 0.134 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_noInline avgt 120 20.606 1.281 ns/op

	48 threads: // no more hardware threads
	Benchmark Mode Samples Score Score error Units
	n.s.p.e.ThrowingStacklessBench.exception_inline avgt 240 101.671 0.867 ns/op
	n.s.p.e.ThrowingStacklessBench.exception_noInline avgt 240 385.515 9.752 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_inline avgt 240 1.746 0.028 ns/op
	n.s.p.e.ThrowingStacklessBench.plain_noInline avgt 240 38.263 0.404 ns/op

	NOTE: This is an average time across all threads, so the average time staying the same means perfect
	scalability.

	Results interpretation: for exception_noInline, there is only a 1.64x hit when going 1->24 threads, and
	only 2.46x hit when going 1->48 threads (expected because of hyper-threads contending over executors).
	For the baseline plain_noInline, which does allocations as well, it is 1.47x hit on 1->24 threads, and
	2.72x hit on 1->48 threads, both because of the allocation pressure?