skids/topic_optimize

## topic_optimize

So there I was asking myself why we have so much nqp code cropping up in
the rakudo core... what prevents rakudo from producing as efficient opcodes
from Perl6 source for even basic things like ifs and whiles?  Just how
much performance are we gaining, anyway?  Let's take the case of while
versus nqp::while:.  Typical results for two comparable busy loops on
my machine are:


```
$ time perl6 -e 'use nqp; my $i = 1; nqp::while(nqp::islt_i($i,100000000), $i := nqp::add_i($i,1)); exit(0);'

real    0m5.729s
user    0m5.748s
sys     0m0.024s
```

```
$ time perl6 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)'

real    0m5.951s
user    0m5.972s
sys     0m0.024s
```

...somehow the Perl6 code picks up 2% to 4% overhead costs.

I decided to take a look at the AST.  You do this by running rakudo with the --target flag.
In this case, I want to see the AST after the Perl6 high level optimizer
has run, so the flag is --target=optimize

Removing the parts that are identical the Perl6 while loop body has
this extra stuff around incrementing $i.

```
                - QAST::Stmts(:resultchild(1)))
                  - QAST::Op(bind)
                    - QAST::Var(local pres_topic__1)
                    - QAST::Var(lexical $_)
         ...
                    - QAST::WVal(Bool)
                  - QAST::Op(bind)
                    - QAST::Var(lexical $_)
                    - QAST::Var(local pres_topic__1)

```

Reading the code in src/Perl6/Optimizer.nqp it is pretty easy to figure out
that the bind operations are the result of inlining the body of the while loop,
but since the optimizer cannot prove that the code in the loop body will not
alter $_, it saves and restores $_ before and after the inline.

On every loop iteration.

So just to see if removing those bind calls sped up the code, I added a little
code in Perl6::Optimizer.visit_op, which will be called on every QAST::Op node
and so, on while loops.

```
--- a/src/Perl6/Optimizer.nqp
+++ b/src/Perl6/Optimizer.nqp
@@ -1253,6 +1253,16 @@ class Perl6::Optimizer {
         # Visit the children.
         self.visit_op_children($op);

+        if $!level >= 90 {
+           if $optype eq 'while' {
+                my $qast := QAST::Stmts.new: $op[1][0], $op, $op[1][2];
+               $op[1].shift;
+               $op[1].pop;
+               $op[1].resultchild(0);
+               return $qast;
+            }
+        }
+

```

...Let's look at that code for a bit.  First, you would not want to necessarily
do this optimization to every while loop that got inlined, because some loop
bodies or conditionals or phasers or whatnot may actually alter or use $_ in some
tricky way.  So we prevent this from happening when we build rakudo (and during
normal usage) by only performing the optimization when the --optimize flag has
been given a silly value.

This trick makes it realy easy to test out optimizer changes.

Since this is just a test, and we already know what our AST is going to look like,
we just shuffle things around to move the bind operations outside the while loop,
so they only run once.  We leave the QAST::Stmts node there even though we could
replace it with a QAST::Stmt.  Since we have moved the loop body from position
1 to position 0 inside the QAST::Stmts, we have to adjust the resultschild attribute
to tell the compiler to use the 0th element of the QAST::Stmts as the result
which that QAST node is producing.

Now let's see if that sped things up.  Typical results were:

```
$ time perl6 --optimize=99 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)'

real    0m5.732s
user    0m5.752s
sys     0m0.028s

```

...there is still a tiny hair of a performance loss, but most of it is gone.  So, if an
optimization can be made to figure out when it is safe to make this transformation,
we could speed up tight loops in general for a rakudo performance improvement when
running Perl6 code, and probably also stop using nqp::while in several places and enjoy
prettier syntax in the core code.

If those bind operations even really belong there in the first place...
I'm a bit behind the curve when it comes to fully unerstanding Perl6 scoping, or the
finer points of scoping in general for that matter, so I'm a ways off from fully
understanding all the things that could make such optimizations unsafe... it's something
I'm going to have to poke around to try to figure out.

During this endeavor I had at one point put some test nqp::says in delete_unused_magicals
in the Optimizer.  During the rakudo build, I don't think I saw even one instance
of this block actually deleting anything (things do scroll go by pretty fast though)
This is probably because that code will not remove unused $_ from a block if it
contains any calls at all... and almost any Perl6 operator involves a call (at least,
at this stage of optimization it does.)  The only way to fix that would be if the called
function/method (or list of possible methods) could be determined and if they bore an
annotation that they were not going to do something like CALLERS::<$_> = 42, when that
could be determined.  Such an annotation would have to be embedded in the bytecode of
the module from which the function is being included...

However, determining whether a loop body needs to ensconse $_ on each iteration, rather
than around the loop as a whole might be easier, given that conditionals are often
pretty simple code fragments.

Anyway, what I'd like people to take from this is that, while optimizing the core
by using inline nqp has managed to gain rakudo an amazing amount of performance over
the last couple years, there's another way to optimize... the Optimizer... and it
turns out it is pretty easy to tinker with.

	So there I was asking myself why we have so much nqp code cropping up in
	the rakudo core... what prevents rakudo from producing as efficient opcodes
	from Perl6 source for even basic things like ifs and whiles? Just how
	much performance are we gaining, anyway? Let's take the case of while
	versus nqp::while:. Typical results for two comparable busy loops on
	my machine are:


	```
	$ time perl6 -e 'use nqp; my $i = 1; nqp::while(nqp::islt_i($i,100000000), $i := nqp::add_i($i,1)); exit(0);'

	real 0m5.729s
	user 0m5.748s
	sys 0m0.024s
	```

	```
	$ time perl6 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)'

	real 0m5.951s
	user 0m5.972s
	sys 0m0.024s
	```

	...somehow the Perl6 code picks up 2% to 4% overhead costs.

	I decided to take a look at the AST. You do this by running rakudo with the --target flag.
	In this case, I want to see the AST after the Perl6 high level optimizer
	has run, so the flag is --target=optimize

	Removing the parts that are identical the Perl6 while loop body has
	this extra stuff around incrementing $i.

	```
	- QAST::Stmts(:resultchild(1)))
	- QAST::Op(bind)
	- QAST::Var(local pres_topic__1)
	- QAST::Var(lexical $_)
	...
	- QAST::WVal(Bool)
	- QAST::Op(bind)
	- QAST::Var(lexical $_)
	- QAST::Var(local pres_topic__1)

	```

	Reading the code in src/Perl6/Optimizer.nqp it is pretty easy to figure out
	that the bind operations are the result of inlining the body of the while loop,
	but since the optimizer cannot prove that the code in the loop body will not
	alter $_, it saves and restores $_ before and after the inline.

	On every loop iteration.

	So just to see if removing those bind calls sped up the code, I added a little
	code in Perl6::Optimizer.visit_op, which will be called on every QAST::Op node
	and so, on while loops.

	```
	--- a/src/Perl6/Optimizer.nqp
	+++ b/src/Perl6/Optimizer.nqp
	@@ -1253,6 +1253,16 @@ class Perl6::Optimizer {
	# Visit the children.
	self.visit_op_children($op);

	+ if $!level >= 90 {
	+ if $optype eq 'while' {
	+ my $qast := QAST::Stmts.new: $op[1][0], $op, $op[1][2];
	+ $op[1].shift;
	+ $op[1].pop;
	+ $op[1].resultchild(0);
	+ return $qast;
	+ }
	+ }
	+

	```

	...Let's look at that code for a bit. First, you would not want to necessarily
	do this optimization to every while loop that got inlined, because some loop
	bodies or conditionals or phasers or whatnot may actually alter or use $_ in some
	tricky way. So we prevent this from happening when we build rakudo (and during
	normal usage) by only performing the optimization when the --optimize flag has
	been given a silly value.

	This trick makes it realy easy to test out optimizer changes.

	Since this is just a test, and we already know what our AST is going to look like,
	we just shuffle things around to move the bind operations outside the while loop,
	so they only run once. We leave the QAST::Stmts node there even though we could
	replace it with a QAST::Stmt. Since we have moved the loop body from position
	1 to position 0 inside the QAST::Stmts, we have to adjust the resultschild attribute
	to tell the compiler to use the 0th element of the QAST::Stmts as the result
	which that QAST node is producing.

	Now let's see if that sped things up. Typical results were:

	```
	$ time perl6 --optimize=99 -e 'use nqp; my $i = 1; while nqp::islt_i($i,100000000) { $i := nqp::add_i($i,1) }; exit(0)'

	real 0m5.732s
	user 0m5.752s
	sys 0m0.028s

	```

	...there is still a tiny hair of a performance loss, but most of it is gone. So, if an
	optimization can be made to figure out when it is safe to make this transformation,
	we could speed up tight loops in general for a rakudo performance improvement when
	running Perl6 code, and probably also stop using nqp::while in several places and enjoy
	prettier syntax in the core code.

	If those bind operations even really belong there in the first place...
	I'm a bit behind the curve when it comes to fully unerstanding Perl6 scoping, or the
	finer points of scoping in general for that matter, so I'm a ways off from fully
	understanding all the things that could make such optimizations unsafe... it's something
	I'm going to have to poke around to try to figure out.

	During this endeavor I had at one point put some test nqp::says in delete_unused_magicals
	in the Optimizer. During the rakudo build, I don't think I saw even one instance
	of this block actually deleting anything (things do scroll go by pretty fast though)
	This is probably because that code will not remove unused $_ from a block if it
	contains any calls at all... and almost any Perl6 operator involves a call (at least,
	at this stage of optimization it does.) The only way to fix that would be if the called
	function/method (or list of possible methods) could be determined and if they bore an
	annotation that they were not going to do something like CALLERS::<$_> = 42, when that
	could be determined. Such an annotation would have to be embedded in the bytecode of
	the module from which the function is being included...

	However, determining whether a loop body needs to ensconse $_ on each iteration, rather
	than around the loop as a whole might be easier, given that conditionals are often
	pretty simple code fragments.

	Anyway, what I'd like people to take from this is that, while optimizing the core
	by using inline nqp has managed to gain rakudo an amazing amount of performance over
	the last couple years, there's another way to optimize... the Optimizer... and it
	turns out it is pretty easy to tinker with.