YARA Loop Optimization Details
Let's look at the bytecode without my optimizations. Before we do that let's set some terminology, because I find it easier to use names compared YARA VM memory locations. These are the names I've mostly borrowed from the comments in the grammar:
- memory 0: lower bound
- memory 1: boolean_expression accumulator
- memory 2: iteration counter
- memory 3: upper bound
We'll be using this rule for the first example:
"any" integer range case
for any i in (0..5): (true)
Unoptimized
These are the instructions generated by the master
branch:
1 INIT_RULE
2 PUSH 1 ; "any"
3 PUSH UNDEF ; "end of list"
4 PUSH 0 ; integer range lower bound
5 PUSH 5 ; integer range upper bound
6 CLEAR_M 1 ; clear boolean expression accumulator
7 CLEAR_M 2 ; clear iteration counter
8 .--POP_M 3 ; store upper bound
9 | POP_M 0 ; store lower bound
10 | PUSH 1 ; true
11 | ADD_M 1 ; add boolean_expression result to accumulator
12 | INCR_M 2 ; increment loop iteration counter
13 | INCR_M 0 ; increment lower bound (more like current bound)
14 | PUSH_M 0 ; lower (current) bound
15 | PUSH_M 3 ; upper bound
16 `--JLE ; jump to start of loop if we haven't iterated enough
17 POP ; if we didn't take the jump we have to clean up the stack
18 POP
19 POP ; pop end of list
20 SWAPUNDEF 2 ; at this point only our "any" is on the stack, this is effectively a NOP
21 PUSH_M 1 ; push the boolean_expression accumulator
22 INT_LE ; compare boolean_expression accumulator to primary_expression ("any")
23 MATCH_RULE
24 NOP
25 HALT
Optimized
My branch adds another memory location:
memory 4: primary_expression minimum
Here is the instructions generated by my optimization branch:
1 INIT_RULE
2 PUSH 1 ; "any"
3 CLEAR_M 4 ; store primary_expression in m4 while
4 ADD_M 4 ; keeping it on the stack by pushing
5 PUSH_M 4 ; it back on after ADD_M pop'ed it off.
6 PUSH UNDEF ; "end of list"
7 PUSH 0 ; integer range lower bound
8 PUSH 5 ; integer range upper bound
9 CLEAR_M 1 ; clear loop local variables
10 CLEAR_M 2 ; clear loop local variables
11 .--POP_M 3 ; store upper bound
12 | POP_M 0 ; store lower bound
13 | PUSH 1 ; true
14 | ADD_M 1 ; add boolean_expression result to accumulator
15 | INCR_M 2 ; increment loop iteration counter
16 | INCR_M 0 ; increment lower bound (more like current bound)
17 | PUSH_M 4 ; primary expression minimum
18 | PUSH_M 1 ; boolean_expression accumulator
19 .-+--JLE ; jump out of loop if (minimum <= accumulator)
20 | | POP ; clean up stack if we don't take jump
21 | | POP ; clean up stack if we don't take jump
22 | | PUSH_M 0 ; lower (current) bound
23 | | PUSH_M 3 ; upper bound
24 | `--JLE ; jump to start of loop if we haven't iterated enough
25 `----NOP ; jump target for early exit
26 POP ; if we took the early exit this will clean up those two pushes
27 POP ; if we didn't take the second jump this cleans up those two pushes
28 POP ; pop end of list
29 SWAPUNDEF 2 ; at this point only our "any" is on the stack, this is effectively a NOP
30 PUSH_M 1 ; push the boolean_expression accumulator
31 INT_LE ; compare boolean_expression accumulator to primary_expression ("any")
32 MATCH_RULE
33 NOP
34 HALT
Some things to note:
- Lines 3 to 5 are new. They exist because we want to store the primary expression in memory 4 while still keeping it on the stack. These could be reduced to a single SET_M instruction which restores the value on the stack after pop'ing it and setting it in memory.
- Lines 17 to 21 are new. We compare the primary_expression minimum in memory 4 ("any" or 1 in this example) to the boolean expression accumulator (1 the first time through the loop) and jump out of the loop if we have met the condition. If we haven't met the condition we pop the two numbers off the stack and do the regular loop iteration comparison.
- Line 25 is new and serves as a jump target for the early exit check. It's worth noting that the number of POP instructions after the loop are the same because we will either be cleaning up after the early exit check or the loop iteration check, but the size of the stack is the same in both cases.
"all" integer range case
Now let's look at the "all" case:
In this case I'm using memory 4 to store the last boolean expression result. This is stored differently from the accumulator because I want to check it at each iteration of the loop.
for all i in (0..5): (true)
Unoptimized
Here's the instructions generated by the master branch:
1 INIT_RULE
2 PUSH UNDEF ; "all"
3 PUSH UNDEF ; "end of list"
4 PUSH 0 ; integer range lower bound
5 PUSH 5 ; integer range upper bound
6 CLEAR_M 1 ; clear boolean expression accumulator
7 CLEAR_M 2 ; clear iteration counter
8 .--POP_M 3 ; store upper bound
9 | POP_M 0 ; store lower bound
10 | PUSH 1 ; true
11 | ADD_M 1 ; add boolean_expression result to accumulator
12 | INCR_M 2 ; increment loop iteration counter
13 | INCR_M 0 ; increment lower bound (more like current bound)
14 | PUSH_M 0 ; lower (current) bound
15 | PUSH_M 3 ; upper bound
16 `--JLE ; jump to start of loop if we haven't iterated enough
17 POP ; if we didn't take the jump we have to clean up the stack
18 POP
19 POP ; pop end of list
20 SWAPUNDEF 2 ; swap the UNDEF ("all") with loop iteration counter (memory 2)
21 PUSH_M 1 ; push the boolean_expression accumulator
22 INT_LE ; compare boolean_expression accumulator to loop iteration counter
23 MATCH_RULE
24 NOP
25 HALT
The difference between the "any" case is two things:
- The expression which is pushed on line 2 is UNDEF instead of 1.
- Line 20 is no longer a NOP because we pushed UNDEF on line 2.
Optimized
Here's the instructions for this condition as generated by my loop_optimization branch:
1 INIT_RULE
2 PUSH UNDEF ; "all"
3 PUSH UNDEF ; "end of list"
4 PUSH 0 ; integer range lower bound
5 PUSH 5 ; integer range upper bound
6 CLEAR_M 1 ; clear loop local variables
7 CLEAR_M 2 ; clear loop local variables
8 .--POP_M 3 ; store upper bound
9 | POP_M 0 ; store lower bound
10 | PUSH 1 ; true
11 | POP_M 4 ; store boolean expression result in memory 4
12 | PUSH_M 4 ; we still want it on the stack so we can add it to the accumulator
13 | ADD_M 1 ; add boolean_expression result to accumulator
14 | INCR_M 2 ; increment loop iteration counter
15 | INCR_M 0 ; increment lower bound (more like current bound)
16 | PUSH_M 4 ; boolean expression result
17 .-+--JFALSE ; jump out of loop if last result is false
18 | | POP ; clean up stack if we don't take jump
19 | | PUSH_M 0 ; lower (current) bound
20 | | PUSH_M 3 ; upper bound
21 | `--JLE ; jump to start of loop if we haven't iterated enough
22 | POP ; clean up the upper bound left on the stack
23 `----NOP ; jump target for early exit
24 POP ; if we took the early exit this will clean up boolean expression result or it will clean up line 19
25 POP ; pop end of list
26 SWAPUNDEF 2 ; swap the UNDEF ("all") with loop iteration counter (memory 2)
27 PUSH_M 1 ; push the boolean_expression accumulator
28 INT_LE ; compare boolean_expression accumulator to loop iteration counter
29 MATCH_RULE
30 NOP
31 HALT
This is similar to the "any" case but rather than a JLE we have a JFALSE and jump if the boolean expression result is ever false.
Integer Sets (TBD)
I haven't quite figured out integer sets or string sets (they are the same problem). Here's how the instructions look for this rule:
for any i in (0,1,2,3,4,5): (true)
1 INIT_RULE
2 PUSH 1 ; "any"
3 PUSH UNDEF ; "end of list"
4 PUSH 0
5 PUSH 1
6 PUSH 2
7 PUSH 3
8 PUSH 4
9 PUSH 5
10 CLEAR_M 1 ; clear boolean expression accumulator
11 CLEAR_M 2 ; clear iteration counter
12 .--POP_M 0 ; store lower bound
13 | PUSH 1 ; true
14 | ADD_M 1 ; add boolean_expression result to accumulator
15 | INCR_M 2 ; increment loop iteration counter
16 `--JNUNDEF ; Are we at the end of list?
17 POP ; pop end of list
18 SWAPUNDEF 2 ; at this point only our "any" is on the stack, this is effectively a NOP
19 PUSH_M 1 ; push the boolean_expression accumulator
20 INT_LE ; compare boolean_expression accumulator to primary_expression ("any")
21 MATCH_RULE
22 NOP
23 HALT
If I apply the same logic for early exits in the "any" and "all" case as I have explained above I won't know where to put the jump target so that the stack unwinds properly. To put it another way, I would need to add some number of POP instructions to clean up the stack but I won't know how many (and thus where to put the jump target), because the right value depends entirely upon how many times we have iterated through the loop (and thus consumed something off the stack). I'm still considering this case but for now I'm leaving that unoptimized.