I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:
wxs@wxs-mbp yara % cat rules/test.yara
rule a {
condition:
for any i in (0..100000000): (i == 1)
}
wxs@wxs-mbp yara %
Eliminate the compiler by pre-compiling the rules and then run them a few times:
wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
4.94 real 4.91 user 0.02 sys
a /dev/null
4.88 real 4.85 user 0.01 sys
a /dev/null
4.89 real 4.87 user 0.01 sys
a /dev/null
4.97 real 4.95 user 0.01 sys
a /dev/null
4.82 real 4.79 user 0.02 sys
wxs@wxs-mbp yara %
Somewhere just under 5 seconds to run that (horrible) rule.
Here is the same thing with my loop optimization branch. All this branch does is stop running the expression inside the loop as soon as the condition is met. In our rule this is as soon as the expression evaluates to true one time. We have to recompile the rule again since my patch is modifying the bytecode emited by the compiler.
wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
0.02 real 0.00 user 0.01 sys
a /dev/null
0.02 real 0.01 user 0.01 sys
a /dev/null
0.02 real 0.01 user 0.01 sys
a /dev/null
0.02 real 0.00 user 0.01 sys
a /dev/null
0.02 real 0.01 user 0.01 sys
wxs@wxs-mbp yara %
What impact does this have on real world rules? I'm collecting some data right now, but if you have rules that have loops in them that could run a lot of times I'd love to see them, along with a handful of samples that match so I can benchmark some more!
Impressive! This could have a great impact in our use case in VirusTotal.