Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?

I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:

wxs@wxs-mbp yara % cat rules/test.yara
rule a {
  condition:
    for any i in (0..100000000): (i == 1)
}
wxs@wxs-mbp yara %

Eliminate the compiler by pre-compiling the rules and then run them a few times:

wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
        4.94 real         4.91 user         0.02 sys
a /dev/null
        4.88 real         4.85 user         0.01 sys
a /dev/null
        4.89 real         4.87 user         0.01 sys
a /dev/null
        4.97 real         4.95 user         0.01 sys
a /dev/null
        4.82 real         4.79 user         0.02 sys
wxs@wxs-mbp yara %

Somewhere just under 5 seconds to run that (horrible) rule.

Here is the same thing with my loop optimization branch. All this branch does is stop running the expression inside the loop as soon as the condition is met. In our rule this is as soon as the expression evaluates to true one time. We have to recompile the rule again since my patch is modifying the bytecode emited by the compiler.

wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
        0.02 real         0.00 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
a /dev/null
        0.02 real         0.00 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
wxs@wxs-mbp yara %

What impact does this have on real world rules? I'm collecting some data right now, but if you have rules that have loops in them that could run a lot of times I'd love to see them, along with a handful of samples that match so I can benchmark some more!

@plusvic

This comment has been minimized.

Copy link

commented Feb 22, 2019

Impressive! This could have a great impact in our use case in VirusTotal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.