Skip to content

Instantly share code, notes, and snippets.

@wxsBSD
Last active April 29, 2022 15:08
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save wxsBSD/9e98a02b22255373773606294281b3a2 to your computer and use it in GitHub Desktop.
Save wxsBSD/9e98a02b22255373773606294281b3a2 to your computer and use it in GitHub Desktop.

I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:

wxs@wxs-mbp yara % cat rules/test.yara
rule a {
  condition:
    for any i in (0..100000000): (i == 1)
}
wxs@wxs-mbp yara %

Eliminate the compiler by pre-compiling the rules and then run them a few times:

wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
        4.94 real         4.91 user         0.02 sys
a /dev/null
        4.88 real         4.85 user         0.01 sys
a /dev/null
        4.89 real         4.87 user         0.01 sys
a /dev/null
        4.97 real         4.95 user         0.01 sys
a /dev/null
        4.82 real         4.79 user         0.02 sys
wxs@wxs-mbp yara %

Somewhere just under 5 seconds to run that (horrible) rule.

Here is the same thing with my loop optimization branch. All this branch does is stop running the expression inside the loop as soon as the condition is met. In our rule this is as soon as the expression evaluates to true one time. We have to recompile the rule again since my patch is modifying the bytecode emited by the compiler.

wxs@wxs-mbp yara % ./yarac rules/test.yara rules/test.bin
wxs@wxs-mbp yara % for i in $(jot 5); do /usr/bin/time ./yara rules/test.bin /dev/null; done
a /dev/null
        0.02 real         0.00 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
a /dev/null
        0.02 real         0.00 user         0.01 sys
a /dev/null
        0.02 real         0.01 user         0.01 sys
wxs@wxs-mbp yara %

What impact does this have on real world rules? I'm collecting some data right now, but if you have rules that have loops in them that could run a lot of times I'd love to see them, along with a handful of samples that match so I can benchmark some more!

@plusvic
Copy link

plusvic commented Feb 22, 2019

Impressive! This could have a great impact in our use case in VirusTotal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment