Test rules:
wxs@wxs-mbp yara % cat rules/test.yara
rule b {
strings:
$a = "LSCOLORS"
condition:
$a
}
Test rules:
wxs@wxs-mbp yara % cat rules/test.yara
rule b {
strings:
$a = "LSCOLORS"
condition:
$a
}
I've shared this technique with some people privately, but might as well share it publicly now since I was asked about it. I've been using this for a while now with good success. It works well for parsing .NET droppers and other things.
If you don't know what the -D flag to YARA does I suggest you import a module and run a file through using that flag. It will print, to stdout, everything the module parsed that doesn't involve you calling a function. This is a great way to get a quick idea for the structure of a file.
For example:
wxs@mbp yara % cat always_false.yara
/* | |
* fmtid + 24 == number of property identifiers and offsets | |
* fmtid + 28 == start of property identifier and offsets (4 bytes each) | |
*/ | |
rule test { | |
strings: | |
//$fmtid = { 02 d5 cd d5 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae } | |
$fmtid = { e0 85 9f f2 f9 4f 68 10 ab 91 08 00 2b 27 b3 d9 } | |
$redacted_author = "REDACTED AUTHOR" | |
condition: |
I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:
wxs@wxs-mbp yara % cat rules/test.yara
rule a {
condition:
for any i in (0..100000000): (i == 1)
}
wxs@wxs-mbp yara %
Let's look at the bytecode without my optimizations. Before we do that let's set some terminology, because I find it easier to use names compared YARA VM memory locations. These are the names I've mostly borrowed from the comments in the grammar:
We'll be using this rule for the first example:
wxs@wxs-mbp yara % cat rules/test.yara
rule a {
strings:
// This program cannot VGhpcyBwcm9ncmFtIGNhbm5vdA==
// AThis program cannot QVRoaXMgcHJvZ3JhbSBjYW5ub3Q=
// AAThis program cannot QUFUaGlzIHByb2dyYW0gY2Fubm90
$a = "This program cannot" base64 ascii
// Custom alphabets are supported, but I have it commented out for now. ;)
Counting number of times strings match in YARA with awk...
wxs@wxs-mbp yara % cat rules/test.yara
rule a { strings: $a = "FreeBSD" nocase $b = "usage: " condition: any of them }
wxs@wxs-mbp yara % ./yara -s rules/test.yara /bin/ls
a /bin/ls
0xb8e1:$a: FreeBSD
0xb9a1:$a: FreeBSD
0xb9f1:$a: FreeBSD
wxs@wxs-mbp yara % cat rules/sets.yara
rule a0 { condition: false }
rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule c { condition: 2 of (a*) }
rule d { condition: 50% of (a*) }
rule e { condition: 1 of (a1) }
rule f { condition: all of (a1, e) }
wxs@wxs-mbp yara %
This started with a tweet from Steve Miller (https://twitter.com/stvemillertime/status/1508441489923313664) in which he asked what is better for performance: 1 rule with 10k strings or 10k rules with 1 string each? Based upon my understanding of YARA I guessed it wouldn't matter for search time and the difference in bytecode evaluation would be in the noise. Effectively, I guessed you would not be able to tell the difference between the two.
Costin was the first to provide actual results and he claimed a 35 second vs 31 second difference between the two (https://twitter.com/craiu/status/1508445059129163783). That didn't make much sense to me so I asked for his rules so I could test them. He provided me with two rules files (10k.yara and 10kv2.yara) and a text file with a bunch of strings in it.
This is my attempt to replicate his findings and also document why he was getting the warning he was getting. Because I wanted the run to take a bit of time I ended up not using his text file with all the strings (it
One way to find PE files that start at offset 0 and have a single byte xor key:
rule single_byte_xor_pe_and_mz {
meta:
author = "Wesley Shields <wxs@atarininja.org>"
description = "Look for single byte xor of a PE starting at offset 0"
strings:
$b = "PE\x00\x00" xor(0x01-0xff)
condition: