Skip to content

Instantly share code, notes, and snippets.


Wesley Shields wxsBSD

View GitHub Profile
View gist:3e9452c3699bf68ff2e83a5d6a521801

Test rules:

wxs@wxs-mbp yara % cat rules/test.yara
rule b {
    $a = "LSCOLORS"

Using YARA python interface to parse files

I've shared this technique with some people privately, but might as well share it publicly now since I was asked about it. I've been using this for a while now with good success. It works well for parsing .NET droppers and other things.

If you don't know what the -D flag to YARA does I suggest you import a module and run a file through using that flag. It will print, to stdout, everything the module parsed that doesn't involve you calling a function. This is a great way to get a quick idea for the structure of a file.

For example:

wxs@mbp yara % cat always_false.yara
View gist:2936585412fd57f039fd7ecd7b24cd1b
* fmtid + 24 == number of property identifiers and offsets
* fmtid + 28 == start of property identifier and offsets (4 bytes each)
rule test {
//$fmtid = { 02 d5 cd d5 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae }
$fmtid = { e0 85 9f f2 f9 4f 68 10 ab 91 08 00 2b 27 b3 d9 }
$redacted_author = "REDACTED AUTHOR"

I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:

wxs@wxs-mbp yara % cat rules/test.yara
rule a {
    for any i in (0..100000000): (i == 1)
wxs@wxs-mbp yara %

YARA Loop Optimization Details

Let's look at the bytecode without my optimizations. Before we do that let's set some terminology, because I find it easier to use names compared YARA VM memory locations. These are the names I've mostly borrowed from the comments in the grammar:

  • memory 0: lower bound
  • memory 1: boolean_expression accumulator
  • memory 2: iteration counter
  • memory 3: upper bound

We'll be using this rule for the first example:

View base64 and ascii
wxs@wxs-mbp yara % cat rules/test.yara
rule a {
    // This program cannot VGhpcyBwcm9ncmFtIGNhbm5vdA==
    // AThis program cannot QVRoaXMgcHJvZ3JhbSBjYW5ub3Q=
    // AAThis program cannot QUFUaGlzIHByb2dyYW0gY2Fubm90
    $a = "This program cannot" base64 ascii

 // Custom alphabets are supported, but I have it commented out for now. ;)
wxsBSD / gist:4ec929a0eb07d8e3feeccc49e0d9aa2a
Last active Apr 29, 2022
Counting string matches in YARA with awk
View gist:4ec929a0eb07d8e3feeccc49e0d9aa2a

Counting number of times strings match in YARA with awk...

wxs@wxs-mbp yara % cat rules/test.yara
rule a { strings: $a = "FreeBSD" nocase  $b = "usage: " condition: any of them }
wxs@wxs-mbp yara % ./yara -s rules/test.yara /bin/ls
a /bin/ls
0xb8e1:$a: FreeBSD
0xb9a1:$a: FreeBSD
0xb9f1:$a: FreeBSD
wxsBSD /
Created Dec 2, 2021
Example of using rule sets to write higher order logic
wxs@wxs-mbp yara % cat rules/sets.yara
rule a0 { condition: false }
rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule c { condition: 2 of (a*) }
rule d { condition: 50% of (a*) }
rule e { condition: 1 of (a1) }
rule f { condition: all of (a1, e) }
wxs@wxs-mbp yara %
View gist:76dc97427252f2dda8e7c9f4870ebb5a

This started with a tweet from Steve Miller ( in which he asked what is better for performance: 1 rule with 10k strings or 10k rules with 1 string each? Based upon my understanding of YARA I guessed it wouldn't matter for search time and the difference in bytecode evaluation would be in the noise. Effectively, I guessed you would not be able to tell the difference between the two.

Costin was the first to provide actual results and he claimed a 35 second vs 31 second difference between the two ( That didn't make much sense to me so I asked for his rules so I could test them. He provided me with two rules files (10k.yara and 10kv2.yara) and a text file with a bunch of strings in it.

This is my attempt to replicate his findings and also document why he was getting the warning he was getting. Because I wanted the run to take a bit of time I ended up not using his text file with all the strings (it

wxsBSD /
Last active Jan 12, 2022
xor PE rules

One way to find PE files that start at offset 0 and have a single byte xor key:

rule single_byte_xor_pe_and_mz {
    author = "Wesley Shields <>"
    description = "Look for single byte xor of a PE starting at offset 0"
    $b = "PE\x00\x00" xor(0x01-0xff)