Skip to content

Instantly share code, notes, and snippets.

View wxsBSD's full-sized avatar

Wesley Shields wxsBSD

View GitHub Profile

Using YARA python interface to parse files

I've shared this technique with some people privately, but might as well share it publicly now since I was asked about it. I've been using this for a while now with good success. It works well for parsing .NET droppers and other things.

If you don't know what the -D flag to YARA does I suggest you import a module and run a file through using that flag. It will print, to stdout, everything the module parsed that doesn't involve you calling a function. This is a great way to get a quick idea for the structure of a file.

For example:

wxs@mbp yara % cat always_false.yara
@wxsBSD
wxsBSD / gist:3e9452c3699bf68ff2e83a5d6a521801
Created September 29, 2021 02:23
french yara hits, no sorting

Test rules:

wxs@wxs-mbp yara % cat rules/test.yara
rule b {
  strings:
    $a = "LSCOLORS"
  condition:
    $a
}
/*
* fmtid + 24 == number of property identifiers and offsets
* fmtid + 28 == start of property identifier and offsets (4 bytes each)
*/
rule test {
strings:
//$fmtid = { 02 d5 cd d5 9c 2e 1b 10 93 97 08 00 2b 2c f9 ae }
$fmtid = { e0 85 9f f2 f9 4f 68 10 ab 91 08 00 2b 27 b3 d9 }
$redacted_author = "REDACTED AUTHOR"
condition:

I've been working on optimizing the YARA compiler to generate better bytecode for loops. The goal is to skip as much of loops as possible by not iterating further once the loop condition is met. Here's the rule I'm using. Completely contrived and excessive, but it's to show the performance improvement:

wxs@wxs-mbp yara % cat rules/test.yara
rule a {
  condition:
    for any i in (0..100000000): (i == 1)
}
wxs@wxs-mbp yara %

YARA Loop Optimization Details

Let's look at the bytecode without my optimizations. Before we do that let's set some terminology, because I find it easier to use names compared YARA VM memory locations. These are the names I've mostly borrowed from the comments in the grammar:

  • memory 0: lower bound
  • memory 1: boolean_expression accumulator
  • memory 2: iteration counter
  • memory 3: upper bound

We'll be using this rule for the first example:

wxs@wxs-mbp yara % cat rules/test.yara
rule a {
  strings:
    // This program cannot VGhpcyBwcm9ncmFtIGNhbm5vdA==
    // AThis program cannot QVRoaXMgcHJvZ3JhbSBjYW5ub3Q=
    // AAThis program cannot QUFUaGlzIHByb2dyYW0gY2Fubm90
    $a = "This program cannot" base64 ascii

 // Custom alphabets are supported, but I have it commented out for now. ;)
@wxsBSD
wxsBSD / gist:4ec929a0eb07d8e3feeccc49e0d9aa2a
Last active April 29, 2022 15:06
Counting string matches in YARA with awk

Counting number of times strings match in YARA with awk...

wxs@wxs-mbp yara % cat rules/test.yara
rule a { strings: $a = "FreeBSD" nocase  $b = "usage: " condition: any of them }
wxs@wxs-mbp yara % ./yara -s rules/test.yara /bin/ls
a /bin/ls
0xb8e1:$a: FreeBSD
0xb9a1:$a: FreeBSD
0xb9f1:$a: FreeBSD
@wxsBSD
wxsBSD / sets.md
Created December 2, 2021 02:30
Example of using rule sets to write higher order logic
wxs@wxs-mbp yara % cat rules/sets.yara
rule a0 { condition: false }
rule a1 { condition: true }
rule b { condition: 1 of (a*) }
rule c { condition: 2 of (a*) }
rule d { condition: 50% of (a*) }
rule e { condition: 1 of (a1) }
rule f { condition: all of (a1, e) }
wxs@wxs-mbp yara %
@wxsBSD
wxsBSD / gist:76dc97427252f2dda8e7c9f4870ebb5a
Last active March 29, 2022 02:15
y10k - YARA 10k test

This started with a tweet from Steve Miller (https://twitter.com/stvemillertime/status/1508441489923313664) in which he asked what is better for performance: 1 rule with 10k strings or 10k rules with 1 string each? Based upon my understanding of YARA I guessed it wouldn't matter for search time and the difference in bytecode evaluation would be in the noise. Effectively, I guessed you would not be able to tell the difference between the two.

Costin was the first to provide actual results and he claimed a 35 second vs 31 second difference between the two (https://twitter.com/craiu/status/1508445059129163783). That didn't make much sense to me so I asked for his rules so I could test them. He provided me with two rules files (10k.yara and 10kv2.yara) and a text file with a bunch of strings in it.

This is my attempt to replicate his findings and also document why he was getting the warning he was getting. Because I wanted the run to take a bit of time I ended up not using his text file with all the strings (it

@wxsBSD
wxsBSD / rules.md
Last active January 12, 2022 19:51
xor PE rules

One way to find PE files that start at offset 0 and have a single byte xor key:

rule single_byte_xor_pe_and_mz {
  meta:
    author = "Wesley Shields <wxs@atarininja.org>"
    description = "Look for single byte xor of a PE starting at offset 0"
  strings:
    $b = "PE\x00\x00" xor(0x01-0xff)
 condition: