Skip to content

Instantly share code, notes, and snippets.

Last active February 5, 2023 20:18
Show Gist options
  • Save wxsBSD/1e518cef545fee7bb991a9dc6c14a0f7 to your computer and use it in GitHub Desktop.
Save wxsBSD/1e518cef545fee7bb991a9dc6c14a0f7 to your computer and use it in GitHub Desktop.

Using YARA python interface to parse files

I've shared this technique with some people privately, but might as well share it publicly now since I was asked about it. I've been using this for a while now with good success. It works well for parsing .NET droppers and other things.

If you don't know what the -D flag to YARA does I suggest you import a module and run a file through using that flag. It will print, to stdout, everything the module parsed that doesn't involve you calling a function. This is a great way to get a quick idea for the structure of a file.

For example:

wxs@mbp yara % cat always_false.yara
import "dotnet"

rule a { condition: false }
wxs@mbp yara % ./yara -D always_false.yara ~/malware/random-dotnet/VirusShare_1f747682038223b20c44e809095d0eb3
        number_of_constants = 0
        typelib = "d1b75267-6610-4555-aaf7-86eef51e2179"
        number_of_user_strings = 279
                [0] = "P\x00r\x00o\x00p\x00e\x00r\x00t\x00y\x00 \x00c\x00a\x00n\x00 \x00o\x00n\x00l\x00y\x00 \x00b\x00e\x00 \x00s\x00e\x00t\x00 \x00t\x00o\x00 \x00N\x00o\x00t\x00h\x00i\x00n\x00g\x00\x00"
                [1] = "W\x00i\x00n\x00F\x00o\x00r\x00m\x00s\x00_\x00R\x00e\x00c\x00u\x00r\x00s\x00i\x00v\x00e\x00F\x00o\x00r\x00m\x00C\x00r\x00e\x00a\x00t\x00e\x00\x00"
                [2] = "W\x00i\x00n\x00F\x00o\x00r\x00m\x00s\x00_\x00S\x00e\x00e\x00I\x00n\x00n\x00e\x00r\x00E\x00x\x00c\x00e\x00p\x00t\x00i\x00o\x00n\x00\x00"
                [3] = "c\x00s\x00j\x00w\x00C\x00\x00"
                [4] = "k\x00u\x00q\x00e\x00a\x00\x00"
                [SNIP A WHOLE BUNCH OF THESE]
        number_of_modulerefs = UNDEFINED
                culture = UNDEFINED
                name = "JdEiesaTQvcyNCukodamxDcmns"
                        revision_number = 0
                        build_number = 0
                        minor = 0
                        major = 1
        number_of_assembly_refs = 7
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb7z\V\x194\xe0\x89"
                        name = "mscorlib"
                                major = 8
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb0?_\x7f\x11\xd5\x0a:"
                        name = "Microsoft.VisualBasic"
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb7z\V\x194\xe0\x89"
                        name = "System.Windows.Forms"
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb7z\V\x194\xe0\x89"
                        name = "System"
                                major = 1
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = UNDEFINED
                        name = "GeniusLibFull"
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb0?_\x7f\x11\xd5\x0a:"
                        name = "System.Drawing"
                                major = 1
                                minor = 9
                                build_number = 1
                                revision_number = 5
                        public_key_or_token = "\xed\xbeQ\xad\x94*?\"
                        name = "Ionic.Zip.Reduced"
        number_of_resources = 8
                        offset = 10816
                        length = 334848
                        name = "JdEiesaTQvcyNCukodamxDcmns.GeniusLibFull.dll"
                        offset = 345668
                        length = 110032
                        name = "JdEiesaTQvcyNCukodamxDcmns.installerbg.jpg.gzc"
                        offset = 455704
                        length = 1456
                        name = "JdEiesaTQvcyNCukodamxDcmns.Config.xml.gzc"
                        offset = 457164
                        length = 199688
                        name = "JdEiesaTQvcyNCukodamxDcmns.Ionic.Zip.Reduced.dll.gc"
                        offset = 656856
                        length = 3000
                        name = "JdEiesaTQvcyNCukodamxDcmns.insticon.ico.gzc"
                        offset = 659860
                        length = 25061
                        name = "jDxKcaragBiRaaKnaBfu.trpJcfvaqNqjsaAwhzqfgan.resources"
                        offset = 684925
                        length = 180
                        name = "DedpagGbaLxrsaNfqGju.uDqJcdubrcrctaDkBxnfpOo.resources"
                        offset = 685109
                        length = 180
                        name = "bqAeacnbfsy.CbpdzgraqcdixaujmRRfdMa.resources"
        number_of_guids = 1
                [0] = "9338c924-cab2-4fdd-ba0c-892bc48d91d1"
        number_of_streams = 5
                        name = "#~"
                        offset = 685408
                        size = 7288
                        name = "#Strings"
                        offset = 692696
                        size = 8316
                        name = "#US"
                        offset = 701012
                        size = 4148
                        name = "#GUID"
                        offset = 705160
                        size = 16
                        name = "#Blob"
                        offset = 705176
                        size = 2664
        module_name = "asjmizkoAhkzsheqsbngaab"
        version = "v2.0.50727"
wxs@mbp yara %

You may notice each of the dotnet resources has an offset and a length, which means carving them is super easy. Now we just need a way to get access to this parsed module state via python. Don't worry, there's an easy way to do that too. Just use the nifty modules_callback functionality in the YARA python module.

import sys
import yara

def modules_callback(data):
    for i, resource in enumerate(data.get('resources', [])):
        offset = resource['offset']
        length = resource['length']
        with open('resource_%i' % i, 'wb') as f:
            print("Writing %i to %s" % (length,
            f.write(file_data[offset:offset + length])

    return yara.CALLBACK_CONTINUE

f = open(sys.argv[1])
file_data =

rules = yara.compile(source='import "dotnet" rule a { condition: false }')
rules.match(data=file_data, modules_callback=modules_callback)

And here it is being run:

wxs@mbp yara % python ./ ~/malware/random-dotnet/VirusShare_1f747682038223b20c44e809095d0eb3
Writing 334848 to resource_0
Writing 110032 to resource_1
Writing 1456 to resource_2
Writing 199688 to resource_3
Writing 3000 to resource_4
Writing 25061 to resource_5
Writing 180 to resource_6
Writing 180 to resource_7
wxs@mbp yara % file resource_*
resource_0: PE32 executable (DLL) (console) Intel 80386 Mono/.Net assembly, for MS Windows
resource_1: data
resource_2: data
resource_3: data
resource_4: data
resource_5: data
resource_6: data
resource_7: data
wxs@mbp yara %

It's a silly example but illustrates the point. The basic technique is simple but the applications are wide. It's about time I shared this technique more widely. Feel free to ping me at @wxs on twitter or via if you have questions!

-- WXS

Copy link

I needed to read a bit more about that modules_callback functionality here but this is really useful, especially for .NET parsing. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment