Skip to content

Instantly share code, notes, and snippets.

@wxsBSD
Last active February 5, 2023 20:18
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save wxsBSD/1e518cef545fee7bb991a9dc6c14a0f7 to your computer and use it in GitHub Desktop.
Save wxsBSD/1e518cef545fee7bb991a9dc6c14a0f7 to your computer and use it in GitHub Desktop.

Using YARA python interface to parse files

I've shared this technique with some people privately, but might as well share it publicly now since I was asked about it. I've been using this for a while now with good success. It works well for parsing .NET droppers and other things.

If you don't know what the -D flag to YARA does I suggest you import a module and run a file through using that flag. It will print, to stdout, everything the module parsed that doesn't involve you calling a function. This is a great way to get a quick idea for the structure of a file.

For example:

wxs@mbp yara % cat always_false.yara
import "dotnet"

rule a { condition: false }
wxs@mbp yara % ./yara -D always_false.yara ~/malware/random-dotnet/VirusShare_1f747682038223b20c44e809095d0eb3
dotnet
        number_of_constants = 0
        constants
        typelib = "d1b75267-6610-4555-aaf7-86eef51e2179"
        number_of_user_strings = 279
        user_strings
                [0] = "P\x00r\x00o\x00p\x00e\x00r\x00t\x00y\x00 \x00c\x00a\x00n\x00 \x00o\x00n\x00l\x00y\x00 \x00b\x00e\x00 \x00s\x00e\x00t\x00 \x00t\x00o\x00 \x00N\x00o\x00t\x00h\x00i\x00n\x00g\x00\x00"
                [1] = "W\x00i\x00n\x00F\x00o\x00r\x00m\x00s\x00_\x00R\x00e\x00c\x00u\x00r\x00s\x00i\x00v\x00e\x00F\x00o\x00r\x00m\x00C\x00r\x00e\x00a\x00t\x00e\x00\x00"
                [2] = "W\x00i\x00n\x00F\x00o\x00r\x00m\x00s\x00_\x00S\x00e\x00e\x00I\x00n\x00n\x00e\x00r\x00E\x00x\x00c\x00e\x00p\x00t\x00i\x00o\x00n\x00\x00"
                [3] = "c\x00s\x00j\x00w\x00C\x00\x00"
                [4] = "k\x00u\x00q\x00e\x00a\x00\x00"
                [SNIP A WHOLE BUNCH OF THESE]
        number_of_modulerefs = UNDEFINED
        modulerefs
        assembly
                culture = UNDEFINED
                name = "JdEiesaTQvcyNCukodamxDcmns"
                version
                        revision_number = 0
                        build_number = 0
                        minor = 0
                        major = 1
        number_of_assembly_refs = 7
        assembly_refs
                [0]
                        version
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb7z\V\x194\xe0\x89"
                        name = "mscorlib"
                [1]
                        version
                                major = 8
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb0?_\x7f\x11\xd5\x0a:"
                        name = "Microsoft.VisualBasic"
                [2]
                        version
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb7z\V\x194\xe0\x89"
                        name = "System.Windows.Forms"
                [3]
                        version
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb7z\V\x194\xe0\x89"
                        name = "System"
                [4]
                        version
                                major = 1
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = UNDEFINED
                        name = "GeniusLibFull"
                [5]
                        version
                                major = 2
                                minor = 0
                                build_number = 0
                                revision_number = 0
                        public_key_or_token = "\xb0?_\x7f\x11\xd5\x0a:"
                        name = "System.Drawing"
                [6]
                        version
                                major = 1
                                minor = 9
                                build_number = 1
                                revision_number = 5
                        public_key_or_token = "\xed\xbeQ\xad\x94*?\"
                        name = "Ionic.Zip.Reduced"
        number_of_resources = 8
        resources
                [0]
                        offset = 10816
                        length = 334848
                        name = "JdEiesaTQvcyNCukodamxDcmns.GeniusLibFull.dll"
                [1]
                        offset = 345668
                        length = 110032
                        name = "JdEiesaTQvcyNCukodamxDcmns.installerbg.jpg.gzc"
                [2]
                        offset = 455704
                        length = 1456
                        name = "JdEiesaTQvcyNCukodamxDcmns.Config.xml.gzc"
                [3]
                        offset = 457164
                        length = 199688
                        name = "JdEiesaTQvcyNCukodamxDcmns.Ionic.Zip.Reduced.dll.gc"
                [4]
                        offset = 656856
                        length = 3000
                        name = "JdEiesaTQvcyNCukodamxDcmns.insticon.ico.gzc"
                [5]
                        offset = 659860
                        length = 25061
                        name = "jDxKcaragBiRaaKnaBfu.trpJcfvaqNqjsaAwhzqfgan.resources"
                [6]
                        offset = 684925
                        length = 180
                        name = "DedpagGbaLxrsaNfqGju.uDqJcdubrcrctaDkBxnfpOo.resources"
                [7]
                        offset = 685109
                        length = 180
                        name = "bqAeacnbfsy.CbpdzgraqcdixaujmRRfdMa.resources"
        number_of_guids = 1
        guids
                [0] = "9338c924-cab2-4fdd-ba0c-892bc48d91d1"
        number_of_streams = 5
        streams
                [0]
                        name = "#~"
                        offset = 685408
                        size = 7288
                [1]
                        name = "#Strings"
                        offset = 692696
                        size = 8316
                [2]
                        name = "#US"
                        offset = 701012
                        size = 4148
                                        [3]
                        name = "#GUID"
                        offset = 705160
                        size = 16
                [4]
                        name = "#Blob"
                        offset = 705176
                        size = 2664
        module_name = "asjmizkoAhkzsheqsbngaab"
        version = "v2.0.50727"
wxs@mbp yara %

You may notice each of the dotnet resources has an offset and a length, which means carving them is super easy. Now we just need a way to get access to this parsed module state via python. Don't worry, there's an easy way to do that too. Just use the nifty modules_callback functionality in the YARA python module.

import sys
import yara

def modules_callback(data):
    for i, resource in enumerate(data.get('resources', [])):
        offset = resource['offset']
        length = resource['length']
        with open('resource_%i' % i, 'wb') as f:
            print("Writing %i to %s" % (length, f.name))
            f.write(file_data[offset:offset + length])

    return yara.CALLBACK_CONTINUE

f = open(sys.argv[1])
file_data = f.read()
f.close()

rules = yara.compile(source='import "dotnet" rule a { condition: false }')
rules.match(data=file_data, modules_callback=modules_callback)

And here it is being run:

wxs@mbp yara % python ./test.py ~/malware/random-dotnet/VirusShare_1f747682038223b20c44e809095d0eb3
Writing 334848 to resource_0
Writing 110032 to resource_1
Writing 1456 to resource_2
Writing 199688 to resource_3
Writing 3000 to resource_4
Writing 25061 to resource_5
Writing 180 to resource_6
Writing 180 to resource_7
wxs@mbp yara % file resource_*
resource_0: PE32 executable (DLL) (console) Intel 80386 Mono/.Net assembly, for MS Windows
resource_1: data
resource_2: data
resource_3: data
resource_4: data
resource_5: data
resource_6: data
resource_7: data
wxs@mbp yara %

It's a silly example but illustrates the point. The basic technique is simple but the applications are wide. It's about time I shared this technique more widely. Feel free to ping me at @wxs on twitter or via wxs@atarininja.org if you have questions!

-- WXS

@cmatthewbrooks
Copy link

I needed to read a bit more about that modules_callback functionality here but this is really useful, especially for .NET parsing. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment