Skip to content

Instantly share code, notes, and snippets.

@wingdeans
Last active September 11, 2022 19:01
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wingdeans/8203a694beaa1ae3b2a40d2c8635375a to your computer and use it in GitHub Desktop.
Save wingdeans/8203a694beaa1ae3b2a40d2c8635375a to your computer and use it in GitHub Desktop.
GSOC 2022 Submission - Rizin

Work Summary

rz-bindgen

Python usage documentation

Python bindings documentation

Python examples

Background

Rizin is a reverse-engineering CLI tool and library written in C including binary format parsers, disassemblers, an IL, and more (Cutter is the officially-supported GUI frontend). When reverse-engineering a binary, it's sometimes necessary to write small scripts (eg. for deobfuscation). Rizin's default shell has some functionality for this, such as loops and macros. Rizin also has the rz-pipe plugin to programmatically call Rizin commands from a variety of languages. Finally, being written in C, Rizin allows developers to link to it and call into its functions (this is how Cutter works).

Proposal

Rz-pipe can only work with functions exposed to the Rizin shell, and only in the way that such functions are exposed. In practice, this means lots of string marshalling and JSON conversion. Although rz-pipe can do everything the Rizin shell can, it cannot match the full Rizin C API in performance, feature-completeness, or strictness (eg. as in type guarantees). For GSoC, I was tasked with exposing this C API to scripting languages using a binding generator.

See this post on the Rizin blog for more details on the thought process behind my proposal and my implementation ideas from before I started the task.

Work

To generate bindings for the headers, the contents of the headers must first be parsed. I considered my primary options to be tree-sitter and libclang, but I also considered clang libtooling and even hand-writing a C preprocessor and parser based on tinycc. Even though I wrote about tree-sitter in the Rizin GSoC announcement blogpost, the integrated preprocessor and semantic analysis pushed me towards the clang-based tools, and I ended up choosing to use libclang's Python bindings.

Once a header is parsed, it is up the bindgen user to specify how the unstructured C functions and structs should translate to the target language. In this snippet from rz-bindgen, the RzAnalysis struct from the rz_analysis.h header is bound as an object-oriented class where functions found with the prefix rz_analysis_ become its methods.

rz_analysis = Class(
    analysis_h,
    typedef="RzAnalysis",
    ignore_fields={"leaddrs"},
    rename_fields={"type_links": "_type_links"},
)

rz_analysis.add_method("rz_analysis_reflines_get", rename="get_reflines")
rz_analysis.add_prefixed_methods("rz_analysis_")
rz_analysis.add_prefixed_funcs("rz_analysis_")

One of the major challenges in translating the C headers was the existence of generic types. Rizin uses types like RzList and RzVector to represent a linked-list and dynamic array respectively and, being written in C, uses void* for the type of the data contained within. This means that trying to use these types from Python would be difficult, as their elements lack the type information to generate methods. Thankfully, Rizin developers saw the need to annotate the types of these functions for developer ergonomics using comments such as RzList /*<RzAnalysisBlock *>*/ *bbs. I added additional annotations over the span of several PRs.

In order to feasibly generate bindings for a variety of languages, I chose SWIG to be the primary target for rz-bindgen. Of the languages supported by SWIG, I chose Python to start with, due to its ease of use and widespread adoption in both commercial and open-source reverse-engineering tools alike.

The Python bindings make it easier to access Rizin internals when writing scripts, as can be seen in the examples. One key feature is the ability to register a Rizin command backed by a Python function, like so:

def print_function_info(fn: rizin.RzAnalysisFunction):
    print("name:", fn.name)
    print("number of xrefs from:", len(fn.get_xrefs_from()))
    print("number of xrefs to:", len(fn.get_xrefs_to()))
    return True
core.register_command("uf", print_function_info)

See the Python usage documentation for additional information.

In order to improve the developer experience, I also implemented a Python documentation backend for the functions exposed by rz-bindgen (output accessible here). Finally, I implemented Rizin and Cutter plugins to better integrate the bindings for interactive use.

Reflection

Overall. I'm quite happy with what I've accomplished. I wasn't able to add a backend for another programming language like I wanted to in my proposal, the plugins need more care, the examples are too basic and there is no testing framework. I would to like to add these in the future, but this is a passable start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment