Skip to content

Instantly share code, notes, and snippets.

@chinmaydd
Created January 21, 2018 17:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chinmaydd/ce6fb6108ca97a316d1994dfe4da8a97 to your computer and use it in GitHub Desktop.
Save chinmaydd/ce6fb6108ca97a316d1994dfe4da8a97 to your computer and use it in GitHub Desktop.
Static Single Assignment for Decompilation (A summary)
Introduction
* Machine code decompilers have the ability to provide the key to software evolution: source code.
* The idea is to transform the program from one form to another. (Stems from the similarity between compilers and decompilers.
* Compilation removes comprehension aids (comments and meaningful names for procedures and variables) from the program
[+] Source code
* Source code is so important to software development that at times, it becomes worthwhile to derive it from the executable form of computer programs.
* Defines a set of precise steps required to achieve the functionality of the program
* Both source and machine code are equivalent: in that both convey how to perform the programs function
* The task of a decompiler is to find one of the variants of the original source code (which could be in one of the multiple levels possible) that is semantically equivalent to the machine code, and in turn the source code.
[+] Forward and Reverse Engineering
* Decompilation is a form of reverse engineering.
[+] Applications
* Browsing parts of a program, providing the foundation for an automated tool, ability to generate compilable output.
- Decompiler as a browser
* Not all of the output code of a decompiler needs to be read/generated. It should act more as a code "browser".
* Core principles: Interoperability, Learning algorithms, Code checking.
- Automated tools
* Finding bugs, vulnerabilities, malware, program verification, comparison
- Recompilable output
* Optimize for platform, cross compilation
[+] State of the art
[+] Reverse Engineering Tool Problems
- Seperating code from data
* Facilitated by data flow guided recursive traversal and the analysis of indirect jumps and calls (discover all possible paths in code).
* Part of the problem is identifying function boundaries of procedures.
- Seperating pointers from constants
* Solved using type analysis
- Seperating original from offset pointers
* Requires range analysis (example used is that of negative indices)
- Tools Comparison
* Problems faced increases as the abstraction distance from source code to input code increases
- Theoretical limits and approximation
* Compilers and decompilers face theoretical limits which can be avoided with conservative approximation.
* A compiler can always avoiud the worst outcome of its theoretical limitations (incorrect output) but choosing conservative behavior).
* Sound result >> Precise result (in case of decompilation)
* Issue a comprehensive list of warnings about decisions which might later prove to be incorrect
[+] Goals
* Correctly and not excessively propagate expressions
* Correctly identifying parameters and returns
* Inferring types
* Correctly identifying indirect jumps and calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment