Skip to content

Instantly share code, notes, and snippets.

@wmealing
Last active December 28, 2015 02:09
Show Gist options
  • Save wmealing/7425655 to your computer and use it in GitHub Desktop.
Save wmealing/7425655 to your computer and use it in GitHub Desktop.
Complications.
=== Overview
Confirming the specified origin of a compiled binary is a complicated problem. Proving changes that either by malicious intent or by accidental occurrence is a complex undertaking when considering the the complexities of the questions involved. Due to these complexities this document assumes that the reader has a solid understanding of the compiling process and excellent understanding of underlying hardware components.
== Existing work done in this area
=== Proof carrying code
Proof carrying code (PCC) is a set of mechanisms that run on a host system to verify the properties of compiled code to a formal proof derived from the original source code. The host system verifies the validity of the proof by observing operations and comparing the conclusions to the formal proof to determine if a function is behaving within specifications. This in itself is useful to determine if the code behaves as per the original formal proof but not that policy/ proof hasn't been redefined or modified for the end resulting binary.
=== Problem Description
Software vendors provide binary files to end users, and in the case of open source they are obliged to provide the source code for inspection. This provides a level of assurance for clients running the software that there are no backdoor or malicious code that could be used to subvert the system for use other than intended.
To confirm if the source has been used to generate a binary, a set of rules must be established to confirm success. The simplest solution would be to accept that if the end product of the compile process (the binary) can be replicated this ensures that the
=== Practical acceptance
In this document we assume that the hardware is in a known-trusted state, that it will never incorrectly modify instructions or their results. While we do make that assumption in this paper this is a very plausible attack vector that hardware vendors have yet to address.
=== Reverse analysis
How it works:
Get compiled source, break out code with objdump, compare to known source code, ensure things are not doing any kind of access or function calls they should not be.
* requires knowledge of C,
* target hardware assembly a
* calling conventions of the each compiler in use.
This requires looking at every line of compiled assembly and determining how the instructions relate to
=== forward analysis
Using a provided binary, rebuild it and compare the produced binary with the source.
* Requires intimate knowledge of build environment
* Requires knowledge of assembly.
* Requires understanding compiler
Smaller space to analyse, should be easier to spot changes. Build environment information needs to be shared prior to attempting this or its going to be a big waste of time. Build environment needs to be "Known safe" and does not contain
=== deterministic output of compile process.
This makes it possible, in theory, to build binary packages from source packages that are bit for bit identical to the published binary packages.
=== Influencing factors:
This section discusses why it is important
==== Compiler used
The compiler is the key component which turns source code into machine code. The machine code is a set of instructions that are available for the CPU to execute directly. There are different methods of translating source into individual instructions that a system can use depending on the decisions made
when writing the compiler.
To reduce errors, programmers separate repeatedly used instructions into a function to promote reuse. There are different methods that are used to prepare the CPU for a function to be used, depending on the compilers design they can dramatically change the algorithms underlying structure and the registers used ( See http://en.wikipedia.org/wiki/X86_calling_conventions for examples)
==== Compiler version
Compilers are constantly being improved to take advantage of advances in hardware. Newer compilers can generate machine code which may work sub-optimally on older platforms, has a significant performance gain on modern hardware.
Newer revisions of compilers may also take advantage of additional instructions provided on current generation hardware. These new instructions would fail when running on hardware which did not support the new instructions.
==== Compiler flags
==== Compile-time includes.
Files that include build strings.
==== Target hardware
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment