Skip to content

Instantly share code, notes, and snippets.

@ilebedev
Created March 20, 2017 16:08
Show Gist options
  • Save ilebedev/4f42d8358bc9b884947056fa9bc2f83d to your computer and use it in GitHub Desktop.
Save ilebedev/4f42d8358bc9b884947056fa9bc2f83d to your computer and use it in GitHub Desktop.
A quick explainer of merkel trees
in a little more detail: The enclave loads something, and expects an honest reply. a
Assume the enclave has guarantees of privacy and integrity, and is not denied service.
Assume the OS is dishonest, but will respond to the Enclave's requests (but can lie).
The *OS* stores an array of `N` data blocks `X = {x_i for i in Z_N} = x_0, x_1, ... x_{N-1}`.
Over this array, the *OS* has computed a merkel tree `T` with N leaves, where the leaves are hashes of the elements of `X`.
Let `T_0` be the merkel *root*. We will define it shortly.
Let `(T_{leaf}_i)` be a merkel *leaf* corresponding to `x_i`. `T_{leaf}_i) = H(x_i)`, where `H` is a cryptographic hash function, like SHA3.
By their definition, cryptographic hash functions are one-way, meaning that given some value `k`, a computational adversary is unable to select a value `j` such that `H(j) = k` (has no better way to do this than to try all possible `j`, which is impractical). The forward computation, `H(j)` is cheap. This is why hashes are also called "trap door functions": forward is easy, the other way is essentially impossible.
Let's pack the merkel tree `T` into an array so we can precisely reason about its elements.
Define some integer functions `left(i) := 2i+1`, and `right(i) := 2i+2`.
Create an array `A` to store `T`. Store `T_0` in `A[0]`.
For any element `T_i` of the tree, store the left child of `T_i` in `A[left(i)]`, and the right child in `A[right(i)]`.
(This is a neat way of packing binary trees into an array - try it with some paper! I think 009 will make you do this for some lab).
Recall that we have defined all `(T_{leaf}_i)` to be hashes of the corresponding elements of `X`.
Now let's fill in the definitions of all other nodes of the tree, including the root.
Define for all `i` where `T_i` is *not* a leaf:
`T_i = H( T_{left}_i, T_{left}_i )
Equivalently,
A[i] = H( A[2i+1]+A[2i+1] ), where A[i] is not a leaf.
You can actually compute the merkel tree this way, if you go through A in reverse order!
Okay! So the OS stores `T` and `X`.
The enclave must store something too. Assume that the enclave will *only perform loads* over `X`, and wishes to assertain that the OS does not lie about `X`. To do this, the enclave must store `T_0`, the merkel root.
The enclave requests a block at index `i` (`x_i`).
The OS responds with the requested data `x_i`, as well as a proof that this data is consistent with the merkel root stored by the enclave (inputs to a hash computation that, one of them being x_i, together produce T_0):
`x_i` is sufficient for the enclave to compute `T_{leaf}_i = H(i)`.
Let `j` denote the *parent* index of `T_{leaf}_i`.
To compute `T_j`, the enclave needs `T_{left}_j` and `T_{right}_j`.
One of these is `T_{leaf}_i`, and is already accounted for.
The other must be sent by the OS along with `x_i` as part of the proof (the hash of the sibling of `T_{leaf}_i`).
But we're not done yet. `T_j` is probably not a root, so the OS must also send the hash of the sibling of `T_j`. Rinse and repeat until the all hashes needed by the enclave to compute `T_0` from `x_i` and the OS-provided hashes are enumerated. These together are the proof that `x_i` is consistent with `T_0`.
Assume without loss of generality that `x_i` is `x_7` the last element of an 8-element array (the rightmost leaf in the tree):
`T_0 = H( T_1 + T_2 )
= H( T_1 + H( T_5 + T_6 ) )
= H( T_1 + H( T_5 + H( T_13 + T_14 ) ) )`
`T_14 is H( x_7 )`.
The OS must send `x_7` along with a proof of consistency: {T_1, T_5, T_13}.
With these, the enclave can compute H( T_1 + H( T_5 + H( T_13 + H(x_7) ) ) ), which *must* equal `T_0`, which the enclave stores.
Now suppose the OS wishes to lie about `x_i`, and returs some other data `y_i` instead.
To present a proof showing that `y_i` is consistent with `T_0`, the OS must find some input or inputs `~I` that hash to `T_0`:
`T_0 = (y_i + ~I)`. By the definition of a cryptographic hash, the OS has no better way of doing so than to try all possible `~I` - a computationally infeasible attack.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment