Created
March 20, 2017 16:08
-
-
Save ilebedev/4f42d8358bc9b884947056fa9bc2f83d to your computer and use it in GitHub Desktop.
A quick explainer of merkel trees
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
in a little more detail: The enclave loads something, and expects an honest reply. a | |
Assume the enclave has guarantees of privacy and integrity, and is not denied service. | |
Assume the OS is dishonest, but will respond to the Enclave's requests (but can lie). | |
The *OS* stores an array of `N` data blocks `X = {x_i for i in Z_N} = x_0, x_1, ... x_{N-1}`. | |
Over this array, the *OS* has computed a merkel tree `T` with N leaves, where the leaves are hashes of the elements of `X`. | |
Let `T_0` be the merkel *root*. We will define it shortly. | |
Let `(T_{leaf}_i)` be a merkel *leaf* corresponding to `x_i`. `T_{leaf}_i) = H(x_i)`, where `H` is a cryptographic hash function, like SHA3. | |
By their definition, cryptographic hash functions are one-way, meaning that given some value `k`, a computational adversary is unable to select a value `j` such that `H(j) = k` (has no better way to do this than to try all possible `j`, which is impractical). The forward computation, `H(j)` is cheap. This is why hashes are also called "trap door functions": forward is easy, the other way is essentially impossible. | |
Let's pack the merkel tree `T` into an array so we can precisely reason about its elements. | |
Define some integer functions `left(i) := 2i+1`, and `right(i) := 2i+2`. | |
Create an array `A` to store `T`. Store `T_0` in `A[0]`. | |
For any element `T_i` of the tree, store the left child of `T_i` in `A[left(i)]`, and the right child in `A[right(i)]`. | |
(This is a neat way of packing binary trees into an array - try it with some paper! I think 009 will make you do this for some lab). | |
Recall that we have defined all `(T_{leaf}_i)` to be hashes of the corresponding elements of `X`. | |
Now let's fill in the definitions of all other nodes of the tree, including the root. | |
Define for all `i` where `T_i` is *not* a leaf: | |
`T_i = H( T_{left}_i, T_{left}_i ) | |
Equivalently, | |
A[i] = H( A[2i+1]+A[2i+1] ), where A[i] is not a leaf. | |
You can actually compute the merkel tree this way, if you go through A in reverse order! | |
Okay! So the OS stores `T` and `X`. | |
The enclave must store something too. Assume that the enclave will *only perform loads* over `X`, and wishes to assertain that the OS does not lie about `X`. To do this, the enclave must store `T_0`, the merkel root. | |
The enclave requests a block at index `i` (`x_i`). | |
The OS responds with the requested data `x_i`, as well as a proof that this data is consistent with the merkel root stored by the enclave (inputs to a hash computation that, one of them being x_i, together produce T_0): | |
`x_i` is sufficient for the enclave to compute `T_{leaf}_i = H(i)`. | |
Let `j` denote the *parent* index of `T_{leaf}_i`. | |
To compute `T_j`, the enclave needs `T_{left}_j` and `T_{right}_j`. | |
One of these is `T_{leaf}_i`, and is already accounted for. | |
The other must be sent by the OS along with `x_i` as part of the proof (the hash of the sibling of `T_{leaf}_i`). | |
But we're not done yet. `T_j` is probably not a root, so the OS must also send the hash of the sibling of `T_j`. Rinse and repeat until the all hashes needed by the enclave to compute `T_0` from `x_i` and the OS-provided hashes are enumerated. These together are the proof that `x_i` is consistent with `T_0`. | |
Assume without loss of generality that `x_i` is `x_7` the last element of an 8-element array (the rightmost leaf in the tree): | |
`T_0 = H( T_1 + T_2 ) | |
= H( T_1 + H( T_5 + T_6 ) ) | |
= H( T_1 + H( T_5 + H( T_13 + T_14 ) ) )` | |
`T_14 is H( x_7 )`. | |
The OS must send `x_7` along with a proof of consistency: {T_1, T_5, T_13}. | |
With these, the enclave can compute H( T_1 + H( T_5 + H( T_13 + H(x_7) ) ) ), which *must* equal `T_0`, which the enclave stores. | |
Now suppose the OS wishes to lie about `x_i`, and returs some other data `y_i` instead. | |
To present a proof showing that `y_i` is consistent with `T_0`, the OS must find some input or inputs `~I` that hash to `T_0`: | |
`T_0 = (y_i + ~I)`. By the definition of a cryptographic hash, the OS has no better way of doing so than to try all possible `~I` - a computationally infeasible attack. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment