stoichammer/Transpose-Merkle-Tree.md

## Transpose-Merkle-Tree.md

      
    Raw
  

              Transpose-Merkle-Tree.md
            
          
    Introducing the Transpose Merkle Tree

A Transpose Merkle Tree (TMT) is a data structure wherein interim nodes are transposed such that the walk from the leaf node to the root node essentially contains the Merkle branch(proof). The leaf & root nodes remain in the same position/s as in a standard Merkle tree.
This post includes,

a highly efficient algorithm to compute the TMT with only O(3 log n) space complexity (primary memory).
a compact pre-ready data-structure, that can be persisted in a graph database(Neo4j), and can readily serve Merkle branch/paths trivially, traversal (by DB engine) of only O(log n)  nodes.

The algorithm needs to store only the node-id, left-child-id, right-child-id for each height of the Merkle tree, hence the 3 log n. And the recursive algorithm shown below simultaneously constructs the intermediate nodes & transposes them appropriately upto the final root node recursively, with this simple structure. The time complexity of TMT construction is comparable to the standard Merkle tree.
Haskell is a purely-functional** programming language, and embraces recursive functions. You can notice the purely functional data structures too.
data MerkleNode =
    MerkleNode
        { node :: Maybe Hash256
        , leftChild :: Maybe Hash256
        , rightChild :: Maybe Hash256
        }
    deriving (Show, Eq, Ord)

type HashCompute = (M.Map Int8 MerkleNode, [MerkleNode])

pushHash :: HashCompute -> Hash256 -> Maybe Hash256 -> Maybe Hash256 -> Int8 -> Int8 -> Bool -> HashCompute
pushHash (stateMap, res) nhash left right ht ind final =
    case node prev of
        Just pv ->
            pushHash
                ( (M.insert ind emptyMerkleNode stateMap)
                , (insertSpecial
                       (Just pv)
                       (left)
                       (right)
                       (insertSpecial (Just nhash) (leftChild prev) (rightChild prev) res)))
                (hashPair pv nhash)
                (Just pv)
                (Just nhash)
                ht
                (ind + 1)
                final
        Nothing ->
            if ht == ind
                then (updateState, (insertSpecial (Just nhash) left right res))
                else if final
                         then pushHash
                                  (updateState, (insertSpecial (Just nhash) left right res))
                                  (hashPair nhash nhash)
                                  (Just nhash)
                                  (Just nhash)
                                  ht
                                  (ind + 1)
                                  final
                         else (updateState, res)
  where
    insertSpecial sib lft rht lst = L.insert (MerkleNode sib lft rht) lst
    updateState = M.insert ind (MerkleNode (Just nhash) left right) stateMap
    prev =
        case M.lookupIndex (fromIntegral ind) stateMap of
            Just i -> snd $ M.elemAt i stateMap
            Nothing -> emptyMerkleNode
            
Although Raspberry-Pi’s are unsavoury in the BitcoinSV world, with this (& other) innovation/s its now theoretically possible to ingest (w/ partial validation) a  Tera-byte block via a low-memory Pi. The node leverages the combined strength of distributed & graph database for its secondary storage. 
More importantly, a low resource server can trivially serve hundreds of Merkle proofs  per minute. For retrieves, the asymptotic time complexity is O(log n) however, the real world performanceˆˆ happens to be much closer to constant time (simple Neo4j graph walk). This implies the Nexa node can dish out Merkle proofs for 1TB block just as easily as an 1MB block.
The complexity of the graph DB insertion is reduced, by certain techniques with a goal to reduce the number of nodes to match/associate.

Avoid using MERGE operation, and only use MATCH.
Push / pop nodes in sub-trees for batch commits to DB, prevents calling commit for each node.
Adhere to strictly sequential graph DB updation for ensuring no MATCH misses/deadlocks/retries. The subsequent stages of the block/tx indexing (on distributed db) can be highly parallel.
With streaming IO, the TMT is constructed in parallel while the block is still being received over the network. Very low bounds on memory usage.

The Neo4j/Cypher query to fetch the Merkle branch/proof would be,
MATCH p=shortestPath((start:mnode {v: "24f60d44c7d5fe14be1baced468f534bfa35234168281fadbb1f5a769b104a56"})-[rels:PARENT*]->(end:mnode {v: "53ea7690bcdc27617f49333f500af10d721d21584a8fb4f1100d569b0b8d4ed9"}))
RETURN [mnode in nodes(p)]

The below is a snapshot of the TMT at block-height 600,000.

Zoomed-in sub-tree

Full (almost) tree

** - https://wiki.haskell.org/Pure
^^ - https://lemire.me/blog/2013/07/11/big-o-notation-and-real-world-performance/