nothingmuch/code diving notes.org

## code diving notes.org

      
    Raw
  

              code diving notes.org
            
          
    Notes on foundational modules of Dat code:

Hypercore

Feeds

totally ordered sets of blocks encoded as merkle trees

static: pure merkle trees, created by merkle-tree-stream

node types are leaf, node & root

flat-tree numbers all nodes in tree for given tree size

tree root = hash of all roots (unless $|blocks| log 2 ∈ \mathbb{N}$)

stream is not written in flat-tree order, but a breadth first upwards walk of flat tree (0, 2, 1, 4, 6, 5, 3…)

live: a ed25519 keypair that signs static roots

hypercore feed implementation

all feeds are polymorphic under reading (static, live, and live with secretKey)

unavailable blocks can be fetched implicitly or opt in for read error

static & live feeds are polymorphic for appending (static as purely functional data structure, live via write cap (secretKey))

tree-index & bitfield store block<->hash correspondence, indexing merkle tree and tracking actual available blocks in core (feed may be locally sparse)

storage delegated to core._data

random per-feed namespace (_prefix) keyed by block number

feed blocks are not automatically deduplicated (within or between feeds)

does hypercore-archiver deduplicate based on data block hash?

is replication the only deduplicating operation?

core._nodes (= protobuf encoded merkle nodes) is trusted, core._data is not (can opt in to verify but doesn’t do a full merkle proof)

why are node indexes also saved in the value? aren’t they available from leveldb keys?

Core

a collection of feeds

block selection is mutable, and may be scheduled in and out

are unscheduled blocks evicted?

cores are unique, core.id = random

but this ID is not really in use in user facing ways

one db (= levelup-ish) per core contains many feeds

separated into sublevels

nodes - protobuf merkle tree nodes - index, size & hash

data - actual data blocks

feeds - keys etc

signatures - live feeds

bitfields - block availability

Hyperdrive

Archives

An archive is a hypercore feed

live or static

feed contains all data (= git-tree kinda, see hyperdrive-encoding)

data is written to the feed with rabin chunking (not yet on the browser)

Drive

collection of archives

implemented as a hypercore feed containing hypercore feeds

hyperdrive-ln vs. hyperdrive-named-archives?

Replication

swarms & relpicators

swarms allow replicators to establish p2p links

discovery-swarm, webrtc-swarm, etc

peers establish pairwise streams between them

replicators talk to each other over node streams

replication = idempotent block exchange (for appendable live hypercores - total order assumed)

Do swarm joins deduplicate {simple-,}peers?

can you even replicate multiple things on the same peer connection? (not data channel)