Skip to content

Instantly share code, notes, and snippets.

@raulk
Last active June 14, 2021 23:23
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save raulk/c1368d94208c63a5a2b806c828c86d68 to your computer and use it in GitHub Desktop.
Save raulk/c1368d94208c63a5a2b806c828c86d68 to your computer and use it in GitHub Desktop.

CARv2 + DAG store implementation notes

Overview

The purpose of the CARv2 + DAG store endeavour is to eliminate overhead from the deal-making processes with the mission of unlocking scalability, performance, and resource frugality on both miner and client side within the Filecoin network.

The definition of done is the removal of the Badger blockstore from all client and miner codepaths involved in storage and retrieval deals, and the transition towards interfacing directly with low-level data repository files known as CARs (Content ARchives), on both the read and the write sides.

Note that Badger is generally a fitting database for write-intense and iterator-heavy workloads. It just doesn't withstand the pattern of usage we subject it to, past the 100s GBs data volume mark.

The following doc lays out concrete requirements and ideas to guide the implementation of the architecture introduced in CAR-native DAG store.

CAR-native DAG store requirements

We envision the DAG store to be a common interplanetary building block across IPFS and Filecoin. However, at this point we are focusing on solving Filecoin's needs, while building out the foundations of something reusable in IPFS.

The DAG store comprises three layers:

  1. Storage layer (manages shards)
  2. Index repository (manages indices)
  3. DAG access layer (manages queries)

Storage layer

  1. The DAG store holds shards of data. In Filecoin, shard = deal. Each shard is a repository of data capable of acting as a blockstore.
  2. Shards are identified by an opaque byte string: the shard key. In the case of Filecoin, the shard key is the PieceCID.
  3. Shards are CAR files, and can be located anywhere (e.g. local filesystem, NFS mount, Ceph/GlusterFS distributed filesystem, HTTP, etc.)
    • Local paths are currently prioritised over other forms. It is presumed that networked FS will be mounted to local paths.
    • The fetching of the CAR resource is done via a Getter abstraction that returns an io.ReadCloser.
  4. The DAG store is capable of operating with indexed (CARv2) and unindexed CARs (CARv1 and CARv2):
    • For unindexed forms, the DAG store calculates an index upon shard registration.
    • For indexed forms, the DAG store detaches the index and tracks it in its index repo.
  5. Shards are like data cartridges that come and go. A shard can be:
    1. Active: the system knows about this shard and is fully capable of serving data from it instantaneously, because the CAR is immediately accessible (e.g. it doesn't need to be fetched from a remote location, nor has it been inactivated), and an index exists (either embedded in the CAR or as an external artefact).
    2. Inactive: the system knows about this shard, but is not capable of serving data from it because the shard is still being initialized (fetched from its location, or its index calculated if it's a CARv1 or an indexed CARv2), or it is gone (e.g.the CAR has been deleted). However, if the shard is reactivated (the CAR is brought back in, such as via unsealing), data can be served immediately.
    3. Destroyed: the shard is no longer available permanently.
  6. A shard initially created with an indexed CARv2 (e.g. a new deal transfer) can be deactivated (e.g. such if the unsealed copy is deleted) and later reactivated with a CARv1 equivalent (unsealed from sealed sector), without index recreation.
    • This is possible because the original index is stored in the index repo and can be joined with the CARv1 copy.

Interaction with sealed sectors

  1. The DAG store does not manage unsealing. When a piece has to be unsealed, to serve a deal, the unsealer must reactivate the shard with the unsealed CARv1, which can be supplied as a slice (offset + length) of an unsealed sector. The DAG store will recover the index from the index repo.

Index repository

  1. Stores different types of indices, always associated with a shard.
    • full shard indices: protected, not writable externally, managed entirely by the storage layer. { CID: offset } mappings.
    • semantic indices: supplied externally (by markets/indexing process). CID manifests.
    • other indices in the future?
  2. Ultimately serves network indexer requests. These requests come in through the markets/indexing process, and arrive here.
  3. In the future, will manage the cross-shard top-level index, and thus will serve as a shard routing mechanism.

DAG access layer

  1. Responsible for serving DAGs or sub-DAGs from one or many shards. Initially, queries will require the shard key.
  2. In the future, once the cross-shard top-level index has been implemented, the DAG store will be capable for resolving the shard key for a given root CID.
  3. ...

Requirements for CARv2

  • Index needs to be detachable.
  • Index offsets need to be relative to the CARv1, and not absolute in the physical CARv2 file.
  • Index needs to be iterable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment