Skip to content

Instantly share code, notes, and snippets.

@rvagg
Created April 24, 2023 23:45
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rvagg/8b5ad828d0adcd958cc0947d18aef35f to your computer and use it in GitHub Desktop.
Save rvagg/8b5ad828d0adcd958cc0947d18aef35f to your computer and use it in GitHub Desktop.

Lassie & IPLD

https://github.com/filecoin-project/lassie

Lassie what?

Lassie is:

  • A universal retrieval client for IPFS & Filecoin.

  • An IPFS implementation that doesn't store or publish data

  • An IPLD native, and only deals in IPLD blocks as CAR format

  • Written in Go and offers

    • a command-line tool (lassie fetch)
    • a minimal IPFS gateway-like HTTP API
    • a Go library interface
  • A work in progress

Lassie why?

  • Started as a Filecoin retrieval tool, expanded protocol support to now support

    • Graphsync (Filecoin)
    • Bitswap (IPFS, Filecoin, others, including Elastic-IPFS)
    • Verified HTTP CAR (work in progress)
  • Back-end support for the Saturn CDN network to fetch IPFS & Filecoin content

  • Very lightweight content retrieval

    • No config files
    • No local IPFS node
    • No persistent storage
    • Very fast startup time
    • Just-enough functionality to get IPLD data

Lassie how?

  • Integrates with the IPFS & Filecoin network indexer service (cid.contact)

  • "Fetch" a CID queries the indexer and:

    • Finds Filecoin Storage Providers that have it
    • Queries the IPFS DHT to find nodes that have it
  • Begin Graphsync and/or Bitswap sessions to retrieve the data from the peer candidates (HTTP too, soon)

  • Collect a graph from the requested (root) CID depending on request:

    • Is there a path? Qmfoobar/path/to/thing
    • Fetch the entire DAG under the root / path?
    • Fetch just the single block under the root / path?
    • Fetch just the UnixFS "entity" under the root / path?
  • Return content in verifiable CAR format

  • Perfect parther with github.com/ipld/go-car:

    • lassie fetch -o - Qmfoobar/cats.mp4 | car extract - | ffplay -

DAG Forms

  • CID + optional Path:

    • Qmfoobar/path/to/thing
    • Start at CID, walk the path to the thing according to IPLD pathing rules
    • BUT default to UnixFS pathing semantics where possible
  • Single block fetch: just give me the block at the terminus of Qmfoobar/path/to/thing

  • Entire DAG fetch: give me the entire DAG under Qmfoobar/path/to/thing

  • UnixFS entity fetch: give me the UnixFS entity under Qmfoobar/path/to/thing

    • Is it a sharded file? Give me all the blocks for the file
    • Is it a sharded directory? Give me all the blocks for the directory but not the leaves

UnixFS by default

  • = 95% (??) of content stored on IPFS is UnixFS, so let's assume you're fetching UnixFS

  • Not DAG-PB || can't interpret as UnixFS? Default to plain IPLD semantics

  • Pathing is UnixFS: Qmfoobar/path/to/thing as UnixFS vs Qmfoobar/Links/3/Hash/Links/2/Hash/Links/0/Hash as IPLD

  • UnixFS entity fetch: sharding makes fetching complicated

    • Files are often bigger than ~safe IPLD block size so are sharded across many blocks
    • Directories with hundreds of entries are sharded using a HAMT to create a complex DAG
    • A user "fetching" one of these generally doesn't just want the first block, they want the whole thing
    • For sharded directories, the whole thing could be very large, so we can do a shallow fetch

How? ADLs to the rescue!

  • github.com/ipfs/go-unixfsnode implements UnixFS as an ADL

  • We can traverse with a combination of go-ipld-prime selectors and go-unixfsnode ADLs

  • Selectors: translate a path/to/thing path to a selector with go-unixfsnode that adds in the ADL: unixfs or unixfs-preload (for entity fetch), but with safe fall-back for non-UnixFS

Deterministic and Verifiable CAR

  • Using go-ipld-prime's traversal engine for deterministic DAG generation

  • go-unixfsnode provides deterministic UnixFS traversal

  • Verifiable

    • Consider https://ipfs.io/ipfs/Qmfoobar/path/to/thing - how can you verify the gateway gave you what you asked for? (You can't!)
    • Lassie's output CARs include the requested (root) CID and every block from that CID to the requested content
    • User trusts the original CID (presumably), they can verify the CID:Block match and the inclusion of each additional block in the requested path/DAG
    • i.e. root CID is the trust anchor, and the trust is transferable to the entire included DAG

Parallelism

  • Deterministic DAG traversal is difficult to properly parallelise

  • Graphsync has in-built parallelism, but it's single peer to single peer: both peers agree on the selector and "sync" blocks using the same traversal

  • Bitswap is multi-peer but has no graph awareness so is harder to parallelise:

    • You don't know what links a block contains until you have it
    • A deterministic DAG traversal wants blocks in a specific order
  • Pre-fetching to the rescue for Bitswap:

    • Run a double-pass of our selector traversal over each block as we load them
    • First pass is shallow and just queues all links it encounters for pre-fetching
    • Second pass follows links line a normal traversal
    • Pre-fetcher runs in parallel with the main traversal, optimistically loading blocks that will eventually be needed
  • Has some difficulties with traversal node & link "budgets"

  • Available for any traversal with go-ipld-prime, using master, see: https://pkg.go.dev/github.com/ipld/go-ipld-prime@master/traversal#Config (Config.Preloader)

Lassie & go-car

  • car verify to verify CAR format and content (just simple block check)

  • car inspect to provide a summary of the content of the CAR, w/ --full to also verify.

  • car extract to extract UnixFS content:

    • Can receive from stdin, lassie can send to stdout with -o -
    • Can send output to stdout if it's just a single file
  • More coming as both tools are developed in tandem

transport-ipfs-gateway-http

(WIP)

  • New graph transport: Verifiable CAR over HTTP

  • Indexer will know if a peer can provide the requested content via HTTP

  • Peers will provide CARs in the same format that Lassie provides them

  • Lassie will verify the CAR as it's passed on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment