Skip to content

Instantly share code, notes, and snippets.

@rvagg
Created September 10, 2020 05:11
Show Gist options
  • Save rvagg/dbf445a494d5eed98093aebef4a40f1b to your computer and use it in GitHub Desktop.
Save rvagg/dbf445a494d5eed98093aebef4a40f1b to your computer and use it in GitHub Desktop.
High-level JS abstractions for navigating IPLD data (thought-bubble)

Continuing on from multiformats/js-multiformats#35 (comment), because I don't want to derail that but I'd like to seed more thinking about how we can move toward better high-level IPLD abstractions.

There are currently a few different visions about this topic, but mostly I think it's pretty grey because we need to experiment with it more to figure out what actually makes sense. I think some of our current core differences are around what to do with the block boundary. I'd like us to try and erase the block boundary more at the user-facing end (not entirely, it's an abstraction that has to leak to some degree because there are costs to pretending it doesn't exist). go-ipld-prime is pushing forward to a model that I think we can somewhat mirror in JavaScript, with some major differences - primarily in that we have the async boundary to deal with, and we can use plain JavaScript objects to represent complex things (they have to either pre-define their object shapes or use horrible and impractical maps of interface{} which is just gross).

But at the top level, one could imagine an abstraction that just exposes traversal methods, and you traverse by path, or key, or index, or maybe we get to roll those things all up into one (paths kind of do this, but indexes are a bit awkward). A traversal may lead you to a strict block boundary and you may end up returning an entire block as a plain JavaScript object (or whatever type it needs to be, maybe string or number), or it might be a subset of a block, maybe just a single node (value) within a block, or maybe a plain JavaScript object that spans multiple blocks but gets returned as a single, large object (there would have to be limitations to this power because it could easily be too costly - in terms of time to load and memory to instantiate).

import iq = from '@ipld/iq'

const cid = ...

iq = iq.load(cid)
iq = iq.traverse('path/to/some/node')
const value = await iq.asObject()

// you'd probably end up doing things like this:
const number = await iq.load(cid).traverse('foo/bar').asNumber()

// or maybe you want to consume a large number of bytes and `cid` gave you the start of a long
// multi-block byte stream (using Flexible Byte Layout we've got specified in ipld/specs)
const stream = await iq.asByteIterator()
for await (const chunk of stream) {
  // ...
}

^ that's all thought-bubble material, there's a bunch of things to overcome before we arrive at a concrete API. You'd probably want to have a "loader" in there somewhere that can provide blocks. The async boundary needs careful attention and making a nice API that combines async and chaining is going to get messy, but it's not impossible. But this is all to get at the ultimate dream so we can talk about how we might support that.

To support ADLs, we'd want to be able to plug in to the middle of this mechanism something that can take over the traversal process. You need to insert logic somewhere so that traverse('key') will use an ADL to figure out that it needs to traverse down a HAMT instead of just looking at the raw data model. (ADLs are going to need a similar loading mechanism as codecs, if it's not DI then some kind of explicit configuration will be required).

But at the other end, there are codecs where a common traversal is going to involve instantiation of only a sub-part of a block (DAG-PB can be like this) or instantiation of a graph of blocks as a single object (bitcoin is like this). And instantiation of a plain JavaScript object as a whole block is not necessarily a step toward that. Our current model says that Block = JavaScript object. But maybe a traversal can be smarter and more efficient than that? What if we have a use-case where we have very large blocks and usually only reach down to a small part of the block? If we have vertical integration with a codec, we might be able to partial-decode and only instantiate at the asObject() end of the chain. Maybe DAG-CBOR decodes into the raw CBOR tokens only and keeps that state in memory without forming a full JavaScript object to represent the final form, and it uses it those tokens to support traversal and then only instantiates a full object at the point that you call asThing(), which may just be a small section of the block.


So the point of all of that is to say that I'd like us to be careful not to push ourselves into a corner where our models just don't support vertically integrated abstractions where the highest level blurs the block boundaries and makes "nodes" the primary way of addressing IPLD data (where a "node" could be all, or a part of a block, or more controversially a graph of blocks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment