Skip to content

Instantly share code, notes, and snippets.

@RangerMauve
Last active April 6, 2024 13:33
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save RangerMauve/ae271204054b62d9a649d70b7d218191 to your computer and use it in GitHub Desktop.
Save RangerMauve/ae271204054b62d9a649d70b7d218191 to your computer and use it in GitHub Desktop.
Hyperbee Indexed DB

Hyperbee Indexed DB

Purpose:

Hyperbee is super useful for storing data and searching for data using it's ordered key search. When you set up your keyspace just right, you can query a small subset of a large database and quickly load what you need without having to download the entire dataset or traverse it.

This module seeks to make it easier to set up these indexes for JSON data and to make it easy to query large datasets.

How it works

Hyperbee comes with the ability to use sub hyperbees which are like namespaces.

We use this to create the concept of collections and indexes

You can create a new collection of documents for your database.

Each collection has a name which will be used for it's namespace.

Within the collection there will be a second sub-namespace for the indexes on documents.

Whenever you insert a document into the collection, the indexer will check if it contains fields needed to be indexed, and add entries to the index namespace.

E.g. Given we have documents called people that look like {_id: 0, name: "alice"} {_id: 1, name: "bob"}

If we have an index on the name field, the DB will look something like this:

/people/doc/0 => {_id: 0, name: "alice"}
/people/doc/1 => {_id: 1, name: "bob"}
/people/idxs => [["name"]]
/people/idx/name/alice/0 => 0
/people/idx/name/bob/1 => 1

If you want to find all documents that have the name alice, you can open a search in the /people/idx/name/alice keyspace, and iterate through each one to get back the IDs.

What if in addition to name we also had an age field and our data looked like this: {_id: 0, name: "eve", age: 4} {_id: 1, name: "eve", age: 20}

We could set up an index that could let us quickly ask "Find all people named eve, who are older than 5".

Here's what the data looks like:

/people/doc/0 => {_id: 0, name: "alice"}
/people/doc/1 => {_id: 1, name: "bob"}
/people/doc/2 => {_id: 2, name: "eve", age: 4}
/people/doc/3 => {_id: 3, name: "eve", age: 20}
/people/idxs => [["name"], ["name", "age"]]
/people/idx/name/alice/0 => 0
/people/idx/name/bob/1 => 1
/people/idx/name/eve/2 => 2
/people/idx/name/eve/3 => 3
/people/idx/name-age/eve/4/2 => 2
/people/idx/name-age/eve/20/3 => 3

Note that the / character is representing the byte 0x00 which is used for sorting in B+ trees like Hyperbee or LevelDB.

We could seach on key starting with /people/idx/name-age/eve/5 which will avoid downloading the first eve and only load the ones that have a higher age field.

Note that if your index was instead age-name, your query would perform differently since it would iterate through all people older than 5 and then find just the ones who's name is eve. This goes to show that you should put your indexes together intelligently. A good rule of thumb is that values which you can specify exactly should come first, or values which can eliminate a large swath of data. e.g. if you expect yourself to search for posts that replyTo a specific ID, have that be your highest priority index, and then maybe index by timestamp

Although the actual data might look like JSON, we should support the BSON module in order to store data more efficiently and to work around bugs in storing integrers in the key ranges.

Another thing to note is that when you create indexes you're trading off the initial load times for more storage on the writer and full backups. In peer to peer systems this is likely a good tradeoff since individuals won't be creating huge amounts of data and peers will be saving more bandwidth by only needing a subset of the available data.

Dream API

Modelled after the MongoDB API

const db = new HyperbeeIndexedDB(hyperbee)

// Create an index for a given set of fields
// Optionally specify whether the index should be rebuilt even if it's not new
await db.people.createIndex({name: 1}, {rebuild: false})

await db.people.insert({name: 'chuck', age: 99})

// Get an array
const results = await db.people.find({name: 'chuck'})

// Sort/Limit/Skip
const results = await db.people
  .find({name: 'chuck'})
  .skip(4)
  .limit(32)
  .sort({age: -1})

// Get a single doc
const doc = await db.people.findOne({name: 'chuck'})

// Update the specified fields
// Use `insert` if you want to replace the doc completely
await db.people.update({_id: doc._id, age: 60})

// Iterate through the data
for await (const doc of await db.people.find({age: {$gt: 69}}) {
  console.log(doc)
}

Supported queries:

  • You can only query against indexed fields by default
  • limit/skip
  • sort Sort must be in order of indexed fields.
  • $eq or : - Match fields that are eqactly equal to this value
  • $gt, $gte, $lt, $lte - Match fields that are greater than or equal to a value.
  • Matching fields within arrays. (cannot be arrays of objects, only strings/numbers/Dates)
  • $size for array fields?

Not supported (yet):

  • sub.fields.like.this
  • Arrays of objects
  • Regex and string fanciness
  • $and and $or (hard to make indexes for)
  • Sort by several fields

Using Indexes with queries.

  • If there is an $eq field have it be highest priority
  • If there is a sort field, have it be second highest priority
  • Other fields should come after these two
  • Find the index that most closely matches the preferred ordering and use it
  • If there are $eq fields at the beginning of the index, use them as a prefix
  • If there are $gt/$lt fields at the beginning (after whatever eq fields are filled), add them to the prefix (depending on direction)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment