Skip to content

Instantly share code, notes, and snippets.

@pvh
Last active June 24, 2020 23:05
Show Gist options
  • Save pvh/6dd6f66c17f89e2a686f5b3026b7784f to your computer and use it in GitHub Desktop.
Save pvh/6dd6f66c17f89e2a686f5b3026b7784f to your computer and use it in GitHub Desktop.
Cambria, Week 1

Cambria, Week 1

Welcome, one and all to the Cambria project. Cambria is an ongoing research project here at Ink & Switch exploring how we can operate on evolving data formats in a decentralized system. We hope this work will be exciting not just for folks working on decentralized systems but could also be applicable more broadly to other distributed systems.

What's the problem?

Cambria was motivated by some problems we kept seeing in our work. The most common problem was that someone would add an array field to a document. New documents would work fine (the array would be added at document creation time) but existing documents, or documents shared by older clients, would cause Javascript runtime errors when the code would try to call Array methods on the object.

This is a very common problem in all kinds of distributed systems, and in a centralised system is mostly managed by locking down how the system works. Database schemas prevent writing bad data and guarantee the shape of data that comes back. Network-friendly data structures like ProtoBuf or Avro provide strong constraints around how data types evolve, but at the cost of severely constraining how these types can be changed. Finally, JSON responses from APIs are essentially the wild west. Cloud software everywhere crashes when APIs change and many blog posts have been written about how to manage deprecation of old versions. (Stripe is a notable bright light here, with an inspiring system that supports change over time!)

Instead of trying to solve these problems by constraining what can be written, we're embracing the uncertainty of the world and attempting to improve how we read data. This is especially important in truly decentralised systems where you may be collaborating with other nodes running older, newer, or simply differing versions of the program you have locally.

We draw on a number of influences in our work here, but our main influence is the research on lenses from Benjamin Pierce's group at UPenn (in particular we found the Boomerang paper quite exciting). Other notable influences include elm decoders for demonstrating how to bridge untyped and strongly typed systems, and Stripe's API versioning strategy.

Sounds... vague. What are you actually doing?

We wanted to get straight to work, so after considering a few other options, we've adopted a fork of the PushPin project's codebase. This gave us the ability to jump straight into building the part of the system we're interested in (data structure versioning) without having to figure out other pieces like networking, storage, or rendering.

In brief, we're building a React app that runs on your system (Electron) and communicates directly with other copies installed on other users' computers (hyperswarm, hypercore.) There are CRDTs in there, of course.

Our application of choice is an issue tracker, which we have named Arthropod. We chose it because it is

  • easy to implement a simple, usable example,
  • it's inherently collaborative and offline friendly, and
  • we can dog-food it during the project to discover real-world problems.

Arthropod has already given us some great insights into managing real-world data migration challenges and Geoffrey has us far enough along to be usable as our issue tracker. Arthropod runs several versions of the actual issue tracking tool simultaneously in the same window so that we can be sure we're not breaking anything as we go.

Enough talk though, here's Geoffrey's demo!

In addition to Arthropod, we've also got a prototype of a migration library from Orion named Chitin which has helped us start figuring out problems like storage format and APIs.

What have you learned so far?

We've identfied several interesting data migrations. Aside from basic cases of adding and removing fields from version to version, we've spotted two interesting migrations we believe can be supported in a bidirectional way:

  • moving from a boolean "complete" field to a "status" enumeration (todo, in progress, done)
  • adding "archive" support in a new version and wanting to filter out "archived" items from showing in the older version at all

What's up next?

Last week we prototyped the problem. This week we prototype a solution. We hope to have a simple DSL that gives us insight into programmer-facing APIs and proves that some of these migrations can indeed be implemented as lenses.

One last thing. Cambria? Chitin? Arthropod? Are you paleontologists now?

The Cambrian period was when life began to develop skeletons. Prior to that, everything was just a big soup of cells. Arthropods were a particularly notable creature found in the fossil record from that period. Chitin is the organic material that forms exoskeletons.

That's it!

See you all next week. We'd love to hear from you about the project so feel free to send me a note.

EDIT: We're still at it, and Week Two is now posted for your enjoyment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment