Skip to content

Instantly share code, notes, and snippets.

@pvh
Last active June 22, 2020 22:16
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pvh/eb6229451453a07ba84b5b01c4cb911f to your computer and use it in GitHub Desktop.
Save pvh/eb6229451453a07ba84b5b01c4cb911f to your computer and use it in GitHub Desktop.

Cambria, Week Two

A sequence of prototypes. Or, really, a drawing of a Cloudina

Last week we showed you a demo of a problem. We wanted to run two different versions of a program, both operating on different data types... but with the same underlying document.

This is a tricky problem! The old code doesn't natively understand data written by the new system. The new system relies on data the old one doesn't provide. Worst, even in our little toy example there is one field where the data type changes completely, going from a Boolean type to a string.

So, if last week was a prototype of a problem then this week is a prototype of a solution.

A prototype of a solution

Last week's demo showed data flowing back and forth between two systems. The implementation was a giant soup of spaghetti code under the hood. This week we wanted to attempt a principled approach to implementing that code automatically, and to also think a bit about developer experience.

What we've got is a small library that can output JSON Schemas (to validate data), Typescript types, and a big JSON file that describes all the rules to convert data between types.

The results is that when you're programming you have nice Typescript types to keep track of what you're doing, and at runtime the system can handle all the conversions for you.

Migrations? Sounds like database code.

The big difference between our system and a traditional migration is that all of the versions exist simultaneously in a super-imposed state in the underlying document. At read time we pull out the bits of the underlying document that the running program needs and ignore the rest. When we write changes we carefully map them down into the document again to ensure that old versions of the code will still work, even if their documents have been edited by newer code.

Here's how it looks to write one of those migrations. In this example we've decided that the status of an issue is no longer either complete or not, but can now be a string with several values, like "in progress".

import { runMigration } from '../chitin/migrationRunner'
import { convertField, valueMapping } from '../chitin/migrations'

runMigration((graph) => {
  graph.extendSchema('ProjectV3', 'ProjectV4', [
    convertField({
      from: { name: 'complete', type: 'boolean' },
      to: { name: 'status', type: 'string' },
      forwards: valueMapping({ false: 'todo', true: 'done' }),
      backwards: valueMapping({ todo: false, inProgress: false, done: true, default: true }),
    }),
  ])
})

You can see here that we're defining a new version of the Project data type (V4), and describing how to map one field (and its data) back and forth between the two versions. This creates a link between those types that the system can traverse.

What if you want to convert data between systems?

We've also thought about that! In the case of an incremental change in a data format the above example is convenient. You take the old format, make some changes, and that's the new format.

In our Arthropod test program we have a second, much simpler display for our Project data used in the title bar. That display just shows the title and the description, so we gave it a similar data format (HasTitle). In order to render the Project document there we need to define a conversion for that, too.

That looks like this:

runMigration((graph) => {
  graph.connectSchemas('ProjectV2', 'HasTitleV1', [
    renameField('name', 'title'),
    renameField('description', 'subtitle'),
    removeField('tasks'),
  ])
})

In this case we're not defining a new schema (ProjectV4), we're just saying how to connect two existing ones.

Wait a second, you're connecting an older version to the Title thing!

Astute of you to observe that, but it still works! Because we can map from HasTitleV1 to ProjectV2 and then on up from there through ProjectV3 and ProjectV4 the system finds a path and moves the data back and forth seamlessly. As long as there is a path, the system will find it! Even an old ProjectV1 file could follow this path.

What's next?

Well, our more principled implementation has better "bones", but it's still missing some important pieces. First, it doesn't handle nested data very well yet. If you want to rename a field and it's not connected to the root of the document? Too bad. Orion's working on that this week.

Next, our canonical representation has become increasingly difficult to reason about. We suspect that we can separate the conversion logic from the underlying storage layout, which will hopefully both simplify each piece of the system and also make them more independently useful. Geoffrey is working on exploring that.

Last, we've had a number of fascinating discussions about what the correct behavior should be for some of these cases. If you check and uncheck the complete Boolean, what should happen to the status string?

Along with concrete progress in improving the capabilities of the system we also want to document our expectations for how various cases will behave in a test suite. Peter's planning to focus on that to the extent that pandemic parenting allows.

No demo video this week. Maybe next week? We'll see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment