pvh/week-2.md

## week-2.md

      
    Raw
  

              week-2.md
            
          
    Cambria, Week Two


Last week we showed you a demo of a problem. We wanted to run two different versions of a program, both operating on different data types... but with the same underlying document.
This is a tricky problem! The old code doesn't natively understand data written by the new system. The new system relies on data the old one doesn't provide. Worst, even in our little toy example there is one field where the data type changes completely, going from a Boolean type to a string.
So, if last week was a prototype of a problem then this week is a prototype of a solution.
A prototype of a solution

Last week's demo showed data flowing back and forth between two systems. The implementation was a
giant soup of spaghetti code under the hood. This week we wanted to attempt a principled approach
to implementing that code automatically, and to also think a bit about developer experience.
What we've got is a small library that can output JSON Schemas (to validate data), Typescript types,
and a big JSON file that describes all the rules to convert data between types.
The results is that when you're programming you have nice Typescript types to keep track of what
you're doing, and at runtime the system can handle all the conversions for you.
Migrations? Sounds like database code.

The big difference between our system and a traditional migration is that all of the versions exist
simultaneously in a super-imposed state in the underlying document. At read time we pull out the bits
of the underlying document that the running program needs and ignore the rest. When we write changes
we carefully map them down into the document again to ensure that old versions of the code will still
work, even if their documents have been edited by newer code.
Here's how it looks to write one of those migrations. In this example we've decided that the status
of an issue is no longer either complete or not, but can now be a string with several values, like
"in progress".
import { runMigration } from '../chitin/migrationRunner'
import { convertField, valueMapping } from '../chitin/migrations'

runMigration((graph) => {
  graph.extendSchema('ProjectV3', 'ProjectV4', [
    convertField({
      from: { name: 'complete', type: 'boolean' },
      to: { name: 'status', type: 'string' },
      forwards: valueMapping({ false: 'todo', true: 'done' }),
      backwards: valueMapping({ todo: false, inProgress: false, done: true, default: true }),
    }),
  ])
})
You can see here that we're defining a new version of the Project data type (V4), and describing how
to map one field (and its data) back and forth between the two versions. This creates a link between
those types that the system can traverse.
What if you want to convert data between systems?

We've also thought about that! In the case of an incremental change in a data format the above example
is convenient. You take the old format, make some changes, and that's the new format.
In our Arthropod test program we have a second, much simpler display for our Project data used in
the title bar. That display just shows the title and the description, so we gave it a similar data format
(HasTitle). In order to render the Project document there we need to define a conversion for that, too.
That looks like this:
runMigration((graph) => {
  graph.connectSchemas('ProjectV2', 'HasTitleV1', [
    renameField('name', 'title'),
    renameField('description', 'subtitle'),
    removeField('tasks'),
  ])
})
In this case we're not defining a new schema (ProjectV4), we're just saying how to connect two existing
ones.
Wait a second, you're connecting an older version to the Title thing!

Astute of you to observe that, but it still works! Because we can map from HasTitleV1 to ProjectV2
and then on up from there through ProjectV3 and ProjectV4 the system finds a path and moves the data
back and forth seamlessly. As long as there is a path, the system will find it! Even an old ProjectV1
file could follow this path.
What's next?

Well, our more principled implementation has better "bones", but it's still missing some important pieces.
First, it doesn't handle nested data very well yet. If you want to rename a field and it's not connected
to the root of the document? Too bad. Orion's working on that this week.
Next, our canonical representation has become increasingly difficult to reason about. We suspect that
we can separate the conversion logic from the underlying storage layout, which will hopefully both
simplify each piece of the system and also make them more independently useful. Geoffrey is working
on exploring that.
Last, we've had a number of fascinating discussions about what the correct behavior should be for
some of these cases. If you check and uncheck the complete Boolean, what should happen to the
status string?
Along with concrete progress in improving the capabilities of the system we also want to document our
expectations for how various cases will behave in a test suite. Peter's planning to focus on that to
the extent that pandemic parenting allows.
No demo video this week. Maybe next week? We'll see.