atacratic/unison-codebase-editor.md

## unison-codebase-editor.md

      
    Raw
  

              unison-codebase-editor.md
            
          
    Based on Paul and Arya's design here, for a codebase editor for Unison.
Background

Unison functions are content addressed - a function's identity (Id) depends only on its content.
The 'content' here excludes cosmetic stuff that doesn't affect how the function runs, like names, comments and layout.
When one function has a call to another, it refers to it by Id¹.
When I want to talk about a function, I refer to it using a Name.  My name for the function might be different from yours though!
Unison functions are immutable (once they exist).  The bindings between Names and functions are mutable.
Here's an example showing a function with name f, which has a call to a function with name g:

Aim of the game

We want to explain how to 'change the code', in a world where functions are immutable!  And we'd like our answer to allow concurrent editing by different people.
Names vs Ids


Unison function Ids are like stars in the sky².  You can discover new ones, but the ones that are there don't change.
Names are what you write on your map of the sky.  The stars themselves don't know or care :)
At runtime, all we have are Ids (unless we're using the Codebase ability).  Names are only used at edit time, and when a person is selecting what function they want to run.
Ids represent implementation; Names represent intention - your name for a function represents what you think it's good for, and what it means to you.
Branches and transactions

A branch is a Map Name Id.  It's a map of the sky.  It's the thing you're actually changing, when you 'change the code'.
The process of making a change is managed using a Transaction, a piece of state which evolves as the user goes through the editing process.  A Transaction is where work-in-progress lives until it the user decides it is a coherent change to the code.  (It must also be free of conflicts at this point.  Conflicts are competing candidate changes, possibly made by different people, which leave Unison unsure which Id some Name should map to.)
Editing process

Say we have a branch B and a function which that branch calls f.  It might feel like you're changing f, when you open it for editing³, but actually Unison goes through the following process (a picture is below).

Looking up the Id which the Name f refers to in branch B.  Let's say the Id is the hash 2nwu84.
Fetching the function definition associated with that hash in the code store, and exposing it to the user so they can play with it - maybe by pretty-printing it to a file and opening that file in an editor.
Once the user has come back with their new function definition, slurping that definition into the code store and working out its new Id, say the hash s9b22k.
Creating a Transaction t, which contains the following proposed changes.

"Instead of binding the name f to the hash 2nwu84, bind it to s9b22k."
"Consider s9b22k a candidate replacement for 2nwu84.  ("Add s9b22k to the 'edited set' for 2nwu84.)


Work through the possible consequences of this candidate replacement as follows.

If other functions in the code store reference hash 2nwu84, ask the user whether they want to repeat this process on those functions.  This includes deciding whether Names that refer to those functions should be rebound, to point to the new versions.
A typical default action would be to propagate out the change, so that f's dependent functions (callers of f, for example g in the diagram) are swapped out of whatever name bindings they're in, and replaced with functions that call the new f.  And then continue propagating out the resulting changes transitively.


Once the user has decided how far they want to propagate the change, we have a Transaction which proposes a bunch of name rebindings.  We can treat it as a function t : Branch -> Branch, and commit the transaction by replacing our working Branch B with the new branch t B.  Note that the transaction also contained a bunch of accumulated information about candidate replacements (edited sets) - but we don't need that any more, and it doesn't directly affect the branch.  It was just working data to allow Unison to systematically propose propagation steps to the user during the editing process.


Concurrent editing

During the process, Unison is guided by the 'edited sets', i.e. by a Map Id (Set Id).  This map explains how each Id has possibly multiple candidate replacements.  This can happen if multiple users are editing a branch at once, as follows.

Let's say a function named f refers to functions named a and b, and Alice edits a while Bob edits b.  Alice declares her new a to be the 'canonical' version, and likewise Bob with b.
Alice will likely decide to add a version of f which uses the new a but the old b; and Bob will add a version which uses the old a but the new b.  They each declare their new candidate f canonical.
The transaction will then contain two new candidate bindings for f, and further they both claim to be canonical - this is a conflict.
Either Alice or Bob will need to work out how to fix the conflict - probably starting by adding a third function referring to the new a and new b, and a third candidate binding for f.
They then resolve the conflict in the transaction, by declaring the third f to be the only canonical one of the three in the edited set.
Either of them can now commit the transaction, updating the shared branch⁴ with the new bindings for f, a, and b.

This raises some interesting questions...

When is it decided that a transaction is actually committed?  It surely can't require a synchronization point between all the possible concurrent editors.
Does it ever make sense to work on two logically distinct changes in the same transaction?  Might it yield a better concurrent editing experience, by surfacing conflicts earlier?  Or can the system keep watch on the merge of a number of open transactions, in order to spot conflicts?
A transaction can yield a branch whenever it has no conflicts (although that branch might not yield a useful and consistent set of name mappings!)  Might it be useful to run tests over that branch before the transaction is committed?  Are the test functions referring to the functions under test in the normal way (by Id), or indirecting via Name (mediated maybe by the Codebase ability)?  Maybe they're more useful for assessing the health of work-in-progress if they do the latter?

Branches as namespaces

Unison does name resolution at edit-time - that is, when you're adding a reference to function g in your code of function f, the name g is resolved to a hash, and that hash is actually what gets used by f.  An important consequence of that is what happens to the business of 'imports' and 'namespaces'.  In other programming languages, there is normally a list of import statements at the top of each file, and those are treated as part of the code.  In Unison, the set of accessible names is defined by the branch, which is separate from the code.  This means different people can view a given function through their own personal 'goggles' - my view of the function uses the names I'm most familiar and comfortable with, which might be different to yours.
So we'll want to manipulate branches in similar ways to how we used imports before.  If I start working with Widgets, I might want to add a body of names relating to them to my branch, by injecting it at some prefix:
  Foo
  bar
  List.head
  Widget.start  -- injected
  Widget.stop   -- "

Or maybe by merging sets of names without a prefix, and resolving conflicts according to some hiding/shadowing rule.
Another interesting question: when I start working with someone else's function f, how do I learn their names for functions that it calls, which I don't have my own names for?  Do they need to send me a whole branch, for me to work with their code?  Maybe Unison needs to let them prune their branch down to f and its transitive dependencies before sending it to me?  Does Unison automatically inject the new names into my branch in some way?
Footnotes

1: Or maybe I mean Reference?  This is a bit strange: 'Id is the basic type used to refer to things in a Unison codebase'.  Why isn't Reference the name for the type used to refer to things?
2: I quite like this stars metaphor, but it breaks down slightly if you thought Unison was something you might run 'in the cloud' :-)
3: The word 'edit' rather encourages the idea that we are actually changing a function, even though it's a central concept that this never happens.  Is this a barrier to learning the concept?  How about 'adapt' instead? 'You are now creating a function by adapting an existing one.'
4: Is a Branch actually a Name referring to an Id of type Map Name Id?  If so I guess that map probably doesn't include the name of the branch itself.  It would seem a shame to keep sprouting new name/value association schemes, so might be a good idea to just have one and reuse it.