atacratic/unison-codebase-editor-design-comments.md

## unison-codebase-editor-design-comments.md

      
    Raw
  

              unison-codebase-editor-design-comments.md
            
          
    Comments on https://github.com/unisonweb/unison/blob/8ba5359c8638d5044cd224ef560fc329a1aa0e77/docs/codebase-editor-design.markdown
Core model


Consider the example of Alice moving the name f from hw3a9 to bww8a, and Bob moving the name f from hw3a9 to 28sja.  Am I right that this won't conflict, in this scheme?  The two removals agree on hw3a9, and the two additions of the same name on different definitions won't conflict because the MetadataEdits are scoped to a single definition.  Doesn't seem right?


data Codebase = Codebase { code : Set Code, metadata : Map Code (Set Metadata) }  You explained that this is for different people to have different metadata for the definition (I think that could do with clarifying in the write-up).  I can't quite see how someone would select the Metadata they wanted from the set.  How do they know which Set Metadata is theirs / the one they're interested in?

I'm wondering what happens when two independently discovered versions of a function end up meeting in a codebase somewhere.  Might be useful to cover that in the write-up (and mention for example the fact they will have different docs and how the UX is for that.)
As well as Codebase giving us a Set Metadata for each Code, Metadata gives us a Set Name.  Why multiple names per metadata?  Is it because a single user can also have multiple names for the same Code?  Makes sense, but again could do with clarifying in the doc.


Re links    : Map Name Code - you explained that the Name here is something like author- not something like Acme.rad.  Are those two things really coming from the same namespace?  Maybe names like author should be of type LinkName or something.


When there are multiple names for the Code that a Ref refers to, how do we know which to pretty-print to the user?  (I keep coming back to my example of isomorphic data structures with different meanings - would love to get your take on that since I keep bringing it up!)


Why is it that multiple Code can have the same Name in the codebase?  How can it occur?  Is it just when we're reconciling changes from two people, and if so shouldn't the objective be to fix this as a conflict?  What good is a name if you can't tell what it refers to?


Changesets


linkAdds/linkRemoves - you're not trying to allow any merging on metadata.  Wouldn't it be handy if authors could be sets and merged with union?  And what about merging docs?  Or are you seeing that as something that can happen out-of-band with standard merge tools?


Conflict (Set Edit) - this won't let you recover things like the name of the person who proposed each edit.  That info would be important for a good UX for manual conflict resolution.


"A Changeset is complete ... only type-preserving edits."  It's complete from the point of view of leaving you with a well-typed codebase.  But it may well not be complete from the user's point of view, because there may be semantic impacts of the change, not captured in the types, which impact other definitions in the codebase.  E.g. assumes a different format in a disk file that the functions read; so requiring that format change to be pushed out to other definitions that touch that file.  I think this will be important from the perspective of the language we use towards the user, to (a) not confuse them or give them unrealistic expectations (b) them not to get the impression that we're unrealistic about what types can do.   So, need some phrase other than "changeset is complete"?  "A changeset is committable if..." ?


use and imports and branches


I'm a bit suspicious of the Carol and Acme prefixes in the name examples.  How many axes of information are we going to pack into these names?  Seems like we have 2 - functional hierarchy, and library provenance/ownership.  Having this info packed into dot-separated strings and enforced by convention doesn't feel entirely Unison-y to me.


I'm a bit suspicious of how when we grab code from other codebases, we dump the names into our namespace (maybe with some prefixing).  Suppose we import 10 names to Acme. (Acme.foo, Acme.bar etc).  Feels like we're losing some structure when we do that, because we've lost the semantic action we performed (a single grab of some code) and replaced it with 10 things.  OK maybe there could be some metadata on those 10 things that help us remember their source, but the higher level operation has basically been forgotten.  (So presumably is difficult to undo; to reify and inspect in a UI; etc).

Compare with current barbaric practices: I assemble my app from my own code plus various libraries.  I have files which list the URLs and versions of those libraries.  When I want to change or remove one, I can do it by changing one line.  How does it work in Unison?


How does a library author decide what the 'public' names to export from their library are?  I think it will still be useful to show library users API functions more prominently than library-internal implementation-detail functions.


Imagine a world in which there are unison codebases all over github, people have synced their working codebases to different subsets of those.  If different codebases use different prefixes for their various imports of each others' libraries, then are we going to end up with the same code existing in each codebase under more and more different names?  Your utils.leftpad is my Acme.import.3rdparty.leftpad, and when you and I share dependents of that code I end up with both names; and it keeps happening with new names coming from new sources.  And no way to prune or rationalise them.  What does that do to my coding UX (thinking about the type-aware magic autocomplete in my semantic editor...)  Maybe it's OK because I set up my imports to present a pruned view of the mess of names in my codebase.  But that doesn't get rid of the confusion, it just moves it into import selection.

I'm basically trying to stress test the idea that there is one canonical namespace in which all the Names in the codebase live.  I have a suspicion that all namespace mechanisms end up supporting virtualisation (linux namespaces, cabal sandboxes, VPNs...), and that it's morally right that they do, because the namespace is one of the objects in the user's domain model, rather than just a substrate for it.
So the alternative I'm trying to think through, is that as codebases grab stuff from each other, merge with each other and whatnot, they keep a structured description of the namespaces they've acquired and how those have been combined.  This would then be the same mechanism as use/import, so the user gets the same mental tools to think about which code they can see, as they do to think about which code they give/receive/have.


It's the same idea I was grappling with when I pasted in a precursor of this proposal:
data Namespace = Empty
               | Bind Namespace (Map Name Code) Source_URL   -- overwrite some bindings
               | Inject Namespace Namespace Prefix           -- inject a branch at a prefix within another
               | Hide Namespace (Set Specifier)              -- hide some names
               | Restrict Namespace (Set Specifier)          -- hide all except some names
               | Rename Namespace (Map Name Name)            -- rename some names
               | RenamePrefix Namespace (Map Prefix Prefix)  -- rename some prefixes

data Specifier = NameSpec Name | PrefixSpec Prefix

namespace : Namespace -> Map Name Code -- fold the Namespace into an actual resultant name map

-- The Namespace type is used to represent:
-- * the set of names in the codebase
-- * the user's favourite use/import config for editing/viewing code
-- * the public API exposed by a library
-- * a set of names chosen to be shared with someone else (along with the associated Code)

-- The 'Source_URL' in Bind is just a sketch of how this type would track the origin of the 
-- imported names.  This allows the user to understand what's in the codebase, in a more
-- directly meaningful way than just sampling the Names and authors, and inferring what the prefixes are
-- meant to represent.
Details


Re "Monoid Changeset" - what if c1 adds a definition, but c2 deletes it?  Is there meant to be a constraint that added and deprecated are disjoint?  Should this manifest as a conflict, or should the addition win?
I think Changeset.apply is out of date - it's missing a Just, and I can't see what substitutions and nameOf relate to.
"The rename happens instantly and is 100% accurate."  Does it also do the rename across the docs?
"She ignores the changeset, and perhaps decides to just rename the function to Carol.Imported.Acme.rad, effectively "forking" this single function, which she will maintain going forward."  This makes it sound like the act of renaming the function forks it somehow, preventing future changes to the definition from propagating to Carol.  But I don't think that's what you meant.

Cosmetic


I think the 'dependency hell' bullet would ideally have a bit of extra explanation to make it more convincing.
s/names/metadata/ in Monoid Changeset