atacratic/type-decl-thoughts-3.md

## type-decl-thoughts-3.md

      
    Raw
  

              type-decl-thoughts-3.md
            
          
    In reply to Arya here...
structural -> nominal


I very much don't want the declared order of the constructors to matter

Let's just talk for now about how we treat code that uses some specific type (rather than about when different types are the same).  I think it would be OK for the Unison implementation to remember and use the order (or name) of the constructors to distinguish them, without those things being observable to Unison code.  So for example with
structural Decision = Yes | No
...the implementation could use the hash of the original name, or the order of declaration, to record which constructor a function has invoked or which pattern it's matching against.  That doesn't itself mean that names or orders are significant in the actual Unison semantics - Unison code has no way of observing it, and the Unison user can't observe it, even by running experiments involving changing names or orders.
I say this because I'm wondering if wanting to avoid doing this was one of your motivations for nominal types.
Now, all of the above wasn't saying anything about when two types are deemed the same.
structural Decision = Yes | No
structural Color = Black | White
We need to work out whether to display Decision or Color when rendering code that uses one of those, so either we store that as metadata (1.B from my post) or we make sure that the two types have different identity.  And that's then the motivation for introducing nominal types.
Is that your motivation too?
Name erasure

This section is me just trying to sort things out in my head, rather than comment or suggest anything.
We're keen to avoid names being too relevant to how things work - and too much name-relevance is a criticism of the nominal types idea.  But what exactly do we mean?
Should we talk in terms of 'name erasure'?
We've been thinking of the Unison world as having two layers - the core semantics of how things work, and the names layered on top of that, pointing down at the core stuff, but not affecting what code evaluates to.
We could express this as follows: 'in any Unison program, the choices of names are irrelevant - you can erase the names without affecting how the code typechecks or evaluates.'
There's already an interesting qualification around TDNR.  My understanding of that is that the resolution happens before the definition is added to the codebase, so in the ABT that gets saved, the name is fully resolved.  As long as I've got that right, then name erasure still holds - TDNR is an edit-time thing, and in the actual fully elaborated code, names are irrelevant.
I'd like names for these layers but am struggling to think of some.  For now I'll go with:
  'name layer'   |
---------------- | both together = 'edit-time Unison'
 'erased Unison' |

Identity: intrinsic, GUID, registry

Another getting-things-clear train of thought.
We haven't mentioned the third (most obvious!) way of giving something an identity.  The ways as I see them:

Determine it from intrinsic properties of the thing.
Create a GUID (really this is a magic trick, using the awesome power of huge random numbers)
Have an identity registry, i.e. a data store that contains a map from some kind of name or ID handle, to definitions.

Now, don't get me wrong, I don't think that having a registry of Unison types is a great idea.  I think we've been taking it as unspoken foundational assumption that the objective is for Unison to find a way to avoid having one.  But it would be good to unpack the reasons why.
Options for a registry

Have it as part of the codebase.  Codebases can pull types from each other, that ought to be the same, but aren't.  There needs to be a refactoring to replace one of the variants with the other wherever it occurs.
Have it as a global well-known service (stay with me here...)  People browse this service to find their type before creating a new one.  There needs to be a deprecation process for when duplicates are discovered.

Why don't we like this?
It goes against the basic Unison idea of identifying things without names, in order to make interactions between different nodes and different codebases easier.  As soon as you have namespaces to refer to things, you

have state (in a name table, which needs to exist in erased Unison not just at edit-time), so can't treat your codebase as an immutable data structure any more
have to qualify names by their version, and the scope of that version number - like if you were trying to uniquely identify a function in any library online today.

Nominal types via GUID

Using a GUID is a cunning way to avoid a name table and avoid state, but it still means a thing's identity can't just be calculated from its definition.  I'm unsettled by this but I can't put my finger on why.
Maybe it's just because it's a departure from the situation we have with functions.
Or maybe it's because it leaves it not quite clear whether identity is really determined just by stuff that exists in erased unison.  (And it's also not quite clear whether that property would be desirable.)
But I'm struggling to see a practical downside with it.  We need to make sure that the user has full control over the identities and isn't locked into accidents of initial choices.  I think that just means we need a refactoring to collapse two structurally identical types into one.
With 'nominal types option 2', you're using a hash of the initial name as the GUID.  That means that two people, working in different codebases at different times, end up with a type that is deemed the same just on the grounds of choosing the same name.  That's heading into 'duck typing' territory I reckon, and I think that's undesirable - people should only write code that resolves to some given entity (type/definition/whatever) if as they were editing it, the editor gave them every chance to check it was the one they really meant.  Being surprised by later type coincidences due to name collisions sounds dangerous.
And what (natural) language are these names even written in?  We want different people to be able to look at the same code in different languages right?  So just because they're spelt the same isn't to say they mean the same.
Is there a reason to prefer option 2 (hash initial name) over option 1 (GUID)?  Maybe just that we already have the initial name to hand in the current unison implementation?
Opaque type aliases

I think I'd need to hear more about how these work and the motivation.  Are they for data modularity?  Or are they an angle on type identity (throw the privileged set of code into the mix of what gets hashed to get the type identity).
If they're for modularity then how do they relate to other approaches to the problem?
If they're an angle on type identity, then again, can we put a finger on what's motivating us to look at them?  Is it the fact that the nominal/GUID solution is arguably making identity not just a function of stuff in the (name-erased) codebase?
I wonder whether there's a relationship between the applicative/generative distinction in ML-style modules, and what we're talking about here.  AIUI (and I won't try and pretend I'm not out of my depth), there is an issue around whether an ML functor, when applied to some module that contains an abstract type, creates a new abstract type in its resulting module 'each time' (generative) or the same one each time (applicative).  Maybe we could see your package of extra functions alongside a type as a functor being applied to the base type (the thing you're aliasing).  However I don't know what I'm talking about and don't know where this would lead anyway...  but it adds to my wish to understand this opaque types idea in the context of other approaches to modularity.