Skip to content

Instantly share code, notes, and snippets.

@sstucki
Last active May 26, 2020 11:34
Show Gist options
  • Star 5 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save sstucki/3fa46d2c4ce6f54dc61c3d33fc898098 to your computer and use it in GitHub Desktop.
Save sstucki/3fa46d2c4ce6f54dc61c3d33fc898098 to your computer and use it in GitHub Desktop.
Strawman of a DOT calculus with higher-kinded types

DOT with higher-kinded types -- A sketch

Definitions

Syntax

Var         x, y, z
Term           s, t ::= x                 variable
                        λx:T.t            term abstraction
                        s t               term application
                        new(x: T)d        record creation
                        t.l               term member selection
Definition        d ::= { a = t }         term member
                        { A = T }         type member
Value*         v, w ::= λx:T.t
                        new(x: T)d
Path*          p, q ::= x
                        v
Type        S, T, U ::= ⊤                 top
                        ⊥                 bottom
                        S ∧ T             intersection
                        (x:S) → T         dependent function type
                        { a : T }         term member
                        { A :: K }        type member
                        μ(x: T)           recursive record type
                        p.A               type member selection
                        λx:S.T            type abstraction
                        S t               type application

Kind              K ::= (x:S) → K         dependent function kind
                        S..T              type interval kind

Context           Γ ::= ∅                 empty
                        Γ, x: T           extension

* = predicate on terms defining a subset of the terms

As is customary, we write S → T as a shorthand for non-dependent function types (and kinds), i.e. for (x: S) → T where x does not appear freely in T. We write * for the interval containing the entire type lattice, i.e. for ⊥..⊤.

Substitutions [x := p] of a variable x for a path p and their applications t[x := p] to terms and T[x := p] to types are defined in the usual way.

We make use of let-binders, which are desugared in the usual way,

⟦ let x: T = t₁ in t₂ ⟧ = (λx: T. ⟦ t₂ ⟧) ⟦ t₁ ⟧

omitting the explicit type annotation when it can be readily inferred.

As in DOT, parametric polymorphism is achieved through "boxing" and (path-)dependent types, e.g. universals are encoded as

∀X.T   =   (x: { A :: * }) → T[X := x.A]
ΛX.t   =   λx: { A :: * }. t[X := x.A]

Type operators are encoded similarly as

K₁ → K₂    =   { A :: K₁ } → K₂
λX: K. T   =   λx: { A :: K }. T[X := x.A]

Note that we can use dependent kinds to give very precise kinds to type operators:

λX: K. T  ::  (X: K) → T..T

As we'll see below, this allows us to admit β-reductions in types through a combination of kinding and sub-kinding.

Reduction rules

Contractions:

(AppAbs)        (λx:T.t) v —→ t[x := v]

                d = … ∧ { a = t } ∧ …
(RecSel)        ------------------------------------
                (new(x: T)d).a —→ t[x := new(x: T)d]

Congruence:

                   t₁ —→ t₂'
(App₁)          ---------------
                t₁ t₂ —→ t₁' t₂

                    t —→ t'
(App₂)            -----------
                  v t —→ v t'

                    t —→ t'
(Sel)             -----------
                  t.a —→ t'.a

Typing

Term typing

x : T ∈ Γ
--------- (T-Var)
Γ ⊢ x : T

Γ ⊢ S :: *    Γ, x : S ⊢ t : T
------------------------------ (T-→-I)
Γ ⊢ λx: S.t : (x: S) → T

Γ ⊢ t : (x: S) → T   Γ ⊢ p : S
------------------------------ (T-→-E₁)
Γ ⊢ t p : T[x := p]

Γ ⊢ t₁ : S → T   Γ ⊢ t₂ : S
--------------------------- (T-→-E₂)
Γ ⊢ t₁ t₂ : T

Γ, x: T ⊢ d : T
------------------------ (T-{}-I)
Γ ⊢ new(x: T)d : μ(x: T)

Γ ⊢ t : { a : T }
----------------- (T-{}-E)
Γ ⊢ t.a : T

Γ ⊢ p : T[x := p]
--------------------- (T-Fold)
Γ ⊢ p : μ(x: T)

Γ ⊢ p : μ(x: T)
--------------------- (T-Unfold)
Γ ⊢ p : T[x := p]

Γ ⊢ t: T    Γ ⊢ t: U
--------------------
Γ ⊢ t: T ∧ U

Γ ⊢ t : S   Γ ⊢ S <: T
---------------------- (T-Sub)
Γ ⊢ t : T

Value/path typing is defined as the obvious restrictions of the above relation to values/paths.

The following rules for let-binders are admissible.

Γ ⊢ p : S   Γ, x : S ⊢ t : T
--------------------------------- (T-Let₁)
Γ ⊢ let x: S = p in t : T[x := p]

Γ ⊢ t₁ : S   Γ ⊢ t₂ : T
--------------------------- (T-Let₂)
Γ ⊢ let x: S = t₁ in t₂ : T

Definition typing

Γ ⊢ t : T
------------------------- (D-{:})
Γ ⊢ { a = t } : { a : T }

Γ ⊢ T :: K
-------------------------- (D-{::})
Γ ⊢ { A = T } : { A :: K }

Γ ⊢ d₁ : T₁     Γ ⊢ d₂ : T₂
--------------------------- (D-∧)
Γ ⊢ d₁ ∧ d₂ : T₁ ∧ T₂

Subtyping

---------- (T-Refl)
Γ ⊢ T <: T

Γ ⊢ S <: T    Γ ⊢ T <: U
------------------------ (T-Trans)
Γ ⊢ S <: U

Γ ⊢ T :: *
---------- (T-<:-⊤)
Γ ⊢ T <: ⊤

Γ ⊢ T :: *
---------- (T-⊥-<:)
Γ ⊢ ⊥ <: T

-------------- (T-∧-<:₁)
Γ ⊢ T ∧ U <: T

-------------- (T-∧-<:₂)
Γ ⊢ T ∧ U <: U

Γ ⊢ S <: T    Γ ⊢ S <: U
------------------------ (T-<:-∧)
Γ ⊢ S <: T ∧ U

Γ ⊢ S₂ <: S₁    Γ, x: S₂ ⊢ T₁ <: T₂
----------------------------------- (T-→-<:-→)
Γ ⊢ (x: S₁) → T₁ <: (x: S₂) → T₂

Γ ⊢ T₁ <: T₂
---------------------------- (T-{:}-<:-{:})
Γ ⊢ { a : T₁ } <: { a : T₂ }

Γ ⊢ K₁ <:: K₂
------------------------------ (T-{::}-<:-{::})
Γ ⊢ { A :: K₁ } <: { A :: K₂ }

Γ ⊢ S :: T₁..T₂
----------------- (T-..-<:)
Γ ⊢ T₁ <: S

Γ ⊢ S :: T₁..T₂
------------------ (T-<:-..)
Γ ⊢ S <: T₂

Kinding

---------- (K-⊤-F)
Γ ⊢ ⊤ :: *

---------- (K-⊥-F)
Γ ⊢ ⊥ :: *

Γ ⊢ T, U :: *
-------------- (K-∧-F)
Γ ⊢ T ∧ U :: *

Γ ⊢ S :: *    Γ, x: S ⊢ T :: *
------------------------------ (K-→-F)
Γ ⊢ (x: S) → T :: *

Γ ⊢ T :: *
------------------ (K-{:}-F)
Γ ⊢ { a : T } :: *

------------------- (K-{::}-F)
Γ ⊢ { A :: K } :: *

Γ, x: T ⊢ T :: *
---------------- (K-μ-F)
Γ ⊢ μ(x: T) :: *

Γ ⊢ T :: *
------------- (K-..-I)
Γ ⊢ T :: T..T

Γ ⊢ S :: *    Γ, x: S ⊢ T :: K
------------------------------ (K-→-I)
Γ ⊢ λx: S.T :: (x: S) → K

Γ ⊢ S :: (x: T) → K   Γ ⊢ p : T
------------------------------- (K-→-E₁)
Γ ⊢ S p :: K[x := p]

Γ ⊢ S :: T → K   Γ ⊢ t : T
-------------------------- (K-→-E₂)
Γ ⊢ S t :: K

Γ ⊢ p : { A :: K }
------------------ (K-Sel)
Γ ⊢ p.A :: K

Γ ⊢ T :: K₁    Γ ⊢ K₁ <:: K₂
---------------------------- (K-Sub)
Γ ⊢ T :: K₂

Sub-kinding

Γ ⊢ S₂ <: S₁    Γ ⊢ T₁ <: T₂
---------------------------- (K-..-<::-..)
Γ ⊢ S₁..T₁ <:: S₂..T₂

Γ ⊢ T₂ <: T₁    Γ, x: T₂ ⊢ K₁ <:: K₂
------------------------------------ (K-→-<::-→)
Γ ⊢ (x: T₁) → K₁ <:: (x: T₂) → K₂

A few points worth noting:

  • The selection subtyping rules from DOT are now admissible, e.g. we have

     Γ ⊢ p : { A :: S..T }
     --------------------- (K-Sel)
     Γ ⊢ p.A :: S..T
     --------------- (T-..-<:)
     Γ ⊢ S <: p.A
    
  • β-reduction in types is also admissible (as a subtyping rule), e.g. we have

     Γ ⊢ S :: *    Γ, x: S ⊢ T :: T..T
     --------------------------------- (K-→-I)
     Γ ⊢ λx: S. T :: (x: S) → T..T               Γ ⊢ p : S
     ----------------------------------------------------- (K-→-E₁)
     Γ ⊢ (λx: S. T) p :: (T..T)[x := p]
     ---------------------------------- (T-<:-..)
     Γ ⊢ (λx: S. T) p <: T[x := p]
    

    Note that this is not a new idea, Aspinall already noted in 1994 that β-reduction is admissible in systems with singleton types1. Type interval kinds can be understood to be a generalization of singleton kinds2: clearly, every singleton {T}_* can be encoded as a singleton interval T..T. Most of the literature on the properties of systems with singleton types and kinds seems to use PER-semantics1, 2, 3, interpreting singleton types/kinds {T}_K as an equivalence class of types/kinds up to β(η)-equivalence.

  • Types in this system do not have normal forms in general. Consider e.g. the encoding of

     { type F[X] = F[F[X]] }
    

    which is expressible as

     new(z: { F :: (x: {A::*}) → z.F (z.F x) .. z.F (z.F x) })
         { F = λx:{A::*}. z.F (z.F x) }
    

    This is in contrast to the calculi with singleton kinds proposed by Stone and Harper2, where type equivalence is show to be decidable via comparison of WHNFs.

  • The introduction rule for interval kinds (over proper types)

     Γ ⊢ T :: *
     ------------- (K-..-I)
     Γ ⊢ T :: T..T
    

    corresponds roughly to Aspinall's {}-I for singleton types,1

     Γ ⊢ M : A
     ------------- ({}-I)
     Γ ⊢ M : {M}_A
    

    and Stone and Harper's introduction rule for singleton kinds.2

     Γ ⊢ A :: *
     -------------
     Γ ⊢ A :: S(A)
    
  • One might wonder why we don't give the "most precise" kinds to proper types in the first place, e.g. the kinding rule for the top type could be

     ------------- (K-⊤-F′)
     Γ ⊢ ⊤ :: ⊤..⊤
    

    On a first glance, it seems like the original (K-⊤-F) rule could then be recovered via subsumption, since ⊤..⊤ <:: *. However, a derivation of this relationship already requires one for ⊤ :: * or ⊥ :: *, and so we end up with a circular dependency:

        ???
     ----------
     Γ ⊢ ⊤ :: *
     ---------- (T-⊥-<:)   ---------- (T-Refl)
     Γ ⊢ ⊥ <: ⊤            Γ ⊢ ⊤ <: ⊤
     -------------------------------- (K-<:-..)
     Γ ⊢ ⊤..⊤ <:: ⊥..⊤
    
  • It's questionable whether identifying * with ⊥..⊤ is a good idea. Alternatively, one could treat them separately and add sub-kinding rules to convert between them. At the very least one should have that ⊥..⊤ <:: *.

  • One could take the analogy to singleton types one step further and allow intervals A .._K B over kinds K other than *. E.g.

     (λx:A₁.B₁) .._(* → *) (λx:A₂.B₂)
    

    would be an interval over unary type operators. Note the kind annotation to make sure the bounds of the interval are of the same kind. This is closer in spirit to the way abstract, higher-kinded type members are handled in dotty, where one can write e.g.

     { type F <: List }
    

    (which the compiler η-expands to)

     { type F <: λX.List[X] }
    

    While it is unclear whether this would simplify or complicate the meta theory, it would likely make the existing theory from systems with singleton types more easily applicable in the DOT setting.

    Note that the more liberal, kind-annotated variant of intervals can be encoded in terms of the *-only variant. Again, this has already been noticed, for the case of singleton kinds, by Stone and Harper:2

     S .._*        T   =   S..T
     F .._(x: S)→K G   =   (x: S) → (F x) .._K (G x)
    
  • This proposal originally contained the following two extra subtyping rules:

     Γ, x: S ⊢ T₁ <: T₂
     -------------------------- (T-λ-<:-λ)
     Γ ⊢ λx: S. T₁ <: λx: S. T₂
    
     Γ ⊢ S₁ <: S₂
     ---------------- (T-App-<:-App)
     Γ ⊢ S₁ t <: S₂ t
    

    However, as Martin pointed out in a comment to his gist (https://gist.github.com/odersky/36aee4b7fe6716d1016ed37051caae95), these rules might not add any expressivity in practice. Consider the following example taken from his comment:

     type C(x) <: Seq(x)
    

    is encoded as

     { C :: (x: {A::*}) → ⊥..(Seq x) }
    

    Then

     p.C y :: ⊥..(Seq y)
    

    Hence

     p.C y <: Seq y
    

    Although this idea does not seem to extend to the case of partial applications, e.g.

     type Map[K, V] <: GenMap[K, V]
     Map Int  <:  GenMap Int   ?
    

    such partial applications can only occur nested inside other subtyping derivations but never directly in (T-Sub), (T-<:-..) or (T-..-<:) since partially applied types necessarily have arrow-kinds.

Unsoundness

The following example illustrates that type safety does not hold for the proposed calculus (neither for the above version, nor for Martin's original proposal).

let x: { A :: { B :: x.A → ⊤..⊥ }..{ B :: x.A → ⊤..⊥ } } =
    new(x: { A :: { B :: x.A → ⊤..⊥ }..{ B :: x.A → ⊤..⊥ } })
           { A =  { B :: x.A → ⊤..⊥ } } in
let y: { B :: x.A → ⊤..⊥ } =
    new(y: { B :: (z: x.A) → (z.B z)..(z.B z) })
           { B =  λz: x.A. z.B z } in
let z: y.B y = () in
(z: ⊥)

To see that this example typechecks, let's first verify that the declared types of x, y and z match the types of their definitions.

  • For x, we have that x.A → ⊤..⊥ is a kind (syntactically), so we immediately get { B :: x.A → ⊤..⊥ } :: * by (K-{::}-F). From there, we get the declared type by applying the usual sequence of (K-..-I), (D-{::}), (T-{}-I) and (T-Unfold).

  • For y, we need to show that (z: x.A) → (z.B z)..(z.B z) <:: (z: x.A) → ⊤..⊥ (note the absurd bounds on the codomain). Let Γ = x: { A :: { B :: x.A → ⊤..⊥ }..{ B :: x.A → ⊤..⊥ } } and Γ′ = Γ, z: x.A, then by (T-Var) and (K-Sel)

     Γ′ ⊢ x.A :: { B :: x.A → ⊤..⊥ }..{ B :: x.A → ⊤..⊥ }
    

    by (T-Var), (T-<:-..) and (T-Sub)

     Γ′ ⊢ z : { B :: x.A → ⊤..⊥ }
    

    by (K-Sel)

     Γ′ ⊢ z.B :: x.A → ⊤..⊥
    

    by (T-Var) and (K-→-E)

     Γ′ ⊢ z.B z :: ⊤..⊥
    

    by (T-..-<:) and (T-<:-..)

     Γ′ ⊢ ⊤ <: z.B z <: ⊥
    

    by (K-..-<::-..) and (K-→-<::-→)

     Γ ⊢ (z: x.A) → (z.B z)..(z.B z) <:: (z: x.A) → ⊤..⊥
    
  • For z, it's sufficient to show that ⊤ <: y.B y. Let Γ′′ = Γ, y: { B :: x.A → ⊤..⊥ }, then by (T-Var) and (K-Sel)

     Γ′′ ⊢ y.B :: x.A → ⊤..⊥
    

    by (T-Var), (K-Sel) and (T-..-<:)

     Γ′′ ⊢ { B :: x.A → ⊤..⊥ } <: x.A
    

    by (T-Var) and (T-Sub)

     Γ′′ ⊢ y : x.A
    

    by (T-Var) and (K-→-E)

     Γ′′ ⊢ y.B y :: ⊤..⊥
    

    by (T-..-<:) and (T-<:-..)

     Γ′′ ⊢ ⊤ <: y.B y <: ⊥
    

It follows that y.B y has absurd bounds in a context that corresponds to a store. Equivalently, we can substitute x and y by their definitions to obtain a closed type with absurd bounds. In either case, the type lattice becomes trivial.

The example is a variant of the well-know technique for typing arbitrary lambda terms in a system with (iso-)recursive types (see e.g. TAPL, ch. 20, pp. 273). Here, we lift this technique to the kind level to allow kinding arbitrary lambda expressions on the type level. Then we construct a divergent type expression and kind it with absurd bounds.

This unsoundness example makes crucial use of

  1. recursion in types (or rather in bounds of type members), and
  2. path-dependent type families.

The former is also present in Scala/dotty where as the latter is not. The example does therefore not constitute a soundness issue in those languages. Since this soundness issue is specific to HK DOT, it's worth elaborating a bit on this point. The different sorts of arrows in HK DOT (and Scala/dotty) can be categorized according to the sort of their domain and codomain:

  1. (possibly path-dependent) function type (domain: type, codomain: type), inhabited by term-level lambdas/functions/methods,

    example in DOT

    f : (x: S) → T         where S ≠ { A :: K }
    f = λx: S. t
    

    examples in Scala/dotty

    def f(x: S): T = t
    val g: (x: S) => T = (x: S) => t
    
  2. universal quantifier/polymorphic type (domain: kind, codomain type), inhabited by polymorphic terms/functions

    examples in DOT

    f : (x: { A :: K }) → T
    f = λx: { A :: K }. t
    

    examples in Scala/dotty

    def f[X]: T = t
    def f[X[_]](y: ...): T = t
    
  3. kind of path-indexed type families (domain: type, codomain: type), inhabited by type families or path-indexed singleton kinds

    examples in HK DOT

    F :: (x: S) → K         where S ≠ { A :: K′ }
    F =  λx: S. T
    

    alternatively, using Martin's variant

    F <: all(x: S)K         where S ≠ { A : K′ }
    F =  all(x: S)J               J singleton, J <: K
    

    there is no counterpart of this in Scala/dotty

  4. kind of type operators, i.e. higher kind (domain: kind, codomain: kind), inhabited by type operators/functions/lambdas

    examples in HK DOT

    F :: (x: { A :: K′ }) → K
    F =  λx: { A :: K′ }. T
    

    in Martin's variant

    F <: all(x: { A : K′ })K
    F =  all(x: { A : K′ })J      J singleton, J <: K
    

    example in Scala/dotty

    F[X]       >: S <: T
    F[X, Y, Z] >: S <: T
    F[X[_]]    >: S <: T
    F[X]       = T
    

In other words, the kind language of both variants of HK DOT are strictly more expressive than that of Scala/dotty, which suggests that the soundness issue could be avoided by restricting kinds as follows:

K ::= (x: { A :: K }) → K′         type operator kind
      S..T                         type interval kind

This restriction is purely syntactic and remains "compatible" with Scala/dotty in that any (higher-kinded) abstract type definition expressible in Scala can still be encoded in this subset.

References

  1. D. Aspinall, Subtyping with singleton types, CSL'94, LNCS 933, 2005. http://dx.doi.org/10.1007/BFb0022243
  2. C. A. Stone, R. Harper, Deciding Type Equivalence in a Language with Singleton Kinds, POPL '00. http://doi.acm.org/10.1145/325694.325724
  3. C. A. Stone, R. Harper, Extensional Equivalence and Singleton Types, ACM TCL 7(4), 2006. http://doi.acm.org/10.1145/1183278.1183281
  4. N. Amin, S. Grütter, M. Odersky, T. Rompf, S. Stucki, The Essence of Dependent Object Types, WadlerFest, LNCS 9600, 2016. http://dx.doi.org/10.1007/978-3-319-30936-1_14
  5. N. Amin et al., The DOT Calculus. Additional material including a detailed type safety proof with an accompanying Coq formalization. http://wadlerfest.namin.net/
@sstucki
Copy link
Author

sstucki commented Dec 6, 2016

Ah, very interesting—(Refl) itself is "problematic" then. Do you model the fact that type selection, in Scala and other calculi, depends on object identity not object equality? It seems not.

No, this calculus does not distinguish between object identity and object equality. BUT, the answer to your question really depends on what you mean by "object equality" and "object identity". In particular, types in this calculus are still path-dependent (as opposed to fully dependent) and the distinction between applicative and generative functors does manifest itself. I will go into some detail below as to why I think this is the case. But first, I'd like to clarify the essential difference between this calculus (or rather the first-order fragment) and that described in the Wadlerfest paper (WF-DOT).

Unlike WF-DOT, this calculus follows the tradition of typed lambda calculi (and various extensions thereof) in that it is referentially transparent in the sense that any definition can be replaced by its value (via substitution) without changing the overall result of the program. In STLC, for example, the two programs (λx:T. f x x) λz:U.z and f (λz:U.z) (λz:U.z) yield the same result, and nobody would argue that x is a store location initialized by λz:U.z or that, in copying the expression λz:U.z twice, we have somehow created distinct values. Expressions (and values in particular) are considered identical if they are syntactically equal. Depending on the calculus in question, this might be further relaxed to identify β/η-equivalent expressions. Whether or not this is an appropriate "design decision" for a calculus that deals with "objects" is a different question, which I'm not going to address here (though I will argue that it has some practical advantages below). Be that as it may, a direct consequence of this "design decision" is that the path language (represented by the syntactic category p) has to be extended to also contain values (denoted by v). This is because paths can appear in types in the form of type member selections p.A: if x.A is a valid type expression, and we are serious about referential transparency, then x.A[x := v] = v.A must be a valid type expression as well.

In other words, the problem (if any) is not with the (Refl) rule but with the choice to include values in the path language.

Note that despite the added flexibility, important restrictions remain.

  1. Paths are still a subset of terms, in particular, (u v).A is not a path, nor is v.a.A. As a consequence, types remain path-dependent (albeit for a more relaxed notion of "paths").

  2. A pair of type selections v.A and u.A are comparable only if the values v and u are syntactically equal, no β/η-reductions are allowed, ({a = λx.(λy.y)x} ∧ {A = ⊤}).A and ({a = λx.x} ∧ {A = ⊤}).A are different.

Object identity can be modeled in small-step semantics without ANF, so why does this calculus not do that? That might be an acceptable simplifying assumption, maybe even a good one, but at least ML module people care a lot about the difference (and dealing with it properly tends to be pretty subtle), and I suspect Martin would care as well.

Indeed, he does. I have had many discussion with Martin about whether this is the right approach to formalizing Scala, and at present, we mostly agree that it essentially depends on how the calculus is to be used.

If you're interested in modeling Scala closely -- what happens in the compiler or at runtime -- then a WF-DOT, ANF-based, small-step formalization or an environment-based, OOPSLA'16-style, big-step formalization is probably what you want. In these calculi, variable names correspond to store/environment locations reflecting the familiar distinction between object identity and equality.

If you're interested in the more logical/type-theoretic aspects of the calculus, and about relating it (and its meta theory) to other typed lambda calculi, then a small-step formalization permitting substitutions in terms as well as in types is probably better suited. For example, developing a proof that an ANF-based WF-DOT simulates a substitution-based F-<: is a pain (in fact we don't have any such proof), whereas proving the simulation for a substitution-based variant of DOT is almost trivial (Nada mechanized it in Coq a while back).

In the end, I don't think either of the two approaches yields fundamental insights that cannot be gained from the other. On the contrary, I believe that switching between the two perspectives is more likely to reveal which features of the type system (and its meta theory) are accidental and which reflect deeper principles.

In ML, this shows up in the semantics of functors (generative vs applicative). And at least, you'd have to be pretty explicit on this change.

Functors remain generative in DOT, irrespective of the formalization. Remember that types are still path-dependent, so expressions of the form (f x).A are not allowed: f x needs to be let-bound, at which point different instances can no longer be identified. Consider

let f: μ(z: {A :: ⊥..⊤; a: x.A}) → μ(z: {B :: ⊥..⊤; b: z.B}) =
    λx: μ(z: {A :: ⊥..⊤; a: x.A}). new(z:_){B = x.A; b = x.a} in
let m: μ(z: {A :: ⊥..⊤; x.A}) = new(z:_){A = Int; a = 1} in
let x = f m in
let y = f m in
let z: x.B = x.b in   // Works.
let w: x.B = y.b      // Doesn't typecheck: x.B ≠ y.B

In fact, this simpler example already fails in both calculi:

let m: μ(z: {A :: ⊥..⊤; x.A}) = new(z:_){A = Int; a = 1} in
let x = m
let y: x.A = x.a in   // Works.
let z: x.A = m.a      // Doesn't typecheck: x.A ≠ m.A

In fact, even if we changed the last line to

let y: x.A = (new(z:_){A = Int; a = 1}).a  // Doesn't typecheck either.

it would still not type check. Only if we consistently replace x by it's value do we get a typable program:

let y: (new(z:_){A = Int; a = 1}).A = (new(z:_){A = Int; a = 1}).a

I don't want to belabor this point unnecessarily, so I'm going to skip the remainder of your comments about object identity in Scala and other DOT versions vs. this one. But please let me know if any of the above remains unclear or dubious to you.

Regarding redundant rules, type C(x) <: Seq(x) and type Map[K, V] <: GenMap[K, V]; Map Int <: GenMap Int ?, could one extend (T-..-<:) to higher kinds? That's strictly a question about expressiveness, and it somehow fits with using singleton kinds to make beta-reduction admissible.

The extension of (T-..-<:) and (T-<:-..) to arrow-kinds is possible in principle but it requires the use of intervals over kinds other than * -- though these are readily encodable (as explained in the gist).

A more interesting question, IMO, is how you would ever use the fact that Map Int <: GenMap Int, other than to show that Map Int T <: GenMap Int T, which is already covered by the current (T-..-<:) and (T-<:-..) rules.

A potential concern, both there and in beta-reduction (or maybe only here), is that since arguments are not types but full objects, in general asking questions on partially applied types might be similar to reducing under binders. Beta-reduction is safer since it has an argument, but would it be sound to have (T-λ-<:-λ) or to make it admissible? Imagine S has bad bounds; that rule asks whether Γ, x: S ⊢ T₁ <: T₂, so T₁ <: T₂ might well be an absurd conclusion. Maybe that's sound because to exploit this absurdity we'd have to apply λx: S. T₁ to a non-existent argument for x? Would that break if we extend (T-..-<:) to higher kinds?

Yes, indeed, the same holds for the (K-→-<:-→) and (T-→-<:-→) rules, which, in combination with singleton types, already allows "reductions" under a binder. You are right that the resulting subtyping/-kinding relations can be absurd, but they do not constitute a soundness issue since they cannot escape the binder.

In fact, I've just seen, in Harper's PFPL, that usual systems can encode singletons for functions just by having singletons for kind *, which is pretty cool (though the relevant proofs are subtle). It's probably the same as what you do here too (and what Aspinall pointed out), but Harper's presentation is insightful and clear.

Yes, as mentioned in the gist, Harper and Stone already describe this in their papers on singleton types and kinds,2, 3 though I have not read the relevant chapter in PFPL. I should probably have a look at that.

@Blaisorblade
Copy link

No, this calculus does not distinguish between object identity and object equality.

Agreed then. I assumed you'd know this but was surprised you didn't mention it—but Martin knew what you meant by "not using ANF".

In other words, the problem (if any) is not with the (Refl) rule but with the choice to include values in the path language.

Oh. I thought OOPSLA had values in the path language without full (Refl), but I was confused.

Let me try to focus my speculations on real problems then: Map Int <: GenMap Int and hereditary substitution.

A more interesting question, IMO, is how you would ever use the fact that Map Int <: GenMap Int, other than to show that Map Int T <: GenMap Int T, which is already covered by the current (T-..-<:) and (T-<:-..) rules.

Well, what if I try to encode def f[K <: GenMap Int] = ...; f[Map Int]? In fact, it seems I currently won't need Map Int <: GenMap Int, but Map Int <: λ x :: *. GenMap Int x, and inspecting your subtyping rules suggests that's unprovable. No rule is applicable except transitivity (which defers the problem) or (T-<:-..) and (T-..-<:) (which don't apply at kind * -> *).
The reason is that encoding K <: GenMap Int requires an interval of non-* kind, which must be currently encoded using eta-expansion. However, type definitions (in f's argument) need not be eta-expanded. Maybe (D-{::}) should be modified and force eta-expanded definitions, like (apparently) Dotty seems to do internally? Or maybe the rules should just be extended to handle this case somehow—right now, it seems I can't even prove F <: λ x :: *. F x or viceversa.

BTW, an algorithmic for Map Int <: λ x. GenMap Int x might need kind-directed eta-expansion, which is sort-of obvious but I've never heard. I have seen eta-expansion for equivalence checking (ATTAPL Ch. 6), but not for subtyping (though it can arise the same way). Of course google disagrees and points me to https://www.researchgate.net/publication/221557946_Syntactic_Metatheory_of_Higher-Order_Subtyping (which also uses HSubst, though on types, and mentions the barrier below).

==

Though, maybe surprisingly, not to normalize type expressions but to "normalize" subtyping derivations. [...]

Of course, all of this hinges on there actually being a suitable termination measure for this edifice of mutually recursive normalization/admissibility proofs, but that's where I believe thinking in terms of hereditary substitution may help.

I admit there are a lot of "if"s and "but"s and it's all still rather vague and abstract at this point. But at least it's a starting point...

Great, you have a research problem and an attack!

I wondered about step-indexing here: it would allow the mutual recursion to proceed, but it wouldn't ensure the whole process terminates, so it's clearly not enough by itself...

Hereditary substitution (HSubst) might be a cool inspiration. This suggests maybe the best idea of this comment: have you tried adding proof terms to the subtyping rules, and phrasing reduction rules for those? That's maybe just notation, but it might help all those that get beta-reduction better than cut-elimination. (In fact, we even get to look informally at the "untyped" reduction rules if we want, though we probably shouldn't formalize those). The result might be of interest beyond Scala!

To this end, though, you might need to internalize dependency into assumptions using a term-level lambda, so that you can exhibit (λ x: S. < proof term > : T <: U): (∀ x: S. T <: U). But that would probably complicate things...

Then, can you explain the "leftover" transitivity/subsumption in a way similar to variables (and spines) in neutral terms?

Back to hereditary substitution: It does prove (weak? strong?) normalization for STLC ("Hereditary Substitutions for Simple Types, Formalized" details how) without logical relations.

But HSubst has limited power, though it might be enough: since (IIUC) hereditary substitution can be formalized in Peano Arithmetic, and since proves strong normalization (hence, with a few provisos consistency) of the object language, hereditary substitution alone can't work for type systems that extend Peano Arithmetic, such as System T, or System F (which proves second-order Peano Arithmetic consistent). I have no clue where the system you have sits, but maybe it's too powerful? But subtyping proofs don't have lambdas and subtyping lacks internalized implication, and maybe it can't be encoded suitably and this is all irrelevant.

There you need reducibility candidates or logical relations, which go beyond Peano Arithmetic. This point is made in "Proofs and Types", in the intros of Chapter 6 and 15, though I don't fully get all those intros. I've just seen it again mentioned in http://www.twelf.org/slr/.
Proofs and Types also doesn't actually mention hereditary substitution (it hints at "proof theoretic techniques" and I guess that's what they mean) or logical relations (but it mentions Tate '67).

If the barrier applies, it would suggest considering logical relations for transitivity elimination, but I've never heard of those. Parroting termination of STLC/System F, they'd be like "this subtyping proof is itself normalizing, and even the result of applying it to an assumption normalizes". What's confusing is that the property, it seems, would not be defined by induction on some "proof type", since you don't have internal abstraction on assumptions (unless you add lambdas, as sketched above), but by induction on the context. That's not novel though: logical relations are defined on closed terms usually, but they can be extended to open ones, and PFPL's chapter on "Equational Reasoning for T" (either edition) does exactly that, by defining open logical equivalence.

@sstucki
Copy link
Author

sstucki commented Jan 24, 2017

TLDR:

  1. HK DOT is unsafe (though there might be a fix),
  2. even with the fix, HSubst is (probably) too weak to prove safety.

A slightly more comprehensive summary with links to the relevant sections:

  1. Subtyping partial type applications.
  2. Proof terms for subtyping derivations, why we should expect them to be normalizing, and how this all relates to inversion, canonical forms and type safety.
  3. The soundness/safety counter example revisited: an illustration of how non-terminating coercion terms break safety.
  4. Why impredicativity of HK DOT spells trouble for a HSubst-based proof.
  5. Some thoughts on a proof via logical relations.
  6. An outlook on future work.

But let's start from the beginning...

No, this calculus does not distinguish between object identity and object equality.

Agreed then. I assumed you'd know this but was surprised you didn't mention it—but Martin knew what you meant by "not using ANF".

Ah yes, that was a rather imprecise distinction indeed, good point. I'll make a mental note to be more clear about this point in future write-ups.

BTW: I recently re-read parts of the TAPL chapter on Featherweight Java (ch. 19, pp. 247), while preparing for TA-ing an exercise on FJ, and realized that it uses the same type of substitution-based semantics as I do above. In particular, there is also no distinction between object identity and object equality: the reduction rules for field selection and method invocation operate directly on objects.

1. Subtyping partial type applications

Let me try to focus my speculations on real problems then: Map Int <: GenMap Int and hereditary substitution.

A more interesting question, IMO, is how you would ever use the fact that Map Int <: GenMap Int, other than to show that Map Int T <: GenMap Int T, which is already covered by the current (T-..-<:) and (T-<:-..) rules.

Well, what if I try to encode def f[K <: GenMap Int] = ...; f[Map Int]?

Yes, initially I was worried as well that partially applied types might appear in bounds of polymorphic functions, and thus, via transitivity, in anywhere in subtyping derivations. But once you de-sugar the above signature into DOT, the partial application disappears. In DOT, the signature of f would be

(k: {A :: (x: *) → ⊥..(p.GenMap int x)}) → T

where A is a dummy label, p is the module containing the abstract type GenMap, and int is a "boxed" variant of Int. Note the "η-expansion" of GenMap! This is necessary because

  1. type parameters have to be "boxed" and their upper/lower bounds have to be translated into upper/lower bounds of the corresponding abstract type (here A),
  2. the calculus only supports type intervals over proper types, so type intervals over type operators have to be de-sugared as described above (note that the definition of S .._K T performs an η-expansion so higher-order type bounds are effectively "η-long").

As a consequence, every occurrence of a type operator in a bound is necessarily fully applied in the kind corresponding to the "type box".

The reason is that encoding K <: GenMap Int requires an interval of non-* kind, which must be currently encoded using eta-expansion.

Exactly.

However, type definitions (in f's argument) need not be eta-expanded. Maybe (D-{::}) should be modified and force eta-expanded definitions, like (apparently) Dotty seems to do internally?

I think your on to something there. We want all (type-)definitions to have singleton kinds, so the typing rules above should be adjusted (Martin's original proposal actually enforced this restriction). There are good meta-theoretic reasons for requiring definitions to have singleton types, and incidentally, this would force definitions to be η-expanded. Consider the following definition

App[F[_]] = F

which we can either interpret as the identity operation on unary type operators or as a "curried" variant of type application, i.e. App[F][X] = F[X]. This type definition is actually not valid in Scala (I don't know whether it is in dotty). Instead, one has to η-expand and un-curry manually, like so:

App[F[_], X] = F[X]

Now in HK DOT (as outlined above) one may encode the former as

App :: {F :: {A :: *} → *} → {A :: *} → *
App =  λx: {F :: {A :: *} → *}. x.F

Note that the kind of App is certainly not a singleton, so according to Martin's original proposal, this wouldn't be valid. On the other hand, the η-expanded version can be given a singleton kind:

App :: (x: {F :: {A :: *} → *}) → (y: {A :: *}) → (x.F y)..(x.F y)
App =  λx: {F :: {A :: *} → *}. λy: {A :: *}. x.F y

I believe this generalizes to arbitrary type operators.

To summarize, enforcing singleton types for all type definitions ensures all such definitions are η-long, which is also in accordance with Scala.

Or maybe the rules should just be extended to handle this case somehow—right now, it seems I can't even prove F <: λ x :: *. F x or viceversa.

No, η-conversion is not admissible. But η-conversion only makes sense for type operators, and following the same reasoning as above, you should never need to compare type operators that are not fully applied. Note that fully applied variants of your example operators are comparable.

Γ ⊢ λ x :: *. F x :: (x :: *) → (F x)..(F x)    ...
--------------------------------------------------- (K-→-E)
Γ ⊢ (λ x :: *. F x) p :: (F p)..(F p)
------------------------------------- (T-..-<:)
Γ ⊢ F p <: (λ x :: *. F x) p

BTW, an algorithmic for Map Int <: λ x. GenMap Int x might need kind-directed eta-expansion, which is sort-of obvious but I've never heard. I have seen eta-expansion for equivalence checking (ATTAPL Ch. 6), but not for subtyping (though it can arise the same way). Of course google disagrees and points me to https://www.researchgate.net/publication/221557946_Syntactic_Metatheory_of_Higher-Order_Subtyping (which also uses HSubst, though on types, and mentions the barrier below).

Indeed. That's a nice paper BTW (I guess most of Abel's papers are). His paper on polarized subtyping, which I mentioned earlier,6 also makes essential use of hereditary substitution (though he doesn't mention it explicitly, IIRC).

2. Proof terms for subtyping derivations and normalization

Hereditary substitution (HSubst) might be a cool inspiration. This suggests maybe the best idea of this comment: have you tried adding proof terms to the subtyping rules, and phrasing reduction rules for those?

I certainly have!

That's maybe just notation, but it might help all those that get beta-reduction better than cut-elimination. (In fact, we even get to look informally at the "untyped" reduction rules if we want, though we probably shouldn't formalize those). The result might be of interest beyond Scala!

I agree that this is of more general interest, and I think that we should formalize the untyped reduction rules! Firstly, they provide a sort of formal specification for the normalization proof we're looking for, and secondly, we can prove a variant of the type safety theorems for the new calculus. Namely

  1. that reductions of terms (and the coercions they contain) preserve types, and
  2. that closed, well-typed (coercion) terms can either be reduced or are values.

Intuitively, a coercion value is a coercion that is closed and normal/canonical, and that therefore corresponds to a subtyping derivation that can be inverted. Indeed this latter point can be made formal by a handful of corresponding canonical forms lemmas (for coercions). I have sketched such a type safety proof for a variant of the system above (without recursion or records) and there don't seem to be any major difficulties. I believe that extending the proof to the full system would increasing the technical complexity but wouldn't introduce any new conceptual difficulties.

These type-safety theorems do of course not guarantee that type safety holds for the original calculus because coercion terms may be non-normalizing, in which case either a) the calculus is unsafe (see example below) or b) our reduction strategy is flawed. In either case, the formalization allows us to make the inherent "dynamics" of the normalization proof explicit, and — as you already mentioned — should give some insights about the proof-theoretic strength of the logic corresponding to the subtyping fragment.

Finally, it allows us to interpret the final type safety proof as a proof by (weak) simulation: the extended calculus (with explicit coercions) weakly simulates the original calculus by interleaving finite sequences of "silent" coercion reductions with the reductions of the "bare" terms in the original calculus. The simulation relation is simply the function that maps terms in the extended calculus to their bare counterparts by erasing all coercions.

To this end, though, you might need to internalize dependency into assumptions using a term-level lambda, so that you can exhibit (λ x: S. < proof term > : T <: U): (∀ x: S. T <: U). But that would probably complicate things...

Then, can you explain the "leftover" transitivity/subsumption in a way similar to variables (and spines) in neutral terms?

I don't understand your example. Could you elaborate a bit? How would you use a term with this particular signature?

It is definitely true though that the coercion language does need to internalize dependencies. E.g. the language I introduce below does have binders and reduction rules for eliminating them.

There is also plenty of literature where subtyping rules in some language are be elaborated into explicit coercion terms in a subtyping-free fragment of the same language. But this isn't what we want here, because it would blur the clear separation between the coercion language, which we need to be normalizing, and the original term language, which might not be normalizing (it certainly isn't here). Again, I'm going to say a bit more about this below.

Back to hereditary substitution: It does prove (weak? strong?) normalization for STLC ("Hereditary Substitutions for Simple Types, Formalized" details how) without logical relations.

Another very nice paper (and a source of inspiration for my attempt to use HSubst here). BTW, are you aware of a similar paper that handles STLC with products?

But HSubst has limited power, (...), hereditary substitution alone can't work for type systems that extend Peano Arithmetic, such as System T, or System F (...). I have no clue where the system you have sits, but maybe it's too powerful?

Indeed, the expressiveness/proof-theoretic power of the HK DOT system above (and in particular the (sub-)typing fragment) is too powerful for HSubst, as exemplified by the above safety counter example — in fact, this is how I found the counter example in the first place. But there are even simpler examples of why HSubst doesn't work "out of the book" here, and these apply even to the "fixed" version of HK DOT where the safety counter example doesn't apply. I'll get back to this later — first, let's have a look at the counter example to get a feeling for what a non-terminating subtyping coercion looks like!

3. Non-termination of the soundness/safety counter example

Since the presence of path-indexed type families (plus recursion) are powerful enough to type arbitrary (untyped) lambda calculus expressions on the type level, types are clearly not normalizing. However, we're not trying to normalize types, but rather (sub-)typing derivations, and as you further note,

subtyping proofs don't have lambdas and subtyping and subtyping lacks internalized implication, and maybe it can't be encoded suitably and this is all irrelevant.

Unfortunately, they can be encoded suitably.

In what follows, I'll use a new syntax for paths and types extended with explicit subtyping and subkinding coercions:

   p, q ::= ...
         |  unfold(p)       (T-Unfold)
         |  p ↑ α           (T-Sub)

S, T, U ::= ...
         |  s(T)            (K-..)
         |  T ⇑ Φ           (K-Sub)

α, β, γ ::= id_T            (T-Refl)
         |  α ; β           (T-Trans)
         |  {A::Φ}          (T-{::}-<:-{::})
         |  ⟨T|             (T-..-<:)
         |  |T⟩             (T-<:-..)
         |  ...

   Φ, Ψ ::= α..β            (K-..-<::-..)
         |  (x:T.Φ)^α       (K-→-<::-→)

where I have indicated the corresponding (sub-)typing and kinding rules. For lack of a better name, let's call the original calculus the bare language (BL) and the extended calculus with explicit coercions the extended language (EL). Suggestions for better terminology are welcome...

When presenting (sub-)typing and kinding rules/derivations, I will use the EL for "proof terms", but the BL for the associated types and kinds. There is an obvious erasure function ⌞-⌟ that, given a path p, type T or kind K in the EL, yields the underlying bare path ⌞p⌟, type ⌞T⌟ or kind ⌞K⌟, respectively, in the BL. E.g. the updated subtyping rule for (T-..-<:) is

Γ ⊢ S :: T₁..T₂
------------------- (T-..-<:)
Γ ⊢ ⟨S| : T₁ <: ⌞S⌟

Note that the conclusion of this rule is still a subtyping judgment, albeit one with an explicit proof term ⟨S|. I.e. subtyping is now a four-place relation Γ ⊢ α : S <: T, relating a coercion α to it's (bare) upper and lower bounds S and T in a typing context Γ. Importantly, <: remains part of the relation, i.e. it is not a type operator here. This is a design choice to simplify erasure of coercions from subtyping derivations (it suffices to just drop the proof term) and contrasts with systems that elaborate (or translate) subtyping judgments into a term-language (e.g. through a relation Γ ⊢ S <: T ⇝ α) where coercions have proper types (e.g. Γ ⊢ α : S → T). Still, in designing the reduction rules, it is of course useful to think of coercions as maps α : S → T from terms of their lower bound (i.e. the domain) S to terms of their upper bound (i.e. the codomain) T.

Semantically, we treat elements of the BL as uninterpreted constants and reason only about the dynamics of the new extensions in the EL: coercions and the associated instances of subsumption. In this context, the sub-kinding rule (K-→-<::-→) takes the role of lambda abstraction,

Γ ⊢ α : T₂ <: T₁        Γ, x: T₂ ⊢ Φ :: K₁ <:: K₂
------------------------------------------------- (K-→-<::-→)
Γ ⊢ (x: T₁. Φ)^α :: (x: T₁) → K₁ <:: (x: T₂) → K₂

internalizing implication in its arrow kind bounds (x: T₁) → K₁ <:: (x: T₂) → K₂. The kinding rule (K-→-E₁) along with subsumption (K-Sub) on the first premise act as "application"/implication elimination, i.e.

Γ ⊢ Ψ : (x: T₁) → K₁ <:: (x: T₂) → K₂
Γ ⊢ S :: (x: T₁) → K₁
-------------------------- (K-Sub)
Γ ⊢ S ⇑ Ψ :: (x: T₂) → K₂                Γ ⊢ p : T₂
--------------------------------------------------- (K-→-E₁)
Γ ⊢ (S ⇑ Ψ) p :: K₂[x := ⌞p⌟]

We can "contract" this (sub-)kinding derivation by pushing the subsumption down the derivation tree through the "application". The corresponding reduction rule is

(T ⇑ (x: S. Φ)^α) p   —→   (T (p ↑ α)) ⇑ Φ[x := p]

Note the similarity to β-reduction! It's easy to verify that this reduction preserves both

  1. the erased type expression (which remains T p), and
  2. its kind (which remains K₂[x := ⌞p⌟]),

assuming the following substitution lemma holds:

Γ ⊢ p : T             Γ, x: T ⊢ Φ :: K₁ <:: K₂
----------------------------------------------
Γ ⊢ Φ[x := p] :: K₁[x := ⌞p⌟] <:: K₂[x := ⌞p⌟]

Returning to the above safety counter example, if we try and reduce the (sub-)typing derivation for z : ⊥, we encounter exactly this situation. As before, let

Γ = x: { A :: { B :: x.A → ⊤..⊥ }..{ B :: x.A → ⊤..⊥ } }
Γ′ = Γ, z: x.A
Γ′′ = Γ, y: { B :: x.A → ⊤..⊥ }

then

Γ′ ⊢ z ↑ |x.A⟩ : { B :: x.A → ⊤..⊥ }
Γ′ ⊢ (z ↑ |x.A⟩).B z :: ⊤..⊥

and letting p = (z ↑ |x.A⟩).B z

Γ′ ⊢ ⟨p| : ⊤ <: z.B z
Γ′ ⊢ ⟨p|..|p⟩ :: (z.B z)..(z.B z) <:: ⊤..⊥
Γ ⊢ (z: x.A. ⟨p|..|p⟩)^id :: (z: x.A) → (z.B z)..(z.B z) <:: (z: x.A) → ⊤..⊥

Now let

Φ = (z: x.A. ⟨p|..|p⟩)^id
v = unfold(new(x: { A :: { B :: x.A → ⊤..⊥ }..{ B :: x.A → ⊤..⊥ } })
                  { A =  { B :: x.A → ⊤..⊥ } })
w = unfold(new(y: { B :: (z: x.A) → (z.B z)..(z.B z) })
                  { B =  λz: x.A. z.B z })
q = w ↑ { B :: Φ }

i.e. v and q are exactly the "definitions" of x and y, respectively, and ⌞q⌟ = w. Furthermore, we have

Γ′′ ⊢ y.B :: x.A → ⊤..⊥
Γ′′ ⊢ ⟨x.A| : { B :: x.A → ⊤..⊥ } <: x.A
Γ′′ ⊢ y ↑ ⟨x.A| : x.A
Γ′′ ⊢ y.B (y ↑ ⟨x.A|) :: ⊤..⊥
Γ′′ ⊢ ⟨y.B (y ↑ ⟨x.A|)|;|y.B (y ↑ ⟨x.A|)⟩ : ⊤ <: ⊥

This last coercion is the one we'd like to normalize. To do so, we start by substituting the definitions v and q for x and y (otherwise the coercion is already in normal form). We get

Γ ⊢ ⟨q.B (q ↑ ⟨v.A|)|;|q.B (q ↑ ⟨v.A|)⟩ : ⊤ <: ⊥                   (1)

Expanding a bit and focusing on the left instance of (T-..-<:), we have

Γ ⊢ ⟨(w ↑ { B :: Φ }).B (q ↑ ⟨v.A|)| : ⊤ <: w.B w

and we can apply the following reduction rule

(p ↑ { A :: Φ }).A   —→   p.A ⇑ Φ

to push subsumption through type member selection.

Γ ⊢ ⟨(w.B ⇑ Φ) (q ↑ ⟨v.A|)| : ⊤ <: w.B w

Note that the underlying bare type is again preserved by the reduction rule. Now we are ready to apply the β-like reduction rule discussed earlier.

⟨(w.B ⇑ Φ) (q ↑ ⟨v.A|)|
    =   ⟨(w.B ⇑ (z: v.A. ⟨p|..|p⟩)^id) (q ↑ ⟨v.A|)|
    —→  ⟨(w.B ((q ↑ ⟨v.A|) ↑ id) ⇑ ⟨p|..|p⟩[z := q ↑ ⟨v.A|]|
    =   ⟨(w.B ((q ↑ ⟨v.A|) ↑ id) ⇑ ⟨p′|..|p′⟩|

where p′ = p[z := q ↑ ⟨v.A|]. Expanding p′, we get

p′  =   p[z := q ↑ ⟨v.A|]
    =   ((z ↑ |v.A⟩).B z)[z := q ↑ ⟨v.A|]
    =   ((q ↑ ⟨v.A|) ↑ |v.A⟩).B (q ↑ ⟨v.A|)

which looks suspiciously similar to the left half of (1) above. Indeed, if we reduce p′ further we get

p′  =   ((q ↑ ⟨v.A|) ↑ |v.A⟩).B (q ↑ ⟨v.A|)
    —→  (q ↑ ⟨v.A|;|v.A⟩).B (q ↑ ⟨v.A|)
    —→  (q ↑ id).B (q ↑ ⟨v.A|)
    —→  q.B (q ↑ ⟨v.A|)

which is exactly the subexpression of (1) we started from. Its easy to convince oneself that these last three steps correspond to sensible reductions too (there are some subtleties involved in formulating the rule for the second step in general, but the instance used in this example is straight-forward).

In summary, we are left with a coercion expression that is strictly bigger than the one we started from and which contains the core of the original coercion as a sub-expression.

⟨q.B (q ↑ ⟨v.A|)|
    —→*  ⟨(w.B (q ↑ ⟨v.A|)) ⇑ ⟨q.B (q ↑ ⟨v.A|)|..|q.B (q ↑ ⟨v.A|)⟩|
    —→*  ...

So we are forced to conclude that this sequence of reductions diverges, and hence no (total) HSubst function can exist for this coercion calculus in general.

But, as stated above, I believe this situation can be remedied by restricting the kind language to forbid path-indexed families of types.

4. Impredicativity troubles for HSubst

So let's turn to the question of whether HSubst is powerful enough proof normalization of the "fixed" coercion calculus. In order to use HSubst, we first need to define a language of "canonical" (or normal) path and type expressions. Such a definition is implicit in the reduction rules for coercions, but we need an direct, syntactic definition, possibly typedA5 or untyped.A1,A3,A4 This is cumbersome, but there are no real conceptual difficulties.

Next, one has to define HSubst on canonical paths and types as well as on canonical path and type coercions, and then show that the type of the substitute path in any "nested" use of HSubst decreases, for some well-founded order on types. Since kinds can appear inside types and vice versa, we actually need a combined order that covers both types and kinds. The essential difficulty here is "impredicativity". Just as it is difficult to define a useful well-founded order on types in an impredicative calculus such as System F, it is difficult to define one on types and kinds in HK DOT. In particular, for HSubst to work, we need the kind of a type application to be smaller than the kind of the corresponding type operator, i.e. given F :: (x: S) → K and p : S, we would like

K[x := p]   ≺   (x: S) → K

to hold, where _≺_ is our candidate order. If variables were not allowed in kind expressions (as e.g. in Fω-<:A1), we could just take _≺_ to be the sub-expression order:

K[x := p]   =   K   ≺   (x: S) → K

but obviously this does not work for HK DOT. Since paths, types and kinds may all appear circularly inside each other, we can pick e.g. K = ⊥..x! and p = { { ⊥..(x: S) → K } }, and get

K[x := p]   =   ⊥..{ { ⊥..((x: S) → K) } }!   ≻   (x: S) → K

which is the opposite of what we want. Note, firstly, that the above example still applies to the "fixed" kind language, where only arrow kinds of the form (x: { A :: K₁ } → K₂ are allowed and, secondly, that the inclusion of kinds in the path language is crucial here. The mere presence of variables in kinds (or types) is not necessarily a problem. For example, Harper and Licata's "Canonical LF"A4 has dependent types, but since the system is stratified, i.e. kinds and types can never appear in terms, any terms appearing in types (or kinds) can simply be ignored by _≺_ (concretely, Harper and Licata first define a forgetful map from types to "simple types", forgetting any term occurrences, then define a well-founded order on simple types instead).

Let's work through an example to see what exactly goes wrong when we try to extend this approach to HK DOT. Consider the following definition of the identity function on types:

tId : { I :: (x: {A::*}) → x.A..x.A }   <:  { I :: {A::*} → * }
tId = { I =  λx: {A::*}. x.A }

Now, assume we are given a variable z of type { I :: (x: {A::*}) → x.A..x.A }, i.e. the precise type of the identity function. We can apply the operator z.I to a pair of nested boxes containing some kind K, i.e. z.I { A = { B :: K } }, which is OK because { B :: K } :: * and hence { A = { B :: K } } : {A::*} for any K. Let C = { B :: K }, then we have

z.I { A = C }  ::  { A = C }.A..{ A = C }.A  <::  C..C

and hence C <: z.I { A = C } <: C, i.e. the identity applied to some type C is equivalent to C, as one would expect. For example, if we let K = (x: {A::*}) → x.A..x.A and B = I, we have

z : C  <:  z.I { A = C }

i.e. we can apply the identity to it's own type. Let's make all the coercions in this statement explicit, starting with C:

C            :: *
s(C)         :: C..C
{ A = s(C) } :: { A :: C..C }

Let k = { A = s(C) }. Noting that ⌞k⌟ = { A = C }, we have

z.I                      :: (x: {A::*}) → x.A..x.A
z.I k                    :: { A = C }.A..{ A = C }.A
k.A                      :: C..C
⟨k.A|                    :: C <:: { A = C }.A
⟨k.A|..|k.A⟩             :: { A = C }.A..{ A = C }.A <:: C..C
(z.I k) ⇑ (⟨k.A|..|k.A⟩) :: C..C

Let E = (z.I k) ⇑ (⟨k.A|..|k.A⟩). Noting that ⌞E⌟ = z.I ⌞k⌟ we have

z       : C
⟨E|     : C <: z.I { A = C }
z ↑ ⟨E| : z.I { A = C }

Note that the variable z, which is a path, appears inside the type z.I, which, in turn, appears as the head of a type application and eventually in the path coercion ⟨E|. Hence, interesting things should happen if we substitute some path p ↑ α for z and normalize the result. We expect the path coercion to be pushed, successively, through the selection of I, then the application to k and finally through the left projection ⟨-|. On the way, it should turn from a path coercion into a type coercion and back, and it's type (and/or kind) should change accordingly.

To keep things simple, we will work with (an η-expanded version of) the identity coercion on the type of z:

Φ :: (x: {A::*}) → x.A..x.A <:: (x: {A::*}) → x.A..x.A
Φ =  (x: {A::*}. id_x.A .. id_x.A)^id_{A::*}

α : { I :: (x: {A::*}) → x.A..x.A } <: { I :: (x: {A::*}) → x.A..x.A }
α = { I :: Φ }

I'll leave it to the reader to verify that these definition are well-typed.

In what follows, I'll annotate paths/types and coercions with both their type/kind as well as simplified versions thereof. In particular, I'll write statements of the form p : T / |T| / ⌊T⌋′ where |T| and ⌊T⌋ are, respectively, the types obtained by erasing all variables and all paths from T. E.g. we have

z ↑ ⟨E| : z.I { A = C } / ?.I { A = |C| } / ?.I ?

with |C| = ⌊C⌋ = { B :: |K| } and |K| = ⌊K⌋ = {A::*} → ?.A..?.A.

Now let's compute the normal form of (z ↑ ⟨E|)[z := p ↑ α]. We have

  (z ↑ ⟨E|)[z := p ↑ α] :  (z.I { A = C })[z := ⌞p ↑ α⌟]  /
                          |(z.I { A = C })[z := ⌞p ↑ α⌟]| /
                          ⌊(z.I { A = C })[z := ⌞p ↑ α⌟]⌋
= (p ↑ α) ↑ ⟨E′|        : ⌞p⌟.I { A = C } / ⌞p⌟.I { A = C } / ?.I ?

with

E′ = E[z := p ↑ α]                   :: C..C / |C|..|C| / ⌊C⌋..⌊C⌋
   = ((p ↑ α).I k) ⇑ (⟨k.A|..|k.A⟩)  :: C..C / |C|..|C| / ⌊C⌋..⌊C⌋

We can already see that the erasure function |-| does not commute with substitution, i.e. |T| ≠ |T[x := ⌞p⌟]|, as exemplified by the type of z ↑ ⟨E| before and after the substitution of ⌞p ↑ α⌟ for z. This is a problem because it means that the simple type |T[x := ⌞p⌟]| of an application f p may be bigger than the codomain |T| of the corresponding simple arrow type |S| → |T| of the function f. The same is true for the simple kinds of type applications. Concretely, in the above example, we have

z.I   :: (x: {A::*}) → x.A..x.A / {A::*} → ?.A..?.A    / {A::*} → ?.A..?.A
z.I k :: (⌞k⌟.A)..(⌞k⌟.A)       / (|⌞k⌟|.A)..(|⌞k⌟|.A) / ?.A..?.A

where (|⌞k⌟|.A)..(|⌞k⌟|.A) is clearly not a subexpression of {A::*} → ?.A..?.A. Unfortunately, HSubst relies crucially on the fact that the type of a an application (or any redex in general) is smaller than the type of the corresponding function (or, in general, the term being eliminated).

Conversely, the erasure function ⌊-⌋ does commute with substitution, i.e. we have ⌊T⌋ = ⌊T[x := ⌞p⌟]⌋. E.g. in the above example, ?.A..?.A is a subexpression of {A::*} → ?.A..?.A.

Continuing the above example, the first redex we encounter is (p ↑ α).I, deeply nested inside E′. The associated reduction is

(p ⇑ { A :: Φ }).I  —→  p.I ⇑ Φ

The kind of the redex and the type of the associated eliminatee α are

(p ⇑ α).I :: K / |K| / ⌊K⌋
α         :  C <: C / |C| <: |C| / ⌊C⌋ <: ⌊C⌋

Expanding C = { I :: K } and |C| = ⌊C⌋ = { I :: |K| }, and noting that |K| = ⌊K⌋, we see that K, |K| and ⌊K⌋ are subexpressions, respectively, of C, |C| and ⌊C⌋ — so far, all is well. The next redex we encounter is (p.I ⇑ Φ) k, with reduction

(p.I ⇑ Φ) k  —→  (p.I (k ↑ id_{A::*}) ⇑ (id_x.A .. id_x.A)[x:=⌞p⌟]

which is an instance of the β-like reduction rule we have seen earlier. The types of the redex and Φ are

(p.I ⇑ Φ) k :: (⌞k⌟.A)..(⌞k⌟.A) / (|⌞k⌟|.A)..(|⌞k⌟|.A) / ?.A..?.A
Φ           :: K <: K / |K| <: |K| / ⌊K⌋ <: ⌊K⌋

Expanding the definitions of K, |K| and ⌊K⌋ we discover that (⌞k⌟.A)..(⌞k⌟.A) is not a subexpression of K, nor is (|⌞k⌟|.A)..(|⌞k⌟|.A) a subexpression of |K|. However, ?.A..?.A is a subexpression of ⌊K⌋, as expected since ⌊-⌋ commutes with substitution. We can therefore dismiss the subexpression order on types as well as subexpressions modulo |-| as candidates for our well-founded induction order for HSubst and focus instead on subexpressions modulo ⌊-⌋.

To see why this will not work either, we first need to perform a few boring reductions on E′ (mostly pushing identity coercions around and eliminating them):

⟨E′|  —→*  ⟨(p.I k) ⇑ ⟨k.A|..|k.A⟩|

The resulting coercion is a redex for the reduction

⟨(p.I k) ⇑ ⟨k.A|..|k.A⟩|  —→  ⟨k.A|;⟨(p.I k)|

which is an instance of the contraction rule

⟨T ⇑ α..β|  —→  α;⟨T|

The associated type/kind are

⟨(p.I k) ⇑ (⟨k.A|..|k.A⟩)|  :   C <: p.I { A = C } / ⌊C⌋ <: ?.I ?
⟨k.A|..|k.A⟩                ::  { A = C }.A..{ A = C }.A <:: C..C /
                                ?.A..?.A <:: ⌊C⌋..⌊C⌋

Focusing on the lower bounds, we note that C is a subexpression of { A = C }.A..{ A = C }.A but, alas, ⌊C⌋ = { B :: |K| } is not a subexpression of ?.A..?.A. The erasure function ⌊-⌋ forgot too much.

In summary: just as paths, types and kinds can circularly contain each other, so can coercions on paths and kinds (i.e. subtyping and subkinding derivations). As a consequence, (hereditary) substitutions in types may trigger (hereditary) substitutions in paths, and vice-versa. Whatever well-founded order/induction measure we use, it has to be defined both on (well-typed) paths and on (well-kinded) types. Ignoring all paths in types and kinds results in a well-founded order that is too coarse to make deeply nested calls to HSubst, while only ignoring variables leads to an order that is not stable under substitution, i.e. it cannot relate the kind K[x := p] of an application F p to the domain K of the arrow kind of the domain (x: S) → K of the corresponding type operator F.

All hope is not lost though: there could still be a well-founded order on types that goes beyond simple subexpression modulo erasure. That is, provided that the type language of HK DOT is weaker than System F. So is it? One thing we can say is that System F can at least not be directly encoded in the type language of HK DOT, simply because that would require us to encode kind-level variables through boxing in path-level variables, similar to the way we encode type variables in DOT. Recall that in DOT we have an injection { A = _ } from types to paths ("boxing") with inverse _.A ("unboxing") which allows us to treat path variables x of type x : { A :: * } as type variables X = x.A and encode universal quantifiers (over types) like ∀X.T as ⟦ ∀X.T ⟧ = (x.A) → ⟦ T[X := x.A] ⟧. To encode universals ∀k.K over kinds, we first need an encoding for kind variables k, i.e. a similar invertible injection from kinds to paths. A candidate is { A = { B :: _ } }, but this map has no syntactic form as its inverse.

5. Some thoughts on a proof via logical relations

Proofs and Types also doesn't actually mention hereditary substitution (it hints at "proof theoretic techniques" and I guess that's what they mean) or logical relations (but it mentions Tate '67).

I think the term "hereditary substitution" had not been invented when Proofs and Types was written. It seems to have originatedA3,A4,A5 in a paper by Watkins et al. that was published in 2004.A6 However, the method used to prove the weak normalization theorem in chapter 4 of P&T also makes crucial use of the fact that, in STLC, the type of a redex is smaller than the type of the corresponding eliminatee, e.g. for applications f t: T, we have f: S → T, and for projections πᵢ t: Tᵢ, we have t: T₁ × T₂. Both techniques leverage this fact by using the types of eliminatees in redexes as part of their induction/termination measure.

The origins of the term "logical relations" seem to go back much further, according to a thread on the TYPES mailing list at least as far as 1974, so I'm not sure why it is not mentioned in P&T.

If the barrier applies, it would suggest considering logical relations for transitivity elimination, but I've never heard of those.

Neither have I.

Parroting termination of STLC/System F, they'd be like "this subtyping proof is itself normalizing, and even the result of applying it to an assumption normalizes". What's confusing is that the property, it seems, would not be defined by induction on some "proof type", since you don't have internal abstraction on assumptions (unless you add lambdas, as sketched above), but by induction on the context. That's not novel though: logical relations are defined on closed terms usually, but they can be extended to open ones, and PFPL's chapter on "Equational Reasoning for T" (either edition) does exactly that, by defining open logical equivalence.

I don't understand this comment. What exactly do you mean by "applying it to an assumption"? And why would the relation not be defined by induction on types/kinds? The way I see it, there would be four mutually inductively defined relations/predicates (one each for terms, types and the corresponding coercions). Since coercions are typed/kinded by two types/kinds, the corresponding relations would likely be defined by induction on e.g. the upper bound and case-analysis on the lower, or vice-versa. E.g. definitions might look something like this:

  • p ∈ R⟦{ K₁ }⟧ if p! ∈ R⟦K₂⟧.
  • α ∈ R⟦T <: { K }⟧ if, for all p, p ∈ R⟦T⟧ implies (p ↑ α)! ∈ R⟦K₂⟧.

Needless to say, this seems rather non-standard. For the arrow cases, predicativity/dependency becomes an issue, e.g. the following definition is not inductive in the upper bound:

  • Φ ∈ R⟦(x: S₁) → K₁ <: (x: S₂) → K₂⟧ if, for all T and p, T ∈ R⟦(x: S₁) → K₁⟧ and p ∈ R⟦S₂⟧ implies (T ⇑ Φ)! p ∈ R⟦K₂[x:=⌞p⌟]⟧.

and we need to instead parametrize the (reducibility) relation on K₂ over relations (i.e. reducibility candidates) on S₂. Normalization proofs for e.g. the calculus of constructionsA2 have to deal with similar dependency-related issues and might serve as a starting point here.

For now, I have no idea how to deal with the order-related operations (reflexivity, transitivity) in this setting, nor how to represent recursive types/paths.

6. Conclusions and future work

I have more to say about

  1. pros and cons of encoding coercions as lambda terms,
  2. predicative variants of HK DOT,
  3. a simplified calculus where types and terms are stratified,

but this comment is already way too long, so that's for another time.

Additional references

  1. A. Abel and D. Rodriguez, Syntactic Metatheory of Higher-Order Subtyping, CSL '08. http://dx.doi.org/10.1007/978-3-540-87531-4_32
  2. T. Coquand, J. H. Gallier, A Proof of Strong Normalization for the Theory of Constructions Using a Kripke-Like Interpretation, CIS Tech. Report (MS-CIS-90-44), University of Pennsylvania, 1990. http://repository.upenn.edu/cis_reports/568/
  3. K. Crary, A Syntactic Account of Singleton Types via Hereditary Substitution, LFMTP '09. http://www.cs.cmu.edu/afs/cs/user/crary/www/papers/2009/synsing.pdf
  4. R. Harper, D. R. Licata, Mechanizing Metatheory in a Logical Framework, in JFP 17(4-5), 2007. http://dx.doi.org/10.1017/S0956796807006430
  5. C. Keller, T. Altenkirch, Hereditary Substitutions for Simple Types, MSFP '10. http://doi.acm.org/10.1145/1863597.1863601
  6. K. Watkins, I. Cervesato, F. Pfenning, D. Walker, A Concurrent Logical Framework: The Propositional Fragment, TYPES 2003. http://dx.doi.org/10.1007/978-3-540-24849-1_23

@Blaisorblade
Copy link

Blaisorblade commented Nov 7, 2018

What I wrote doesn’t work, but was the first step in understanding what is happening. And I finally took the time to understand this remark:

The example is a variant of the well-know technique for typing arbitrary lambda terms in a system with (iso-)recursive types (see e.g. TAPL, ch. 20, pp. 273). Here, we lift this technique to the kind level to allow kinding arbitrary lambda expressions on the type level. Then we construct a divergent type expression and kind it with absurd bounds.

In both cases, non-termination is caused by the negative recursion, which needs to be controlled, and the restriction that Sandro proposes also happens to forbid such negative recursion, or at least direct occurrences of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment