Skip to content

Instantly share code, notes, and snippets.

@gasche
Created April 2, 2019 08:30
Show Gist options
  • Save gasche/fa9aa3eb4c9abd4e75df4d4db60b3cb5 to your computer and use it in GitHub Desktop.
Save gasche/fa9aa3eb4c9abd4e75df4d4db60b3cb5 to your computer and use it in GitHub Desktop.
FSCD2019 author response
We thank the reviewers for their work on our submission and their
thoughtful comments. Our response is in two parts: discussion of
high-level questions, then more reviewer-specific comments. The
document is designed to have the most important points first, so that
you can stop reading at any point.
# High-level questions
First, we note why we think it would be nice to see this paper at
FSCD.
FSCD is a conference that is dear to us (not least for its continuous
Open Access policy, which is unfortunately a rarity in this area), and
we think that this work could nicely support the general direction of
reasonably broadening the original TLCA+RTA scope: our work justifies
a real-world improvement in a real-world programming language through
a simple but non-routine meta-theoretical development -- yet another
argument in favor of rigorous formalism in programming language
design.
Second, we think that there is an insight in this work that relates to
the discussion with reviewer 2 on "declarativeness". Existing work
(and earlier iterations of the present work) on expressing program
analyses on the type system tend to propose judgments of the form
Γ ⊢ e : τ ▹ ϕ
or
Γ ⊢ e : τ ▹ e' : τ', ϕ
where Γ, e, τ are inputs, ϕ describes the analysis results, and (e' :
τ') in the second formulation describes re-annotations of the inputs
e, τ refined with the analysis results. Instead, we present our static
system as
Γ ⊢ e : τ
and we use the freedom of deciding that Γ may be an output instead of
an input (as reviewer 2 remarks, our system is uni-typed; if we wanted
more types for better precision, Γ would contain both inputs and
outputs) to get a system that expresses the same algorithmic content
with a simpler, more declarative, presentation. Our meta-theoretic
evaluation (proving soundness with respect to the operational
semantics of [Nordlander, Carlsson and Gill, 2008]) validates the idea
that this presentation is reasonably easy to reason about (note in
particular how the subject-reduction proof is completely obvious to
the input/output distinctions in the system), and our programming
experience validates the idea that this is still conductive to an
implementation (which was merged into the OCaml compiler
implementation).
Let us now answer the salient comments and questions from our reviewers.
## Bugs avoided in the OCaml compiler (review 1)
Reviewer 1 asks about OCaml compiler bugs that this work eliminates.
We offer the following list:
- [#7231](https://github.com/ocaml/ocaml/issues/7231): let-rec
well-formedness check too permissive with nested recursive bindings
(this issue is directly solved by our more modular/composable
approach of a type system rather than a list of allowed/disallowed
scenarios)
- [#7215](https://github.com/ocaml/ocaml/issues/7215): Unsoundness
with GADTs and let rec
(This issue, discussed in §6.4, justifies analysing a typed
representation of the language, rather than an intermediate
representation that does not represent GADT type equalities.)
- [#4989](https://github.com/ocaml/ocaml/issues/4989): Compiler
rejects recursive definitions of values
`let rec f = let g = fun x -> f x in g`
The other bugs in this list allow invalid programs that
segfault. This bug is a case of lack of expressivity of the
previous check; it is trivially solved by our type-system approach.
- [#6939](https://github.com/ocaml/ocaml/issues/6939): Segfault
with improper use of let-rec and float arrays
Our §6.2 describes this issue
Besides fixing (at least) these bugs, checking recursive definitions
at the typing phase rather than on an immediate representation made
life much easier for tools that reuse the OCaml front-end, e.g.
* MetaOCaml (which previously disallowed value recursion within
quotations, as it could not check them)
* Merlin (an IDE tool providing edit-time type checking, which did
not report value recursion errors).
## Declarative vs. Algorithmic presentations (review 2)
We agree with Reviewer 2 that the presentation is neither "fully
declarative" nor "fully algorithmic", and that having separate
declarative and algorithmic version would be nice. But we don't think
this is reasonable within the constrained FSCD space budget. Our
system takes one half-page figure; having two systems, plus
explanations to relate them, would take another page, and we don't see
any portion of the present article that is less important than having
those two presentations instead of one reasonable compromise.
Also, in the details we respectfully disagree with the reviewer 2 on
the placement of the boundary between "declarative" and "algorithmic",
which is somewhat subjective. For example, we would defend the choice
of using a multiplicative style, explicitly merging the contexts of
separate subexpressions. We view this as being mindful about the
respective contributions of subterms, rather than a
redundancy-introducing algorithmic choice. In fact, reference [2] uses
this same style for the application rule, which supports our intuition
that it is a natural presentation choice for this family of type
systems.
The one rule that we agree is objectively algorithmic in its
presentation is the rule for mutual recursion:
```
(G_i, (x_j : m_i,j)^j |- t_i : Return)^i
(m_i,j <= Guard)^i,j
forall i, (G'_i = G_i + \Sum_j m_i,j[G'_j])
-------------------------------------------
(x_i : G'_i)^i |- rec (x_i = t_i)^i
```
Here, the G' are defined from the G as fixpoints of a system of
equations, while it would be enough (less constrained) to specify,
through inequalities, a correctness condition on a single family of
contexts:
```
(G_i, (x_j : m_i,j)^j |- t_i : Return)^i
(m_i,j <= Guard)^i,j
forall i,j, G_i <= m_i,j[G_j]
----------------------------------
(x_i : G_i)^i |- rec (x_i = t_i)^i
```
This formulation is more declarative, but we found it less intuitive /
readable: it would have been the only rule where constraints on the
context in a sub-goal would flow from the rule around it, instead of
the other way around. So we made the conscious choice to be more
algorithmic there.
As researchers we value declarative presentations because they are (1)
easier for humans to understand and (2) (often) easier to reason about
in the meta-theory. This particular type system is currently never
shown to human users of OCaml (the check either passes or reports an
error for the offending variable), and its meta-theory analysis is
done. Finally, our reference implementation serves as the ultimate
algorithmic presentation. These explain why exploring the
declarative/algorithmic design continuum was not a priority in our
presentation of the work.
## Related work on modal type systems (review 2)
We thank reviewer 2 for the extra references (we only knew of [1],
[3] and work related to [4] by the same authors) and will try to
extend the Related Work section as suggested.
> "apply [our technique to other systems - type parameter variance"
>
> Yes, and there is related literature, you may start looking at [1,2,5].
By "our technique" we mean "presenting a backward-analysis algorithm
as a right-to-left type system". [1], [2] and [3] demonstrate the
flexibility of modal type systems (in complement to the work on
Nakano-style modal calculi), but they do not propose this technique in
the context of a program analysis.
Relatedly, [4] structures the evidence of a backward analysis as
a modal type system, which is related, but it does not use
a "right-to-left type system" presentation, it returns the result of
the analysis as a "demand" on the right of the judgment. As mentioned
earlier, we propose to represent this syntactically as just the
context of a type judgment. (This is a presentation technique rather
than a deep difference in what is being computed, but we think that it
does make a nice difference, and in a sense that's what is really
"declarative" about our work.)
## Comparison with call-by-need (review 2)
> Shouldn't be there similarities with call-by-need? The concept of
> "needed redex" seems related to your redexes in forced evaluation
> contexts. There are differences as well, since you consider a term
> vicious not only when it is stuck...
>
> Anyway this is just a hunch; a comment on the relation to
> call-by-need in the related work section would be useful.
Don Syme proposed a translation of value-letrec into lazy thunks for
F#. Using a source-level call-by-need operational semantics was our
first idea to specify the dynamics to compare our type system against,
but it results in a more complex presentation than the one we adapted
from [Nordlander, Carlsson and Gill, 2008].
In call-by-need calculi, there is also a store (typically for
not-yet-evaluated terms), and indeed it has been presented as a global
thunk store (as implemented in programming languages). But the nicer
presentations for reasoning, starting with "The call-by-need
lambda-calculus" by Ariola, Maraist, Odersky, Felleisen, Wadler, use
local stores / explicit substitutions. We have local stores as well
("let rec"), not to delay computations (we are strictly
call-by-value), but to represent cycles through the memory. Note that
explicit substitutions are a common meta-theoretical technique in
lambda-calculi that is not specific to call-by-need (call-by-need,
being more complex to describe than call-by-{value,name}, tends to
require fancier techniques).
The notion of "forcing context" does sound related to notions that
occur in call-by-need, but it is also not uncommon in call-by-value
settings when you want to precisely study errors. For example, our
definition of Mismatch (using E[H] has a notion of forcing context)
was inspired by the following work, agnostic to the evaluation
strategy:
System F with Coercion Constraints
Julien Crétin and Didier Remy, 2014
http://gallium.inria.fr/~remy/coercions/Cretin-Remy!fcc@lics2014.pdf
first paragraph of the right column of page 3
> What is the relation between "forcing contexts" and needed redexes?
They are quite different notions, because we are in a call-by-value
setting. A context is "forcing" when (1) it is in evaluation position
and (2) its evaluation requires the value of its hole. For example:
- `(□, u)` is neither needed in call-by-need nor forcing for us
(by "needed" we mean that a redex in the hole of this context would
be a needed context)
- `□` is needed but not forcing (in terms of our type-system it is a Return
position, not a Dereference)
- `(1, □+2)` is forcing (with strict pairs) but not needed
## Details on the operational semantics (review 3)
It was indeed our intention to avoid an "assoc" rule, so that the
example of review 3
let rec x = (let rec y = 1 :: x in 2 :: y) in 3 :: x
is precisely a normal form that describes a value containing cycles,
with each cycle defined as locally as possible. (This is just a
presentation choice and an "assoc" rule would work just as well; in
fact it would probably have been wiser to directly reuse this aspect
of the presentation of [Nordlander, Carlsson and Gill, 2008], which
use a "merge" rule for that purpose. The rest of this response
describes how the calculus works without needing to explicitly merge
let-rec blocks.)
Note that our "lookup" rule may traverse the whole term from the
lookup point to the root, in search of the value binding that defines
this variable -- we don't need to merge value bindings to find the
value in outer/older bindings.
That said, there were two small mistakes in the rules described in our
submission, related to the lack of a grammar for contexts built only
from fully-evaluated `let rec` blocks:
L[□] ::= □ | let rec B in L[□]
(With this L[□] defined, our forcing contexts E_f[□] can be defined
from forcing frames F_f[□] in a simpler way: E_f ::= E[F_f[L]].)
Mistake 1: the grammar of values should allow L[v] as a strong value,
and L[w] as a weak value.
Mistake 2: the rules for head reductions should allow for L-contexts
in the middle of the redex, corresponding to a somewhat standard
"reduction at a distance" rule in explicit-substitution calculi:
L[λx.t] v -{hd}-> L[t[v/x]]
(match L[K (wᵢ)ⁱ] with h, K (xᵢ) → t, h') -{hd}-> L[t[wᵢ/xᵢ]ⁱ]
(For a reference on "reduction at a distance", see for example the
Linear Substitution Calculus introduced in "An abstract factorization
theorem for explicit substitutions", Beniamino Accattoli, 2012.)
With these two fixes in place, let us describe a more interesting
example by defining
B :=
rec hd = λli.
match li with Nil -> None | Cons (x, xs) -> Some x
and cycle = λx.
let rec li = Cons (x, li) in li
t :=
let rec B₀ in hd (cycle 1)
We have, as one possible reduction path for t
let rec B₀ in hd (cycle 1)
-{lookup hd}->
let rec B₀ in
(λli. match li with Nil -> None | Cons (x, xs) -> Some x)
(cycle 1)
-{lookup cycle}->
let rec B₀ in
(λli. match li with Nil -> None | Cons (x, xs) -> Some x)
((λx. let rec li = Cons (x, li) in li) 1)
-{app}->
let rec B₀ in
(λli. match li with Nil -> None | Cons (x, xs) -> Some x)
(let rec li = Cons (1, li) in li)
-{lookup li}->
let rec B₀ in
(λli. match li with Nil -> None | Cons (x, xs) -> Some x)
(let rec li = Cons (1, li) in Cons (1, li))
-{app}->
let rec B₀ in
(match (let rec li = Cons (1, li) in Cons (1, li)) with
| Nil -> None
| Cons (x, xs) -> Some x)
-{match}->
let rec B₀ in
let rec li = Cons (1, li) in
Some 1
# Local specific questions from reviewer
## Review 1
> You basically want to say that information flows from the
> expression to the variables. This is usually referred to as the
> "checking" direction in bidirectional typing, as opposed to the
> "inference"/"synthesis" direction. Thus, you could also say "A
> type checking system for...".
In the bidirectional type systems that we are familiar with, the
context is known, and the "checking" direction moves inwards inside
terms by following constructors, and stops before reaching neutrals
(in particular variables) which are inferred, not checked. It is thus
not clear that "checking" would carry the right intuition here.
> p7 Why is "(λx.□)t" a deref context and not a return context? Should
> it not be equal to "let x = t in □" ?
Reduction can lower mode constraints, so anti-reduction can strengthen
them. With our mode languages, (□ t) is a Dereference context that is
indistinguishable, from the modes, from (f □); it is natural that
(f(λ.□)) be a Dereference context (f may call the function and inspect
the result), and ((λx.□)t) is handled in the exact same way.
If we wanted to refine the mode-checking of this particular example
(we do not), we could introduced new mode "Call" that sits between
Return and Dereference, is only used on the left of function
applications, with Call[Delay] = Return.
> Maybe a semantics using a heap more explicitly would make the
> presentation clearer!? See, e.g., [4], or related machines for
> call-by-need, like the Sestoft machine.
We think that there is conceptual value in staying as close to the
source calculus as possible. Of course, "clearer" depends on reader
familiarity, and some readers are already more familiar with
abstract-machine presentations. In term of economy of concepts, we
would argue that this presentation is better.
> Since you have only shallow matching, you could have instead
> required disjointness of patterns in the syntax of "match".
Thanks for the suggestion.
## Review 2
> (One approach to remedy the situation would be to prove that the
> paper's semantics is correct with respect to a more conventional
> semantics. But this would be beyond the scope of the paper.)
Yes, we asked ourselves the same question and have written down a
simple compilation scheme for our source language into a λ-calculus
with a global store, and a sketch of a bisimulation argument between a
source program and its compiled form. (It doesn't fit in the paper.)
For the record, our target calculus is defined as follows:
t ::= x, y, z
| new (xᵢ)ⁱ; u
| (xᵢ←tᵢ)ⁱ; u
| λx.t
| t u
| K (tᵢ)ⁱ
| match t with h
v ::= K (wᵢ)ⁱ | λx.t
w ::= x | v
[[let rec (xᵢ = tᵢ)ⁱ in u]] ::= [[new (xᵢ)ⁱ; (xᵢ ← tᵢ)ⁱ; u]]
In this calculus, store locations are just variables (we don't have
first-class reference cells that can be returned from a function, so
no need for a separate class of runtime locations), and (xᵢ←tᵢ)ⁱ
represents parallel, unordered writes: reducing one of the writee and
committing a fully-evaluated write (x←v) to memory may happen in any
order.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment