dorchard/esop-2021-notes.md

## esop-2021-notes.md

      
    Raw
  

              esop-2021-notes.md
            
          
    My (partial and personal) notes from ESOP 2021.
Disclaimer: There may well be errors and omissions here, so please do read the original papers
instead of relying on my notes! Apologies to the author's for any misrepresentation.
Due to other pressures I am not able to take notes on all the talks, and some notes are partial.
Session 1 - Probabilistic Programming and Verification (Chaired by Sam Staton)

Automated Termination Analysis of Polynomial Probabilistic Programs - Marcel Moosbrugger et al.

Starting example probabilistic programming
x := 0
y := 0
while (x^2 + y^2 <= 10^2)
  y := y+1 [0.5] y-1
  x := x+1 [0.5] x-1

Random walk. Does it terminate?
AST = Terminates with probability 1 (Almost Sure Termination)
PAST = Finite expected runtime (Positive Almost Sure Termination)
Previous work had proof rules to classify programs that are PAST or AST (via some kind of
constraint-based ideas). This work relaxes this to add the caveat of eventually by
restricting the syntax of the programs:

Acyclic variable dependencies in loop bodies
Polynomial updates allowed

Hard part is to compute bounds on program variables (that are written as monomials)
Classic technique for determinstic programs is to represent imperative loops as recurrence
relations and use a recurrence relation solver to get a closed-form expression for how
variables behave (asymptotic bounds). But in the probabilistic setting this needs generalising to stochastic
recurrences where there is a kind of 'cone' of bounds.

Treat all variables as a single variable to create a monomial
Solve the 'inhomogeneous' parts recursively (does inhomogeneous = heterogeneous?)
(this terminates because there are no cycles in the dependencies of loop dependencies)
Gives us an asymptotic lower-bounding function and upper-bounding function

This feels like it can be used in deterministic programs with conditionals to computer lower- and
upper bounds: I asked this and Marcel seemed positive about that idea (he pointed out that
probabilities become not that relevant anymore when they are doing the solving part described above).
This was a very clear talk. I learnt quite a bit!
Correctness of Sequential Monte Carlo Inference for Probabilistic Programming Languages, Daniel Lundén et al.

Probabilistic models are popular in many fields in the form of inference problems (see in things like
Machine Learning, Phylogenetics, Topci modelling).
Workflow: user writes an inference problem as a program, compiler applies inference algorithm to this, e.g.
Sequential Monte Carlo (SMC), Markov chain Monte Carlo (MCMC), and others.
Problem: when should resampling be performced during SMC? This seems to be done after the 'weight'/'score'
procedure in most dominate approaches (in Probabilist Programming Languages, PPLs), but with some slight variations.
Essentially there is a tradeoff: runtime overhead vs. inference accuracy, and a question: how does it affect
the correctness of the result?
Contributions:

Extended Borgström et al. ICFP 2016 calculus with a 'resample' construct.
Relate this (in what way?) to a generic SMC formulation
Extends SMC correctness result to the calculus
Oh no I missed this one.

Previous focussed more on algorithm implementation and efficiency rather than correctness (although SMC correctness
has been studied somewhat before). There was a nice related work summary here.
Example: model of aircraft localisation.
Initial position X0 ~ U(0,100)   (uniform distribution)
Transition model: Xn ~ N(Xn_1 + 2, 1)   (normal distribution capturing velocity + 2)
Observation model: Yn ~N(elevation(Xn), 2)  (normal distribution with noise)
Problem: Estimate X_n | Y_{0:n}
How is SMC inference applied:
1. Take a bunch of initial samples
2. "Weigh" samples using the observational model -
and resample (sample from the samples) and throw away the ones with non-zero weight
3. Propagate the samples forwards
4. Eventually this helps us determine the correct position
The meat is in step 2. The setup is then based on an untyped
lambda calculus + (real?) arithmetic + sampling (sample_N(0, 1) = draw a random sample
from normal distribution with mean 0 and variance 1) + weight (observe) + resample
(I guess we can see weight/resample/sample as effectful operations).
SMC is approximating a target distribution, so adding resample into our program should
help us converge to this. Theorem : if the number of calls to resample has an upper
bound N then we converge to the target distribution after N. There were a couple
of other theorems I didn't capture here.
Previous correctness results were extended to their calculus9 (nice) and the 'bootstrap particle
filter' was formalised in this approach.
Untypedness is key so that recursion is directly encoded.
Bayesian strategies: probabilistic programs as generalised graphical models, Hugo Paquet

Graphical model = directed graph of random variables (with their types) where edges = dependencies
between the variables. e.g., if no edge between two nodes then independent.
P(A,B,C) = P(A) * P(B|A) * P(C|A) if we know that the B depends on A and C depends on A only.
We can associate distributions then to these variables. e.g., with the above dependency structure
then we may have that:
A = sample (Normal 0 1)
B = sample (Poisson A)
C = sample (Normal A 1)

(note the dependency here in the probability distribution functions). Dependency is a static property
that can be thus represented in this graphical way.
But what about capturing data-flow and control-flow dependencies (these are two kinds of dependency):
add another kind of edge to capture the control flow.
The observation here is that these graphs are (Bayesian) event structures:

A set of nodes (with measure space)
Control-flow relation
Data-flow relation
Probability disitribution per node (I guess this is the Bayesian part)
Conflict relation (used to model branching where there are two copies of a node to represent a random
variable occuring in two control flow branches: draw a conflict edge between the two copies).

Give semantics to programs using these Bayesian event structures (actually Bayesian strategies representing as a denotational
category model) as the model (Hugo marked the input and outputs in these models but I wondered whether that was really needed?). When multiple inputs are involved this was represented by having a bidirectional control-flow edge between them. I wondered why not represent this as a conflict edge.
Hugo proceeded through three examples which were really useful. Higher-order functions can also be represented.
A Bayesian strategy is Bayesian event structure + input/output nodes + a partial map to the game representing
the type of the program which ignores all the details but just gives us the interface of inputs/output.
Composition is then represented in then the substitution / cut between two strategies.
Densities of almost-surely terminating probabilistic programs are differentiable almost everywhere - Carol Mak, Luke Ong, Hugo Paquet, Dominik Wagner

https://arxiv.org/pdf/2004.03924.pdf
Presented by Dominik Wagner
Probabilist programming goal: make Bayesian machine learning more accessible via probabilistic programming languages.
PPLs = regular languages + sample + observations. Then apply inference algorithm to the result.
Open problem posed by Hongseok Yang at FSCD 2019: what
class of statistical probabilistic programs have densities that are differentiable almost everywhere?
Answered here in the negative! That is:
The value- and weight-functions are differentiable amost everywhere provided that
1. the program terminates almost surely
2. primitive operations are well-behaved (I didn't catch exactly what this meant)?
Their approach leverages symbolic execution.
A nice operational semantics example was given showing the semantics of weight which is
accumulated in the operational model. A program has the weight given by this accumulation if it terminates. Otherwise it is 0.
The unary score primitive multiplies the accumulated weight by its argument.
Things can be tricky if we allow sampling over arbitrary functions: e.g., what if we take
the characteristic function of the rational (Dirichlet function): this is not differentiable everywhere. So one of
the assumptions is that all distrubiton functions are differentiable everywhere.
let x = sample in if f(x) < 0 then score(0) else score(1)

But this is not differentiable over the inverse image of f between -inf and 0. This leads to another assumption about functions having "a measure 0" in their image over this range (missed a detail here).
Recursion is also tricky so there is the requirment for almost sure termination.
(I had to run to meet a student so I missed the end of this talk sadly but I was really enjoying it: the operational
semantics model seemed neat (I was skimming in the paper at the same time)).
Session 2 - Session types and concurrency (Chaired by myself)

Types for Complexity of Parallel Computation in Pi-Calculus (Patrick Baillot and Alexis Ghyselen)

(talk given by Alexis)
Obtain time complexity properties/bounds via a type system.
Typically, given a t Nat -> Nat function for an integer n we can extract a bound on the computuation time for t n.
Complexity in a Calculus with Parallelism:

Work (total time complexity)
Span (time complexity with maximal parallelism)

Their type system for the pi-calculus gives a way to analysis work and span.
Pi-calculus is a key fundamnetal calculus for concurrency, which can encode functional
languages, and has dynamic creation of processes and channels.
They extended the syntax of the pi calculus with a tick prefix, e.g., tick.P for process P
for exposing information about complexity, but it sounds like its semantics is slightly parametric.
Some examples were given:


n-ticks in parallel gives Work = n and Span = 1


Two ticks with one guarded by a beta-redux gives W = 2 and S = 1


For a more complex program we see that we take the worst-case
       a().tick.Po | tick.a().p1 | 'a<>   S = max(1 + C0, 1 + C1)


With recursion then
!(a).tick.(if (n=0) then 0 else 'a | 'a)


gives us W = O(2^n)but Span  = O(n) linear
Normal reduction rules give 0 cost but
tick.P -->1 P
Work = maximum number of ticks
New constructor, n : P means P with n ticks coming before (a bit like a time delay).
This has some structural congruence rules.
Simply-typed pi-calculus which has basic primitives types and channel types parametrised by the
type of message that can be communicated over that channel. Enrich data types with sizes so that
there is a way to give cost information that relates to the size of inputs (indexed natural
numbers and list types).
Typing contexts Phi; Gam |- P <| K then the worst case time completely use of process P is K
Replicated received get's a higher-order type parameterised by sizes, e.g., for the previous
example the a replicated channel has type forall i . serv^(2^{i + 1} - 1} T where i
is free in T.
Subject reduction says:
      G |- P <| K and P -->1 Q then G |- Q <| K - 1

i.e., taking a step which has a tick reduces the complexity by 1.
Then another type system is given for span which has slightly different rules which need to account
for ticks across all channels/process?
Part of subject reduction for the span type system is then that:
   G |- P <| K and P = n : P_1 | P_2 then K >= n

(there was a second part but I didn't get a chance to write it down).
How does it relate to bound
Session Coalgebras: A Coalgebraic View on Session Types and Communication Protocols (Alex Keizer, Henning Basold and Jorge A. Pérez.)

(there were some technical difficulties with the platform so I missed a bit whilst I was working with the backstage team to resolve this).
Example based on client and server.
Use the idea of coalgebras to represent the process which has next state (could be branching from an internal
or external choice). This was demonstrated neatly from the server's perspective an then the dual client's
perspective. (My observation: this ends up looking like a CFSM - communicating automata but where the nodes
are labelled with the kind of the operation (send/receive/branch/etc.) rather than the edges).
Now going to higher-order (delegation: channels over channels) how do we represent the label on an edge which is
itself an automata? The solution uses two kinds of edges now: data transitions (which were shown in blue) and then
continuation transitions (what happens next) (show in black) with an additional arrow linking the data transition
and the continuation transition. This capture the delegation quite neatly (with some though perhaps!)
Then unrestricted channels were considered where the linearity constraints are relaxed and values can be shared/dropped. A new kind of marker state un is then introduced to the coalgebraic view.
Consider two processes:
 T = mu X . ?int.X
 U = ?int . mu X . ?int . X

These have different coalgebra but they are bisimilar so we use the usual coinductive approach to bisimulation.
Duality was given a coinductive definition as was subtyping.
Nested Session Types (Ankush Das, Henry DeYoung, Andreia Mordido and Frank Pfenning.)

(talk given by Henry and Andreia)
Study type equality for types that are: structural (not generative), nested (not regular),
coinductive (not inductive), equirecursive (not isorecursive).
Consider two session types:
 type List(a) = select{ nil : 1, cons : a * List(a) }

 type F(a, k) = select{ nil : 1, cons: a * k }

Want to find out if List(a) = F(a, List(a)). This requires us to check the equality of nested types
(which in a generative setting would be difficult).
Another example recalls ideas from pushdown automata:
type S = select{ a : T[1] }
type T(k) = select{a : T[select{b:k}], b : k}

This example relates to pushdown automata for a^n+1 b^n+1 (for n >= 0): type type parameter tracks the bs needed (- 1):
we can see it like a stack where seeing an a pushes and then we pop off the same number of b.
An even more involved example was shown to serialize binary trees into a flat form!
So how is equality for these types defined: two types A and B are equal if A and B belong to a type bisimulation
meaning that the types A and B permit the same behaviour. This is defined via a notion of type unfolding (wasn't
able to capture the definition here in time).
Nested Session Types are contractive which allows them to be translated to the the production rules of a first-order
grammar (see Jancar 2010): reduce type equality to trace equivalence for first-order grammars (which is decidable, but
has double-expontential complexity!). So instead, this work defines a sound (coinductive) algorithm for approximating type equality.
The technical overview of this is that type definitions are expanded, equality of labels are checked (when considering
branches/etc.), compare each component and compare any arguments, in a way which seemed a bit like a bisimulation proof to me.
This was demonstrated by an example that succeeds and then one that fails. The algorithm combine structural equality (where
it can be applied) and nominal equality (for type names) with also some expansion with an upper bound given so that termination can be ensured (this means that the approach is not complete in general: equality can be 'inconclusive').
The algorithm can be seeded (by a user) with auxiliary definitions which helps to alleviate limitations due to the completeness.
Interesting discussion about what it would be like if we have existential types (asked by Nobuko) and some discussion about applying the algorithm back to first class grammars (seems plausible).
Checking Robustness Between Weak Transactional Consistency Models (Sidi Mohamed Beillahi, Ahmed Bouajjani and Constantin Enea)

(talk giving by Sidi)
In databases, transactions are used to allow a group of operations to execute atomically without interference. Strong consistency says we want to be able to serialise the execution of multiple transactions.
Weak consistency says that we want to relax things so that we can get some parallelism but writing correct applications to work on top of weak consistency is difficult: trade off between performance and correctness.
This work:
* Gives a new technique for charactierzing violations of robustness in the weak consistency setting
* Approximated method for proving robustness
* Gives an experimental evaluation
One approach to weak consistency is snaphost isolation (SI) where transactions are given a snapshot of the world to work on independently (in parallel) then parallel transactions commit only if they write to different variables.
Another approach is prefix consistency (PC) again based on independent snapshosts and writing to the same variables is allowed as long as the values are the same (My thought: reminds me a bit of CRDTs but with a partial join oepration).
Robustness definition:
A program is robust if it behaves the same if we replace the underlying model of weak consistency with a 'weaker' one, e.g., substitution SI with PC (which is wekaer).
In the approach here, code instrumentation was used to track happens-before relations and to ensure that certain kinds of behaviour are not provides (e.g., no two success RW dependencies).
They then consider robustness against substitute PC for Causal Consistency (CC). The notion of traces was employed which had a flavour of axiomatic semantics which I am always a bit slow to undertand so I missed a few of the details here (had to think rather than type!)
The experimental evaluation took 8 applications from the weak consistent databases literature. Used Boogie to do the instrumentation. Found it  is quite quick to find a counter example to robustness. The experiments show that they can prove the programs robust or find counter examples quickly on all the benchmarks.
A cool further work mentioned was to use this reductions to reachability to guide insertion of synch primitives between transactions running in parallel- that would be really neat.
Session 3 - Category Theory and Type Theory(Chaired by Tarmo Uustalu)

Graded Hoare Logic and its Categorical Semantics (Marco Gaboardi, Shin-Ya Katsumata, Dominic Orchard and Tetsuya Sato)

My paper
Graded Modal Dependent Type Theory (Benjamin Moon, Harley Eades III and Dominic Orchard)

My paper
Question

Identity types
Other uses for grading

Complete trace models of state and control (Guilhem Jaber and Andrzej Murawski)

(presented by Guilhem)
(I was a bit distracted after my talk so I didn't capture as much here).
Develop labelled transition systems (LTS) models of ML languages (CBV) with effects (organised into a hierarchy based on having higher-order references or just ground-type references, and call-cc or no control operators).
There work is based on 'operational game semantics': interaction between program and the context are
traces generated by a bipartite LTS (it's bipartite because there is a program as a player and the context as opponent). These interactions will be described by exchange of values.
Names are used to represent values in order to be less intensional (more extensional) (there is also no eexchange of locations or continuatiosns which sounds like it reduces a bit the higher-order nature).
They use a language called HOSC which let's us configure the different choices of effects.
This seems like quite a flexible model that could be used to do lots of nice reasoning about interesting effects. Things that spring to mind that I'd like to ask about is sequentially realizable functions and comparing generl recursive behaviours, e.g., Landin's knot vs Y combinator.
Temporal Refinements or Guarded Recursive Types (Guilhem Jaber and Colin Riba)

(Talk by Colin)
Guarded recursive definition of streams:
      Stream A = A * Later (Stream A)

The Later modality is used here to guard the tail of the stream.
Typing of
      x : Later A |- M : A
     -----------------------
      fix (x) . M : A

This gives a simple productivity check.
Here they use refinement types of the form {A | phi} where A is guarded recursive type
and phi is a formula of the modal mu-calculus. The idea is to focus on logical formulae
refining the types.
For example, they can capture that:
    map : ({A | phi} -> {B | psi}) -> {Stream A | Box Diamond phi} -> {Stream B | Box Diamond psi}

i.e., mapping over a stream means that it is always eventually the case that phi holds.
The system has rules like this:
           x : {A | phi}          xs : Later (Stream A)
           ---------------------------------------------------
             (x :: xs) : {Stream A | Diamond phi}

This makes sense: we guard the formula phi by Diamond because we don't know that it always holds, only that it holds at the head.
Oooh they introduce a graded modality:
     Diamond^k phi = Or_{n <= k} Circ^n phi

to represent finite unfoldings...cool! Makes sense. Not integrated with indexed / sized types yet (I asked) but they think it could be adapted. The k could be linked to a size I think - but at the moment this has to be determined somewhat ad hoc, externally.
One note: type checking is not decidable.
Session 4 - Programming Theories and Testing (chaired by Jorge Perez)

(I missed quite a bit of this session as I had overlapping meetings. I caught a few talks but had to deal some things during so didn't get to write many notes).
Sound and Complete Concolic Testing for Higher-Order Functions (Shu-Hung You, Robert Bruce Findler and Christos Dimoulas).

(Shu-Hong gave the talk)
Concolic testing = testing + symbolic execution. Challenge is to generate higher-order arguments for this approach.
Concolic tester needs to generate inputs and to concretely test the program. At the same time a symbolic run is also computed (which annotates intermediate results with their symbolic forms). The concolic tests logs the results of branching and from this generates a logical formula which describes the data constraints of the execution.
By negating this formula and using SMT then we can satisfying assignments for examples.
Session 6 - Verification and Programming Language Theory (Chair: Marco Gaboardi)

Strong-Separation Logic Jens Pagel, Florian Zuleger

(Florian gave the talk)
Key to separation logic is separating conjunction * which connects separate memories. In this paper
they give this a new meaning called 'strong conjunction' which only allows splitting the memory
into parts where pointed to by stack variables.
(had some nice diagrams capturing the idea of
allocation heap location which has an out-going pointer which may be dangling, and parts which
are named by variables ont he stack).
This gives rise to: separation algebra and soundess of the frame rule (but normal separation conjunction
does this right?), agrees with standard semantics of symbolic heap fragments (ah so can be used for
symbolic execution), decidability, support for abduction, and some other things I missed.
Has atomic predicates: empty, points to, list segment, eq, neq.
This particularly logic was shown undecidable (Demri et al. 2018) but with strong separating conjunction
this becoems decidable (PSPACE-complete)! Ah cool!
(Symbolic Heap Fragment removes some parts of the logic hm. (used by most automated verifiers))
Florian then showed a model based on 'postive' and 'negative' chunks and the idea of an Abstract Memory State
for every pair of stacks and heaps which contains information about aliasing constraints, allocation of varaiables
within chunks, etc. (this got technical and I was digesting).
One question from the audience was whether they have tried to validate the improved automation support for symbolic
execution by implementing in tools like VST defining predicates that obey the restriction and the checking symbolic
execution leaves only a few simple side conditions (or none). Sounds like not yet but some possible things in the works.
An Automated Deductive Verication Framework for Circuit-building Quantum Programs (Christophe Chareton, Sébastien Bardin, François Bobot, Valentin Perrelle and Benoit Valiron).

Automatic formal verification of quantum prorams in Qbricks (recall: n-qbits have n coefficients such that the sum of their absolute squares is = 1). "Strange rules": no cloning (no duplication), destructive measure (collapses superposition), operations restricted to unitary operations (preserving norms of the bector) which can be represented in the standard way by a matrix.
Challenges: (1) double layer programming paradigm (2) non standard theories (probabilities, Kronecker product, etc.)
what complexity specifications and automated theorem proving support. Interesting that they follow a deductive verification framework here.
(and had to run to a meeting).