jtpaasch/Specification-Synthesis-Notes.md

## Specification-Synthesis-Notes.md

      
    Raw
  

              Specification-Synthesis-Notes.md
            
          
    Specification Synthesis

To Read


https://www.ccs.neu.edu/home/amal/papers/seminterop.pdf
https://arxiv.org/pdf/1711.03871.pdf
https://lamport.azurewebsites.net/pubs/hyper2.pdf
https://arxiv.org/pdf/2106.02628.pdf
https://www-sop.inria.fr/everest/Tamara.Rezk/publication/Barthe-DArgenio-Rezk-Journal.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7536379
https://plg.uwaterloo.ca/~olhotak/pubs/ecoop13.pdf
https://cs.stanford.edu/people/saswat/research/popl15.pdf
https://www.cs.utexas.edu/~isil/aplas2013.pdf
http://www0.cs.ucl.ac.uk/staff/p.ohearn/papers/popl09.pdf
http://www.cs.technion.ac.il/~yahave/papers/issta07.pdf
http://www.kroening.com/papers/esop2013-precond.pdf
http://www.cs.toronto.edu/~chechik/courses18/csc410/generate-specs-issta2002.pdf
https://www.ijcai.org/proceedings/2021/0570.pdf
https://www.doc.ic.ac.uk/~livshits/papers/pdf/pldi09.pdf
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.648.9876&rep=rep1&type=pdf
https://theory.stanford.edu/~aiken/publications/papers/pldi12b.pdf
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/paper-2.pdf
https://www.cs.purdue.edu/homes/suresh/papers/pldi07.pdf
https://www.cis.upenn.edu/~alur/popl05.pdf

Literature

Guttag, J. and J. Horning (1980). "Formal Specification as a Design Tool."

The authors argue that there are at least three concerns when developing software: (i) understanding the problem to be solved, (ii) designing a solution to that problem, and (iii) implementing that solution.
The authors believe (i) and (ii) are by far the most important, and should be done by a small group of experts. They also argue that step (i) is particularly important, because it is not obvious what questions should be asked. The biggest benefit of (i) is helping us understand how to formulate the questions, rather than the answers per se.
The authors note that asking questions about "correctness" are difficult to answer, because the answer must be given relative to some precise specification of intent, and such is seldom available.
In the paper, the authors present a preliminary version of a specification language for expressing designs.
Guttag, J., J. Horning, and J. Wing (1982). ``Some Notes on Putting Formal Specifications to Productive Use.'' In \emph{Science of Computer Programming} 2: 53--68.

This paper is just a set of personal reflections. The authors begin by stating that in their experience, using formal specifications did not help them produce better software. The problems came down to:
(a) It's difficult to establish a useful correspondence between the elegant formal systems and the untidy realities of writing and maintaining real code.
(b) Can't maintain a consistent attitude towards what counts as a `good' specification.
(c) One gets bogged down in the management of large specifications.
The authors work with the following definitions. There is a syntax and a semantics, and relation of satisfiability (the syntax satisfies the semantics).
If the syntax satisfies the semantics, the syntax is the specification, and the semantics is the specificand of (i.e., it is what is specified by) the syntax.
A specification (syntax) is satisfiable (or consistent) if there is at least one semantics that satisfies it.
We can think of a specification language as providing a notation (the syntax), the objects (the semantics), and a precise rule defining when the objects satisfy the notation.
The authors focus on specifications whose specificands are software/programs.
For each specification language, they claim that there is a ``semantic abstraction function'' that maps elements of a semantic domain into equivalence classes (this is a homomorphism).
The authors choose the particular semantic abstraction function so that it induces an ``abstract satisfies relation'' which is such that every time a program satisfies a spec, its equivalence class satisfies that spec.
The authors make some other distinctions. The semantic abstraction function essentially limits the kind of constraints that the specification can impose on the program. Some focus on preserving the program's \emph{behavior}, while others focus on preserving the program's \emph{structure}.
The authors note that sometimes, you find the right application-oriented abstractions, and you end up only needing a few of those to really specify what you want. If you can find these abstractions, that's a big step in converting complex specifications into simpler ones (p. 57).
The authors have a section where they ask a number of questions, noting that there seems to be a range of answers rather than any `right' set of answers.


The authors note that they often profit more from the process of constructing a spec than from the completed result. The process helps you find inconsistences and understand everything more precisely.


Constructing whole system specifications prompt questions that should be answered by the client, rather than the implementer. So, it's often a good idea to try and construct a system specification early in the process, where questions that arise are more likely to get answered by the client or system designer than by low-level engineers.


What benefits from the existence of a spec? It serves as a tangible record of what was understood at the time that it was written. A spec states an agreement betweeen vendors and clients, and perhaps even serves as a legal contract.


When should specs be written? Generally, as soon as the decisions that the specs record have been made. A first version of a behavioral spec should probably be done before the system itself is designed, and structural specs before the implementation because a structural spec helps implementers break the system down into units that can be implemented in pieces.


Who should write specs? You need skills to do it, but basically, the spec should be written as a collaboration of programmer and application expert (the client).


Who should read specs? You need skills to read it, though less skills than those needed to write it. The client should certainly understand it, though they may need an interpreter. The developers and designers of the system should also read it.


What properties should specs be evaluated by? Specs need not be logically complete, but they should not conceal oversights. (This is very hard to ensure in practice.) In evaluating specs, specifiers unfortunately do not have many good definitions of interesting properties. So specifiers have to usually fall back on sometihng much more low-level and specific to the case at hand.


What about tools? At the time of writing, the authors noted that the current tools are a significant bottleneck in the application development life cycle. There are two kinds of tools: mental ones, and `metal' ones. Mental tools include specification languages, methods, training, tactics, and most importantly, \emph{experience}. Still, a good set of software tools can help with the mental tools.
The authors note that a program might need to satisfy many different kinds of specifications (e.g., some dealing with their algorithmic complexity, others dealing with their parameters, others dealing with their running time, and so on). The authors suggest that it is unlikely that a single spec language can encode all of them at once (that wouldn't be a very readable spec). So, they suggest using many different specs in many languages, and then we have a program that satisfies a \emph{collection} of specs. The implication here is that we might end up with a ``full-fledged family of specification languages'' (p. 63).
A key issue here is the composability of specs. The meaning of combinations needs to be well-defined.
The authors themselves claimed to be developing a family of spec languages that includes one shared spec language and many interface languages. They say the shared spec language is a lot like common mathematics, and that common abstractions should be developed in that common language, while more specific stuff can be done in the interface languages.
What do the authors want to see in the future?


Syntax and type checker for spec languages, to catch mistakes and errors.


Semantic checkers, using mechanical theorem provers to automatically check that various semantic properties are present (or absent) and warn the user accordingly.


A library of properties, so that others can use them.


An editor, so that users can more easily write specs.


A viewer, to help make it easier to see them.


J. Wing (Date?). "Helping Specifiers Evaluate Their Specifications."

The author notes that one of the main problems that a specifier has is that when they are developing a spec incrementally, they get no feedback telling them whether their spec is ``on the right track.'' Such feedback would obviously be helpful.
One idea is to give the specifier a set of ``checkers.'' For instance, a syntax and type checker could help them find simple syntactic and type mistakes.
In the paper, the author discusses four different properties of specs that a ``checker'' might check.
The author follows their work in Wing 1983, which takes a two-tiered approach. The two-tiered approach distinguishes between describing specification abstractions, and describing more concrete state transformations. The specification has a shared language component'' (for describing abstractions) and an interface language component'' (for describing state transformations).
(FWIW, the spec language used is Larch, and it falls under the algebraic style of specification.)
The first property of a specification that Wing considers is consistency. There could be a ``checker'' that checks if the spec is consistent or not. If it's not, then no program could possibly satisfy it.
The second property is completeness. This is not logical completeness. The author expects specs to be incomplete. The real task is to identify when the user has said enough'' in their spec. So, Wing proposes that a checker'' could check for three things: full-coverage, determinism, and protection.


Full-coverage basically checks that the spec covers all cases (all reachable states), and it can warn the user if they have not covered all cases (reachable states).


Determinism checks that a spec is deterministic (one set of input/output values satisfies the post-condition).


Protection basically makes sure that the incompleteness of one part of a spec does not bleed through to another part (its parts are modular and insulated).


Goguen, J. and J. Meseuger (1982). "Security Policies and Security Models."

I believe that this is the paper where the authors define non-interference (that by using the system, high-users can have no effect on what low-users see when they use the system).
Sannella, D. and M. Wirsing (1999). "Specification Languages." In Astesiano, E. et al (Eds.), Algebraic Foundations of Systems Specification, pp. 243-272.

This is an overview of the design decisions that go into specification languages.
Clerici, S., R. Jimenez, and F. Orejas (??) "Semantic Constructions in the Specification Language Glider."

The authors note that the process of writing a specification is never linear. Spec writers are always dealing with an incomplete specification (they only know part of the program, namely the bits the customer has told them about). They often have to go back to the customer, and they learn more each time, and ever expand and revise their spec.
In past work, the authors studied the refinement relation in specifications, and concluded that we have an inheritence relation similar to OOP. Goguen 1991 "Types as Theories" has also noted this.
Glider is an algebraic specification in the tradition of Clear.
Goguen, J. (1991) "Types as Theories." In Topology and Category Theory in Computer Science. Oxford: Clarendon Press.

Reger, G. (2011) "An Overview of Specification Inference." Report, University of Manchester. http://www.cs.man.ac.uk/~regerg/papers/siOverview.pdf

ch. 1
Reger says there are the following motivations for specification inference:

Program understanding, so we can better understand what a program actually does
Checking program correctness, so we can make sure it does what it is supposed to do
Controlling programs, e.g., having a tool that sits next to the operating program and ensures that it is working according to spec

Reger covers more motivations (and has literature references) in section 3.5
In the software development process, Reger mentions the following places where spec inference can be useful:

checking if a program is written well
replacing legacy systems (we need to know exactly what they do, in order to replace them, so that we can make sure to replace them with systems that do the same thing)
help when making small changes, to make sure regressions haven't been introduced
it can be a feedback tool

Reger notes on p. 2 that:

the spec inference problem is NP-hard (for many good descriptions of the problem)
there are approaches that can weaken this hardness
grammar inference techniques using automata are very advanced, but the techniques to develop more expressive models with automata are underdeveloped
grammar inference techniques have been used a few times to infer specifications, and they have been successful
spec mining techniques are new (as of 2011) and are limited
a few approaches infer parametric specs (trace specs with free variables), but these are limited (as of 2011)

Ch. 2
Reger says that a formal program spec is a mathematical model of some (intended or forbidden) program behavior.
We can model the functional behavior, or the performance. We can model the observable behavior, or the internal behavior. We can use states to do all of this, for example.
We can say the the behavior of a program is the set of states it can pass through.
Then we can specify a predicate over those states. The predicate might be over the memory states, or the transitions. So, Reger says specs are either state or trace specs:


A state spec is a function P : State -> D, where D is a verdict domain. So, to each state, P assigns a verdict. These are often referred to as state invariants.


A trace spec isa  function P : Trace -> D, where D is a verdict domain. So, to each trace, P assigns a verdict. These are often referred to as protocol specs, typestate properties, or interaction invariants.


Some specs are positive (state valid behavior), others are negative (state violations). These are not necessarily the complement, since they might be incomplete (include some "don't care" behaviors). And even the spec language might not be complement-complete (the complement of a spec may not be in the spec language).
Spec languages can come in a variety of flavors:


Model-based - an abstract model of program states and state changing operations, described axiomatically through pre- and post-conditions. VDM or Z are examples of such spec languages, and these are often part of a tool chain that starts with specs and then uses refinement to get an implementation.


Finite-state-based - states and transitions between states, e.g., a deterministic finite automata.


Process Algebra State-based - CSP or CCS are good examples, they are labelled transition systems. We have bisimulation for trace equivalence with these models.


Hybrid - models that represent both discrete states and continuous behaviors, useful for modeling physical things.


Algebraic - these are algebraic specifications.


Ch. 3
Checking whether a program p is correct is an act of deduction - from the facts that the program exhibits behaviors X and the fact that the correct behavior is supposed to be Y, we can deduce whether or not p is correct.
Inferring a spec is an act of induction - from the facts that the program exhibits behaviors X and from the fact (or perhaps assumption) that the program is correct, we need to infer what the spec is. It as case of Inductive Inference, or more generally, it is a case of Generalization.
Usually then, we can infer an approximation only. If we want to infer an exact spec, the program better contain enough information to do that.
Often, we may guess a number of candidate (approximated) solutions. Then we need some means to select the good ones from the bad ones. Reger mentions Ockham's razor, where we select the simplest model (we can use Kolmogorovian complexity to describe the simplicity, or we can count the number of states required, or whatever). I might also add that we could perhaps ask a human to select the best candidate, or perhaps to remove bad candidates.
One can map the different techniques along the following axes:

Grammar Inference, or Specification Mining

Grammar Inference began with natural language processing in the 1950s and 60s. The goal is to ingest samples of word usage and try to infer a grammar or language model. The field is very diverse.
Reger takes Specification Mining to mean the attempt to infer or mine a specification from program artefacts. Artefacts might be source code or execution traces.

Given or Requested Data

An approach works with given data if it starts from data given to it, and has no control over what is in that data. An approach works with requested data if it can request data that it might want to see.

Approximate or exact

Whether the inferred spec is approximate or exact

Passive or active

An approach is passive if it cannot guide which data is provided to it, and it is active if it can. So, often an active approach also utilizes requested data, while a passive approach utilezes given data. However, a passive approach may also work with requested data.
There are different approaches to formalizing the problem. Let S be a sample comprised of the union of S+ and S-, where S+ are good traces and S- are bad traces. If there is a trace that is in both S+ and S-, it's called noise.
For Grammar Inference, the basic form of the problem is stated roughly like this: find a language L(M) such that all S+ belong to L(M) and all S- do not belong to L(M).
For Specification Mining, the basic form of the problem is stated roughly like this: given a set of all traces of interactions with an API or ADT, with C being a subset of the correct ones and a sample T of traces, find an automatoton that generates C.
Reger gives an informal statement of the problem: it's "finding a model which describes the program that the programmer intended to create" (p. 17).
Reger also notes that in order to evaluate approaches, it is useful to be able to measure the distance between models. In particular, we want to measure the distance between the hidden model and the inferred model, so we can tell "how far away" our inferred model is from the real (hidden) one.
Grammar inference systems often (at the time of writing) measured by how many strings are accepted from a test set. Mining techniques tend to look at precision and recall.
The author then goes on to describe the basic ideas behind how grammar inference works, via all the automaton stuff.
The author also describes spec mining techniques. One of the authors is Ramanathan et al., and their "predicate mining." On this approach, data-flow predicates are inferred that capture values held by variables at callpoints, and control-flow predicates that capture ordering between procedure calls. A flow-analysis is used to build these predicates and associate them with language statements. See:

Murali Krishna Ramanathan, Ananth Grama, and Suresh Jagannathan. Static specification inference using predicate mining. In PLDI ’07: Proceedings of the 2007 ACM SIGPLAN conference
on Programming language design and implementation, pages 123–134

Shemer, R., A. Gurfinkel, S. Shoham, and Y. Vizel (2019). "Property Directed Self-Composition." In \emph{CAV 2019}, ed. I. Dillig and S. Tasiran, pp. 161-179.

The authors deal with verifying propreties about k-safety properties (hyperproperties) of a program that interact with each other. They do this through self-composition (product programs).
These authors developed a technique where they actually infer the composition function as they proceed with the verification.
They note that the work that is closest to theirs is the one that uses Cartesian Hoare Logic.
Notation.
T stands for a transition system (S, R, F), where S is a set of states, R is a transition relation S x S, and F is a set of terminal states. Terminal states are states that have only one transition that goes out and loops back to themselves.
A trace (an execution) pi = s0, s1, ... such that for every i >= 0, the transition (si, si+1) is in R. The trace is a terminating trace if at some point, the tail has the form si, si, ... (i.e., it reaches a terminal state and the transition keeps looping back to the same terminal state).
Such transition systems are represented using logical formulas over sets of variables (that represent program variables). V is the set of variables. The set of terminal states  is represented by one formula over V, and the transition relation is represented by a formula over the disjoint union of V + V', where V is the set of variables in the pre-state, and V' is the set of corresponding variables in the post-state.
A safety property is a pair (pre, post), where pre and post are formulas over V. They represent a subset of states, namely the subset that is the pre-condition, and the subset that is the post-condition. We say that the transition system T satisfies the safety property (pre, post), which we donet as T |= (pre, post), if every terminating trace pi of T that starts in state s0 (where s0 |= pre) ends in state s (where s |= post). That is, for every state s in S that is reachable in T from a state pre (i.e., s0), we have that s |= F -> post (i.e., the final states imply the post condition).
An inductive variant of a transition system T and a safety property (pre, post) is a formula Inv which has these properties:
(1) pre => Inv (initiation)
(2) Inv and R => Inv' (consecution)
(3) Inv => (F -> post) (safety)
In those conditions, phi => psi represents the validity of phi -> psi, i.e., that for every state s such that s |= phi, then for every s' such that (s, s') in R s' |= psi. Also, phi' denotes phi(V'), i.e., the formula that we get by starting with phi, and then substituting every v in phi with v' (the variables in s').
Inductive invariants are used to verify safety properties like this: if there exists an inductive invariant of (pre, post) for a transition system T, then T |= (pre, post).
(I suppose CBAT does something like this. The user states a property in terms of (pre, post), and then CBAT constructs an invariant formula Inv from that, and then it asks Z3 if there is any way to falsify that invariant. If Z3 can't falsify it, that means the invariant is valid, and hence the program (transition system T) satisfies (pre, post).)
A k-safety property refers to k interacting traces of T. It is just like an ordinary property, in that it is defined as a pair (pre, post). However, it is defined over V1 + ... + Vn disjoint sets of variables, where Vi denotes the ith copy of the program variables.
The authors define a k-safety property as k interacting executions of a trace T. It is essentially
Srinivas Y, and R. Jullig (1995). "Specware: Formal Support for Composing Software." In Mathematics of Program Construction, Third International Conference, MPC '95, ed. B. Moller, Springer, pp. 399-422.

The authors describe their tool Specware, which they developed at the Kestrel Institute. The tool is essentialy a program (with a GUI) that lets you write and compose specifications, and then refine those specs to generate programs. At the time of writing, the tool can generate Lisp programs, but the authors see no reason why Specware couldn't be extended to generate programs in other languages.
The background theory that Specware implements is based mainly on Goguen and Burstall's work on specification composition using colimits, as well as Srinivas's work on sheaves, along with the notions of specification morphisms, intepreretations and interpretation morphisms, and diagram refinements.
The sort of specifications at play here are algebraic specifications.
Note the tools mentioned in the bibliography:

Bauer, F. L., et al (1985, 1987). The Munich Project CIP (Vols I and II).
Blane, L. and A. Goldberg (1991). "DTRE -- A Semi-automatic Transformation System." In Constructing Programs from Specifications, ed. B. Moller. North Holland, pp. 165-204.
Goguen, J. and R. Burstall (1980). "CAT, A System for the Correct Elaboration of Correct Programs from Structured Specifications." Tech Report.
Goguen, J. and T. Winkler (1988). "Introducing OBJ3." Tech Report.
Jones, C. (1986) Systematic Software Development Using VDM. Prentice Hall. 1986.
Smith, D. R. (1990) "KIDS -- A Semi-Automatic Program Development System." IEEE Transactions on Software Engineering, Special Issue on Formal Methods of Software Engineering 16, 9, pp. 1024-1043.
Clear

And papers in the biblography:

Bird, R. (1987) ``A Calculus of Functions for Program Derivation.'' Tech Report.
Burstall, R. and J. Goguen (1977). "Putting Theories Together to Make Specifications." In Proceedings of the Fifth International Joint Conference on Artificial Intelligence, IJCAI 1977, pp. 1045-1058.
Gilham, L.-M., et al. (1989) "Toward Reliable Reactive Systems." In _Proceedinsg of the 5th International Workshop on Software Specifications and Design.
Jullig, R. (1993) "Applying Formal Software Synthesis." IEEE Software 10 3, pp. 11-22.
Sannella, D. and A. Tarlecki (1988). "Specifications in an Arbitrary Institution." In Inf. and Comput. 76, pp. 165-210.
Smith, D. R. (1993) "Constructing Specification Morphisms." Journal of Symbol Computation, Special Issue on Automatic Programming 15, 5-6, pp. 571-606.
Smith, D. R., and M. Lowry (1990). "Algorithm Theories and Design Tactics." Science and Computer Programming 14, 2-3, pp. 305-321.
Srinivas, Y. (1993) "A Sheaf-Theoretic Approach to Pattern Matching and Related Problems." Theoretical Comput. Sci. 112, pp. 53-97.

Albarghouthi, A., I. Dillig, and A. Gurfinkel (2015). ``Maximal Specification Synthesis.'' ACM.

Often, when verifying a program, you have to infer specifications for unknown procedures (say, in libc or something). Suppose we have a program P that calls procedures F1, F2, and so on. We want to know: what the minimal assumptions about our program P that we need to make, in order to still prove P correct? The weakest preconditions (assumptions) are the maximal (most permissive) specification. If we made stronger preconditions, we'd restrict the set of possibilities to reach the postcondition. So, the authors call this problem the ``maximal specification inference'' problem.
They point to a number of places that maximal specification inference shows up, under many guises.


Verifying open programs - i.e., figuring out specs for big proprietary libraries. Some researchers are attempting to automatically generate the specs for library functions, and the authors mention papers 7, 12, 48, 54.

7 - R. Alur, P. Cern ˇ y, P. Madhusudan (2005), and W. Nam. Synthesis of interface specifications for Java classes. In POPL, 2005.
12 - O. Bastani, S. Anand, and A. Aiken (2015). Specification inference using context-free language reachability. In POPL, 2015.
48 - M. K. Ramanathan, A. Grama, and S. Jagannathan (2007). Static specification inference using predicate mining. PLDI ’07, 2007.
54 - H. Zhu, T. Dillig, and I. Dillig (2013). Automated inference of library specifications for source-sink property verification. In APLAS, 2013.


But see also:


13 - N. E. Beckman and A. V. Nori. Probabilistic, modular and scalable inference of typestate specifications. In PLDI, 2011.


20 - S. Blackshear and S. K. Lahiri. Almost-correct specifications: A modular semantic framework for assigning confidence to warnings. In PLDI, 2013.


23 - A. Cimatti, A. Griggio, S. Mover, and S. Tonetta. Parameter synthesis with IC3. In FMCAD, 2013.


27 - A. Das, S. K. Lahiri, A. Lal, , and Y. Li. Angelic verification: Precise verification modulo unknowns. In CAV, 2015.


30 - I. Dillig, T. Dillig, and A. Aiken. Automated error diagnosis using abductive inference. PLDI, 2012.


36 - K. Hoder and N. Bjørner. Generalized property directed reachability. In SAT, 2012.


41 - B. Livshits, A. V. Nori, S. K. Rajamani, and A. Banerjee. Merlin: Specification inference for explicit information flow problems. In PLDI, 2009.


45 - J. W. Nimmer and M. D. Ernst. Automatic generation of program
specifications. In ISSTA, 2002.


49 - M. N. Seghir and D. Kroening. Counterexample-guided precondition inference. In ESOP, 2013.


50 - S. Shoham, E. Yahav, S. Fink, and M. Pistoia. Static specification mining using automata-based abstractions. ISSTA, 2007.


53 - M. Veanes, N. Bjørner, L. Nachmanson, and S. Bereg. Monadic
decomposition. In CAV, 2014.


Compositional verification - start at main, infer maximal specs for the callees, then proceed recursively with those, and so on.


Modular program synthesis - you can treat a hole in a sketch as a call to an unknown function, and then you can infer the maximal spec for that function, and then use off-the-shelf synthesis tools to synthesize it.


Input-filter synthesizing - Input-filter synthesizers try to synthesize a program that simply rejects bad inputs (instead of crashing). We can treat this as an unknown function at the start of the program that accepts or rejects an input, and then we can infer the maximal spec for it, and then synthesize it.


State-games and program-repair - We can treat player moves as unknown functions, and then infer them, and likewise for program repair.


The authors say that they will handle the maximal specification inference problem by using CEGIS and something called multi-abduction.
Traditional abduction takes a pair of formulas X, C and an unknown predicate R(x), and it infers a precondition Phi which is such that both Phi and X not |= false, and Phi and X |= C. In other words, it tries to find an explanatory hypothesis, together with known facts X, that is sufficient to imply the conclusion C.
The authors use multi-abduction, which generalizes traditional abduction by handling multiple unknown predicates rather than just the one.
The authors combine this multi-abduction with CEGIS to synthesize the explanatory hypothesis, which is the weakest precondition (maximal spec).
Summers A., and P. Mueller (2018). ``Automating Deductive Verification for Weak-Memory Programs (extended version).''

The authors encode a number of separation-style logics that handle weak memory models into Viper, in order to automate a fair amount of verification for weak-memory C11 programs.
The authors require the authors of code to annotate methods and threads with pre- and post-conditions, and also to annotate access permissions at allocation calls. Viper can then take these inputs and verify them.
Bart Jacobs (2014) has a similar approach for verifying TSO programs with VeriFast. But Jacobs' approach requires many more annotations, so this approach seems better.
Heule, S., I. Kassios, P. Mueller, and A. Summers (2013). ``Verification Condition Generation for Permission Logics with Abstract Predicates and Abstract Functions.'' In G. Castagna, Ed., ECOOP 2013, LNCS 7920, pp. 451-476.

The authors note  VCG (Verification Condition Generation) papers 4, 7, 12, 17, 18, and 20.

4 is the Spec# paper.
7 is the VCC (Verifying Concurrent C) paper
12 is Why3
17 is about verifying OOP software
18 is the Boogie 2 paper
20 is about verifying multi-threaded programs

But note also:

21: Chalice paper
26: The VeriCool tool

The authors claim that their main contribution is that they provide an encoding of abstract predicates and abstract functions that is sound, and amenable to SMT solvers.
This they do. The user still needs to write specs as assertions in the program though (the authors build their implementation on top of Chalice).
Steinhofel, D. (2020) "Abstract Execution: Automatically Proving Infinitely Many Programs." Ph.D dissertation, Technische Universitat Darmstadt.

Extends symbolic execution to abstract programs, and develops MTL (Modal Trace Logic), a logic with a trace semantics, that can be used to write specifications for abstract programs.
Similar to SpecWare, users can write Java-like programs at an abstract level, and they can add specification annotations that describe how refinement can work.
Steinhofel built a tool for Java called REFINITY, that shows two program fragments side by side, and it begins with a simple proof goal: result_1 == result_2. Then it makes a proof attempt, which usually fails. But it presents the failed proof obligations, and that can help the user further refine what they want.
Huisman, M., S. Blom, S. Darabi, and M. Safari (2018). "Program Correctness by Transformation." In T. Margaria and B. Steffen (Eds.): ISoLA 2018, LNCS 11244, pp. 365–380.

The authors argue that it is easier to verify high-level code than low-level. Their proposal to help verify lower-level code is this: add annotations to the higher-level, and transformations (say in the compiler) should be made annotation-aware. Transformations can then refine the code (and even the annotations). In this way, if the high level code can be verified, so can the lower level code.
Kamburjan, E., and N. Wasser (2022). "Deductive Verification of Programs with Underspecified Semantics by Model Extraction." At arxiv only right now.

Also:
Kamburjan, E. and N. Wasser (2022). "The Right Kind of Non-Determinism: Using Concurrency to Verify C Programs with Underspecified Semantics." Accessible at https://arxiv.org/pdf/2208.04630.pdf
The authors look at sequential C code and extract a distributed "Active Object" model. Along the way, the process uses the ANSI C spec language (ACSL) to automatically construct method contracts and object invariants. Then, they automatically verify the Active Object model.
Essentially, what they are doing is this. For any underspecified C semantics, the evaluation order could be anything (since it's not specified). So, they construct all possible evaluation orders, basically, and verify that under any of them, the C code still works as expected.
Mansouri, N. (2001) "Automated Correctness Condition Generation for Formal Verification of Synthesize RTL Designs." Ph.D dissertation, University of Cincinnati.

The author puts together a way to automatically verify synthesize high-level RTL (Register Transfer Level) designs.
The user provides an algorithmic behavioral specification, and then the synthesizer will generate an RTL design from it. The author then automatically generates correctness conditions and uses a theorem prover to prove that the synthesized RTL design conforms to the provided specification.
There is a good deal of automation here, and there is interesting information about synthesis at the RTL level that seems pretty relevant to VIBES. There also seems to be nice discussions of various ways to think about synthesis and verification conditions (e.g., carried-based register allocation vs value-based register allocation).
The author implements these ideas by building a correctness condition generator (CCG) and integrating it into the synthesis tool ASSERTA (see chapter 11). The behavioral spec language is VHDL.
Note that, even here, the user is still required to provide a behavioral specification up front. The specification itself is not automatically generated. Only the VCCs are (similar to how CBAT generates VCCs given a spec in SMTLIB).
Soares, G., R. Gheyi, and T. Massoni. (2013) "Automated Behavioral Testing of Refactoring Engines." IEEE Transactions on Software Engineering, Vol. 39, no. 2, February 2013.

In this paper, they tested Java refactoring engines, to see how well they did.
Gilles Barthe, Pedro R. D’Argenio, and Tamara Rezk (2004). Secure information flow by self-composition. In 17th IEEE Computer Security

Foundations Workshop, (CSFW-17), 28-30 June 2004, Pacific Grove,
CA, USA, pages 100–114.
This is the self-composition paper that first outlines the self-composition strategy commonly used by verifiers to verify hyperproperties.
To Read Someday

Daniel, B., D. Dig, K. Garcia, and D. Marinov (2007). "Automated Testing of Refactoring Engines." In

ESEC/FSE’07, September 3–7, 2007.
Mosses, P. (2021) "Fundamental Constructs in Programming Languages." In Leveraging Applications of Formal Methods, Verification and Validation. 10th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2021, pp. 296-321.

Wadler, P. (2021) "GATE: Gradual Effect Types." In Leveraging Applications of Formal Methods, Verification and Validation. 10th International Symposium on Leveraging Applications of Formal Methods, ISoLA 2021,, pp. 335-345.