Formal verification of a realistic compiler
Summary (5 min read)
1. Introduction
- Compilers are generally assumed to be semantically transparent: the compiled code should behave as prescribed by the semantics of the source program.
- For low-assurance software, validated only by testing, the impact of compiler bugs is low: what is tested is the executable code produced by the compiler; rigorous testing should expose compiler-introduced errors along with errors already present in the source program.
- Namely, it compiles a language commonly used for critical embedded software: neither Java nor ML nor assembly code, but a large subset of the C language.
- Section 3 describes the structure of the CompCert compiler, its performance, and how the Coq proof assistant was used not only to prove its correctness but also to program most of it.
2. Approaches to trusted compilation 2.1 Notions of semantic preservation
- The authors aim is to prove that the semantics of S was preserved during compilation.
- In all cases, behaviors also include a trace of the input-output operations (system calls) performed during the execution of the program.
- If the source language is not deterministic, compilers are allowed to select one of the possible behaviors of the source program.
- Under these conditions, there exists exactly one behavior B such that S ⇓ B, and similarly for C. Having proved properties (2) or (3) provides the same guarantee without having to equip the target and intermediate languages with sound type systems and to prove type preservation for the compiler.
2.2 Verified, validated, certifying compilers
- The authors now discuss several approaches to establishing that a compiler preserves semantics of the compiled programs, in the sense of section 2.1.
- The authors model the compiler as a total function Comp from source programs to either compiled code (written Comp(S) = OK(C)) or a compile-time error (written Comp(S) = Error).
- Notice that a compiler that always fails (Comp(S) = Error for all S) is indeed verified, although useless.
- Validation can be performed in several ways, ranging from symbolic interpretation and static analysis of S and C to the generation of verification conditions followed by model checking or automatic theorem proving.
- Translation validation generates additional confidence in the correctness of the compiled code, but by itself does not provide formal guarantees as strong as those provided by a verified compiler: the validator could itself be incorrect.
Proof-carrying code and certifying compilers
- The proof-carrying code (PCC) approach [19, 1] does not attempt to establish semantic preservation between a source program and some compiled code.
- Instead, PCC focuses on the generation of independently-checkable evidence that the compiled code C satisfies a behavioral specification Spec such as type and memory safety.
- The proof π, also called a certificate, can be checked independently by the code user; there is no need to trust the code producer, nor to formally verify the compiler itself.
- Symmetrically, a certifying compiler can be constructed, at least theoretically, from a verified compiler, provided that the verification was conducted in a logic that follows the "propositions as types, proofs as programs" paradigm.
- The construction is detailed in [11, section 2].
2.3 Composition of compilation passes
- Compilers are naturally decomposed into several passes that communicate through intermediate languages.
- Assume that the semantic preservation property ≈ is transitive.
2.4 Summary
- The conclusions of this discussion are simple and define the methodology the authors have followed to verify the CompCert compiler back-end.
- All intermediate languages must be given appropriate formal semantics.
- Finally, for each pass, the authors have a choice between proving the code that implements this pass or performing the transformation via untrusted code, then verifying its results using a verified validator.
- The latter approach can reduce the amount of code that needs to be verified.
3. Overview of the CompCert compiler 3.1 The source language
- The source language of the CompCert compiler, called Clight [5] , is a large subset of the C programming language, comparable to the subsets commonly recommended for writing critical embedded software.
- The semantics of Clight is formally defined in big-step operational style.
- The semantics is deterministic and makes precise a number of behaviors left unspecified or undefined in the ISO C standard, such as the sizes of data types, the results of signed arithmetic operations in case of overflow, and the evaluation order.
- Other undefined C behaviors are consistently turned into "going wrong" behaviors, such as dereferencing the null pointer or accessing arrays out of bounds.
- Memory is modeled as a collection of disjoint blocks, each block being accessed through byte offsets; pointer values are pairs of a block identifier and a byte offset.
3.2 Compilation passes and intermediate languages
- The formally verified part of the CompCert compiler translates from Clight abstract syntax to PPC abstract syntax, PPC being a subset of PowerPC assembly language.
- The translation from C#minor to Cminor therefore recognizes scalar local variables whose addresses are never taken, assigning them to Cminor local variables and making them candidates for register allocation later; other local variables are stack-allocated in the activation record.
- Unlike the other two optimizations, lazy code motion is implemented following the verified validator approach [24] .
- Finally, the "stacking" pass lays out the activation records of functions, assigning offsets within this record to abstract stack locations and to saved callee-save registers, and replacing references to abstract stack locations by explicit memory loads and stores relative to the stack pointer.
- The final compilation pass expands Mach instructions into canned sequences of PowerPC instructions, dealing with special registers such as the condition registers and with irregularities in the PowerPC instruction set.
3.3 Proving the compiler
- The added value of CompCert lies not in the compilation technology implemented, but in the fact that each of the source, intermediate and target languages has formallydefined semantics, and that each of the transformation and optimization passes is proved to preserve semantics in the sense of section 2.4.
- Coq implements the Calculus of Inductive and Coinductive Constructions, a powerful constructive, higher-order logic which supports equally well three familiar styles of writing specifications: by functions and pattern-matching, by inductive or coinductive predicates representing inference rules, and by ordinary predicates in first-order logic.
- Internally, Coq builds proof terms that are later re-checked by a small kernel verifier, thus generating very high confidence in the validity of proofs.
- Of these 42000 lines, 14% define the compilation algorithms implemented in CompCert, and 10% specify the semantics of the languages involved.
- The remaining 76% correspond to the correctness proof itself.
3.4 Programming and running the compiler
- The authors use Coq not only as a prover to conduct semantic preservation proofs, but also as a programming language to write all verified parts of the CompCert compiler.
- With some ingenuity, this language suffices to write a compiler.
- The authors use persistent data structures based on balanced trees, which support efficient updates without modifying data in-place.
- The main advantage of this unconventional approach, compared with implementing the compiler in a conventional imperative language, is that the authors do not need a program logic (such as Hoare logic) to connect the compiler's code with its logical specifications.
- The Coq functions implementing the compiler are first-class citizens of Coq's logic and can be reasoned on directly by induction, simplifications and equational reasoning.
3.5 Performance
- Since standard benchmark suites use features of C not supported by CompCert, the authors had to roll their own small suite, which contains some computational kernels, cryptographic primitives, text compressors, a virtual machine interpreter and a ray tracer.
- As the timings in figure 2 show, CompCert generates code that is more than twice as fast as that generated by GCC without optimizations, and competitive with GCC at optimization levels 1 and 2.
- The test suite is too small to draw definitive conclusions, but these results strongly suggest that while CompCert is not going to win a prize in high performance computing, its performance is adequate for critical embedded code.
- Compilation times of CompCert are within a factor of 2 of those of gcc -O1, which is reasonable and shows that the overheads introduced to facilitate verification (many small passes, no imperative data structures, etc.) are acceptable.
4.1 The RTL intermediate language
- Register allocation is performed over the RTL intermediate representation, which represents functions as a controlflow graph (CFG) of abstract instructions, corresponding roughly to machine instructions but operating over pseudoregisters (also called "temporaries").
- Every function has an unlimited supply of pseudo-registers, and their values are preserved across function call.
- In the following, r ranges over pseudo-registers and l over labels of CFG nodes.
Instructions:
- Instructions include arithmetic operations op (with an important special case op(move, r, r ′ , l) representing a registerto-register copy), memory loads and stores (of a quantity κ at the address obtained by applying addressing mode mode to registers r), conditional branches (with two successors), and function calls, tail-calls, and returns.
- Internal functions are defined within RTL by their CFG, entry point in the CFG, and parameter registers.
- Functions and call instructions carry signatures sig specifying the number and register classes (int or float) of their arguments and results.
- The register state R maps pseudo-registers to their current values (discriminated union of 32-bit integers, 64-bit floats, and pointers).
- Two slightly different forms of execution states, call states and return states, appear when modeling function calls and returns, but will not be described here.
4.2 The register allocation algorithm
- The goal of the register allocation pass is to replace the pseudo-registers r that appear in unbounded quantity in the original RTL code by locations ℓ, which are either hardware registers (available in small, fixed quantity) or abstract stack slots in the activation record (available in unbounded quantity).
- The dataflow equations are solved iteratively using Kildall's worklist algorithm.
- The central step of register allocation consists in coloring the interference graph, assigning to each node r a "color" ϕ(r) that is either a hardware register or a stack slot, under the constraint that two nodes connected by an interference edge are assigned different colors.
- Since this heuristic is difficult to prove correct directly, the authors implement it as unverified Caml code, then validate its results a posteriori using a simple verifier written and proved correct in Coq.
- Additionally, coalescing and dead code elimination are performed.
4.3 Proving semantic preservation
- In the case of register allocation, each original transition corresponds to exactly one transformed transition, resulting in the following "lock-step" simulation diagram:.
- This requirement is much too strong, as it essentially precludes any sharing of a location between two pseudo-registers whose live ranges are disjoint.
- Once the relation between states is set up, proving the simulation diagram above is a routine case inspection on the various transition rules of the RTL semantics.
- In doing so, one comes to the pleasant realization that the dataflow inequations defining liveness, as well as Chaitin's rules for constructing the interference graph, are the minimal sufficient conditions for the invariant between register states R, R ′ to be preserved in all cases.
5. Conclusions and perspectives
- The CompCert experiment described in this paper is still ongoing, and much work remains to be done: handle a larger subset of C (e.g. including goto); deploy and prove correct more optimizations; target other processors beyond PowerPC; extend the semantic preservation proofs to shared-memory concurrency; etc.
- The preliminary results obtained so far provide strong evidence that the initial goal of formally verifying a realistic compiler can be achieved, within the limitations of today's proof assistants, and using only elementary semantic and algorithmic approaches.
Did you find this useful? Give us your feedback
Citations
799 citations
Cites background from "Formal verification of a realistic ..."
...Wolfe [30], talk ing about independent software vendors (ISVs) says: An ISV with a complex code can work around correctness, turn off the optimizer in one or two .les, and usually they have to do that for any of the compilers they use (emphasis ours)....
[...]
315 citations
Cites background or methods from "Formal verification of a realistic ..."
...The CompCert compiler [14] and projects based on it, including CompcertTSO [29] and the Princeton “verified software toolchain” [1], focus on a C-like source language in contrast to our ML-like language....
[...]
...The last decade has seen a strong interest in verified compilation; and there have been significant, high-profile results, many based on the CompCert compiler for C [1, 14, 16, 29]....
[...]
275 citations
Cites methods from "Formal verification of a realistic ..."
...The CompCert C compiler [22] was verified in Coq and repeatedly shown to be more reliable than traditionally developed compilers [21, 41]....
[...]
265 citations
222 citations
Cites methods from "Formal verification of a realistic ..."
...Traditional methods for acquiring an understanding of control logic typically involve some form of source code inspection and documentation review [2, 13] or formal verification given proper specifications [36, 23]....
[...]
References
[...]
1,799 citations
1,514 citations
"Formal verification of a realistic ..." refers methods in this paper
...This paper gives a high-level overview of the CompCert compiler and its mechanized verification, which uses the Coq proof assistant [7, 3]....
[...]
1,065 citations
1,025 citations
"Formal verification of a realistic ..." refers background in this paper
...Having proved properties (2) or ( 3 ) provides the same guarantee without having to equip the target and intermediate languages with sound type systems and to prove type preservation for the compiler....
[...]
...To address concern ( 3 ), ongoing work within the...
[...]
...(Here, Wrong is the set of “going wrong” behaviors.) Property ( 3 ) is generally much easier to prove than property (2), since the proof can proceed by induction on the execution of S. This is the approach that we take in this work....
[...]
...First, provided the target language of the compiler has deterministic semantics, an appropriate specification for the correctness proof of the compiler is the combination of definitions ( 3 ) and (6):...
[...]
895 citations
Related Papers (5)
Frequently Asked Questions (11)
Q2. What are the future works mentioned in the paper "Formal verification of a realistic compiler" ?
The CompCert experiment described in this paper is still ongoing, and much work remains to be done: handle a larger subset of C ( e. g. including goto ) ; deploy and prove correct more optimizations ; target other processors beyond PowerPC ; extend the semantic preservation proofs to shared-memory concurrency ; etc. However, the preliminary results obtained so far provide strong evidence that the initial goal of formally verifying a realistic compiler can be achieved, within the limitations of today ’ s proof assistants, and using only elementary semantic and algorithmic approaches. The techniques and tools the authors used are very far from perfect—more proof automation, higher-level semantics and more modern intermediate representations all have the potential to significantly reduce the proof effort—but good enough to achieve the goal. Composed with the CompCert back-end, these efforts could eventually result in a trusted execution path for programs written and verified in Coq, like CompCert itself, therefore increasing confidence further through a form of bootstrapping.
Q3. What are the behaviors the authors observe in CompCert?
The behaviors the authors observe in CompCert include termination, divergence, and “going wrong” (invoking an undefined operation that could crash, such as accessing an array out of bounds).
Q4. What is the strongest notion of semantic preservation in the CompCert experiment?
In the CompCert experiment and the remainder of this paper, the authors focus on source and target languages that are deterministic (programs change their behaviors only in response to different inputs but not because of internal choices) and on execution environments that are deterministic as well (the inputs given to the programs are uniquely determined by their previous outputs).
Q5. What is the impact of compiler bugs?
For low-assurance software, validated only by testing, the impact of compiler bugs is low: what is tested is the executable code produced by the compiler; rigorous testing should expose compiler-introduced errors along with errors already present in the source program.
Q6. What is the expected correctness property of the compiler?
The expected correctness property of the compiler is that it preserves the fact that the source code S satisfies the specification, a fact that has been established separately by formal verification of S:S |= Spec =⇒ C |= Spec (4)It is easy to show that property (2) implies property (4) for all specifications Spec.
Q7. How can a certifying compiler be constructed?
a certifying compiler can be constructed, at least theoretically, from a verified compiler, provided that the verification was conducted in a logic that follows the “propositions as types, proofs as programs” paradigm.
Q8. What is the correctness proof of the compiler?
provided the target language of the compiler has deterministic semantics, an appropriate specification for the correctness proof of the compiler is the combination of definitions (3) and (6):∀S, C, B /∈ Wrong, Comp(S) = OK(C) ∧
Q9. What is the way to test a compiler?
validation by testing reaches its limits and needs to be complemented or even replaced by the use of formal methods such as model checking, static analysis, and program proof.
Q10. What is the definition of a verified compiler?
By verified, the authors mean a compiler that is accompanied by a machine-checked proof of a semantic preservation property: the generated machine code behaves as prescribed by the semantics of the source program.
Q11. What is the way to test compilers?
Bugs in the compiler used to turn this formally verified source code into an executable can potentially invalidate all the guarantees so painfully obtained by the use of formal methods.