What are the future works mentioned in the paper "Formal verification of a realistic compiler" ?

The CompCert experiment described in this paper is still ongoing, and much work remains to be done: handle a larger subset of C ( e. g. including goto ) ; deploy and prove correct more optimizations ; target other processors beyond PowerPC ; extend the semantic preservation proofs to shared-memory concurrency ; etc. However, the preliminary results obtained so far provide strong evidence that the initial goal of formally verifying a realistic compiler can be achieved, within the limitations of today ’ s proof assistants, and using only elementary semantic and algorithmic approaches. The techniques and tools the authors used are very far from perfect—more proof automation, higher-level semantics and more modern intermediate representations all have the potential to significantly reduce the proof effort—but good enough to achieve the goal. Composed with the CompCert back-end, these efforts could eventually result in a trusted execution path for programs written and verified in Coq, like CompCert itself, therefore increasing confidence further through a form of bootstrapping.

What are the behaviors the authors observe in CompCert?

The behaviors the authors observe in CompCert include termination, divergence, and “going wrong” (invoking an undefined operation that could crash, such as accessing an array out of bounds).

What is the strongest notion of semantic preservation in the CompCert experiment?

In the CompCert experiment and the remainder of this paper, the authors focus on source and target languages that are deterministic (programs change their behaviors only in response to different inputs but not because of internal choices) and on execution environments that are deterministic as well (the inputs given to the programs are uniquely determined by their previous outputs).

How can a certifying compiler be constructed?

a certifying compiler can be constructed, at least theoretically, from a verified compiler, provided that the verification was conducted in a logic that follows the “propositions as types, proofs as programs” paradigm.

What is the correctness proof of the compiler?

provided the target language of the compiler has deterministic semantics, an appropriate specification for the correctness proof of the compiler is the combination of definitions (3) and (6):∀S, C, B /∈ Wrong, Comp(S) = OK(C) ∧

(Open Access) Formal verification of a realistic compiler (2009) | Xavier Leroy

Q: What contributions have the authors mentioned in the paper "Formal verification of a realistic compiler" ?

This paper reports on the development and formal verification ( proof of semantic preservation ) of CompCert, a compiler from Clight ( a large subset of the C programming language ) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness.

Q: What is the definition of a verified compiler?

By verified, the authors mean a compiler that is accompanied by a machine-checked proof of a semantic preservation property: the generated machine code behaves as prescribed by the semantics of the source program.

HAL Id: inria-00415861

https://hal.inria.fr/inria-00415861

Submitted on 11 Sep 2009

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Formal verication of a realistic compiler

Xavier Leroy

To cite this version:

Xavier Leroy. Formal verication of a realistic compiler. Communications of the ACM, Association

for Computing Machinery, 2009, 52 (7), pp.107-115. �10.1145/1538788.1538814�. �inria-00415861�

Formal veriﬁcation of a realistic compiler

Xavier Leroy

INRIA Paris-Rocquencourt

Domaine de Voluceau, B.P. 105, 78153 Le Chesnay, France

xavier.leroy@inria.fr

Abstract

This paper reports on the development and formal veriﬁca-

tion (proof of semantic preservation) of CompCert, a com-

piler from Clight (a large subset of the C programming lan-

guage) to PowerPC assembly code, using the Coq proof as-

sistant both for programming the compiler and for proving

its correctness. Such a veriﬁed compiler is useful in the con-

text of critical software and its formal veriﬁcation: the veri-

ﬁcation of the compiler guarantees that the safety properties

proved on the source code hold for the executable compiled

code as well.

1. Introduction

Can you trust your compiler? Compilers are generally

assumed to be semantically transparent: the compiled

code should behave as prescribed by the semantics of the

source program. Yet, compilers—and especially optimizing

compilers—are complex programs that perform complicated

symbolic transformations. Despite intensive testing, bugs

in compilers do occur, causing the compilers to crash at

compile-time or—much worse—to silently generate an

incorrect executable for a correct source program.

For low-assurance software, validated only by testing,

the impact of compiler bugs is low: what is tested is the

executable code produced by the compiler; rigorous testing

should expose compiler-introduced errors along with errors

already present in the source program. Note, however,

that compiler-introduced bugs are notoriously diﬃcult to

expose and track down. The picture changes dramatically

for safety-critical, high-assurance software. Here, validation

by testing reaches its limits and needs to be complemented

or even replaced by the use of formal methods such as

model checking, static analysis, and program proof. Almost

universally, these formal veriﬁcation tools are applied to

the source code of a program. Bugs in the compiler used to

turn this formally veriﬁed source code into an executable

can potentially invalidate all the guarantees so painfully

obtained by the use of formal methods. In a future where

formal methods are routinely applied to source programs,

the compiler could appear as a weak link in the chain that

goes from speciﬁcations to executables. The safety-critical

software industry is aware of these issues and uses a variety

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for proﬁt or commercial advantage and that copies

bear this notice and the full citation on the ﬁrst page. To copy otherwise, to

republish, to post on servers or to redistribute to lists, requires prior speciﬁc

permission and/or a fee.

of techniques to alleviate them, such as conducting manual

code reviews of the generated assembly code after having

turned all compiler optimizations oﬀ. These techniques

do not fully address the issues, and are costly in terms of

development time and program performance.

An obviously better approach is to apply formal methods

to the compiler itself in order to gain assurance that it pre-

serves the semantics of the source programs. For the last

ﬁve years, we have been working on the development of a

realistic, veriﬁed compiler called CompCert. By veriﬁed, we

mean a compiler that is accompanied by a machine-checked

proof of a semantic preservation property: the generated

machine code behaves as prescribed by the semantics of the

source program. By realistic, we mean a compiler that could

realistically be used in the context of production of critical

software. Namely, it compiles a language commonly used

for critical embedded software: neither Java nor ML nor

assembly code, but a large subset of the C language. It

produces code for a processor commonly used in embedded

systems: we chose the PowerPC because it is popular in

avionics. Finally, the compiler must generate code that is

eﬃcient enough and compact enough to ﬁt the requirements

of critical embedded systems. This implies a multi-pass com-

piler that features good register allocation and some basic

optimizations.

Proving the correctness of a compiler is by no ways a

new idea: the ﬁrst such proof was published in 1967 [16]

(for the compilation of arithmetic expressions down to stack

machine code) and mechanically veriﬁed in 1972 [17]. Since

then, many other proofs have been conducted, ranging from

single-pass compilers for toy languages to sophisticated code

optimizations [8]. In the CompCert experiment, we carry

this line of work all the way to end-to-end veriﬁcation of a

complete compilation chain from a structured imperative

language down to assembly code through 8 intermediate

languages. While conducting the veriﬁcation of CompCert,

we found that many of the non-optimizing translations per-

formed, while often considered obvious in the compiler lit-

erature, are surprisingly tricky to formally prove correct.

This paper gives a high-level overview of the CompCert

compiler and its mechanized veriﬁcation, which uses the Coq

proof assistant [7, 3]. This compiler, classically, consists of

two parts: a front-end translating the Clight subset of C to

a low-level, structured intermediate language called Cminor,

and a lightly-optimizing back-end generating PowerPC as-

sembly code from Cminor. A detailed description of Clight

can be found in [5]; of the compiler front-end in [4]; and of

the compiler back-end in [11, 13]. The complete source code

of the Coq development, extensively commented, is available

on the Web [12].

The remainder of this paper is organized as follows. Sec-

tion 2 compares and formalizes several approaches to estab-

lishing trust in the results of compilation. Section 3 de-

scribes the structure of the CompCert compiler, its p erfor-

mance, and how the Coq proof assistant was used not only

to prove its correctness but also to program most of it. By

lack of space, we will not detail the formal veriﬁcation of

every compilation pass. However, section 4 provides a tech-

nical overview of such a veriﬁcation for one crucial pass of

the compiler: register allocation. Finally, section 5 presents

preliminary conclusions and directions for future work.

2. Approaches to trusted compilation

2.1 Notions of semantic preservation

Consider a source program S and a compiled program C

produced by a compiler. Our aim is to prove that the seman-

tics of S was preserved during compilation. To make this

notion of semantic preservation precise, we assume given se-

mantics for the source and target languages that associate

observable behaviors B to S and C. We write S ⇓ B to

mean that program S executes with observable behavior B.

The behaviors we observe in CompCert include termination,

divergence, and “going wrong” (invoking an undeﬁned oper-

ation that could crash, such as accessing an array out of

bounds). In all cases, behaviors also include a trace of the

input-output operations (system calls) performed during the

execution of the program. Behaviors therefore reﬂect accu-

rately what the user of the program, or more generally the

outside world the program interacts with, can observe.

The strongest notion of semantic preservation during com-

pilation is that the source program S and the compiled

code C have exactly the same observable behaviors:

∀B, S ⇓ B ⇐⇒ C ⇓ B (1)

Notion (1) is too strong to be usable. If the source lan-

guage is not deterministic, compilers are allowed to select

one of the possible behaviors of the source program. (For

instance, C compilers choose one particular evaluation or-

der for expressions among the several orders allowed by the

C speciﬁcations.) In this case, C will have fewer behaviors

than S. Additionally, compiler optimizations can optimize

away “going wrong” behaviors. For example, if S can go

wrong on an integer division by zero but the compiler elim-

inated this computation because its result is unused, C will

not go wrong. To account for these degrees of freedom in

the compiler, we relax deﬁnition (1) as follows:

S safe =⇒ (∀B, C ⇓ B =⇒ S ⇓ B) (2)

(Here, S safe means that none of the possible behaviors of

S is a “going wrong” behavior.) In other words, if S does not

goes wrong, then neither does C; moreover, all observable

behaviors of C are acceptable behaviors of S.

In the CompCert experiment and the remainder of this pa-

per, we focus on source and target languages that are deter-

ministic (programs change their behaviors only in response

to diﬀerent inputs but not because of internal choices) and

on execution environments that are deterministic as well

(the inputs given to the programs are uniquely determined

by their previous outputs). Under these conditions, there

exists exactly one behavior B such that S ⇓ B, and simi-

larly for C. In this case, it is easy to prove that property (2)

is equivalent to:

∀B /∈ Wrong, S ⇓ B =⇒ C ⇓ B (3)

(Here, Wrong is the set of “going wrong” behaviors.) Prop-

erty (3) is generally much easier to prove than property (2),

since the proof can proceed by induction on the execution

of S. This is the approach that we take in this work.

From a formal methods perspective, what we are really

interested in is whether the compiled code satisﬁes the func-

tional speciﬁcations of the application. Assume that these

speciﬁcations are given as a predicate Spec(B) of the observ-

able behavior. We say that C satisﬁes the speciﬁcations,

and write C |= Spec, if C cannot go wrong (C safe) and

all behaviors of B satisfy Spec (∀B, C ⇓ B =⇒ Spec(B)).

The expected correctness property of the compiler is that it

preserves the fact that the source code S satisﬁes the speciﬁ-

cation, a fact that has been established separately by formal

veriﬁcation of S:

S |= Spec =⇒ C |= Spec (4)

It is easy to show that property (2) implies property (4) for

all speciﬁcations Spec. Therefore, establishing property (2)

once and for all spares us from establishing property (4) for

every speciﬁcation of interest.

A special case of property (4), of considerable historical

importance, is the preservation of type and memory safety,

which we can summarize as “if S does not go wrong, neither

does C”:

S safe =⇒ C safe (5)

Combined with a separate check that S is well-typed in a

sound type system, property (5) implies that C executes

without memory violations. Type-preserving compilation

[18] obtains this guarantee by diﬀerent means: under the

assumption that S is well typed, C is proved to be well-

typed in a sound type system, ensuring that it cannot go

wrong. Having proved properties (2) or (3) provides the

same guarantee without having to equip the target and in-

termediate languages with sound type systems and to prove

type preservation for the compiler.

2.2 Veriﬁed, validated, certifying compilers

We now discuss several approaches to establishing that a

compiler preserves semantics of the compiled programs, in

the sense of section 2.1. In the following, we write S ≈ C,

where S is a source program and C is compiled code, to

denote one of the semantic preservation properties (1) to

(5) of section 2.1.

Veriﬁed compilers. We model the compiler as a total

function Comp from source programs to either compiled

code (written Comp(S) = OK(C)) or a compile-time error

(written Comp(S) = Error). Compile-time errors corre-

spond to cases where the compiler is unable to produce code,

for instance if the source program is incorrect (syntax error,

type error, etc.), but also if it exceeds the capacities of the

compiler. A compiler Comp is said to be veriﬁed if it is

accompanied with a formal proof of the following property:

∀S, C, Comp(S) = OK(C) =⇒ S ≈ C (6)

In other words, a veriﬁed compiler either reports an error or

produces code that satisﬁes the desired correctness property.

Notice that a compiler that always fails (Comp(S) = Error

for all S) is indeed veriﬁed, although useless. Whether the

compiler succeeds to compile the source programs of interest

is not a correctness issue, but a quality of implementation

issue, which is addressed by non-formal methods such as

testing. The important feature, from a formal veriﬁcation

standpoint, is that the compiler never silently produces in-

correct code.

Verifying a compiler in the sense of deﬁnition (6)

amounts to applying program proof technology to the

compiler sources, using one of the properties deﬁned in

section 2.1 as the high-level speciﬁcation of the compiler.

Translation validation with veriﬁed validators In

the translation validation approach [22, 20] the compiler

does not need to be veriﬁed. Instead, the compiler is

complemented by a validator: a boolean-valued function

Validate(S, C) that veriﬁes the property S ≈ C a posteriori.

If Comp(S) = OK(C) and Validate(S, C) = true, the

compiled code C is deemed trustworthy. Validation can

be performed in several ways, ranging from symbolic inter-

pretation and static analysis of S and C to the generation

of veriﬁcation conditions followed by model checking or

automatic theorem proving. The property S ≈ C being

undecidable in general, validators are necessarily incomplete

and should reply false if they cannot establish S ≈ C.

Translation validation generates additional conﬁdence in

the correctness of the compiled code, but by itself does not

provide formal guarantees as strong as those provided by

a veriﬁed compiler: the validator could itself be incorrect.

To rule our this possibility, we say that a validator Validate

is veriﬁed if it is accompanied with a formal proof of the

following property:

∀S, C, Validate(S, C) = true =⇒ S ≈ C (7)

The combination of a veriﬁed validator Validate with an

unveriﬁed compiler Comp does provide formal guarantees

as strong as those provided by a veriﬁed compiler. Indeed,

consider the following function:

Comp

′

(S) =

match Comp(S) with

| Error → Error

| OK(C) → if Validate(S, C) then OK(C) else Error

This function is a veriﬁed compiler in the sense of deﬁni-

tion (6). Veriﬁcation of a translation validator is therefore

an attractive alternative to the veriﬁcation of a compiler,

provided the validator is smaller and simpler than the com-

piler.

Proof-carrying code and certifying compilers The

proof-carrying code (PCC) approach [19, 1] does not at-

tempt to establish semantic preservation between a source

program and some compiled code. Instead, PCC focuses

on the generation of independently-checkable evidence that

the compiled code C satisﬁes a behavioral speciﬁcation

Spec such as type and memory safety. PCC makes use of a

certifying compiler, which is a function CComp that either

fails or returns both a compiled code C and a proof π of the

property C |= Spec. The proof π, also called a certiﬁcate,

can be checked independently by the code user; there is no

need to trust the code producer, nor to formally verify the

compiler itself. The only part of the infrastructure that

needs to be trusted is the client-side checker: the program

that checks whether π entails the property C |= Spec.

As in the case of translation validation, it suﬃces to

formally verify the client-side checker to obtain guarantees

as strong as those obtained from compiler veriﬁcation of

property (4). Symmetrically, a certifying compiler can be

constructed, at least theoretically, from a veriﬁed compiler,

provided that the veriﬁcation was conducted in a logic

that follows the “propositions as types, proofs as programs”

paradigm. The construction is detailed in [11, section 2].

2.3 Composition of compilation passes

Compilers are naturally decomposed into several passes

that communicate through intermediate languages. It is

fortunate that veriﬁed compilers can also be decomposed

in this manner. Consider two veriﬁed compilers Comp

and

Comp

from languages L

to L

and L

to L

, respectively.

Assume that the semantic preservation property ≈ is tran-

sitive. (This is true for properties (1) to (5) of section 2.1.)

Consider the error-propagating composition of Comp

and

Comp

Comp(S) = match Comp

(S) with

| Error → Error

| OK(I) → Comp

(I)

It is trivial to show that this function is a veriﬁed compiler

from L

to L

2.4 Summary

The conclusions of this discussion are simple and deﬁne

the methodology we have followed to verify the CompCert

compiler back-end. First, provided the target language of

the compiler has deterministic semantics, an appropriate

speciﬁcation for the correctness pro of of the compiler is the

combination of deﬁnitions (3) and (6):

∀S, C, B /∈ Wrong, Comp(S) = OK(C) ∧ S ⇓ B =⇒ C ⇓ B

Second, a veriﬁed compiler can be structured as a com-

position of compilation passes, following common practice.

However, all intermediate languages must be given appro-

priate formal semantics.

Finally, for each pass, we have a choice between prov-

ing the code that implements this pass or performing the

transformation via untrusted code, then verifying its results

using a veriﬁed validator. The latter approach can reduce

the amount of code that needs to be veriﬁed.

3. Overview of the CompCert compiler

3.1 The source language

The source language of the CompCert compiler, called

Clight [5], is a large subset of the C programming language,

comparable to the subsets commonly recommended for

writing critical embedded software. It supports almost all

C data types, including pointers, arrays, struct and union

types; all structured control ( if/then, loops, break, con-

tinue, Java-style switch); and the full power of functions,

including recursive functions and function pointers. The

main omissions are extended-precision arithmetic (long

long and long double); the goto statement; non-structured

forms of switch such as Duﬀ’s device; passing struct and

union parameters and results by value; and functions

with variable numbers of arguments. Other features of

Clight C#minor

Cminor

CminorSel

RTLLTLLTLin

Linear

Mach

PPC

simpliﬁcations

type elimination

stack pre-

-allocation

instruction

selection

CFG

construction

allocation

code

linearization

spilling, reloading

calling conventions

layout of

stack frames

PowerPC code

generation

CSELCM

constant propagation

branch tunneling

instr. scheduling

parsing, elaboration

(not veriﬁed)

assembling, linking

(not veriﬁed)

Figure 1: Compilation passes and intermediate languages.

C are missing from Clight but are supported through

code expansion (“de-sugaring”) during parsing: side eﬀects

within expressions (Clight expressions are side-eﬀect free)

and block-scoped variables (Clight has only global and

function-local variables).

The semantics of Clight is formally deﬁned in big-step op-

erational style. The semantics is deterministic and makes

precise a number of behaviors left unspeciﬁed or undeﬁned

in the ISO C standard, such as the sizes of data types, the re-

sults of signed arithmetic operations in case of overﬂow, and

the evaluation order. Other undeﬁned C behaviors are con-

sistently turned into “going wrong” behaviors, such as deref-

erencing the null pointer or accessing arrays out of bounds.

Memory is modeled as a collection of disjoint blocks, each

block being accessed through byte oﬀsets; pointer values are

pairs of a block identiﬁer and a byte oﬀset. This way, pointer

arithmetic is modeled accurately, even in the presence of

casts between incompatible pointer types.

3.2 Compilation passes and intermediate languages

The formally veriﬁed part of the CompCert compiler

translates from Clight abstract syntax to PPC abstract

syntax, PPC being a subset of PowerPC assembly language.

As depicted in ﬁgure 1, the compiler is composed of

14 passes that go through 8 intermediate languages. Not

detailed in ﬁgure 1 are the parts of the compiler that are not

veriﬁed yet: upstream, a parser, type-checker and simpliﬁer

that generates Clight abstract syntax from C source ﬁles

and is based on the CIL library [21]; downstream, a printer

for PPC abstract syntax trees in concrete assembly syntax,

followed by generation of executable binary using the

system’s assembler and linker.

The front-end of the compiler translates away C-speciﬁc

features in two passes, going through the C#minor and Cmi-

nor intermediate languages. C#minor is a simpliﬁed, type-

less variant of Clight where distinct arithmetic operators are

provided for integers, pointers and ﬂoats, and C loops are re-

placed by inﬁnite loops plus blocks and multi-level exits from

enclosing blocks. The ﬁrst pass translates C loops accord-

ingly and eliminates all type-dependent behaviors: operator

overloading is resolved; memory loads and stores, as well as

address computations, are made explicit. The next inter-

mediate language, Cminor, is similar to C#minor with the

omission of the & (address-of) operator. Cminor function-

local variables do not reside in memory, and their address

cannot be taken. However, Cminor supports explicit stack

allocation of data in the activation records of functions. The

translation from C#minor to Cminor therefore recognizes

scalar local variables whose addresses are never taken, as-

signing them to Cminor local variables and making them

candidates for register allocation later; other local variables

are stack-allocated in the activation record.

The compiler back-end starts with an instruction se-

lection pass, which recognizes opportunities for using

combined arithmetic instructions (add-immediate, not-and,

rotate-and-mask, etc.) and addressing modes provided by

the target processor. This pass proceeds by bottom-up

rewriting of Cminor expressions. The target language is

CminorSel, a processor-dependent variant of Cminor that

oﬀers additional operators, addressing modes, and a class of

condition expressions (expressions evaluated for their truth

value only).

The next pass translates CminorSel to RTL, a classic

control-ﬂow graph (CFG). Each node of the graph carries

a machine-level instruction operating over temporaries

(pseudo-registers). RTL is a convenient representation to

conduct optimizations based on dataﬂow analyses. Two

such optimizations are currently implemented: constant

propagation and common subexpression elimination, the

latter being performed via value numbering over extended

basic blocks. A third optimization, lazy code motion,

was developed separately and will be integrated soon.

Unlike the other two optimizations, lazy code motion is

implemented following the veriﬁed validator approach [24].

After these optimizations, register allocation is performed

via coloring of an interference graph [6]. The output of this

pass is LTL, a language similar to RTL where temporaries

are replaced by hardware registers or abstract stack loca-

tions. The control-ﬂow graph is then “linearized”, producing

a list of instructions with explicit labels, conditional and un-

conditional branches. Next, spills and reloads are inserted

around instructions that reference temporaries that were al-

located to stack locations, and moves are inserted around

function calls, prologues and epilogues to enforce calling con-

ventions. Finally, the “stacking” pass lays out the activation

records of functions, assigning oﬀsets within this record to

Formal verification of a realistic compiler

Figures

Citations

Using the coq theorem prover to verify complex data structure invariants

Bringing Iris into the Verified Software Toolchain

Apache Nemo: A Framework for Optimizing Distributed Data Processing

Distant Decimals of $$\pi $$: Formal Proofs of Some Algorithms Computing Them and Guarantees of Exact Computation

A minimalistic verified bootstrapped compiler (proof pearl)

References

Proof-carrying code

Interactive Theorem Proving and Program Development: Coq'Art: The Calculus of Inductive Constructions

CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs

Interactive Theorem Proving and Program Development

Register allocation & spilling via graph coloring

Related Papers (5)

seL4: formal verification of an OS kernel

Isabelle/HOL: A Proof Assistant for Higher-Order Logic

Finding and understanding bugs in C compilers

Interactive Theorem Proving and Program Development: Coq'Art: The Calculus of Inductive Constructions

Z3: an efficient SMT solver

Frequently Asked Questions (11)

Q1. What contributions have the authors mentioned in the paper "Formal verification of a realistic compiler" ?

Q2. What are the future works mentioned in the paper "Formal verification of a realistic compiler" ?

Q3. What are the behaviors the authors observe in CompCert?

Q4. What is the strongest notion of semantic preservation in the CompCert experiment?

Q5. What is the impact of compiler bugs?

Q6. What is the expected correctness property of the compiler?

Q7. How can a certifying compiler be constructed?

Q8. What is the correctness proof of the compiler?

Q9. What is the way to test a compiler?

Q10. What is the definition of a verified compiler?

Q11. What is the way to test compilers?