What are the contributions in "A concrete memory model for compcert" ?

This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. The authors prove that the existing memory model is an abstraction of their more concrete model thus validating formally the soundness of CompCert ’ s abstract semantics of pointers. The authors also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to their novel memory model.

What have the authors stated for future works in "A concrete memory model for compcert" ?

As future work, the authors shall study how to adapt the back-end of CompCert. Withstanding the remaining difficulties, the authors believe that the full CompCert compiler can be ported to their novel memory model. This would improve further the confidence in the generated code.

(Open Access) A Concrete Memory Model for CompCert (2015) | Frédéric Besson

HAL Id: hal-01194549

https://hal.inria.fr/hal-01194549

Submitted on 7 Sep 2015

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

A Concrete Memory Model for CompCert

Frédéric Besson, Sandrine Blazy, Pierre Wilke

To cite this version:

Frédéric Besson, Sandrine Blazy, Pierre Wilke. A Concrete Memory Model for CompCert. ITP 2015 :

6th International Conference on Interactive Theorem Proving, Aug 2015, Nanjing, China. pp.67-83,

�10.1007/978-3-319-22102-1_5�. �hal-01194549�

A Concrete Memory Model for CompCert

∗

Fr´ed´eric Besson

, Sandrine Blazy

, and Pierre Wilke

Inria, Rennes, France

Universit´e Rennes 1 - IRISA, Rennes, France

Abstract. Semantics preserving compilation of low-level C programs is

challenging because their semantics is implementation deﬁned according

to the C standard. This paper presents the proof of an enhanced and

more concrete memory model for the CompCert C compiler which as-

signs a deﬁnite meaning to more C programs. In our new formally veriﬁed

memory model, pointers are still abstract but are nonetheless mapped

to concrete 32-bit integers. Hence, the memory is ﬁnite and it is possible

to reason about the binary encoding of pointers. We prove that the ex-

isting memory model is an abstraction of our more concrete model thus

validating formally the soundness of CompCert’s abstract semantics of

pointers. We also show how to adapt the front-end of CompCert thus

demonstrating that it should be feasible to port the whole compiler to

our novel memory model.

1 Introduction

Formal veriﬁcation of programs is usually performed at source level. Yet, a the-

orem about the source code of a safety critical software is not suﬃcient. Even-

tually, what we really value is a guarantee about the run-time behaviour of the

compiled program running on a physical machine. The CompCert compiler [17]

ﬁlls this veriﬁcation gap: its semantics preservation theorem ensures that when

the source program has a deﬁned semantics, program invariants proved at source

level still hold for the compiled code. For the C language the rules governing so

called undeﬁned behaviours are subtle and the absence of undeﬁned behaviours

is in general undecidable. As a corollary, for a given C program, it is undecidable

whether the semantic preservation applies or not.

To alleviate the problem, the semantics of CompCert C is executable and

it is therefore possible to check that a given program execution has a deﬁned

semantics. Jourdan et al. [12] propose a more comprehensive and ambitious

approach: they formalise and verify a precise C static analyser for CompCert

capable of ruling out undeﬁned behaviours for a wide range of programs. Yet,

these approaches are, by essence, limited by the formal semantics of CompCert

C: programs exhibiting undeﬁned behaviours cannot beneﬁt from any semantic

preservation guarantee. This is unfortunate as real programs do have behaviours

that are undeﬁned according to the formal semantics of CompCert C

. This can

∗

This work was partially supported by the French ANR-14-CE28-0014 AnaStaSec.

The oﬃcial C standard is in general even stricter.

be a programming mistake but sometimes this is a design feature. In the past,

serious security ﬂaws have been introduced by optimising compilers aggressively

exploiting the latitude provided by undeﬁned behaviours [22,6]. The existing

workaround is not satisfactory and consists in disabling optimisations known to

be triggered by undeﬁned behaviours.

In previous work [3], we proposed a more concrete and deﬁned semantics

for CompCert C able to give a semantics to low-level C idioms. This semantics

relies on symbolic expressions stored in memory that are normalised into genuine

values when needed by the semantics. It handles low-level C idioms that exploit

the concrete encoding of pointers (e.g. alignment constraints) or access partially

undeﬁned data structures (e.g. bit-ﬁelds). Such properties cannot be reasoned

about using the existing CompCert memory model [19,18].

The memory model of CompCert consists of two parts: standard operations

on memory (e.g. alloc, store) that are used in the semantics of the languages of

CompCert and their properties (that are required to prove the semantic preser-

vation of the compiler), together with generic transformations operating over

memory. Indeed, certain passes of the compiler perform non-trivial transforma-

tions on memory allocations and accesses: for instance, in the front-end, C local

variables initially mapped to individually-allocated memory blocks are later on

mapped to sub-blocks of a single stack-allocated activation record. Proving the

semantic preservation of these transformations requires extensive reasoning over

memory states, using memory invariants relating memory states during program

execution, that are also deﬁned in the memory model.

In this paper, we extend the memory model of CompCert with symbolic ex-

pressions [3] and tackle the challenge of porting memory transformations and

CompCert’s proofs to our memory model with symbolic expressions. The com-

plete Coq development is available online [1]. Among others, a diﬃculty is that

we drop the implicit assumption of an inﬁnite memory. This has the consequence

that allocation can fail. Hence, the compiler has to ensure that the compiled pro-

gram is using less memory than the source program.

This paper presents a milestone towards a CompCert compiler adapted with

our semantics; it makes the following contributions.

– We present a formal veriﬁcation of our memory model within CompCert.

– We prove that the existing memory model of CompCert is an abstraction of

our model thus validating the soundness of the existing semantics.

– We extend the notion of memory injection, the main generic notion of mem-

ory transformation.

– We adapt the proof of CompCert’s front-end passes, from CompCert C until

Cminor, thus demonstrating the feasibility of our endeavour.

The paper is organised as follows. Section 2 recalls the main features of

the existing CompCert memory model and our proposed extension. Section 3

explains how to adapt the operations of the existing CompCert memory model

to comply with the new requirements of our memory model. Section 4 shows

that the existing memory model is, in a provable way, an abstraction of our

new memory model. Section 5 presents our re-design of the notion of memory

injection that is the cornerstone of compiler passes modifying the memory layout.

Section 6 details the modiﬁcations for the proofs for the compiler front-end

passes. Related work is presented in Section 7; Section 8 concludes.

2 A More Concrete Memory Model for CompCert

In previous work [3], we propose an enhanced memory model (with symbolic

expressions) for CompCert. The model is implemented and evaluated over a

representative set of C programs. We empirically verify, using the reference in-

terpreter of CompCert, that our extension is sound with respect to the existing

semantics and that it captures low-level C idioms out of reach of the existing

memory model. This section ﬁrst recalls the main features of the current Comp-

Cert memory model and then explains our extension to this memory model.

2.1 CompCert’s Memory Model

Leroy et al. [18] give a thorough presentation of the existing memory model of

CompCert, that is shared by all the languages of the compiler. We give a brief

overview of its design in order to highlight the diﬀerences with our own model.

Abstract values used in the semantics of the CompCert languages (see [19])

are the disjoint union of 32-bit integers (written as int(i) ), 32-bit ﬂoating-

point numbers (written as float( f) ), locations (written as ptr(l) ), and the

special value undef representing an arbitrary bit pattern, such as the value of an

uninitialised variable. The abstract memory is viewed as a collection of separated

blocks. A location l is a pair ( b, i) where b is a block identiﬁer (i.e. an abstract

address) and i is an integer oﬀset within this block. Pointer arithmetic modiﬁes

the oﬀset part of a location, keeping its block identiﬁer part unchanged. A pointer

ptr ( b, i) is valid for a memory M (written valid_pointer ( M, b, i)) if the oﬀset i

is within the two bounds of the block b.

Abstract values are loaded from (resp. stored into) memory using the load

(resp. store) memory operation. Memory chunks appear in these operations, to

describe concisely the size, type and signedness of the value being stored. These

operations return option types: we write ∅ for failure and bxc for a successful

return of a value x. The free operation may also fail (e.g. when the locations to

be freed have been freed already). The memory operation alloc never fails, as

the size of the memory is unbounded.

In the memory model, the byte-level, in-memory representation of integers

and ﬂoats is exposed, while pointers are kept abstract [18]. The concrete memory

is modelled as a map associating to each location a concrete value cv that is a

byte-sized quantity describing the current content of a memory cell. It can be

either a concrete 8-bit integer (written as bytev(b)) representing a part of an

integer or a ﬂoat, ptrv(l, i) to represent the i-th byte (i ∈ {1, 2, 3, 4}) of the

location l, or undefv to model uninitialised memory.

struct {

int a0 : 1; int a1 : 1;

} bf ;

int main() {

bf .a1 = 1; return bf .a1;}

(a) Bitﬁeld in C

1 struct { unsigned char bf1 ;} bf ;

3 int main(){

4 bf . bf1 = ( bf . bf1 & ˜0x2U) |

5 (( unsigned int ) 1 << 1U & 0x2U) ;

6 return ( int ) ( bf . bf1 << 30) >> 31;}

(b) Bitﬁeld in CompCert C

Fig. 1: Emulation of bitﬁelds in CompCert

2.2 Motivation for an Enhanced Memory Model

Our memory model with symbolic expressions [3] gives a precise semantics to

low-level C idioms which cannot be modelled by the existing memory model. The

reason is that those idioms either exploit the binary representation of pointers as

integers or reason about partially uninitialised data. For instance, it is common

for system calls, e.g. mmap or sbrk, to return −1 (instead of a pointer) to indicate

that there is no memory available. Intuitively, −1 refers to the last memory

address 0xFFFFFFFF and this cannot be a valid address because mmap returns

pointers that are aligned – their trailing bits are necessarily 0s. Other examples

are robust implementations of malloc: for the sake of checking the integrity of

pointers, their trailing bits store a checksum. This is possible because those

pointers are also aligned and therefore the trailing bits are necessarily 0s.

Another motivation is illustrated by the current handling of bitﬁelds in

CompCert: they are emulated in terms of bit-level operations by an elabora-

tion pass preceding the formally veriﬁed front-end. Fig. 1 gives an example of

such a transformation. The program deﬁnes a bitﬁeld bf such that a0 and a1 are

1 bit long. The main function sets the ﬁeld a1 of bf to 1 and then returns this

value. The expected semantics is therefore that the program returns 1. The trans-

formed code (Fig. 1b) is not very readable but the gist of it is that ﬁeld accesses

are encoded using bitwise and shift operators. The transformation is correct and

the target code generated by CompCert correctly returns 1. However, using the

existing memory model, the semantics is undeﬁned. Indeed, the program starts

by reading the ﬁeld __fd1 of the uninitialised structure bf. The value is therefore

undef. Moreover, shift and bitwise operators are strict in undef and therefore

return undef. As a result, the program returns undef. As we show in the next

section, our semantics is able to model partially undeﬁned values and therefore

gives a semantics to bitﬁelds. Even though this case could be easily solved by

modifying the pre-processing step, C programmers might themselves write such

low-level code with reads of undeﬁned memory and expect it to behave correctly.

2.3 A Memory Model with Symbolic Expressions

To give a semantics to the previous idioms, a direct approach is to have a fully

concrete memory model where a pointer is a genuine integer and the memory is

A Concrete Memory Model for CompCert

Figures

Citations

Into the depths of C: elaborating the de facto standards

Verified peephole optimizations for CompCert

Exploring C semantics and pointer provenance

Verified compilation of CakeML to multiple machine-code targets

Interactive Theorem Proving

References

A formal C memory model supporting integer-pointer casts

The scalable commutativity rule: designing scalable software for multicore processors

Formal C Semantics: CompCert and the C Standard

An operational and axiomatic semantics for non-determinism and sequence points in C

Aliasing Restrictions of C11 Formalized in Coq

Related Papers (5)

A formal C memory model supporting integer-pointer casts

Formal verification of a realistic compiler

CompCertTSO: A Verified Compiler for Relaxed-Memory Concurrency

Aliasing Restrictions of C11 Formalized in Coq

Defining the undefinedness of C

Frequently Asked Questions (2)

Q1. What are the contributions in "A concrete memory model for compcert" ?

Q2. What have the authors stated for future works in "A concrete memory model for compcert" ?