scispace - formally typeset
Open AccessBook ChapterDOI

A Concrete Memory Model for CompCert

Reads0
Chats0
TLDR
This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs and proves formally the soundness of CompCert’s abstract semantics of pointers.
Abstract
Semantics preserving compilation of low-level C programs is challenging because their semantics is implementation defined according to the C standard. This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. In our new formally verified memory model, pointers are still abstract but are nonetheless mapped to concrete 32-bit integers. Hence, the memory is finite and it is possible to reason about the binary encoding of pointers. We prove that the existing memory model is an abstraction of our more concrete model thus validating formally the soundness of CompCert’s abstract semantics of pointers. We also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to our novel memory model.

read more

Content maybe subject to copyright    Report

HAL Id: hal-01194549
https://hal.inria.fr/hal-01194549
Submitted on 7 Sep 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Copyright
A Concrete Memory Model for CompCert
Frédéric Besson, Sandrine Blazy, Pierre Wilke
To cite this version:
Frédéric Besson, Sandrine Blazy, Pierre Wilke. A Concrete Memory Model for CompCert. ITP 2015 :
6th International Conference on Interactive Theorem Proving, Aug 2015, Nanjing, China. pp.67-83,
�10.1007/978-3-319-22102-1_5�. �hal-01194549�

A Concrete Memory Model for CompCert
Fr´ed´eric Besson
1
, Sandrine Blazy
2
, and Pierre Wilke
2
1
Inria, Rennes, France
2
Universit´e Rennes 1 - IRISA, Rennes, France
Abstract. Semantics preserving compilation of low-level C programs is
challenging because their semantics is implementation defined according
to the C standard. This paper presents the proof of an enhanced and
more concrete memory model for the CompCert C compiler which as-
signs a definite meaning to more C programs. In our new formally verified
memory model, pointers are still abstract but are nonetheless mapped
to concrete 32-bit integers. Hence, the memory is finite and it is possible
to reason about the binary encoding of pointers. We prove that the ex-
isting memory model is an abstraction of our more concrete model thus
validating formally the soundness of CompCert’s abstract semantics of
pointers. We also show how to adapt the front-end of CompCert thus
demonstrating that it should be feasible to port the whole compiler to
our novel memory model.
1 Introduction
Formal verification of programs is usually performed at source level. Yet, a the-
orem about the source code of a safety critical software is not sufficient. Even-
tually, what we really value is a guarantee about the run-time behaviour of the
compiled program running on a physical machine. The CompCert compiler [17]
fills this verification gap: its semantics preservation theorem ensures that when
the source program has a defined semantics, program invariants proved at source
level still hold for the compiled code. For the C language the rules governing so
called undefined behaviours are subtle and the absence of undefined behaviours
is in general undecidable. As a corollary, for a given C program, it is undecidable
whether the semantic preservation applies or not.
To alleviate the problem, the semantics of CompCert C is executable and
it is therefore possible to check that a given program execution has a defined
semantics. Jourdan et al. [12] propose a more comprehensive and ambitious
approach: they formalise and verify a precise C static analyser for CompCert
capable of ruling out undefined behaviours for a wide range of programs. Yet,
these approaches are, by essence, limited by the formal semantics of CompCert
C: programs exhibiting undefined behaviours cannot benefit from any semantic
preservation guarantee. This is unfortunate as real programs do have behaviours
that are undefined according to the formal semantics of CompCert C
3
. This can
This work was partially supported by the French ANR-14-CE28-0014 AnaStaSec.
3
The official C standard is in general even stricter.

be a programming mistake but sometimes this is a design feature. In the past,
serious security flaws have been introduced by optimising compilers aggressively
exploiting the latitude provided by undefined behaviours [22,6]. The existing
workaround is not satisfactory and consists in disabling optimisations known to
be triggered by undefined behaviours.
In previous work [3], we proposed a more concrete and defined semantics
for CompCert C able to give a semantics to low-level C idioms. This semantics
relies on symbolic expressions stored in memory that are normalised into genuine
values when needed by the semantics. It handles low-level C idioms that exploit
the concrete encoding of pointers (e.g. alignment constraints) or access partially
undefined data structures (e.g. bit-fields). Such properties cannot be reasoned
about using the existing CompCert memory model [19,18].
The memory model of CompCert consists of two parts: standard operations
on memory (e.g. alloc, store) that are used in the semantics of the languages of
CompCert and their properties (that are required to prove the semantic preser-
vation of the compiler), together with generic transformations operating over
memory. Indeed, certain passes of the compiler perform non-trivial transforma-
tions on memory allocations and accesses: for instance, in the front-end, C local
variables initially mapped to individually-allocated memory blocks are later on
mapped to sub-blocks of a single stack-allocated activation record. Proving the
semantic preservation of these transformations requires extensive reasoning over
memory states, using memory invariants relating memory states during program
execution, that are also defined in the memory model.
In this paper, we extend the memory model of CompCert with symbolic ex-
pressions [3] and tackle the challenge of porting memory transformations and
CompCert’s proofs to our memory model with symbolic expressions. The com-
plete Coq development is available online [1]. Among others, a difficulty is that
we drop the implicit assumption of an infinite memory. This has the consequence
that allocation can fail. Hence, the compiler has to ensure that the compiled pro-
gram is using less memory than the source program.
This paper presents a milestone towards a CompCert compiler adapted with
our semantics; it makes the following contributions.
We present a formal verification of our memory model within CompCert.
We prove that the existing memory model of CompCert is an abstraction of
our model thus validating the soundness of the existing semantics.
We extend the notion of memory injection, the main generic notion of mem-
ory transformation.
We adapt the proof of CompCert’s front-end passes, from CompCert C until
Cminor, thus demonstrating the feasibility of our endeavour.
The paper is organised as follows. Section 2 recalls the main features of
the existing CompCert memory model and our proposed extension. Section 3
explains how to adapt the operations of the existing CompCert memory model
to comply with the new requirements of our memory model. Section 4 shows
that the existing memory model is, in a provable way, an abstraction of our
new memory model. Section 5 presents our re-design of the notion of memory

injection that is the cornerstone of compiler passes modifying the memory layout.
Section 6 details the modifications for the proofs for the compiler front-end
passes. Related work is presented in Section 7; Section 8 concludes.
2 A More Concrete Memory Model for CompCert
In previous work [3], we propose an enhanced memory model (with symbolic
expressions) for CompCert. The model is implemented and evaluated over a
representative set of C programs. We empirically verify, using the reference in-
terpreter of CompCert, that our extension is sound with respect to the existing
semantics and that it captures low-level C idioms out of reach of the existing
memory model. This section first recalls the main features of the current Comp-
Cert memory model and then explains our extension to this memory model.
2.1 CompCert’s Memory Model
Leroy et al. [18] give a thorough presentation of the existing memory model of
CompCert, that is shared by all the languages of the compiler. We give a brief
overview of its design in order to highlight the differences with our own model.
Abstract values used in the semantics of the CompCert languages (see [19])
are the disjoint union of 32-bit integers (written as int(i) ), 32-bit floating-
point numbers (written as float( f) ), locations (written as ptr(l) ), and the
special value undef representing an arbitrary bit pattern, such as the value of an
uninitialised variable. The abstract memory is viewed as a collection of separated
blocks. A location l is a pair ( b, i) where b is a block identifier (i.e. an abstract
address) and i is an integer offset within this block. Pointer arithmetic modifies
the offset part of a location, keeping its block identifier part unchanged. A pointer
ptr ( b, i) is valid for a memory M (written valid_pointer ( M, b, i)) if the offset i
is within the two bounds of the block b.
Abstract values are loaded from (resp. stored into) memory using the load
(resp. store) memory operation. Memory chunks appear in these operations, to
describe concisely the size, type and signedness of the value being stored. These
operations return option types: we write for failure and bxc for a successful
return of a value x. The free operation may also fail (e.g. when the locations to
be freed have been freed already). The memory operation alloc never fails, as
the size of the memory is unbounded.
In the memory model, the byte-level, in-memory representation of integers
and floats is exposed, while pointers are kept abstract [18]. The concrete memory
is modelled as a map associating to each location a concrete value cv that is a
byte-sized quantity describing the current content of a memory cell. It can be
either a concrete 8-bit integer (written as bytev(b)) representing a part of an
integer or a float, ptrv(l, i) to represent the i-th byte (i {1, 2, 3, 4}) of the
location l, or undefv to model uninitialised memory.

struct {
int a0 : 1; int a1 : 1;
} bf ;
int main() {
bf .a1 = 1; return bf .a1;}
(a) Bitfield in C
1 struct { unsigned char bf1 ;} bf ;
2
3 int main(){
4 bf . bf1 = ( bf . bf1 & ˜0x2U) |
5 (( unsigned int ) 1 << 1U & 0x2U) ;
6 return ( int ) ( bf . bf1 << 30) >> 31;}
(b) Bitfield in CompCert C
Fig. 1: Emulation of bitfields in CompCert
2.2 Motivation for an Enhanced Memory Model
Our memory model with symbolic expressions [3] gives a precise semantics to
low-level C idioms which cannot be modelled by the existing memory model. The
reason is that those idioms either exploit the binary representation of pointers as
integers or reason about partially uninitialised data. For instance, it is common
for system calls, e.g. mmap or sbrk, to return 1 (instead of a pointer) to indicate
that there is no memory available. Intuitively, 1 refers to the last memory
address 0xFFFFFFFF and this cannot be a valid address because mmap returns
pointers that are aligned their trailing bits are necessarily 0s. Other examples
are robust implementations of malloc: for the sake of checking the integrity of
pointers, their trailing bits store a checksum. This is possible because those
pointers are also aligned and therefore the trailing bits are necessarily 0s.
Another motivation is illustrated by the current handling of bitfields in
CompCert: they are emulated in terms of bit-level operations by an elabora-
tion pass preceding the formally verified front-end. Fig. 1 gives an example of
such a transformation. The program defines a bitfield bf such that a0 and a1 are
1 bit long. The main function sets the field a1 of bf to 1 and then returns this
value. The expected semantics is therefore that the program returns 1. The trans-
formed code (Fig. 1b) is not very readable but the gist of it is that field accesses
are encoded using bitwise and shift operators. The transformation is correct and
the target code generated by CompCert correctly returns 1. However, using the
existing memory model, the semantics is undefined. Indeed, the program starts
by reading the field __fd1 of the uninitialised structure bf. The value is therefore
undef. Moreover, shift and bitwise operators are strict in undef and therefore
return undef. As a result, the program returns undef. As we show in the next
section, our semantics is able to model partially undefined values and therefore
gives a semantics to bitfields. Even though this case could be easily solved by
modifying the pre-processing step, C programmers might themselves write such
low-level code with reads of undefined memory and expect it to behave correctly.
2.3 A Memory Model with Symbolic Expressions
To give a semantics to the previous idioms, a direct approach is to have a fully
concrete memory model where a pointer is a genuine integer and the memory is

Citations
More filters
Proceedings ArticleDOI

Into the depths of C: elaborating the de facto standards

TL;DR: An in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice is described, a step towards clear, consistent, and accepted semantics for the various use-cases of C.
Proceedings ArticleDOI

Verified peephole optimizations for CompCert

TL;DR: Peek is presented, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert, and a set of local properties are proved are sufficient to ensure global transformation correctness.
Journal ArticleDOI

Exploring C semantics and pointer provenance

TL;DR: This paper aims to reconcile the ISO C standard, mainstream compiler behaviour, and the semantics relied on by the corpus of existing C code, and presents two coherent proposals, tracking provenance via integers and not; both address many design questions.
Proceedings ArticleDOI

Verified compilation of CakeML to multiple machine-code targets

TL;DR: This paper describes how the latest CakeML compiler supports verified compilation down to multiple realistically modelled target architectures, and how the compiler definition, the various language semantics, and the correctness proofs were organised to minimize target-specific overhead.
BookDOI

Interactive Theorem Proving

TL;DR: The metaprogramming language currently in use in Lean, a new open source theorem prover that is designed to bridge the gap between interactive use and automation, is described and evidence is provided to show that the implementation is performant, and that it provides a convenient and flexible way of writing not only small-scale interactive tactics, but also more substantial kinds of automation.
References
More filters
Proceedings ArticleDOI

A formal C memory model supporting integer-pointer casts

TL;DR: This work presents the first formal memory model that allows many common optimizations and fully supports operations on the representation of pointers and all arithmetic operations are well-defined for pointers that have been cast to integers.
Journal ArticleDOI

The scalable commutativity rule: designing scalable software for multicore processors

TL;DR: This paper formalizes the scalable commutativity rule and proves it correct for any machine on which conflict-free operations scale, such as current cache-coherent multicore machines, and enables a better design process for scalable software.
Book ChapterDOI

Formal C Semantics: CompCert and the C Standard

TL;DR: The CompCert semantics is extended with end-of-array pointers and the possibility to byte-wise copy objects, a first and necessary step towards proving that the Comp cert semantics refines the formal version of the C standard that is being developed in the Formalin project in Nijmegen.
Proceedings ArticleDOI

An operational and axiomatic semantics for non-determinism and sequence points in C

TL;DR: An operational and axiomatic semantics (based on separation logic) for non-determinism and sequence points in C is presented and soundness of this semantics is proved with respect to its operational semantics.
Book ChapterDOI

Aliasing Restrictions of C11 Formalized in Coq

TL;DR: This work presents an executable formal memory model for C that incorporates dynamic typing restrictions on memory operations, and at the same time describes required low-level operations.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "A concrete memory model for compcert" ?

This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. The authors prove that the existing memory model is an abstraction of their more concrete model thus validating formally the soundness of CompCert ’ s abstract semantics of pointers. The authors also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to their novel memory model. 

As future work, the authors shall study how to adapt the back-end of CompCert. Withstanding the remaining difficulties, the authors believe that the full CompCert compiler can be ported to their novel memory model. This would improve further the confidence in the generated code.