scispace - formally typeset
Search or ask a question
Book ChapterDOI

A Concrete Memory Model for CompCert

TL;DR: This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs and proves formally the soundness of CompCert’s abstract semantics of pointers.
Abstract: Semantics preserving compilation of low-level C programs is challenging because their semantics is implementation defined according to the C standard. This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. In our new formally verified memory model, pointers are still abstract but are nonetheless mapped to concrete 32-bit integers. Hence, the memory is finite and it is possible to reason about the binary encoding of pointers. We prove that the existing memory model is an abstraction of our more concrete model thus validating formally the soundness of CompCert’s abstract semantics of pointers. We also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to our novel memory model.

Summary (4 min read)

1 Introduction

  • Yet, a theorem about the source code of a safety critical software is not sufficient.
  • The CompCert compiler [17] fills this verification gap: its semantics preservation theorem ensures that when the source program has a defined semantics, program invariants proved at source level still hold for the compiled code.
  • Yet, these approaches are, by essence, limited by the formal semantics of CompCert C: programs exhibiting undefined behaviours cannot benefit from any semantic preservation guarantee.
  • The authors prove that the existing memory model of CompCert is an abstraction of their model thus validating the soundness of the existing semantics.
  • The authors adapt the proof of CompCert’s front-end passes, from CompCert C until Cminor, thus demonstrating the feasibility of their endeavour.

2 A More Concrete Memory Model for CompCert

  • In previous work [3], the authors propose an enhanced memory model (with symbolic expressions) for CompCert.
  • The authors empirically verify, using the reference interpreter of CompCert, that their extension is sound with respect to the existing semantics and that it captures low-level C idioms out of reach of the existing memory model.
  • This section first recalls the main features of the current CompCert memory model and then explains their extension to this memory model.

2.1 CompCert’s Memory Model

  • Leroy et al. [18] give a thorough presentation of the existing memory model of CompCert, that is shared by all the languages of the compiler.
  • The authors give a brief overview of its design in order to highlight the differences with their own model.
  • Pointer arithmetic modifies the offset part of a location, keeping its block identifier part unchanged.
  • The free operation may also fail (e.g. when the locations to be freed have been freed already).
  • In the memory model, the byte-level, in-memory representation of integers and floats is exposed, while pointers are kept abstract [18].

2.2 Motivation for an Enhanced Memory Model

  • The authors memory model with symbolic expressions [3] gives a precise semantics to low-level C idioms which cannot be modelled by the existing memory model.
  • Other examples are robust implementations of malloc: for the sake of checking the integrity of pointers, their trailing bits store a checksum.
  • This is possible because those pointers are also aligned and therefore the trailing bits are necessarily 0s.
  • The expected semantics is therefore that the program returns 1.
  • The transformation is correct and the target code generated by CompCert correctly returns 1.

2.3 A Memory Model with Symbolic Expressions

  • This model lacks an essential property of CompCert’s semantics: determinism.
  • Determinism is instrumental for the simulation proofs of the compiler passes and its absence is a show stopper.
  • The authors define the evaluation of expressions as the function J·Kcm, parametrised by the concrete mapping cm.
  • Pointers are turned into their concrete value, as dictated by cm.
  • The value of the expression is 1 whatever the value of undef and therefore the normalisation succeeds and returns, as expected, the value 1.

3 Proving the Operations of the Memory Model

  • CompCert’s memory model exports an interface summarising all the properties of the memory operations necessary to prove the compiler passes.
  • This section details how the properties and the proofs need to be adapted to accommodate for symbolic expressions.
  • Second, the authors introduce an equivalence relation between symbolic expressions.

3.1 Precise Handling of Undefined Values

  • Symbolic expressions (as presented in Section 2.3) feature a unique undef token.
  • This is a shortcoming that the authors have identified during the proof.
  • With a single undef, the authors do not capture the fact that different occurrences of undef may represent the same unknown value, or different ones.
  • To overcome this problem, each byte of a newly allocated memory chunk is initialised with a fresh undef value.
  • Hence, x − x constructs the symbolic expression undef(b, o)− undef(b, o) for some b and o which obviously normalises to 0, because undef(b, o) now represents a unique value rather than the set of all values.

3.2 Memory Allocation

  • CompCert’s alloc operation always allocates a memory chunk of the requested size and returns a fresh block to the newly allocated memory (i.e. it models an infinite memory).
  • The first guarantee is that for every memory m there exists at least a concrete memory compatible with the abstract CompCert block-based memory.
  • To get this property, the alloc function runs a greedy algorithm constructing a compatible cm mapping.
  • Given a memory m, size_mem(m) returns the size of the constructed memory (i.e. the first fresh address as computed by the allocation).
  • The algorithm makes the pessimistic assumption that the allocated blocks are maximally aligned – for CompCert, this maximum is 3 bits (addresses are divisible by 23).

3.3 Good Variable Properties

  • In CompCert, the so-called good variable properties axiomatise the behaviour of the memory operations.
  • The reverse operation is the concatenation of a symbolic expression sv1 with a symbolic expression sv2 representing a byte.
  • The authors have generalised and proved the axioms of the memory model using the same principle.
  • Moreover, if the structure of the proofs is similar, their proofs are complicated by the fact that the authors reason modulo normalisation of expressions.

4 Cross-validation of Memory Models

  • The semantics of the CompCert C language is part of the trusted computing base of the compiler.
  • If the resulting offset is outside the bounds, their normalisation returns undef.
  • After the easy fix, the authors found two interesting semantics discrepancies with the current semantics of CompCert C. However, when running the compiled program, the pointer is a mere integer, the integer eventually overflows; wraps around and becomes 0.
  • After adjusting both memory models, the authors are able to prove that both semantics agree when the existing CompCert C semantics is defined thus cross-validating the semantics of operators.

5 Redesign of Memory Injections

  • Memory injections are instrumental for proving the correctness of several compiler passes of CompCert.
  • A memory injection defines a mapping between memories; it is a versatile tool to explain how passes reorganise the memory (e.g. construct an activation record from local variables).
  • This section explains how to generalise this concept for symbolic expressions.
  • It requires a careful handling of undefined values undef(l) which are absent from the existing memory model.

5.1 Memory Injections in CompCert

  • The injection relation is defined over values (and called val_inject) and then lifted to memories (and called inject).
  • The val_inject relation distinguishes three cases: 1. For concrete values (i.e. integers or floating-point numbers), the relation is reflexive: e.g. int(i) is in relation with int(i) ; 2. ptr(b, i) is in relation with ptr(b′, i+ δ) when f(b) = b(b′, δ)c; 3. undef is in relation with any value (including undef).
  • The purpose of the injection is twofold: it establishes a relation between pointers using the function f but it can also specialise undef by a defined value.
  • In CompCert, so-called generic memory injections state that every valid location in memory m1 is mapped by function f into a valid location in memory m2; the corresponding location in m2 must be properly aligned with respect to the size of the block; and the values stored at corresponding locations must be in injection.
  • Among other conditions, the authors have that if several blocks in m1 are mapped to the same block in m2, the mapping ensures the absence of overlapping.

5.2 Memory Injection with Symbolic Expressions

  • The function f is still present and serves the same purpose.
  • The authors injection expr_inject is therefore defined as the composition of the function apply_spe spe which specialises undef(l) into concrete bytes, and the function apply_inj f which injects locations.
  • This model makes the implicit assumption that memory blocks are always sufficiently aligned.
  • The existing formalisation of inject has a property mi_representable which states that the offset o+ δ obtained after injection does not overflow.

5.3 Memory Injection and Normalisation

  • The authors normalisation is defined w.r.t. all the concrete memories compatible with the CompCert block-based memory (see Section 2.3).
  • Theorem norm_inject shows that under the condition that all blocks are injected, if e and e′ are in injection, then their normalisations are in injection too.
  • Thus, the normalisation can only get more defined after injection.
  • This is expected as the injection can merge blocks and therefore makes pointer arithmetic more defined.
  • A consequence of this theorem is that the compiler is not allowed to reduce the memory usage.

6 Proving the Front-end of the CompCert Compiler

  • Later compiler passes are architecture dependent and are therefore part of the back-end.
  • This section explains how to adapt the semantics preservation proofs of the front-end to their memory model with symbolic expressions.

6.1 CompCert Front-end with Symbolic Expressions

  • The semantics of all intermediate languages need to be modified in order to account for symbolic expressions.
  • In reality, the transformation is more subtle because, for instance, certain intermediate semantic functions explicitly require locations represented as pairs (b, o).
  • This solution proves wrong and breaks semantics preservation proofs because introduced normalisations may be absent in subsequent intermediate languages.
  • This pass does not transform the memory and therefore the existing proof can be reused.
  • The pass also performs type-directed transformations and removes redundant casts.

2. allocation of local variables

  • This relation is too weak and fails to pass the induction step.
  • The problem is related with the preservation of the memory injection when allocating and de-allocating the variables in C]minor and the stack frame in Cminor.
  • Once again, the authors adapt the two-step proof with a direct induction over the number of variables.
  • To carry out this proof and establish an injection the authors have to reason about the relative sizes of the memories.
  • Here, the authors have to deal with the opposite situation where the stack frame could use less memory than the variables.

8 Conclusion

  • This work is a milestone towards a CompCert compiler proved correct with respect to a more concrete memory model.
  • A side-product of their work is that the authors have uncovered and fixed a problem in the existing semantics of the comparison with the null pointer.
  • The authors are confident that program optimisations based on static analyses will not be problematic.
  • Withstanding the remaining difficulties, the authors believe that the full CompCert compiler can be ported to their novel memory model.
  • This would improve further the confidence in the generated code.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

HAL Id: hal-01194549
https://hal.inria.fr/hal-01194549
Submitted on 7 Sep 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Copyright
A Concrete Memory Model for CompCert
Frédéric Besson, Sandrine Blazy, Pierre Wilke
To cite this version:
Frédéric Besson, Sandrine Blazy, Pierre Wilke. A Concrete Memory Model for CompCert. ITP 2015 :
6th International Conference on Interactive Theorem Proving, Aug 2015, Nanjing, China. pp.67-83,
�10.1007/978-3-319-22102-1_5�. �hal-01194549�

A Concrete Memory Model for CompCert
Fr´ed´eric Besson
1
, Sandrine Blazy
2
, and Pierre Wilke
2
1
Inria, Rennes, France
2
Universit´e Rennes 1 - IRISA, Rennes, France
Abstract. Semantics preserving compilation of low-level C programs is
challenging because their semantics is implementation defined according
to the C standard. This paper presents the proof of an enhanced and
more concrete memory model for the CompCert C compiler which as-
signs a definite meaning to more C programs. In our new formally verified
memory model, pointers are still abstract but are nonetheless mapped
to concrete 32-bit integers. Hence, the memory is finite and it is possible
to reason about the binary encoding of pointers. We prove that the ex-
isting memory model is an abstraction of our more concrete model thus
validating formally the soundness of CompCert’s abstract semantics of
pointers. We also show how to adapt the front-end of CompCert thus
demonstrating that it should be feasible to port the whole compiler to
our novel memory model.
1 Introduction
Formal verification of programs is usually performed at source level. Yet, a the-
orem about the source code of a safety critical software is not sufficient. Even-
tually, what we really value is a guarantee about the run-time behaviour of the
compiled program running on a physical machine. The CompCert compiler [17]
fills this verification gap: its semantics preservation theorem ensures that when
the source program has a defined semantics, program invariants proved at source
level still hold for the compiled code. For the C language the rules governing so
called undefined behaviours are subtle and the absence of undefined behaviours
is in general undecidable. As a corollary, for a given C program, it is undecidable
whether the semantic preservation applies or not.
To alleviate the problem, the semantics of CompCert C is executable and
it is therefore possible to check that a given program execution has a defined
semantics. Jourdan et al. [12] propose a more comprehensive and ambitious
approach: they formalise and verify a precise C static analyser for CompCert
capable of ruling out undefined behaviours for a wide range of programs. Yet,
these approaches are, by essence, limited by the formal semantics of CompCert
C: programs exhibiting undefined behaviours cannot benefit from any semantic
preservation guarantee. This is unfortunate as real programs do have behaviours
that are undefined according to the formal semantics of CompCert C
3
. This can
This work was partially supported by the French ANR-14-CE28-0014 AnaStaSec.
3
The official C standard is in general even stricter.

be a programming mistake but sometimes this is a design feature. In the past,
serious security flaws have been introduced by optimising compilers aggressively
exploiting the latitude provided by undefined behaviours [22,6]. The existing
workaround is not satisfactory and consists in disabling optimisations known to
be triggered by undefined behaviours.
In previous work [3], we proposed a more concrete and defined semantics
for CompCert C able to give a semantics to low-level C idioms. This semantics
relies on symbolic expressions stored in memory that are normalised into genuine
values when needed by the semantics. It handles low-level C idioms that exploit
the concrete encoding of pointers (e.g. alignment constraints) or access partially
undefined data structures (e.g. bit-fields). Such properties cannot be reasoned
about using the existing CompCert memory model [19,18].
The memory model of CompCert consists of two parts: standard operations
on memory (e.g. alloc, store) that are used in the semantics of the languages of
CompCert and their properties (that are required to prove the semantic preser-
vation of the compiler), together with generic transformations operating over
memory. Indeed, certain passes of the compiler perform non-trivial transforma-
tions on memory allocations and accesses: for instance, in the front-end, C local
variables initially mapped to individually-allocated memory blocks are later on
mapped to sub-blocks of a single stack-allocated activation record. Proving the
semantic preservation of these transformations requires extensive reasoning over
memory states, using memory invariants relating memory states during program
execution, that are also defined in the memory model.
In this paper, we extend the memory model of CompCert with symbolic ex-
pressions [3] and tackle the challenge of porting memory transformations and
CompCert’s proofs to our memory model with symbolic expressions. The com-
plete Coq development is available online [1]. Among others, a difficulty is that
we drop the implicit assumption of an infinite memory. This has the consequence
that allocation can fail. Hence, the compiler has to ensure that the compiled pro-
gram is using less memory than the source program.
This paper presents a milestone towards a CompCert compiler adapted with
our semantics; it makes the following contributions.
We present a formal verification of our memory model within CompCert.
We prove that the existing memory model of CompCert is an abstraction of
our model thus validating the soundness of the existing semantics.
We extend the notion of memory injection, the main generic notion of mem-
ory transformation.
We adapt the proof of CompCert’s front-end passes, from CompCert C until
Cminor, thus demonstrating the feasibility of our endeavour.
The paper is organised as follows. Section 2 recalls the main features of
the existing CompCert memory model and our proposed extension. Section 3
explains how to adapt the operations of the existing CompCert memory model
to comply with the new requirements of our memory model. Section 4 shows
that the existing memory model is, in a provable way, an abstraction of our
new memory model. Section 5 presents our re-design of the notion of memory

injection that is the cornerstone of compiler passes modifying the memory layout.
Section 6 details the modifications for the proofs for the compiler front-end
passes. Related work is presented in Section 7; Section 8 concludes.
2 A More Concrete Memory Model for CompCert
In previous work [3], we propose an enhanced memory model (with symbolic
expressions) for CompCert. The model is implemented and evaluated over a
representative set of C programs. We empirically verify, using the reference in-
terpreter of CompCert, that our extension is sound with respect to the existing
semantics and that it captures low-level C idioms out of reach of the existing
memory model. This section first recalls the main features of the current Comp-
Cert memory model and then explains our extension to this memory model.
2.1 CompCert’s Memory Model
Leroy et al. [18] give a thorough presentation of the existing memory model of
CompCert, that is shared by all the languages of the compiler. We give a brief
overview of its design in order to highlight the differences with our own model.
Abstract values used in the semantics of the CompCert languages (see [19])
are the disjoint union of 32-bit integers (written as int(i) ), 32-bit floating-
point numbers (written as float( f) ), locations (written as ptr(l) ), and the
special value undef representing an arbitrary bit pattern, such as the value of an
uninitialised variable. The abstract memory is viewed as a collection of separated
blocks. A location l is a pair ( b, i) where b is a block identifier (i.e. an abstract
address) and i is an integer offset within this block. Pointer arithmetic modifies
the offset part of a location, keeping its block identifier part unchanged. A pointer
ptr ( b, i) is valid for a memory M (written valid_pointer ( M, b, i)) if the offset i
is within the two bounds of the block b.
Abstract values are loaded from (resp. stored into) memory using the load
(resp. store) memory operation. Memory chunks appear in these operations, to
describe concisely the size, type and signedness of the value being stored. These
operations return option types: we write for failure and bxc for a successful
return of a value x. The free operation may also fail (e.g. when the locations to
be freed have been freed already). The memory operation alloc never fails, as
the size of the memory is unbounded.
In the memory model, the byte-level, in-memory representation of integers
and floats is exposed, while pointers are kept abstract [18]. The concrete memory
is modelled as a map associating to each location a concrete value cv that is a
byte-sized quantity describing the current content of a memory cell. It can be
either a concrete 8-bit integer (written as bytev(b)) representing a part of an
integer or a float, ptrv(l, i) to represent the i-th byte (i {1, 2, 3, 4}) of the
location l, or undefv to model uninitialised memory.

struct {
int a0 : 1; int a1 : 1;
} bf ;
int main() {
bf .a1 = 1; return bf .a1;}
(a) Bitfield in C
1 struct { unsigned char bf1 ;} bf ;
2
3 int main(){
4 bf . bf1 = ( bf . bf1 & ˜0x2U) |
5 (( unsigned int ) 1 << 1U & 0x2U) ;
6 return ( int ) ( bf . bf1 << 30) >> 31;}
(b) Bitfield in CompCert C
Fig. 1: Emulation of bitfields in CompCert
2.2 Motivation for an Enhanced Memory Model
Our memory model with symbolic expressions [3] gives a precise semantics to
low-level C idioms which cannot be modelled by the existing memory model. The
reason is that those idioms either exploit the binary representation of pointers as
integers or reason about partially uninitialised data. For instance, it is common
for system calls, e.g. mmap or sbrk, to return 1 (instead of a pointer) to indicate
that there is no memory available. Intuitively, 1 refers to the last memory
address 0xFFFFFFFF and this cannot be a valid address because mmap returns
pointers that are aligned their trailing bits are necessarily 0s. Other examples
are robust implementations of malloc: for the sake of checking the integrity of
pointers, their trailing bits store a checksum. This is possible because those
pointers are also aligned and therefore the trailing bits are necessarily 0s.
Another motivation is illustrated by the current handling of bitfields in
CompCert: they are emulated in terms of bit-level operations by an elabora-
tion pass preceding the formally verified front-end. Fig. 1 gives an example of
such a transformation. The program defines a bitfield bf such that a0 and a1 are
1 bit long. The main function sets the field a1 of bf to 1 and then returns this
value. The expected semantics is therefore that the program returns 1. The trans-
formed code (Fig. 1b) is not very readable but the gist of it is that field accesses
are encoded using bitwise and shift operators. The transformation is correct and
the target code generated by CompCert correctly returns 1. However, using the
existing memory model, the semantics is undefined. Indeed, the program starts
by reading the field __fd1 of the uninitialised structure bf. The value is therefore
undef. Moreover, shift and bitwise operators are strict in undef and therefore
return undef. As a result, the program returns undef. As we show in the next
section, our semantics is able to model partially undefined values and therefore
gives a semantics to bitfields. Even though this case could be easily solved by
modifying the pre-processing step, C programmers might themselves write such
low-level code with reads of undefined memory and expect it to behave correctly.
2.3 A Memory Model with Symbolic Expressions
To give a semantics to the previous idioms, a direct approach is to have a fully
concrete memory model where a pointer is a genuine integer and the memory is

Citations
More filters
Book ChapterDOI
18 Jul 2021
TL;DR: In this paper, the authors present a robust support for verifying memory optimizations of the LLVM compiler, which has been recently developed to verify subsets of LLVM optimizations, but none of these tools has robust support to verify memory optimizations.
Abstract: Several automatic verification tools have been recently developed to verify subsets of LLVM’s optimizations. However, none of these tools has robust support to verify memory optimizations.

4 citations

Journal ArticleDOI
TL;DR: Transformations over assembly code are common in many compilers, but these transformations are also some of the most bug-dense compiler components.
Abstract: Transformations over assembly code are common in many compilers. These transformations are also some of the most bug-dense compiler components. Such bugs could be elim- inated by formally verifying...

3 citations

Dissertation
09 Nov 2016
TL;DR: An extension of the CompCert compiler is presented that aims at providing formal guarantees about the compilation of more programs than CompCert does, and a memory model for CompCert is proposed that makes pointer arithmetic and uninitialised data manipulation defined and captures the behaviour of previously undefined C idioms.
Abstract: Cette these presente une extension du compilateur CompCert permettant de fournir des garanties formelles de preservation semantique a des programmes auxquels CompCert n'en donne pas. CompCert est un compilateur pour le langage C vers differentes architectures qui fournit, en plus d'un executable compile, des garanties formelles concernant le comportement du programme assembleur genere. En particulier, tout programme C ayant une semantique definie selon le standard C est compile en un programme assembleur equivalent, c'est-a-dire qui a la meme semantique. En revanche, ce theoreme n'assure aucune garantie lorsque le programme source n'a pas de semantique definie : on parle en C de comportement indefini. Toutefois, des programmes C issus de reels projets largement utilises contiennent des comportements indefinis. Cette these detaille dans un premier temps un certain nombre d'exemples de programmes C qui declenchent des comportements indefinis. Nous argumentons que ces programmes devraient tout de meme beneficier du theoreme de preservation semantique de CompCert, d'abord parce qu'ils apparaissent dans de vrais projets et parce que leur utilisation des comportements indefinis semble legitime. Dans ce but, nous proposons d'abord un modele memoire pour CompCert qui definit l'arithmetique arbitraire de pointeurs et la manipulation de donnees non initialisees, a l'aide d'un formalisme de valeurs symboliques qui capturent la semantique d'operations non definies dans le standard. Nous adaptons l'integralite du modele memoire de CompCert avec ces valeurs symboliques, puis nous adaptons les semantiques formelles de chacun des langages intermediaires de CompCert. Nous montrons que ces semantiques symboliques sont un raffinement des semantiques existantes dans CompCert, et nous montrons par ailleurs que ces semantiques capturent effectivement le comportement des programmes sus-cites. Enfin, afin d'obtenir des garanties similaires a celles que CompCert fournit, nous devons adapter les preuves de preservation semantique a notre nouveau modele. Pour ce faire, nous generalisons d'importantes techniques de preuves comme les injections memoire, ce qui nous permet de transporter les preuves de CompCert sur nos nouvelles semantiques. Nous obtenons ainsi un theoreme de preservation semantique qui traite plus de programmes C.

3 citations

Dissertation
09 Nov 2016
TL;DR: In this paper, the CompCert compiler is extended to provide formal guarantees about the compilation of more programs than CompCert does, by making pointer arithmetic and uninitialised data manipulation defined, introducing a notion of symbolic values that capture the meaning of otherwise undefined idioms.
Abstract: This thesis presents an extension of the CompCert compiler that aims at providing formal guarantees about the compilation of more programs than CompCert does. The CompCert compiler compiles C code into assembly code for various architectures and provides formal guarantees about the behaviour of the compiled assembly program. It states that whenever the C program has a defined semantics, the generated assembly program behaves similarly. However, the theorem does not provide any guarantee when the source program has undefined semantics, or, in C parlance, when it exhibits undefined behaviour, even though those behaviours actually happen in real-world code. This thesis exhibits a number of C idioms, that occur in real-life code and whose behaviour is undefined according to the C standard. Because they happen in real programs, our goal is to enhance the CompCert verified compiler so that it also provides formal guarantees for those programs. To that end, we propose a memory model for CompCert that makes pointer arithmetic and uninitialised data manipulation defined, introducing a notion of symbolic values that capture the meaning of otherwise undefined idioms. We adapt the whole memory model of CompCert with this new formalism and adapt the semantics of all the intermediate languages. We prove that our enhanced semantics subsumes that of CompCert. Moreover, we show that these symbolic semantics capture the behaviour of the previously undefined C idioms. The proof of semantic preservation from CompCert needs to be reworked to cope with our model. We therefore generalize important proof techniques such as memory injections, which enable us to port the whole proof of CompCert to our new memory model, therefore providing formal guarantees for more programs.
Dissertation
17 Dec 2015
TL;DR: Ce travail ouvre la voie a la conception d’un compilateur certifie pour un langage parallele ciblant des architectures paralleles multi-coeurs tels que les langages a squelettes algorithmiques.
Abstract: Les applications informatiques sont de plus en plus presentes dans nos vies. Pour les applications critiques (medecine, transport, . . .), les consequences d’une erreur informatique ont un cout inacceptable, que ce soit sur le plan humain ou financier. Une des methodes pour eviter la presence d’erreurs dans les programmes est la verification deductive. Celle-ci s’applique a des programmes ecrits dans des langages de haut-niveau transformes, par des compilateurs, en programmes ecrits en langage machine. Les compilateurs doivent etre corrects pour ne pas propager d’erreurs au langage machine. Depuis 2005, les processeurs multi-coeurs se sont repandus dans l’ensemble des systemes informatiques. Ces architectures necessitent des compilateurs et des preuves de correction adaptees. Notre contribution est l’extension modulaire d’un compilateur verifie pour un langage parallele ciblant des architectures paralleles multi-coeurs. Les specifications des langages (et leurs semantiques operationnelles) presents aux divers niveaux du compilateur ainsi que les preuves de la correction du compilateur sont parametrees par des modules specifiant des elements de parallelisme tels qu’un modele memoire faible et des notions de synchronisation et d’ordonnancement entre processus legers. Ce travail ouvre la voie a la conception d’un compilateur certifie pour des langages paralleles de haut-niveau tels que les langages a squelettes algorithmiques.

Cites background from "A Concrete Memory Model for CompCer..."

  • ...A concrete memory model for CompCert....

    [...]

  • ...Conclusion Dans ce chapitre, nous avons utilisé une méthode de définition de sémantiques opérationnelles de langages par systèmes de transitions étiquetées synchronisés pour étendre l’approche de CompCertTSO en donnant le moyen de 76 Chapitre 4....

    [...]

  • ...Il est à noter que plusieurs travaux traitent du modèle mémoire de CompCert [12, 5, 11] : il s’agit ici essentiellement de différentes façons de représenter l’organisation de la mémoire, et non pas, comme dans le cas des modèles mé- 3.2....

    [...]

  • ...CompCert est un compilateur optimisant....

    [...]

  • ...CompCert est organisé en de multiples passes et pour un langage machine cible donné (il peut générer du code assembleur pour trois architectures différentes: ARM, PowerPC et x86-32), il manipule dix langages : un léger sousensemble du langage C tel que défini par la norme, les langages Clight, C#minor, Cminor, CminorSel, RTL, LTL, Linear, Mach, et Asm (dépendant de l’architecture) qui sont des langages intermédiaires au compilateur et enfin l’assembleur de l’architecture considérée....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This paper reports on the development and formal verification of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness.
Abstract: This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of critical software and its formal verification: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.

1,124 citations


"A Concrete Memory Model for CompCer..." refers background or methods in this paper

  • ...The CompCert C semantics [5] provides the specification for the correctness of the CompCert compiler [17]....

    [...]

  • ...[9,15,17])....

    [...]

  • ...The CompCert compiler [17] fills this verification gap: its semantics preservation theorem ensures that when the source program has a defined semantics, program invariants proved at source level still hold for the compiled code....

    [...]

Journal ArticleDOI
04 Jun 2011
TL;DR: Csmith, a randomized test-case generation tool, is created and spent three years using it to find compiler bugs, and a collection of qualitative and quantitative results about the bugs it found are presented.
Abstract: Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.

799 citations


"A Concrete Memory Model for CompCer..." refers methods in this paper

  • ...With this respect, the CompCert C semantics successfully run hundreds of random test programs generated by CSmith [23]....

    [...]

Book ChapterDOI
20 Aug 2009
TL;DR: This paper motivates VCC, describes the verification methodology, the architecture of VCC is described, and the experience using VCC to verify the Microsoft Hyper-V hypervisor is reported on.
Abstract: VCC is an industrial-strength verification environment for low-level concurrent system code written in C. VCC takes a program (annotated with function contracts, state assertions, and type invariants) and attempts to prove the correctness of these annotations. It includes tools for monitoring proof attempts and constructing partial counterexample executions for failed proofs. This paper motivates VCC, describes our verification methodology, describes the architecture of VCC, and reports on our experience using VCC to verify the Microsoft Hyper-V hypervisor.

584 citations


"A Concrete Memory Model for CompCer..." refers methods in this paper

  • ...VCC [7] generates verification conditions using an abstract typed memory model [8] where the memory is a mapping from typed pointers to structured C values....

    [...]

Journal ArticleDOI
25 Jan 2012
TL;DR: The semantics is shown capable of automatically finding program errors, both statically and at runtime, and it is also used to enumerate nondeterministic behavior.
Abstract: This paper describes an executable formal semantics of C. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 99.2% of 776 test programs. It is the most complete and thoroughly tested formal definition of C to date. The semantics yields an interpreter, debugger, state space search tool, and model checker "for free". The semantics is shown capable of automatically finding program errors, both statically and at runtime. It is also used to enumerate nondeterministic behavior.

209 citations


Additional excerpts

  • ...[9,15,17])....

    [...]

17 Jul 2011
TL;DR: In this paper, the authors present an executable formal semantics of C. The semantics yields an interpreter, debugger, state space search tool, and model checker, which is shown capable of automatically finding program errors, both statically and at runtime.
Abstract: This paper describes an executable formal semantics of C. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 770 of 776 test programs. It is the most complete and thoroughly tested formal definition of C to date. The semantics yields an interpreter, debugger, state space search tool, and model checker “for free”. The semantics is shown capable of automatically finding program errors, both statically and at runtime. It is also used to enumerate nondeterministic behavior.

188 citations

Frequently Asked Questions (2)
Q1. What are the contributions in "A concrete memory model for compcert" ?

This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. The authors prove that the existing memory model is an abstraction of their more concrete model thus validating formally the soundness of CompCert ’ s abstract semantics of pointers. The authors also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to their novel memory model. 

As future work, the authors shall study how to adapt the back-end of CompCert. Withstanding the remaining difficulties, the authors believe that the full CompCert compiler can be ported to their novel memory model. This would improve further the confidence in the generated code.