scispace - formally typeset

Book ChapterDOI

A Concrete Memory Model for CompCert

24 Aug 2015-Iss: 9236, pp 67-83

TL;DR: This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs and proves formally the soundness of CompCert’s abstract semantics of pointers.

AbstractSemantics preserving compilation of low-level C programs is challenging because their semantics is implementation defined according to the C standard. This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. In our new formally verified memory model, pointers are still abstract but are nonetheless mapped to concrete 32-bit integers. Hence, the memory is finite and it is possible to reason about the binary encoding of pointers. We prove that the existing memory model is an abstraction of our more concrete model thus validating formally the soundness of CompCert’s abstract semantics of pointers. We also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to our novel memory model.

Topics: Memory model (59%), Semantics (computer science) (52%), Compiler (52%), Soundness (51%)

Summary (4 min read)

1 Introduction

  • Yet, a theorem about the source code of a safety critical software is not sufficient.
  • The CompCert compiler [17] fills this verification gap: its semantics preservation theorem ensures that when the source program has a defined semantics, program invariants proved at source level still hold for the compiled code.
  • Yet, these approaches are, by essence, limited by the formal semantics of CompCert C: programs exhibiting undefined behaviours cannot benefit from any semantic preservation guarantee.
  • The authors prove that the existing memory model of CompCert is an abstraction of their model thus validating the soundness of the existing semantics.
  • The authors adapt the proof of CompCert’s front-end passes, from CompCert C until Cminor, thus demonstrating the feasibility of their endeavour.

2 A More Concrete Memory Model for CompCert

  • In previous work [3], the authors propose an enhanced memory model (with symbolic expressions) for CompCert.
  • The authors empirically verify, using the reference interpreter of CompCert, that their extension is sound with respect to the existing semantics and that it captures low-level C idioms out of reach of the existing memory model.
  • This section first recalls the main features of the current CompCert memory model and then explains their extension to this memory model.

2.1 CompCert’s Memory Model

  • Leroy et al. [18] give a thorough presentation of the existing memory model of CompCert, that is shared by all the languages of the compiler.
  • The authors give a brief overview of its design in order to highlight the differences with their own model.
  • Pointer arithmetic modifies the offset part of a location, keeping its block identifier part unchanged.
  • The free operation may also fail (e.g. when the locations to be freed have been freed already).
  • In the memory model, the byte-level, in-memory representation of integers and floats is exposed, while pointers are kept abstract [18].

2.2 Motivation for an Enhanced Memory Model

  • The authors memory model with symbolic expressions [3] gives a precise semantics to low-level C idioms which cannot be modelled by the existing memory model.
  • Other examples are robust implementations of malloc: for the sake of checking the integrity of pointers, their trailing bits store a checksum.
  • This is possible because those pointers are also aligned and therefore the trailing bits are necessarily 0s.
  • The expected semantics is therefore that the program returns 1.
  • The transformation is correct and the target code generated by CompCert correctly returns 1.

2.3 A Memory Model with Symbolic Expressions

  • This model lacks an essential property of CompCert’s semantics: determinism.
  • Determinism is instrumental for the simulation proofs of the compiler passes and its absence is a show stopper.
  • The authors define the evaluation of expressions as the function J·Kcm, parametrised by the concrete mapping cm.
  • Pointers are turned into their concrete value, as dictated by cm.
  • The value of the expression is 1 whatever the value of undef and therefore the normalisation succeeds and returns, as expected, the value 1.

3 Proving the Operations of the Memory Model

  • CompCert’s memory model exports an interface summarising all the properties of the memory operations necessary to prove the compiler passes.
  • This section details how the properties and the proofs need to be adapted to accommodate for symbolic expressions.
  • Second, the authors introduce an equivalence relation between symbolic expressions.

3.1 Precise Handling of Undefined Values

  • Symbolic expressions (as presented in Section 2.3) feature a unique undef token.
  • This is a shortcoming that the authors have identified during the proof.
  • With a single undef, the authors do not capture the fact that different occurrences of undef may represent the same unknown value, or different ones.
  • To overcome this problem, each byte of a newly allocated memory chunk is initialised with a fresh undef value.
  • Hence, x − x constructs the symbolic expression undef(b, o)− undef(b, o) for some b and o which obviously normalises to 0, because undef(b, o) now represents a unique value rather than the set of all values.

3.2 Memory Allocation

  • CompCert’s alloc operation always allocates a memory chunk of the requested size and returns a fresh block to the newly allocated memory (i.e. it models an infinite memory).
  • The first guarantee is that for every memory m there exists at least a concrete memory compatible with the abstract CompCert block-based memory.
  • To get this property, the alloc function runs a greedy algorithm constructing a compatible cm mapping.
  • Given a memory m, size_mem(m) returns the size of the constructed memory (i.e. the first fresh address as computed by the allocation).
  • The algorithm makes the pessimistic assumption that the allocated blocks are maximally aligned – for CompCert, this maximum is 3 bits (addresses are divisible by 23).

3.3 Good Variable Properties

  • In CompCert, the so-called good variable properties axiomatise the behaviour of the memory operations.
  • The reverse operation is the concatenation of a symbolic expression sv1 with a symbolic expression sv2 representing a byte.
  • The authors have generalised and proved the axioms of the memory model using the same principle.
  • Moreover, if the structure of the proofs is similar, their proofs are complicated by the fact that the authors reason modulo normalisation of expressions.

4 Cross-validation of Memory Models

  • The semantics of the CompCert C language is part of the trusted computing base of the compiler.
  • If the resulting offset is outside the bounds, their normalisation returns undef.
  • After the easy fix, the authors found two interesting semantics discrepancies with the current semantics of CompCert C. However, when running the compiled program, the pointer is a mere integer, the integer eventually overflows; wraps around and becomes 0.
  • After adjusting both memory models, the authors are able to prove that both semantics agree when the existing CompCert C semantics is defined thus cross-validating the semantics of operators.

5 Redesign of Memory Injections

  • Memory injections are instrumental for proving the correctness of several compiler passes of CompCert.
  • A memory injection defines a mapping between memories; it is a versatile tool to explain how passes reorganise the memory (e.g. construct an activation record from local variables).
  • This section explains how to generalise this concept for symbolic expressions.
  • It requires a careful handling of undefined values undef(l) which are absent from the existing memory model.

5.1 Memory Injections in CompCert

  • The injection relation is defined over values (and called val_inject) and then lifted to memories (and called inject).
  • The val_inject relation distinguishes three cases: 1. For concrete values (i.e. integers or floating-point numbers), the relation is reflexive: e.g. int(i) is in relation with int(i) ; 2. ptr(b, i) is in relation with ptr(b′, i+ δ) when f(b) = b(b′, δ)c; 3. undef is in relation with any value (including undef).
  • The purpose of the injection is twofold: it establishes a relation between pointers using the function f but it can also specialise undef by a defined value.
  • In CompCert, so-called generic memory injections state that every valid location in memory m1 is mapped by function f into a valid location in memory m2; the corresponding location in m2 must be properly aligned with respect to the size of the block; and the values stored at corresponding locations must be in injection.
  • Among other conditions, the authors have that if several blocks in m1 are mapped to the same block in m2, the mapping ensures the absence of overlapping.

5.2 Memory Injection with Symbolic Expressions

  • The function f is still present and serves the same purpose.
  • The authors injection expr_inject is therefore defined as the composition of the function apply_spe spe which specialises undef(l) into concrete bytes, and the function apply_inj f which injects locations.
  • This model makes the implicit assumption that memory blocks are always sufficiently aligned.
  • The existing formalisation of inject has a property mi_representable which states that the offset o+ δ obtained after injection does not overflow.

5.3 Memory Injection and Normalisation

  • The authors normalisation is defined w.r.t. all the concrete memories compatible with the CompCert block-based memory (see Section 2.3).
  • Theorem norm_inject shows that under the condition that all blocks are injected, if e and e′ are in injection, then their normalisations are in injection too.
  • Thus, the normalisation can only get more defined after injection.
  • This is expected as the injection can merge blocks and therefore makes pointer arithmetic more defined.
  • A consequence of this theorem is that the compiler is not allowed to reduce the memory usage.

6 Proving the Front-end of the CompCert Compiler

  • Later compiler passes are architecture dependent and are therefore part of the back-end.
  • This section explains how to adapt the semantics preservation proofs of the front-end to their memory model with symbolic expressions.

6.1 CompCert Front-end with Symbolic Expressions

  • The semantics of all intermediate languages need to be modified in order to account for symbolic expressions.
  • In reality, the transformation is more subtle because, for instance, certain intermediate semantic functions explicitly require locations represented as pairs (b, o).
  • This solution proves wrong and breaks semantics preservation proofs because introduced normalisations may be absent in subsequent intermediate languages.
  • This pass does not transform the memory and therefore the existing proof can be reused.
  • The pass also performs type-directed transformations and removes redundant casts.

2. allocation of local variables

  • This relation is too weak and fails to pass the induction step.
  • The problem is related with the preservation of the memory injection when allocating and de-allocating the variables in C]minor and the stack frame in Cminor.
  • Once again, the authors adapt the two-step proof with a direct induction over the number of variables.
  • To carry out this proof and establish an injection the authors have to reason about the relative sizes of the memories.
  • Here, the authors have to deal with the opposite situation where the stack frame could use less memory than the variables.

8 Conclusion

  • This work is a milestone towards a CompCert compiler proved correct with respect to a more concrete memory model.
  • A side-product of their work is that the authors have uncovered and fixed a problem in the existing semantics of the comparison with the null pointer.
  • The authors are confident that program optimisations based on static analyses will not be problematic.
  • Withstanding the remaining difficulties, the authors believe that the full CompCert compiler can be ported to their novel memory model.
  • This would improve further the confidence in the generated code.

Did you find this useful? Give us your feedback

...read more

Content maybe subject to copyright    Report

HAL Id: hal-01194549
https://hal.inria.fr/hal-01194549
Submitted on 7 Sep 2015
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Copyright
A Concrete Memory Model for CompCert
Frédéric Besson, Sandrine Blazy, Pierre Wilke
To cite this version:
Frédéric Besson, Sandrine Blazy, Pierre Wilke. A Concrete Memory Model for CompCert. ITP 2015 :
6th International Conference on Interactive Theorem Proving, Aug 2015, Nanjing, China. pp.67-83,
�10.1007/978-3-319-22102-1_5�. �hal-01194549�

A Concrete Memory Model for CompCert
Fr´ed´eric Besson
1
, Sandrine Blazy
2
, and Pierre Wilke
2
1
Inria, Rennes, France
2
Universit´e Rennes 1 - IRISA, Rennes, France
Abstract. Semantics preserving compilation of low-level C programs is
challenging because their semantics is implementation defined according
to the C standard. This paper presents the proof of an enhanced and
more concrete memory model for the CompCert C compiler which as-
signs a definite meaning to more C programs. In our new formally verified
memory model, pointers are still abstract but are nonetheless mapped
to concrete 32-bit integers. Hence, the memory is finite and it is possible
to reason about the binary encoding of pointers. We prove that the ex-
isting memory model is an abstraction of our more concrete model thus
validating formally the soundness of CompCert’s abstract semantics of
pointers. We also show how to adapt the front-end of CompCert thus
demonstrating that it should be feasible to port the whole compiler to
our novel memory model.
1 Introduction
Formal verification of programs is usually performed at source level. Yet, a the-
orem about the source code of a safety critical software is not sufficient. Even-
tually, what we really value is a guarantee about the run-time behaviour of the
compiled program running on a physical machine. The CompCert compiler [17]
fills this verification gap: its semantics preservation theorem ensures that when
the source program has a defined semantics, program invariants proved at source
level still hold for the compiled code. For the C language the rules governing so
called undefined behaviours are subtle and the absence of undefined behaviours
is in general undecidable. As a corollary, for a given C program, it is undecidable
whether the semantic preservation applies or not.
To alleviate the problem, the semantics of CompCert C is executable and
it is therefore possible to check that a given program execution has a defined
semantics. Jourdan et al. [12] propose a more comprehensive and ambitious
approach: they formalise and verify a precise C static analyser for CompCert
capable of ruling out undefined behaviours for a wide range of programs. Yet,
these approaches are, by essence, limited by the formal semantics of CompCert
C: programs exhibiting undefined behaviours cannot benefit from any semantic
preservation guarantee. This is unfortunate as real programs do have behaviours
that are undefined according to the formal semantics of CompCert C
3
. This can
This work was partially supported by the French ANR-14-CE28-0014 AnaStaSec.
3
The official C standard is in general even stricter.

be a programming mistake but sometimes this is a design feature. In the past,
serious security flaws have been introduced by optimising compilers aggressively
exploiting the latitude provided by undefined behaviours [22,6]. The existing
workaround is not satisfactory and consists in disabling optimisations known to
be triggered by undefined behaviours.
In previous work [3], we proposed a more concrete and defined semantics
for CompCert C able to give a semantics to low-level C idioms. This semantics
relies on symbolic expressions stored in memory that are normalised into genuine
values when needed by the semantics. It handles low-level C idioms that exploit
the concrete encoding of pointers (e.g. alignment constraints) or access partially
undefined data structures (e.g. bit-fields). Such properties cannot be reasoned
about using the existing CompCert memory model [19,18].
The memory model of CompCert consists of two parts: standard operations
on memory (e.g. alloc, store) that are used in the semantics of the languages of
CompCert and their properties (that are required to prove the semantic preser-
vation of the compiler), together with generic transformations operating over
memory. Indeed, certain passes of the compiler perform non-trivial transforma-
tions on memory allocations and accesses: for instance, in the front-end, C local
variables initially mapped to individually-allocated memory blocks are later on
mapped to sub-blocks of a single stack-allocated activation record. Proving the
semantic preservation of these transformations requires extensive reasoning over
memory states, using memory invariants relating memory states during program
execution, that are also defined in the memory model.
In this paper, we extend the memory model of CompCert with symbolic ex-
pressions [3] and tackle the challenge of porting memory transformations and
CompCert’s proofs to our memory model with symbolic expressions. The com-
plete Coq development is available online [1]. Among others, a difficulty is that
we drop the implicit assumption of an infinite memory. This has the consequence
that allocation can fail. Hence, the compiler has to ensure that the compiled pro-
gram is using less memory than the source program.
This paper presents a milestone towards a CompCert compiler adapted with
our semantics; it makes the following contributions.
We present a formal verification of our memory model within CompCert.
We prove that the existing memory model of CompCert is an abstraction of
our model thus validating the soundness of the existing semantics.
We extend the notion of memory injection, the main generic notion of mem-
ory transformation.
We adapt the proof of CompCert’s front-end passes, from CompCert C until
Cminor, thus demonstrating the feasibility of our endeavour.
The paper is organised as follows. Section 2 recalls the main features of
the existing CompCert memory model and our proposed extension. Section 3
explains how to adapt the operations of the existing CompCert memory model
to comply with the new requirements of our memory model. Section 4 shows
that the existing memory model is, in a provable way, an abstraction of our
new memory model. Section 5 presents our re-design of the notion of memory

injection that is the cornerstone of compiler passes modifying the memory layout.
Section 6 details the modifications for the proofs for the compiler front-end
passes. Related work is presented in Section 7; Section 8 concludes.
2 A More Concrete Memory Model for CompCert
In previous work [3], we propose an enhanced memory model (with symbolic
expressions) for CompCert. The model is implemented and evaluated over a
representative set of C programs. We empirically verify, using the reference in-
terpreter of CompCert, that our extension is sound with respect to the existing
semantics and that it captures low-level C idioms out of reach of the existing
memory model. This section first recalls the main features of the current Comp-
Cert memory model and then explains our extension to this memory model.
2.1 CompCert’s Memory Model
Leroy et al. [18] give a thorough presentation of the existing memory model of
CompCert, that is shared by all the languages of the compiler. We give a brief
overview of its design in order to highlight the differences with our own model.
Abstract values used in the semantics of the CompCert languages (see [19])
are the disjoint union of 32-bit integers (written as int(i) ), 32-bit floating-
point numbers (written as float( f) ), locations (written as ptr(l) ), and the
special value undef representing an arbitrary bit pattern, such as the value of an
uninitialised variable. The abstract memory is viewed as a collection of separated
blocks. A location l is a pair ( b, i) where b is a block identifier (i.e. an abstract
address) and i is an integer offset within this block. Pointer arithmetic modifies
the offset part of a location, keeping its block identifier part unchanged. A pointer
ptr ( b, i) is valid for a memory M (written valid_pointer ( M, b, i)) if the offset i
is within the two bounds of the block b.
Abstract values are loaded from (resp. stored into) memory using the load
(resp. store) memory operation. Memory chunks appear in these operations, to
describe concisely the size, type and signedness of the value being stored. These
operations return option types: we write for failure and bxc for a successful
return of a value x. The free operation may also fail (e.g. when the locations to
be freed have been freed already). The memory operation alloc never fails, as
the size of the memory is unbounded.
In the memory model, the byte-level, in-memory representation of integers
and floats is exposed, while pointers are kept abstract [18]. The concrete memory
is modelled as a map associating to each location a concrete value cv that is a
byte-sized quantity describing the current content of a memory cell. It can be
either a concrete 8-bit integer (written as bytev(b)) representing a part of an
integer or a float, ptrv(l, i) to represent the i-th byte (i {1, 2, 3, 4}) of the
location l, or undefv to model uninitialised memory.

struct {
int a0 : 1; int a1 : 1;
} bf ;
int main() {
bf .a1 = 1; return bf .a1;}
(a) Bitfield in C
1 struct { unsigned char bf1 ;} bf ;
2
3 int main(){
4 bf . bf1 = ( bf . bf1 & ˜0x2U) |
5 (( unsigned int ) 1 << 1U & 0x2U) ;
6 return ( int ) ( bf . bf1 << 30) >> 31;}
(b) Bitfield in CompCert C
Fig. 1: Emulation of bitfields in CompCert
2.2 Motivation for an Enhanced Memory Model
Our memory model with symbolic expressions [3] gives a precise semantics to
low-level C idioms which cannot be modelled by the existing memory model. The
reason is that those idioms either exploit the binary representation of pointers as
integers or reason about partially uninitialised data. For instance, it is common
for system calls, e.g. mmap or sbrk, to return 1 (instead of a pointer) to indicate
that there is no memory available. Intuitively, 1 refers to the last memory
address 0xFFFFFFFF and this cannot be a valid address because mmap returns
pointers that are aligned their trailing bits are necessarily 0s. Other examples
are robust implementations of malloc: for the sake of checking the integrity of
pointers, their trailing bits store a checksum. This is possible because those
pointers are also aligned and therefore the trailing bits are necessarily 0s.
Another motivation is illustrated by the current handling of bitfields in
CompCert: they are emulated in terms of bit-level operations by an elabora-
tion pass preceding the formally verified front-end. Fig. 1 gives an example of
such a transformation. The program defines a bitfield bf such that a0 and a1 are
1 bit long. The main function sets the field a1 of bf to 1 and then returns this
value. The expected semantics is therefore that the program returns 1. The trans-
formed code (Fig. 1b) is not very readable but the gist of it is that field accesses
are encoded using bitwise and shift operators. The transformation is correct and
the target code generated by CompCert correctly returns 1. However, using the
existing memory model, the semantics is undefined. Indeed, the program starts
by reading the field __fd1 of the uninitialised structure bf. The value is therefore
undef. Moreover, shift and bitwise operators are strict in undef and therefore
return undef. As a result, the program returns undef. As we show in the next
section, our semantics is able to model partially undefined values and therefore
gives a semantics to bitfields. Even though this case could be easily solved by
modifying the pre-processing step, C programmers might themselves write such
low-level code with reads of undefined memory and expect it to behave correctly.
2.3 A Memory Model with Symbolic Expressions
To give a semantics to the previous idioms, a direct approach is to have a fully
concrete memory model where a pointer is a genuine integer and the memory is

Citations
More filters

Proceedings ArticleDOI
02 Jun 2016
TL;DR: An in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice is described, a step towards clear, consistent, and accepted semantics for the various use-cases of C.
Abstract: C remains central to our computing infrastructure. It is notionally defined by ISO standards, but in reality the properties of C assumed by systems code and those implemented by compilers have diverged, both from the ISO standards and from each other, and none of these are clearly understood. We make two contributions to help improve this error-prone situation. First, we describe an in-depth analysis of the design space for the semantics of pointers and memory in C as it is used in practice. We articulate many specific questions, build a suite of semantic test cases, gather experimental data from multiple implementations, and survey what C experts believe about the de facto standards. We identify questions where there is a consensus (either following ISO or differing) and where there are conflicts. We apply all this to an experimental C implemented above capability hardware. Second, we describe a formal model, Cerberus, for large parts of C. Cerberus is parameterised on its memory model; it is linkable either with a candidate de facto memory object model, under construction, or with an operational C11 concurrency model; it is defined by elaboration to a much simpler Core language for accessibility, and it is executable as a test oracle on small examples. This should provide a solid basis for discussion of what mainstream C is now: what programmers and analysis tools can assume and what compilers aim to implement. Ultimately we hope it will be a step towards clear, consistent, and accepted semantics for the various use-cases of C.

82 citations


Cites methods from "A Concrete Memory Model for CompCer..."

  • ...[6, 7], the model used for seL4 verification by Tuch et al....

    [...]


Proceedings ArticleDOI
02 Jun 2016
TL;DR: Peek is presented, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert, and a set of local properties are proved are sufficient to ensure global transformation correctness.
Abstract: Transformations over assembly code are common in many compilers. These transformations are also some of the most bug-dense compiler components. Such bugs could be elim- inated by formally verifying the compiler, but state-of-the- art formally verified compilers like CompCert do not sup- port assembly-level program transformations. This paper presents Peek, a framework for expressing, verifying, and running meaning-preserving assembly-level program trans- formations in CompCert. Peek contributes four new com- ponents: a lower level semantics for CompCert x86 syntax, a liveness analysis, a library for expressing and verifying peephole optimizations, and a verified peephole optimiza- tion pass built into CompCert. Each of these is accompanied by a correctness proof in Coq against realistic assumptions about the calling convention and the system memory alloca- tor. Verifying peephole optimizations in Peek requires prov- ing only a set of local properties, which we have proved are sufficient to ensure global transformation correctness. We have proven these local properties for 28 peephole transfor- mations from the literature. We discuss the development of our new assembly semantics, liveness analysis, representa- tion of program transformations, and execution engine; de- scribe the verification challenges of each component; and detail techniques we applied to mitigate the proof burden.

31 citations


Cites background from "A Concrete Memory Model for CompCer..."

  • ...Recent work to develop a concrete memory allocator for CompCert [3] verifies translations against a simple, conservative memory allocator which lacks the ability to reuse memory....

    [...]


Journal ArticleDOI
02 Jan 2019
TL;DR: This paper aims to reconcile the ISO C standard, mainstream compiler behaviour, and the semantics relied on by the corpus of existing C code, and presents two coherent proposals, tracking provenance via integers and not; both address many design questions.
Abstract: The semantics of pointers and memory objects in C has been a vexed question for many years. C values cannot be treated as either purely abstract or purely concrete entities: the language exposes their representations, but compiler optimisations rely on analyses that reason about provenance and initialisation status, not just runtime representations. The ISO WG14 standard leaves much of this unclear, and in some respects differs with de facto standard usage --- which itself is difficult to investigate. In this paper we explore the possible source-language semantics for memory objects and pointers, in ISO C and in C as it is used and implemented in practice, focussing especially on pointer provenance. We aim to, as far as possible, reconcile the ISO C standard, mainstream compiler behaviour, and the semantics relied on by the corpus of existing C code. We present two coherent proposals, tracking provenance via integers and not; both address many design questions. We highlight some pros and cons and open questions, and illustrate the discussion with a library of test cases. We make our semantics executable as a test oracle, integrating it with the Cerberus semantics for much of the rest of C, which we have made substantially more complete and robust, and equipped with a web-interface GUI. This allows us to experimentally assess our proposals on those test cases. To assess their viability with respect to larger bodies of C code, we analyse the changes required and the resulting behaviour for a port of FreeBSD to CHERI, a research architecture supporting hardware capabilities, which (roughly speaking) traps on the memory safety violations which our proposals deem undefined behaviour. We also develop a new runtime instrumentation tool to detect possible provenance violations in normal C code, and apply it to some of the SPEC benchmarks. We compare our proposal with a source-language variant of the twin-allocation LLVM semantics proposal of Lee et al. Finally, we describe ongoing interactions with WG14, exploring how our proposals could be incorporated into the ISO standard.

18 citations


Cites background from "A Concrete Memory Model for CompCer..."

  • ...Later work for CompCert adds support for some low-level idioms, but not the full gamut thereof in de facto C [Besson et al. 2014, 2015, 2017; Krebbers et al. 2014; Leroy et al. 2012]....

    [...]


BookDOI
01 Jan 2017
TL;DR: The metaprogramming language currently in use in Lean, a new open source theorem prover that is designed to bridge the gap between interactive use and automation, is described and evidence is provided to show that the implementation is performant, and that it provides a convenient and flexible way of writing not only small-scale interactive tactics, but also more substantial kinds of automation.
Abstract: We describe the metaprogramming language currently in use in Lean, a new open source theorem prover that is designed to bridge the gap between interactive use and automation. Lean implements a version of the Calculus of Inductive Constructions. Its elaborator and unification algorithms are designed around the use of type classes, which support algebraic reasoning, programming abstractions, and other generally useful means of expression. Lean also has parallel compilation and checking of proofs, and provides a server mode that supports a continuous compilation and rich user interaction in editing environments such as Emacs, Vim, and Visual Studio Code. Lean currently has a conditional term rewriter, and several components commonly found in state-of-the-art Satisfiability Modulo Theories (SMT) solvers such as forward chaining, congruence closure, handling of associative and commutative operators, and E-matching. All these components are available in the metaprogramming framework, and can be combined and customized by users. In this talk, we provide a short introduction to the Lean theorem prover and its metaprogramming framework. We also describe how this framework extends Lean’s object language with an API to many of Lean’s internal structures and procedures, and provides ways of reflecting object-level expressions into the metalanguage. We provide evidence to show that our implementation is performant, and that it provides a convenient and flexible way of writing not only small-scale interactive tactics, but also more substantial kinds of automation. We view this as important progress towards our overarching goal of bridging the gap between interactive and automated reasoning. Users who develop libraries for interactive use can now more easily develop special-purpose automation to go with them thereby encoding procedural heuristics and expertise alongside factual knowledge. At the same time, users who want to use Lean as a back end to assist in complex verification tasks now have flexible means of adapting Lean’s libraries and automation to their specific needs. As a result, our metaprogramming language opens up new opportunities, allowing for more natural and intuitive forms of interactive reasoning, as well as for more flexible and reliable forms of automation. More information about Lean can be found at http://leanprover.github.io. The interactive book “Theorem Proving in Lean” is the standard reference for Lean. The book is available in PDF and HTML formats. In the HTML version, all examples and exercises can be executed in the reader’s web browser. 1 https://leanprover.github.io/theorem_proving_in_lean. XIV Whitebox Automation

15 citations


Book ChapterDOI
26 Sep 2017
TL;DR: A formally verified C compiler, CompCertS, is presented, which is essentially the CompCert compiler, albeit with a stronger formal guarantee: it gives a semantics to more programs and ensures that the memory consumption is preserved by the compiler.
Abstract: The CompCert C compiler provides the formal guarantee that the observable behaviour of the compiled code improves on the observable behaviour of the source code. In this paper, we present a formally verified C compiler, CompCertS, which is essentially the CompCert compiler, albeit with a stronger formal guarantee: it gives a semantics to more programs and ensures that the memory consumption is preserved by the compiler. CompCertS is based on an enhanced memory model where, unlike CompCert but like Gcc, the binary representation of pointers can be manipulated much like integers and where, unlike CompCert, allocation may fail if no memory is available.

10 citations


Cites background or methods from "A Concrete Memory Model for CompCer..."

  • ...Our allocation algorithm [4] entails that for every memory state m, there exists a concrete memory cm that we call the canonical concrete memory of m and write canon cm(m), that is built by allocating all the blocks of m at maximally-aligned, i....

    [...]

  • ...Our previous works [3, 4] show how to extend the support for pointer arithmetic and adapt most of the front-end of CompCert to this extended semantics with the notable exception of the SimplLocals pass which requires a sophisticated proof argument detailed in the present paper....

    [...]

  • ...It also summarises the main features and properties of our memory model [3,4]....

    [...]

  • ...Two symbolic values are in injection (see [4]) if they have the same structure (the same operators are applied) and the values at the leaves of each symbolic value are in injection....

    [...]

  • ...In previous work [3,4], we extended CompCert’s memory model and gave semantics to pointer operations by replacing the value domain val by a more expressive domain sval of symbolic values....

    [...]


References
More filters

Journal ArticleDOI
TL;DR: This paper reports on the development and formal verification of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness.
Abstract: This paper reports on the development and formal verification (proof of semantic preservation) of CompCert, a compiler from Clight (a large subset of the C programming language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of critical software and its formal verification: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well.

997 citations


"A Concrete Memory Model for CompCer..." refers background or methods in this paper

  • ...The CompCert C semantics [5] provides the specification for the correctness of the CompCert compiler [17]....

    [...]

  • ...[9,15,17])....

    [...]

  • ...The CompCert compiler [17] fills this verification gap: its semantics preservation theorem ensures that when the source program has a defined semantics, program invariants proved at source level still hold for the compiled code....

    [...]


Journal ArticleDOI
04 Jun 2011
TL;DR: Csmith, a randomized test-case generation tool, is created and spent three years using it to find compiler bugs, and a collection of qualitative and quantitative results about the bugs it found are presented.
Abstract: Compilers should be correct. To improve the quality of C compilers, we created Csmith, a randomized test-case generation tool, and spent three years using it to find compiler bugs. During this period we reported more than 325 previously unknown bugs to compiler developers. Every compiler we tested was found to crash and also to silently generate wrong code when presented with valid input. In this paper we present our compiler-testing tool and the results of our bug-hunting study. Our first contribution is to advance the state of the art in compiler testing. Unlike previous tools, Csmith generates programs that cover a large subset of C while avoiding the undefined and unspecified behaviors that would destroy its ability to automatically find wrong-code bugs. Our second contribution is a collection of qualitative and quantitative results about the bugs we have found in open-source C compilers.

646 citations


"A Concrete Memory Model for CompCer..." refers methods in this paper

  • ...With this respect, the CompCert C semantics successfully run hundreds of random test programs generated by CSmith [23]....

    [...]


Book ChapterDOI
20 Aug 2009
TL;DR: This paper motivates VCC, describes the verification methodology, the architecture of VCC is described, and the experience using VCC to verify the Microsoft Hyper-V hypervisor is reported on.
Abstract: VCC is an industrial-strength verification environment for low-level concurrent system code written in C. VCC takes a program (annotated with function contracts, state assertions, and type invariants) and attempts to prove the correctness of these annotations. It includes tools for monitoring proof attempts and constructing partial counterexample executions for failed proofs. This paper motivates VCC, describes our verification methodology, describes the architecture of VCC, and reports on our experience using VCC to verify the Microsoft Hyper-V hypervisor.

555 citations


"A Concrete Memory Model for CompCer..." refers methods in this paper

  • ...VCC [7] generates verification conditions using an abstract typed memory model [8] where the memory is a mapping from typed pointers to structured C values....

    [...]


Journal ArticleDOI
25 Jan 2012
TL;DR: The semantics is shown capable of automatically finding program errors, both statically and at runtime, and it is also used to enumerate nondeterministic behavior.
Abstract: This paper describes an executable formal semantics of C. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 99.2% of 776 test programs. It is the most complete and thoroughly tested formal definition of C to date. The semantics yields an interpreter, debugger, state space search tool, and model checker "for free". The semantics is shown capable of automatically finding program errors, both statically and at runtime. It is also used to enumerate nondeterministic behavior.

196 citations


Additional excerpts

  • ...[9,15,17])....

    [...]


17 Jul 2011
Abstract: This paper describes an executable formal semantics of C. Being executable, the semantics has been thoroughly tested against the GCC torture test suite and successfully passes 770 of 776 test programs. It is the most complete and thoroughly tested formal definition of C to date. The semantics yields an interpreter, debugger, state space search tool, and model checker “for free”. The semantics is shown capable of automatically finding program errors, both statically and at runtime. It is also used to enumerate nondeterministic behavior.

188 citations


Frequently Asked Questions (2)
Q1. What are the contributions in "A concrete memory model for compcert" ?

This paper presents the proof of an enhanced and more concrete memory model for the CompCert C compiler which assigns a definite meaning to more C programs. The authors prove that the existing memory model is an abstraction of their more concrete model thus validating formally the soundness of CompCert ’ s abstract semantics of pointers. The authors also show how to adapt the front-end of CompCert thus demonstrating that it should be feasible to port the whole compiler to their novel memory model. 

As future work, the authors shall study how to adapt the back-end of CompCert. Withstanding the remaining difficulties, the authors believe that the full CompCert compiler can be ported to their novel memory model. This would improve further the confidence in the generated code.