scispace - formally typeset
Search or ask a question
Book

Formal Verification of Machine-Code Programs

14 Feb 2011-
TL;DR: The work presented in this dissertation aims to ease the effort required in proving properties of programs on top of detailed models of machine code, and presents a new approach based on translation which maps functions from logic, via proof, down to multiple carefully modelled commercial machine languages.
Abstract: Formal program verification provides mathematical means of increasing assurance for the correctness of software. Most approaches to program verification are either fully automatic and prove only weak properties, or alternatively are manual and labour intensive to apply; few target realistically modelled machine code. The work presented in this dissertation aims to ease the effort required in proving properties of programs on top of detailed models of machine code. The contributions are novel approaches for both verification of existing programs and methods for automatically constructing correct code. For program verification, this thesis presents a new approach based on translation: the problem of proving properties of programs is reduced, via fully-automatic deduction, to a problem of proving properties of recursive functions. The translation from programs to recursive functions is shown to be implementable in a theorem prover both for simple while-programs as well as real machine code. This verification-after-translation approach has several advantages over established approaches of verification condition generation. In particular, the new approach does not require annotating the program with assertions. More importantly, the proposed approach separates the verification proof from the underlying model so that specific resource names, some instruction orderings and certain control-flow structures become irrelevant. As a result, proof reuse is enabled to a greater extent than in currently used methods. The scalability of this new approach is illustrated through the verification of ARM, x86 and PowerPC implementations of a copying garbage collector. For construction of correct code, this thesis presents a new compiler which maps functions from logic, via proof, down to multiple carefully modelled commercial machine languages. Unlike previously published work on compilation from higher-order logic, this compiler allows input functions to be partially specified and supports a broad range of user-defined extensions. These features enabled the production of formally verified machine-code implementations of a LISP interpreter, as a case study. The automation and proofs have been implemented in the HOL4 theorem prover, using a new machine-code Hoare triple instantiated to detailed specifications of ARM, x86 and PowerPC instruction set architectures. “Making formal methods into normal methods.” — Peter Homeier

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An in-depth coverage of the comprehensive machine-checked formal verification of seL4, a general-purpose operating system microkernel, and the experience in maintaining this evolving formally verified code base.
Abstract: We present an in-depth coverage of the comprehensive machine-checked formal verification of seL4, a general-purpose operating system microkernel.We discuss the kernel design we used to make its verification tractable. We then describe the functional correctness proof of the kernel's C implementation and we cover further steps that transform this result into a comprehensive formal verification of the kernel: a formally verified IPC fastpath, a proof that the binary code of the kernel correctly implements the C semantics, a proof of correct access-control enforcement, a proof of information-flow noninterference, a sound worst-case execution time analysis of the binary, and an automatic initialiser for user-level systems that connects kernel-level access-control enforcement with reasoning about system behaviour. We summarise these results and show how they integrate to form a coherent overall analysis, backed by machine-checked, end-to-end theorems.The seL4 microkernel is currently not just the only general-purpose operating system kernel that is fully formally verified to this degree. It is also the only example of formal proof of this scale that is kept current as the requirements, design and implementation of the system evolve over almost a decade. We report on our experience in maintaining this evolving formally verified code base.

314 citations


Cites methods from "Formal Verification of Machine-Code..."

  • ...We used Myreen s approach of decompilation into logic [Myreen 2008] to produce a semantic model of the kernel binary after compilation and linking in the HOL4 theorem prover....

    [...]

Proceedings ArticleDOI
08 Jan 2018
TL;DR: This paper extends an existing EVM formalisation in Isabelle/HOL by a sound program logic at the level of bytecode that structure bytecode sequences into blocks of straight-line code and create a program logic to reason about these.
Abstract: Blockchain technology has increasing attention in research and across many industries. The Ethereum blockchain offers smart contracts, which are small programs defined, executed, and recorded as transactions in the blockchain transaction history. These smart contracts run on the Ethereum Virtual Machine (EVM) and can be used to encode agreements, transfer assets, and enforce integrity conditions in relationships between parties. Smart contracts can carry financial value, and are increasingly used for safety-, security-, or mission-critical purposes. Errors in smart contracts have led and will lead to loss or harm. Formal verification can provide the highest level of confidence about the correct behaviour of smart contracts. In this paper we extend an existing EVM formalisation in Isabelle/HOL by a sound program logic at the level of bytecode. We structure bytecode sequences into blocks of straight-line code and create a program logic to reason about these. This abstraction is a step towards control of the cost and complexity of formal verification of EVM smart contracts.

214 citations


Cites methods from "Formal Verification of Machine-Code..."

  • ...In his PhD thesis [15], Myreen introduced a general method of formal verification of machine code with a particular application to ARM....

    [...]

Proceedings ArticleDOI
16 Jun 2013
TL;DR: An approach for proving refinement between the formal semantics of a program on the C source level and its formal semantics on the binary level, thus checking the validity of compilation, including some optimisations, and linking, and extending static properties proved of the source code to the executable is presented.
Abstract: We extend the existing formal verification of the seL4 operating system microkernel from 9500 lines of C source code to the binary level. We handle all functions that were part of the previous verification. Like the original verification, we currently omit the assembly routines and volatile accesses used to control system hardware.More generally, we present an approach for proving refinement between the formal semantics of a program on the C source level and its formal semantics on the binary level, thus checking the validity of compilation, including some optimisations, and linking, and extending static properties proved of the source code to the executable. We make use of recent improvements in SMT solvers to almost fully automate this process.We handle binaries generated by unmodified gcc 4.5.1 at optimisation level 1, and can handle most of seL4 even at optimisation level 2.

143 citations


Cites background or methods from "Formal Verification of Machine-Code..."

  • ...Precise details on their definition be found elsewhere [16]....

    [...]

  • ...This form of decompilation is described in detail elsewhere [16] and has recently been extended [18] to also allow arbitrary use of code pointers, which the original approach struggled to handle....

    [...]

  • ...Myreen [16] has shown that decompilation can be used for verification of sizeable case studies, e....

    [...]

  • ...Informally, read ∗ below as ‘and’, formally this is a separating conjunction from separation logic [25] but here used to separate between any machine code resources [16]....

    [...]

  • ...We can keep this datastructure separate from the rest of memory by proving pre/postconditions where the ‘heap’ memory is separate from the stack using the separating conjunction ∗ from separation logic [16, 25]....

    [...]

Journal ArticleDOI
19 Sep 2011
TL;DR: The generalization of characteristic formulae to an imperative programming language is described, which supports the verification of imperative Caml programs using the Coq proof assistant and has formally verified nontrivial imperative algorithms, as well as CPS functions, higher-order iterators, and programs involving higher- order stores.
Abstract: In previous work, we introduced an approach to program verification based on characteristic formulae. The approach consists of generating a higher-order logic formula from the source code of a program. This characteristic formula is constructed in such a way that it gives a sound and complete description of the semantics of that program. The formula can thus be exploited in an interactive proof assistant to formally verify that the program satisfies a particular specification.This previous work was, however, only concerned with purely-functional programs. In the present paper, we describe the generalization of characteristic formulae to an imperative programming language. In this setting, characteristic formulae involve specifications expressed in the style of Separation Logic. They also integrate the frame rule, which enables local reasoning. We have implemented a tool based on characteristic formulae. This tool, called CFML, supports the verification of imperative Caml programs using the Coq proof assistant. Using CFML, we have formally verified nontrivial imperative algorithms, as well as CPS functions, higher-order iterators, and programs involving higher-order stores.

93 citations


Cites background from "Formal Verification of Machine-Code..."

  • ...Compared with Myreen and Gordon’s work [29, 31], the main difference is that the low-level code is not decompiled automatically but instead decompiled by hand, and that this decompilation phase is then proved correct using semi-automated tactics....

    [...]

Dissertation
01 Jan 2015
TL;DR: Memory trees are a middle ground, and therefore suitable to describe both the low-level and high-level aspects of the C memory as discussed by the authors, and are used in the external interface of the memory model and throughout the operational semantics.
Abstract: values hide internal details of the memory such as permissions, padding and object representations. They are therefore used in the external interface of the memory model and throughout the operational semantics. Memory trees, abstract values and bits with permissions can be converted into each other. These conversions are used to define operations internal to the memory model. However, none of these conversions are bijective because different information is materialized in these three data types: Abstract values Memory trees Bits with permissions Permissions X X Padding always E X Variants of union X X Mathematical values Xvalues Memory trees Bits with permissions Permissions X X Padding always E X Variants of union X X Mathematical values X This table indicates that abstract values and sequences of bits are complementary. Memory trees are a middle ground, and therefore suitable to describe both the lowlevel and high-level aspects of the C memory.

69 citations

References
More filters
Proceedings ArticleDOI
01 Jan 1977
TL;DR: In this paper, the abstract interpretation of programs is used to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations.
Abstract: A program denotes computations in some universe of objects. Abstract interpretation of programs consists in using that denotation to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations. An intuitive example (which we borrow from Sintzoff [72]) is the rule of signs. The text -1515 * 17 may be understood to denote computations on the abstract universe {(+), (-), (±)} where the semantics of arithmetic operators is defined by the rule of signs. The abstract execution -1515 * 17 → -(+) * (+) → (-) * (+) → (-), proves that -1515 * 17 is a negative number. Abstract interpretation is concerned by a particular underlying structure of the usual universe of computations (the sign, in our example). It gives a summary of some facets of the actual executions of a program. In general this summary is simple to obtain but inaccurate (e.g. -1515 + 17 → -(+) + (+) → (-) + (+) → (±)). Despite its fundamentally incomplete results abstract interpretation allows the programmer or the compiler to answer questions which do not need full knowledge of program executions or which tolerate an imprecise answer, (e.g. partial correctness proofs of programs ignoring the termination problems, type checking, program optimizations which are not carried in the absence of certainty about their feasibility, …).

6,829 citations

Book
01 Jan 1976

4,719 citations

Book ChapterDOI
TL;DR: In this paper, the authors consider the problem of reasoning about whether a strategy will achieve a goal in a deterministic world and present a method to construct a sentence of first-order logic which will be true in all models of certain axioms if and only if a certain strategy can achieve a certain goal.
Abstract: A computer program capable of acting intelligently in the world must have a general representation of the world in terms of which its inputs are interpreted. Designing such a program requires commitments about what knowledge is and how it is obtained. Thus, some of the major traditional problems of philosophy arise in artificial intelligence. More specifically, we want a computer program that decides what to do by inferring in a formal language that a certain strategy will achieve its assigned goal. This requires formalizing concepts of causality, ability, and knowledge. Such formalisms are also considered in philosophical logic. The first part of the paper begins with a philosophical point of view that seems to arise naturally once we take seriously the idea of actually making an intelligent machine. We go on to the notions of metaphysically and epistemo-logically adequate representations of the world and then to an explanation of can, causes, and knows in terms of a representation of the world by a system of interacting automata. A proposed resolution of the problem of freewill in a deterministic universe and of counterfactual conditional sentences is presented. The second part is mainly concerned with formalisms within which it can be proved that a strategy will achieve a goal. Concepts of situation, fluent, future operator, action, strategy, result of a strategy and knowledge are formalized. A method is given of constructing a sentence of first-order logic which will be true in all models of certain axioms if and only if a certain strategy will achieve a certain goal. The formalism of this paper represents an advance over McCarthy (1963) and Green (1969) in that it permits proof of the correctness of strategies that contain loops and strategies that involve the acquisition of knowledge; and it is also somewhat more concise. The third part discusses open problems in extending the formalism of part 2. The fourth part is a review of work in philosophical logic in relation to problems of artificial intelligence and a discussion of previous efforts to program ‘general intelligence’ from the point of view of this paper.

3,588 citations

Book ChapterDOI
29 Mar 2004
TL;DR: This work introduces a temporal logic of calls and returns (CaRet) for specification and algorithmic verification of correctness requirements of structured programs and presents a tableau construction that reduces the model checking problem to the emptiness problem for a Buchi pushdown system.
Abstract: Model checking of linear temporal logic (LTL) specifications with respect to pushdown systems has been shown to be a useful tool for analysis of programs with potentially recursive procedures. LTL, however, can specify only regular properties, and properties such as correctness of procedures with respect to pre and post conditions, that require matching of calls and returns, are not regular. We introduce a temporal logic of calls and returns (CaRet) for specification and algorithmic verification of correctness requirements of structured programs. The formulas of CaRet are interpreted over sequences of propositional valuations tagged with special symbols call and ret. Besides the standard global temporal modalities, CaRet admits the abstract-next operator that allows a path to jump from a call to the matching return. This operator can be used to specify a variety of non-regular properties such as partial and total correctness of program blocks with respect to pre and post conditions. The abstract versions of the other temporal modalities can be used to specify regular properties of local paths within a procedure that skip over calls to other procedures. CaRet also admits the caller modality that jumps to the most recent pending call, and such caller modalities allow specification of a variety of security properties that involve inspection of the call-stack. Even though verifying context-free properties of pushdown systems is undecidable, we show that model checking CaRet formulas against a pushdown model is decidable. We present a tableau construction that reduces our model checking problem to the emptiness problem for a Buchi pushdown system. The complexity of model checking CaRet formulas is the same as that of checking LTL formulas, namely, polynomial in the model and singly exponential in the size of the specification.

3,516 citations