What is the general scheme for finding weakest-precondition formulas?

When Hoare logic is extended for a language with pointer variables, such as PL2, syntactic substitution is no longer adequate for finding weakest-precondition formulas.

How many lines of TSL are used in the PowerPC instruction set?

The specification of the Intel x86 instruction set is about 2700 lines of TSL; the specification of the PowerPC instruction set is about 1200 lines.

What is the standard interpretation of the operators used in the PL2 semantics?

The standard interpretation of the operators used in the PL2 semantics is as follows:BValstd = BVal Valstd = Int32 Locstd = Int32η ∈ Envstd = Id→ Locstd ρ ∈ Storestd = Locstd → ValstdlookupStatestd = λ(η, ρ).

What is the meaning of the functions that operate on Terms, Formulas, and FuncExp?

∈ LogicalStruct = (Id→ Val)× (FuncId→ (Val→ Val))The types of the functions that operate on Terms, Formulas, and FuncExprs are as follows:const : CInt32 → Val condL : BVal→ Val→ Val→ VallookupId : LogicalStruct→ Id→ Val binopL : BinOpL → (Val× Val→ Val) relopL : RelOpL → (Val× Val→ BVal)boolopL : BoolOpL → (BVal× BVal→ BVal) lookupFuncId : LogicalStruct→ FuncId→ (Val→ Val)access : (Val→ Val)× Val)→

What is the weakest precondition of the program swap?

For the swap-code fragment shown in Fig. 1(a), repeated substitution and simplification shows that the weakest precondition of the program swap with respect to postcondition x = 2 is y = 2.

(Open Access) Symbolic Analysis via Semantic Reinterpretation (2009) | Junghee Lim

Q: What are the contributions mentioned in the paper "Symbolic analysis via semantic reinterpretation" ?

By “ symbolic program analysis ”, the authors mean logic-based techniques to analyze state changes along individual program paths. In this paper, the authors develop a method to create implementations of these primitives so that they can be made available easily for multiple programming languages—particularly for multiple machine-code instruction sets. In particular, the authors have created a system in which, for the cost of writing just one specification—of the semantics of the programming language of interest, in the form of an interpreter expressed in a functional language—one obtains automaticallygenerated implementations of all three symbolic-analysis functions. The authors show that this can be carried out even for programming languages with pointers, aliasing, dereferencing, and address arithmetic.

Q: How did the authors obtain implementations of the three symbolic-analysis functions?

Using TSL, the authors obtained automatically-generated implementations of all three symbolic-analysis functions from each of the specifications.

Q: Why have the authors delayed the discussion of generating extensions until 8?

For this reason, the authors have chosen to delay the discussion of generating extensions and partial-evaluation machinery until §8, and instead to base the discussion on the simpler principle of semantic reinterpretation.

Symbolic Analysis via Semantic Reinterpretation

Junghee Lim Akash Lal Thomas Reps

University of Wisconsin

{junghee, akash, reps}@cs.wisc.edu

Abstract

In recent years, the use of symbolic analysis in systems for testing

and verifying programs has experienced a resurgence. By “sym-

bolic program analysis”, we mean logic-based techniques to ana-

lyze state changes along individual program paths. The three ba-

sic primitives used in symbolic analysis are functions that perform

forward symbolic evaluation, weakest precondition, and symbolic

composition by manipulating formulas.

The conventional approach to implementing systems that use

symbolic analysis is to write each of the three symbolic-analysis

functions by hand for the programming language of interest. In

this paper, we develop a method to create implementations of these

primitives so that they can be made available easily for multiple

programming languages—particularly for multiple machine-code

instruction sets. In particular, we have created a system in which,

for the cost of writing just one speciﬁcation—of the semantics of

the programming language of interest, in the form of an interpreter

expressed in a functional language—one obtains automatically-

generated implementations of all three symbolic-analysis func-

tions. We show that this can be carried out even for programming

languages with pointers, aliasing, dereferencing, and address arith-

metic. The technique has been implemented, and used to automat-

ically generate symbolic-analysis primitives for multiple machine-

code instruction sets.

1. Introduction

This paper presents new ways to create implementations of the ba-

sic primitives used in certain kinds of veriﬁcation and testing tools

that are based on symbolic program analysis. By “symbolic pro-

gram analysis”, we mean logic-based techniques to analyze state

changes along individual program paths.

The basic primitives used

in symbolic analysis are functions that perform forward symbolic

evaluation, weakest precondition, and symbolic composition by

manipulating formulas.

The conventional approach to implementing systems that use

symbolic analysis is to write each of the three symbolic-analysis

functions by hand for the programming language of interest (which

This is in contrast to the situation addressed by many abstract-

interpretation/dataﬂow-analysis techniques, which usually consider the

problem of analyzing the effects of a collection of program paths—e.g.,

to identify program invariants.

[Copyright notice will appear here once ’preprint’ option is removed.]

we call the subject language).

Our goal is to develop a method to

create implementations of symbolic-analysis primitives easily, so

that they can be made available for different subject languages—

particularly for different machine-code instruction sets. Such in-

struction sets typically have (i) several hundred instructions, (ii) a

variety of architecture-speciﬁc features that are incompatible with

other architectures, and (iii) the ability to perform address arith-

metic and dereferencing of addresses, which means that memory

states can have complicated aliasing patterns. Moreover, most in-

struction sets have evolved over time, so that each instruction-set

family has a bewildering number of variants.

Consequently, our

goal is to generate implementations of such primitives automat-

ically from a speciﬁcation of the subject language’s concrete se-

mantics.

Semantic reinterpretation. Our approach is based on factoring

the concrete semantics of a language into two parts: (i) a client

speciﬁcation, and (ii) a semantic core. The interface to the core

consists of certain base types, function types, and operators (some-

times called a semantic algebra [27]), and the client is expressed

in terms of this interface. This organization permits the core to be

reinterpreted to produce an alternative semantics for the subject

language.

Semantic reinterpretation for abstract interpretation. The idea

of exploiting such a factoring comes from the ﬁeld of abstract in-

terpretation [7], where factoring-plus-reinterpretation has been pro-

posed as a convenient tool for formulating abstract interpretations

and proving them to be sound [23, 24, 21]. In particular, soundness

of the entire abstract semantics can be established via purely local

soundness arguments for each of the reinterpreted operators. (An

example of semantic reinterpretation for abstract interpretation is

presented in §2.)

Semantic reinterpretation for symbolic analysis. This paper

presents a new application for semantic reinterpretation, namely,

to create implementations of the basic primitives used in symbolic

program analysis.

In recent years, the use of symbolic analysis in systems for test-

ing and verifying programs has experienced a resurgence because

of the power that they provide in exploring a program’s state space.

Semantic reinterpretation is a program-generation technique, and thus

we follow the terminology of the partial-evaluation literature [16], where

the program on which the partial evaluator operates is called the subject

program. (§3.2 and §8 discuss the connections between our approach and

partial evaluation.)

In logic and linguistics, the programming language would be called the

“object language”. We avoid that terminology because of possible confu-

sion in §7, which discusses the application of semantic reinterpretation to

machine-language programs. In the compiler literature, an object program

is a machine-code program produced by a compiler.

See http://en.wikipedia.org/wiki/{X86,ARM

architecture,PowerPC}. For

instance, the article about ARM lists 18 different architectural versions.

1 2008/7/22

•

Model-checking tools, such as SLAM [1] and BLAST [14], as

well as hybrid concrete/symbolic program-exploration tools,

such as DART [10], CUTE [28], YOGI [13], SAGE [11],

BITSCOPE [5], and DASH [2] use forward symbolic evalua-

tion, weakest precondition, or both.

Symbolic evaluation can be used to create path formulas.

When it is possible that a path π being analyzed might not

be executable, a call on an SMT solver to determine whether

π’s path formula is satisﬁable can be used to decide whether

π is executable, and if so, to generate inputs that drive the

program down π. Weakest precondition can be used to create

new predicates that split part of a program’s state space [1, 13,

2].

•

Bug-ﬁnding tools, such as ARCHER [32] and SATURN [31],

as well as commercial bug-ﬁnding products, such as Coverity’s

PREVENT [8] and GrammaTech’s CODESONAR [12] use sym-

bolic composition.

Formulas are used to summarize a portion of the behavior of a

procedure. Suppose that procedure P calls Q at call-site c, and

that r is the site in P to which control returns after the call at c.

When c is encountered during the exploration of P , such tools

perform the symbolic composition of the formula that expresses

the behavior along the path [entry

, . . . , c] explored in P with

the formula that captures the behavior of Q to obtain a formula

that expresses the behavior along the path [entry

, . . . , r].

The aforementioned systems apply symbolic analysis to programs

written in languages with pointers, aliasing, dereferencing, and ad-

dress arithmetic. This paper demonstrates that the reinterpretation

technique provides a way to create symbolic-analysis primitives for

such languages.

As mentioned earlier, our motivation is to be able to cre-

ate implementations of symbolic-analysis primitives for multiple

machine-code instruction sets (including multiple variants of a

given machine-code instruction set). However, our work is also

useful for creating tools to analyze high-level-language programs,

starting from source code. Moreover, most of the principles that

we make use of can be explained using two variants of a simple

high-level language: PL

, deﬁned in §4, and PL

, deﬁned in §6.

For this reason, the paper is couched in terms of high-level lan-

guages up until §7, which discusses an idealized machine-code

language, MC. This has the beneﬁt of making the paper accessi-

ble to a wider number of readers, but might cause readers who are

mainly familiar with analysis techniques for C, C++, C#, or Java

to under-appreciate the beneﬁts that one obtains from our approach

when creating machine-code-analysis tools.

Three for the price of one! In §8, we describe how, using

binding-time analysis [16] and a two-level intermediate language

[25], the reinterpretation technique can be used to generate im-

plementations of symbolic-analysis primitives automatically, using

a meta-system that generates program-analysis components from

a speciﬁcation of the subject language’s semantics. In particular,

we have created a system in which, for the cost of writing just

one speciﬁcation—of the semantics of the programming language

of interest, in the form of an interpreter expressed in a functional

language—one obtains automatically-generated implementations

of all three symbolic-analysis functions. We show that this can be

carried out even for programming languages with pointers, alias-

ing, dereferencing, and address arithmetic.

This has been achieved using the TSL system [20], and the im-

plementation has been used to generate symbolic-analysis primi-

tives for multiple machine-code instruction sets. TSL

consists of

TSL stands for “Transformer Speciﬁcation Language”.

(i) a language for specifying the concrete semantics of a machine-

code instruction set (i.e., a collection of concrete-state transform-

ers), (ii) a mechanism to create implementations of different ab-

stract interpretations easily by reinterpreting the TSL base types,

function types, and operators, and (iii) a run-time system to sup-

port the (re-)interpretation and analysis of executables written in

that instruction set.

Moreover, with TSL each reinterpretation is deﬁned at the meta-

level, by reinterpreting the collection of TSL base types, function

types, and operators. When a reinterpretation is performed in this

way, it is independent of any given subject language. Consequently,

with our implementation, all three of the symbolic-analysis prim-

itives can be generated automatically for every instruction set for

which one has a TSL speciﬁcation.

The contributions of the paper can be summarized as follows:

•

From the conceptual standpoint, we present a new application

for semantic reinterpretation. In particular, the paper shows how

semantic reinterpretation can be applied to create analysis func-

tions that compute formulas for forward symbolic evaluation,

weakest precondition, and symbolic composition (§5.1, §5.2,

and §5.3, respectively).

•

From the systems-building perspective, we show that this ob-

servation has algorithmic content: the paper describes how we

created a meta-system that, given an interpreter that speciﬁes a

subject language’s concrete semantics, uses binding-time analy-

sis, a two-level intermediate language, and semantic reinterpre-

tation to automatically generate implementations of all three of

symbolic-analysis primitives, for every instruction set for which

one has a TSL speciﬁcation (§8).

•

We demonstrate that semantic reinterpretation can handle lan-

guages with pointers, aliasing, dereferencing, and address arith-

metic (§3, §6, and §7). In particular, in §3 and §6.4.1, we show

how reinterpretation can automatically generate a weakest-

precondition primitive that implements Morris’s rule of sub-

stitution for a language with pointer variables [22].

•

§6.4.2 shows how the semantic-reinterpretation approach can

also generate a weakest-precondition primitive that implements

the pure substitution-based approach of Cartwright and Oppen

[6] (again for a language with pointer variables). This provides

insight on how Morris’s rule and Cartwright and Oppen’s rule

are related: both are based on substitution; the difference is

merely the degree of algebraic simpliﬁcation that is performed.

Organization. §2 presents the basic principles of semantic rein-

terpretation by means of an example in which reinterpretation is

used to create abstract transformers for abstract interpretation. §3

provides an overview of our techniques and the results obtained

with the symbolic-analysis primitives that are created by seman-

tic reinterpretation. §4 deﬁnes the logic that we use, as well as the

programming languages PL

. §5 discusses how to use reinterpreta-

tion to obtain the primitives for forward symbolic evaluation, weak-

est precondition, and symbolic composition. §6 deﬁnes PL

, which

includes pointer variables and dereferencing, and shows how the

weakest-precondition operation that is obtained automatically via

semantic reinterpretation implements Morris’s rule of substitution.

§7 introduces a simpliﬁed machine-code language, which includes

address arithmetic and dereferencing, and shows that the reinter-

pretation technique applies at the machine-code level, as well. §8

describes how these ideas are implemented using the TSL system

[20]. §9 discusses related work. (Proofs of two lemmas appear in

App. A.)

2 2008/7/22

: x = x ⊕ y;

: y = x ⊕ y;

: x = x ⊕ y;

: ∗px = ∗px ⊕ ∗py;

: ∗py = ∗px ⊕ ∗py;

: ∗px = ∗px ⊕ ∗py;

(a) (b)

Figure 1. (a) Code fragment that swaps two ints; (b) (buggy) code

fragment that swaps two ints using pointers.

2. Semantic Reinterpretation for Abstract

Interpretation

To illustrate factoring-plus-reinterpretation in the context of ab-

stract interpretation, and as a warm-up exercise for the rest of the

paper, this section presents the basic principle of semantic reinter-

pretation using a simple example in which, both the concrete se-

mantics, for a language of assignment statements, and an abstract

sign-analysis semantics are deﬁned via semantic reinterpretation.

Example 2.1. [Adapted from [21].] Consider the following frag-

ment of a denotational semantics, which deﬁnes the meaning of

assignment statements over variables that hold signed 32-bit int

values (where ⊕ denotes exclusive-or):

I ∈ Id E ∈ Expr ::= I | E

⊕ E

| . . .

S ∈ Stmt ::= I = E; σ ∈ State = Id → Int32

E : Expr → State → Int32

EJIKσ = σI EJE

⊕ E

Kσ = EJE

Kσ ⊕ EJE

Kσ

I : Stmt → State → State

IJI = E;Kσ = σ[I 7→ EJEKσ]

This speciﬁcation can be factored into client and core speciﬁcations

by introducing a domain Val, as well as operators xor, lookup, and

store. The client speciﬁcation is deﬁned by

xor : Val → Val → Val

lookup : State → Id → Val

store : State → Id → Val → State

E : Expr → State → Val

EJIKσ = lookup σ I EJE

⊕ E

Kσ = EJE

Kσ xor EJE

Kσ

I : Stmt → State → State

IJI = E;Kσ = store σ I EJEKσ

For the concrete (or “standard”) semantics, the semantic core is

deﬁned by

v ∈ Val

std

= Int32

State

std

= Id → Val

xor

std

= λv

.λv

⊕ v

lookup

std

= λσ.λI.σI

store

std

= λσ.λI.λv.σ[I 7→ v]

Different abstract interpretations can be deﬁned by using the same

client semantics, but giving a different interpretation of the base

types, function types, and operators of the core. For example, for

sign analysis, the semantic core is reinterpreted as follows:

v ∈ Val

abs

= {neg, zero, pos}

State

abs

= Id → Val

abs

xor

abs

= λv

.λv

neg zero pos >

neg > neg neg >

zero neg zero pos >

pos neg pos > >

> > > > >

lookup

abs

= λσ.λI.σI

store

abs

= λσ.λI.λv.σ[I 7→ v]

For instance, for the code fragment shown in Fig. 1, which

swaps two ints, sign-analysis reinterpretation creates abstract

transformers that, given the initial abstract state σ

= {x 7→

neg, y 7→ pos}, produce the following abstract states:

:= {x 7→ neg, y 7→ pos}

:= IJs

: x = x ⊕ y;Kσ

= store

abs

x (neg xor

abs

pos)

= {x 7→ neg, y 7→ pos}

:= IJs

: y = x ⊕ y;Kσ

= store

abs

y (neg xor

abs

pos)

= {x 7→ neg, y 7→ neg}

:= IJs

: x = x ⊕ y;Kσ

= store

abs

x (neg xor

abs

neg)

= {x 7→ >, y 7→ neg}

3. Overview

This section presents intuition about some of the elements that are

used in our work, and provides an overview of how it is possible

to automatically generate the three symbolic-analysis primitives.

§3.1 deﬁnes a stripped-down version of a logic L that is sufﬁcient

for the discussion in this section. (The full logic is deﬁned in

§4.1.) §3.2 presents examples of semantic reinterpretation applied

to forward symbolic evaluation; §3.3 discusses issues relevant to

weakest precondition and symbolic composition. We use the two

swap-code fragments shown in Fig. 1 as a running example.

Because tools that check path feasibility (`a la SLAM [1]) or per-

form path exploration (`a la DART [10], CUTE [28], SAGE [11],

and DASH [2]) only analyze traces, we can concentrate on non-

branching statement sequences. For this reason, our programming-

language deﬁnitions contain only assignment statements and state-

ment sequences, and do not have either if-then-else statements or

loop constructs.

3.1 A Simple Logic

The syntax of L is deﬁned as follows:

I ∈ Id, T ∈ Term, ϕ ∈ Formula

F ∈ FuncId, FE ∈ FuncExpr, U ∈ FOUpdate

T ::= I | T

⊕ T

| FE(T )

ϕ ::= T

= T

| ϕ

&& ϕ

| . . .

FE ::= F | FE

7→ T

]

U ::= ({I

←- T

}, {F

←- FE

})

Names of the form F ∈ FuncId, possibly with subscripts and/or

primes, are function symbols. We distinguish the xor constructor of

L from the programming-language xor (§2) by putting the former

in a box. A FuncExpr of the form FE

7→ T

] denotes a

function-update expression.

An expression of the form ({I

←- T

}, {F

←- FE

}) is called

a structure-update expression. The subscripts i and j implicitly

range over certain index sets, which will be omitted to reduce clut-

ter. To emphasize that I

and F

refer to next-state quantities, we

sometimes write structure-update expressions with primes: ({I

←-

}, {F

←- FE

}). (Also, if a component has only a singleton set,

we omit the set brackets.) {I

←- T

} speciﬁes the updates to the

constants and {F

←- FE

} speciﬁes the updates to the functions.

Thus, a structure-update expression ({I

←- T

}, {F

←- FE

})

can be thought of as a kind of restricted 2-vocabulary (i.e., 2-state)

For numbers represented in two’s complement notation,

pos xor

abs

neg = neg xor

abs

pos = neg

because, for all combinations of values represented by pos and neg, the

sign bit of the result is set, which means that the result is guaranteed to be

negative. However,

pos xor

abs

pos = neg xor

abs

neg = >

because the concrete result could beeither 0 or positive, and zerotpos = >.

3 2008/7/22

formula

= T

) ∧

= FE

We deﬁne U

to be

({I ←- I | I ∈ Id}, {F ←- F | F ∈ FuncId}).

Example 3.1. In §5, we work with a simple high-level language,

, that only has int-valued variables. (PL

is the language

from §2, extended with some additional kinds of expressions.) In

§6, we introduce PL

, which extends PL

with pointers. Here we

conﬁne ourselves to sketching how the semantics of various kinds

of assignment statements can be expressed in L[PL

] and L[PL

•

In PL

, a state σ ∈ State is a map Id → Int32. This is modeled

in L[PL

] by using a constant c

∈ Id for each PL

identiﬁer

x. (However, to reduce clutter, we will merely use x for such

constants instead of c

•

In PL

, a state σ is a pair (η, ρ), where, environment η ∈ Env =

Id → Loc maps identiﬁers to their associated locations and

store ρ ∈ Store = Loc → Int32 maps each location to the

value that it holds. (Loc stands for locations—e.g., memory

addresses—and we identify Loc with the set Int32 of values.)

This is modeled in L[PL

] by using a function symbol F

for

store ρ, and a constant symbol c

∈ Id for each PL

identiﬁer

x. (Again, to reduce clutter, we will use x for such constants

instead of c

.) The constants and their values correspond to the

environment η.

The following table illustrates how the semantics of a few assign-

ment statements are expressed as L[PL

] and L[PL

] structure-

update expressions:

L[PL

]

x = 17; (x

←- 17, ∅)

x = y; (x

←- y, ∅)

L[PL

]

x = 17; (∅, F

←- F

[x 7→ 17])

x = y; (∅, F

←- F

[x 7→ F

(y)])

x = ∗q; (∅, F

←- F

[x 7→ F

(q))])

The semantics of L is deﬁned in terms of a logical structure,

which gives meaning to the Id and FuncId symbols of the logic’s

vocabulary.

ι ∈ LogicalStruct = (Id → Int32) × (FuncId → (Int32 → Int32))

We use (ι↑1) and (ι↑2) to denote the ﬁrst and second components

of ι, respectively. (ι↑1) assigns meanings to constant symbols;

(ι↑2) assigns meanings to function symbols.

T : Term → LogicalStruct → Int32

T JIKι = (ι↑1) I

T JT

⊕ T

Kι = T JT

Kι ⊕ T JT

Kι

T JFE(T )Kι = (FEJFEKι)(T JT

Kι)

F : Formula → LogicalStruct → Bool

FJT

= T

Kι = T JT

Kι

FJϕ

&& ϕ

Kι = FJϕ

Kι ∧ FJϕ

Kι

FE : FuncExpr → LogicalStruct → (Int32 → Int32)

FEJF Kι = (ι↑2) F

FEJFE

7→ T

]Kι = (FEJFE

Kι)[(T JT

Kι) 7→ (T JT

Kι)]

U : FOUpdate → LogicalStruct → LogicalStruct

UJ({I

←- T

}, {F

←- FE

})Kι

= ((ι↑1)[I

7→ T JT

Kι], (ι↑2)[F

7→ FEJFE

Kι])

:= ({x ←- x, y ←- y}, ∅)

IJx = x ⊕ y;KU

= ({x ←- (EJxKU

⊕ EJyKU

), y ←- y}, ∅)

= ({x ←- (x

⊕ y), y ←- y}, ∅)

= U

IJy = x ⊕ y;KU

= ({x ←- (x ⊕ y), y ←- (EJxKU

⊕ EJyKU

)}, ∅)

= ({x ←- (x

⊕ y), y ←- ((x ⊕ y) ⊕ y)}, ∅)

= ({x ←- (x

⊕ y), y ←- x}, ∅)

= U

IJx = x ⊕ y;KU

= ({x ←- (EJxKU

⊕ EJyKU

), y ←- x}, ∅)

= ({x ←- ((x

⊕ y) ⊕ x), y ←- x}, ∅)

= ({x ←- y, y ←- x}, ∅)

= U

Figure 2. Symbolic execution of Fig. 1(a) via semantic reinterpre-

tation, starting with the FOUpdate U

= ({x ←- x, y ←- y}, ∅).

Note how the meaning of a structure-update expression is a func-

tion that maps a pre-state logical structure ι to a post-state logical

structure: {I

←- T

} speciﬁes the updates to the constants and

←- FE

} speciﬁes the updates to the functions.

3.2 Symbolic Evaluation via Reinterpretation

A primitive for forward symbolic-evaluation must solve the follow-

ing problem:

Given the semantic deﬁnition of a programming language,

together with a speciﬁc programming-language statement

(or instruction) s, create a logical formula that captures the

semantics of s.

To apply semantic reinterpretation to this problem, we use formu-

las of logic L as a reinterpretation domain for the semantic core of

. The base types and the state type of the semantic core are rein-

terpreted as follows (our convention is to mark each reinterpreted

base type, function type, and operator with an overbar):

Val = Term BVal = Formula State = FOUpdate

The operators used in the factored versions of PL

’s meaning func-

tions E, B, and I are reinterpreted over these domains; in particular,

operations that are used in the PL

semantics—e.g., xor—are inter-

preted as syntactic constructors of L[PL

] expressions—e.g.,

⊕ .

By extension, this produces reinterpreted meaning functions

E, B,

and

I with the types listed below:

Standard Reinterpreted

E: Expr → State → Val

: Expr → FOUpdate → Term

B: BoolExpr → State → BVal

: BoolExpr → FOUpdate → Formula

I: Stmt → State → State

: Stmt → FOUpdate → FOUpdate

The reinterpreted function I translates a statement s of PL

to a

phrase in logic L[PL

Example 3.2. The steps of symbolic execution of Fig. 1(a) via se-

mantic reinterpretation, starting with the FOUpdate U

= ({x ←-

x, y ←- y}, ∅) are shown in Fig. 2. The ﬁnal FOUpdate U

can be

considered to be the 2-vocabulary formula

= y) ∧ (y

= x).

This expresses a state change in which the values of program

variables x and y are swapped. 2

Algebraic simpliﬁcation of the resulting terms and formulas

also plays an important role. The simpliﬁcation techniques that we

4 2008/7/22

use are similar to ones used by others, such as the preprocessing

steps used in decision procedures (e.g., the ite-lifting and read-over-

write transformations for operations on functions [29, 9, 17]).

We assume that the reinterpreted

⊕ performs bit-vector sim-

pliﬁcation according to the algebraic laws for xor. For example,

when y is updated in U

by y ←- ((x

⊕ y) ⊕ y) (see Fig. 2),

this is simpliﬁed to y ←- x. We assume that the other bit-vector,

relational, and Boolean constructors of the logic behave similarly.

Relationship to partial evaluation. In general, the semantic def-

inition of an imperative programming language is a meaning func-

tion I with type I : Stmt × State → State. Given our goal, namely,

Given the semantic deﬁnition of a programming language,

I : Stmt × State → State, together with a speciﬁc

programming-language statement (or instruction) s ∈ Stmt,

create a logical formula that captures the semantics of s.

it is not surprising that partial-evaluation techniques come into play.

In essence, we wish to partially evaluate I with respect to Stmt

s, while at the same time translating to L. Semantic reinterpretation

permits us to do this: Let U

be the FOUpdate

IJsKU

. Then U

the partial evaluation of I with respect to s, translated to logic.

We show in §5.1 that U

has the desired semantics. Note

that to model PL

programs in L[PL

], we do not require any

function symbols. Thus, a PL

state σ can be identiﬁed with

the LogicalStruct (σ, ∅).

In §5.1, we show that for all ι ∈

LogicalStruct, evaluating U

is equivalent to running I on s—i.e.,

((UJU

Kι)↑1) = IJsK(ι↑1) (see Cor. 5.3).

In our implementation, discussed in §8, the TSL system is sup-

plied with a TSL program for the meaning function I, and the

way that it performs semantic reinterpretation is to create a kind

of generating extension [16] I-gen for I.

The full explanation is

complicated by the number of language levels involved when the

partial-evaluation machinery is included in the discussion. For this

reason, we have chosen to delay the discussion of generating exten-

sions and partial-evaluation machinery until §8, and instead to base

the discussion on the simpler principle of semantic reinterpretation.

This has beneﬁts and drawbacks:

•

The beneﬁt is that the explanation is simpler, and could also be

useful for direct hand implementation when a meta-system such

as TSL is not available.

•

The drawback is that in some of the sections before §8 it may

appear that many steps perform rather trivial transliteration of

expressions from programming language PL

into expressions

of the corresponding logic L[PL

]. In part, this is an artifact

of trying to present the method in an easy-to-digest manner; in

part, it mimics the behavior of a generating extension: copying

(or transliterating) the appropriate residual expression is one

of the principles of “writing a generating extension by hand”

[3, 18].

3.3 Other Symbolic-Analysis Operations

For weakest precondition and symbolic composition, we again use

L[·] as a reinterpretation domain; however, there is a trick: in con-

Similarly, for PL

a State σ = (η, ρ) can be identiﬁed with the

LogicalStruct (η, [F

7→ ρ]).

If p is a two-input program, then p-gen is any program with the property

that for every input pair a and b,

Jp-genK(a) = p

, where Jp

K(b) = JpK(a, b).

Thus, I-gen is a program such that for every statement s and State σ,

JI-genK(s) = I

, where JI

K(σ) = JIK(s, σ).

trast with what is done to generate symbolic-evaluation primitives,

we use the FOUpdate type of L[·] to reinterpret the meaning func-

tions U, FE, F, and T of L[·] itself! The general scheme is outlined

in the following table:

Meaning Type Replacement Function created

function(s) reinterpreted type

I, E, B State FOUpdate Symbolic evaluation

F, T LogicalStruct FOUpdate Weakest precondition

U, FE, F, T LogicalStruct FOUpdate Symbolic composition

To keep things simple in §3.2, we did not present the semantics

of L[·] in factored form (see §4.1). Thus, the discussion in the rest

of this section merely surveys a few of the results that are obtained

by the techniques presented in later sections.

Weakest precondition. The weakest (liberal) precondition

WLP(s, ϕ) characterizes the set of states σ such that the execu-

tion of s starting in σ either fails to terminate or results in a state

such that ϕ(σ

) holds. For a language like PL

, which only has

int-valued variables, the WLP of a postcondition (speciﬁed by

formula ϕ) with respect to an assignment statement var = rhs;

can be expressed as the formula obtained by substituting rhs for all

(free) occurrences of var in ϕ: ϕ[var ← rhs].

For the swap-code fragment shown in Fig. 1(a), repeated sub-

stitution and simpliﬁcation shows that the weakest precondition of

the program swap with respect to postcondition x = 2 is y = 2.

(This will be derived using semantic reinterpretation in §5.2.)

Complications from pointers. When Hoare logic is extended for

a language with pointer variables, such as PL

, syntactic substi-

tution is no longer adequate for ﬁnding weakest-precondition for-

mulas. For instance, suppose that we are interested in ﬁnding a

formula for the WLP of postcondition x = 5 with respect to

∗p = e;. This cannot be accomplished merely by performing the

substitution (x = 5)[∗p ← e]: the substitution yields the formula

x = 5, whereas the WLP depends on the execution context in

which ∗p = e; is evaluated:

•

If p points to x, then the WLP formula should be e = 5.

•

If p does not point to x, then the WLP formula should be

x = 5.

In this case, the WLP formula can be expressed informally as

(p = &x) ? (e = 5) : (x = 5).

Example 3.3. In §5.2, such formulas are expressed as shown below

on the right.

Informal Formal

Query WLP(∗p = e, x = 5) WLP(∗p = e, F

(x) = 5)

Result (p = &x) ? (e = 5) : (x = 5)

ite(F

(p) = x,

(e)

= 5,

(x)

= 5)

For a program fragment that involves multiple pointer variables,

the WLP formula may have to take into account all possible

aliasing combinations. One of the most important features of our

approach is its ability to create correct implementations of Morris’s

rule of substitution [22] automatically—and basically for free.

Symbolic analysis of machine code.

Example 3.4. Fig. 4(a) shows a source-code fragment; Fig. 4(b)

shows the corresponding assembly code. To simplify the discus-

sion, the source-level variables are used in the assembly code in-

stead of having operations to access variable locations based on

their frame-pointer-relative offsets in the activation record.

5 2008/7/22

Symbolic Analysis via Semantic Reinterpretation

Figures

Citations

Directed proof generation for machine code

Static Analysis of x86 Executables

There's plenty of room at the bottom: analyzing and verifying machine code

Symbolic analysis via semantic reinterpretation

MCDASH: Refinement Based Property Verification for Machine Code

References

Z3: an efficient SMT solver

Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

An Axiomatic Basis for Computer Programming

An axiomatic basis for computer programming

DART: directed automated random testing

Related Papers (5)

Advanced Symbolic Analysis for Compilers: New Techniques and Algorithms for Symbolic Program Analysis and Optimization

Symbolic Computation via Program Transformation

Visualizing Unbounded Symbolic Execution

A symbolic manipulator for automated verification of reactive systems with heterogeneous data types

Symbolic Arrays in Symbolic PathFinder

Frequently Asked Questions (8)

Q1. What are the contributions mentioned in the paper "Symbolic analysis via semantic reinterpretation" ?

Q2. What is the general scheme for finding weakest-precondition formulas?

Q3. How did the authors obtain implementations of the three symbolic-analysis functions?

Q4. How many lines of TSL are used in the PowerPC instruction set?

Q5. What is the standard interpretation of the operators used in the PL2 semantics?

Q6. Why have the authors delayed the discussion of generating extensions until 8?

Q7. What is the meaning of the functions that operate on Terms, Formulas, and FuncExp?

Q8. What is the weakest precondition of the program swap?