scispace - formally typeset
Open AccessBook ChapterDOI

Symbolic Analysis via Semantic Reinterpretation

Reads0
Chats0
TLDR
A novel technique is presented to create a system in which, for the cost of writing just one specification, an interpreter for the programming language of interest obtains automatically-generated, mutually-consistent implementations of all three symbolic-analysis primitives.
Abstract
The paper presents a novel technique to create implementations of the basic primitives used in symbolic program analysis: forward symbolic evaluation , weakest liberal precondition , and symbolic composition . We used the technique to create a system in which, for the cost of writing just one specification--an interpreter for the programming language of interest--one obtains automatically-generated, mutually-consistent implementations of all three symbolic-analysis primitives. This can be carried out even for languages with pointers and address arithmetic. Our implementation has been used to generate symbolic-analysis primitives for x86 and PowerPC.

read more

Content maybe subject to copyright    Report

Symbolic Analysis via Semantic Reinterpretation
Junghee Lim Akash Lal Thomas Reps
University of Wisconsin
{junghee, akash, reps}@cs.wisc.edu
Abstract
In recent years, the use of symbolic analysis in systems for testing
and verifying programs has experienced a resurgence. By “sym-
bolic program analysis”, we mean logic-based techniques to ana-
lyze state changes along individual program paths. The three ba-
sic primitives used in symbolic analysis are functions that perform
forward symbolic evaluation, weakest precondition, and symbolic
composition by manipulating formulas.
The conventional approach to implementing systems that use
symbolic analysis is to write each of the three symbolic-analysis
functions by hand for the programming language of interest. In
this paper, we develop a method to create implementations of these
primitives so that they can be made available easily for multiple
programming languages—particularly for multiple machine-code
instruction sets. In particular, we have created a system in which,
for the cost of writing just one specification—of the semantics of
the programming language of interest, in the form of an interpreter
expressed in a functional language—one obtains automatically-
generated implementations of all three symbolic-analysis func-
tions. We show that this can be carried out even for programming
languages with pointers, aliasing, dereferencing, and address arith-
metic. The technique has been implemented, and used to automat-
ically generate symbolic-analysis primitives for multiple machine-
code instruction sets.
1. Introduction
This paper presents new ways to create implementations of the ba-
sic primitives used in certain kinds of verification and testing tools
that are based on symbolic program analysis. By “symbolic pro-
gram analysis”, we mean logic-based techniques to analyze state
changes along individual program paths.
1
The basic primitives used
in symbolic analysis are functions that perform forward symbolic
evaluation, weakest precondition, and symbolic composition by
manipulating formulas.
The conventional approach to implementing systems that use
symbolic analysis is to write each of the three symbolic-analysis
functions by hand for the programming language of interest (which
1
This is in contrast to the situation addressed by many abstract-
interpretation/dataflow-analysis techniques, which usually consider the
problem of analyzing the effects of a collection of program paths—e.g.,
to identify program invariants.
[Copyright notice will appear here once ’preprint’ option is removed.]
we call the subject language).
2
Our goal is to develop a method to
create implementations of symbolic-analysis primitives easily, so
that they can be made available for different subject languages—
particularly for different machine-code instruction sets. Such in-
struction sets typically have (i) several hundred instructions, (ii) a
variety of architecture-specific features that are incompatible with
other architectures, and (iii) the ability to perform address arith-
metic and dereferencing of addresses, which means that memory
states can have complicated aliasing patterns. Moreover, most in-
struction sets have evolved over time, so that each instruction-set
family has a bewildering number of variants.
3
Consequently, our
goal is to generate implementations of such primitives automat-
ically from a specification of the subject language’s concrete se-
mantics.
Semantic reinterpretation. Our approach is based on factoring
the concrete semantics of a language into two parts: (i) a client
specification, and (ii) a semantic core. The interface to the core
consists of certain base types, function types, and operators (some-
times called a semantic algebra [27]), and the client is expressed
in terms of this interface. This organization permits the core to be
reinterpreted to produce an alternative semantics for the subject
language.
Semantic reinterpretation for abstract interpretation. The idea
of exploiting such a factoring comes from the field of abstract in-
terpretation [7], where factoring-plus-reinterpretation has been pro-
posed as a convenient tool for formulating abstract interpretations
and proving them to be sound [23, 24, 21]. In particular, soundness
of the entire abstract semantics can be established via purely local
soundness arguments for each of the reinterpreted operators. (An
example of semantic reinterpretation for abstract interpretation is
presented in §2.)
Semantic reinterpretation for symbolic analysis. This paper
presents a new application for semantic reinterpretation, namely,
to create implementations of the basic primitives used in symbolic
program analysis.
In recent years, the use of symbolic analysis in systems for test-
ing and verifying programs has experienced a resurgence because
of the power that they provide in exploring a program’s state space.
2
Semantic reinterpretation is a program-generation technique, and thus
we follow the terminology of the partial-evaluation literature [16], where
the program on which the partial evaluator operates is called the subject
program. (§3.2 and §8 discuss the connections between our approach and
partial evaluation.)
In logic and linguistics, the programming language would be called the
“object language”. We avoid that terminology because of possible confu-
sion in §7, which discusses the application of semantic reinterpretation to
machine-language programs. In the compiler literature, an object program
is a machine-code program produced by a compiler.
3
See http://en.wikipedia.org/wiki/{X86,ARM
architecture,PowerPC}. For
instance, the article about ARM lists 18 different architectural versions.
1 2008/7/22

Model-checking tools, such as SLAM [1] and BLAST [14], as
well as hybrid concrete/symbolic program-exploration tools,
such as DART [10], CUTE [28], YOGI [13], SAGE [11],
BITSCOPE [5], and DASH [2] use forward symbolic evalua-
tion, weakest precondition, or both.
Symbolic evaluation can be used to create path formulas.
When it is possible that a path π being analyzed might not
be executable, a call on an SMT solver to determine whether
πs path formula is satisfiable can be used to decide whether
π is executable, and if so, to generate inputs that drive the
program down π. Weakest precondition can be used to create
new predicates that split part of a program’s state space [1, 13,
2].
Bug-finding tools, such as ARCHER [32] and SATURN [31],
as well as commercial bug-finding products, such as Coverity’s
PREVENT [8] and GrammaTech’s CODESONAR [12] use sym-
bolic composition.
Formulas are used to summarize a portion of the behavior of a
procedure. Suppose that procedure P calls Q at call-site c, and
that r is the site in P to which control returns after the call at c.
When c is encountered during the exploration of P , such tools
perform the symbolic composition of the formula that expresses
the behavior along the path [entry
P
, . . . , c] explored in P with
the formula that captures the behavior of Q to obtain a formula
that expresses the behavior along the path [entry
P
, . . . , r].
The aforementioned systems apply symbolic analysis to programs
written in languages with pointers, aliasing, dereferencing, and ad-
dress arithmetic. This paper demonstrates that the reinterpretation
technique provides a way to create symbolic-analysis primitives for
such languages.
As mentioned earlier, our motivation is to be able to cre-
ate implementations of symbolic-analysis primitives for multiple
machine-code instruction sets (including multiple variants of a
given machine-code instruction set). However, our work is also
useful for creating tools to analyze high-level-language programs,
starting from source code. Moreover, most of the principles that
we make use of can be explained using two variants of a simple
high-level language: PL
1
, defined in §4, and PL
2
, defined in §6.
For this reason, the paper is couched in terms of high-level lan-
guages up until §7, which discusses an idealized machine-code
language, MC. This has the benefit of making the paper accessi-
ble to a wider number of readers, but might cause readers who are
mainly familiar with analysis techniques for C, C++, C#, or Java
to under-appreciate the benefits that one obtains from our approach
when creating machine-code-analysis tools.
Three for the price of one! In §8, we describe how, using
binding-time analysis [16] and a two-level intermediate language
[25], the reinterpretation technique can be used to generate im-
plementations of symbolic-analysis primitives automatically, using
a meta-system that generates program-analysis components from
a specification of the subject language’s semantics. In particular,
we have created a system in which, for the cost of writing just
one specification—of the semantics of the programming language
of interest, in the form of an interpreter expressed in a functional
language—one obtains automatically-generated implementations
of all three symbolic-analysis functions. We show that this can be
carried out even for programming languages with pointers, alias-
ing, dereferencing, and address arithmetic.
This has been achieved using the TSL system [20], and the im-
plementation has been used to generate symbolic-analysis primi-
tives for multiple machine-code instruction sets. TSL
4
consists of
4
TSL stands for Transformer Specification Language”.
(i) a language for specifying the concrete semantics of a machine-
code instruction set (i.e., a collection of concrete-state transform-
ers), (ii) a mechanism to create implementations of different ab-
stract interpretations easily by reinterpreting the TSL base types,
function types, and operators, and (iii) a run-time system to sup-
port the (re-)interpretation and analysis of executables written in
that instruction set.
Moreover, with TSL each reinterpretation is defined at the meta-
level, by reinterpreting the collection of TSL base types, function
types, and operators. When a reinterpretation is performed in this
way, it is independent of any given subject language. Consequently,
with our implementation, all three of the symbolic-analysis prim-
itives can be generated automatically for every instruction set for
which one has a TSL specification.
The contributions of the paper can be summarized as follows:
From the conceptual standpoint, we present a new application
for semantic reinterpretation. In particular, the paper shows how
semantic reinterpretation can be applied to create analysis func-
tions that compute formulas for forward symbolic evaluation,
weakest precondition, and symbolic composition (§5.1, §5.2,
and §5.3, respectively).
From the systems-building perspective, we show that this ob-
servation has algorithmic content: the paper describes how we
created a meta-system that, given an interpreter that specifies a
subject language’s concrete semantics, uses binding-time analy-
sis, a two-level intermediate language, and semantic reinterpre-
tation to automatically generate implementations of all three of
symbolic-analysis primitives, for every instruction set for which
one has a TSL specification (§8).
We demonstrate that semantic reinterpretation can handle lan-
guages with pointers, aliasing, dereferencing, and address arith-
metic (§3, §6, and §7). In particular, in §3 and §6.4.1, we show
how reinterpretation can automatically generate a weakest-
precondition primitive that implements Morris’s rule of sub-
stitution for a language with pointer variables [22].
§6.4.2 shows how the semantic-reinterpretation approach can
also generate a weakest-precondition primitive that implements
the pure substitution-based approach of Cartwright and Oppen
[6] (again for a language with pointer variables). This provides
insight on how Morris’s rule and Cartwright and Oppen’s rule
are related: both are based on substitution; the difference is
merely the degree of algebraic simplification that is performed.
Organization. §2 presents the basic principles of semantic rein-
terpretation by means of an example in which reinterpretation is
used to create abstract transformers for abstract interpretation. §3
provides an overview of our techniques and the results obtained
with the symbolic-analysis primitives that are created by seman-
tic reinterpretation. §4 defines the logic that we use, as well as the
programming languages PL
1
. §5 discusses how to use reinterpreta-
tion to obtain the primitives for forward symbolic evaluation, weak-
est precondition, and symbolic composition. §6 defines PL
2
, which
includes pointer variables and dereferencing, and shows how the
weakest-precondition operation that is obtained automatically via
semantic reinterpretation implements Morris’s rule of substitution.
§7 introduces a simplified machine-code language, which includes
address arithmetic and dereferencing, and shows that the reinter-
pretation technique applies at the machine-code level, as well. §8
describes how these ideas are implemented using the TSL system
[20]. §9 discusses related work. (Proofs of two lemmas appear in
App. A.)
2 2008/7/22

s
1
: x = x y;
s
2
: y = x y;
s
3
: x = x y;
t
1
: px = px py;
t
2
: py = px py;
t
3
: px = px py;
(a) (b)
Figure 1. (a) Code fragment that swaps two ints; (b) (buggy) code
fragment that swaps two ints using pointers.
2. Semantic Reinterpretation for Abstract
Interpretation
To illustrate factoring-plus-reinterpretation in the context of ab-
stract interpretation, and as a warm-up exercise for the rest of the
paper, this section presents the basic principle of semantic reinter-
pretation using a simple example in which, both the concrete se-
mantics, for a language of assignment statements, and an abstract
sign-analysis semantics are defined via semantic reinterpretation.
Example 2.1. [Adapted from [21].] Consider the following frag-
ment of a denotational semantics, which defines the meaning of
assignment statements over variables that hold signed 32-bit int
values (where denotes exclusive-or):
I Id E Expr ::= I | E
1
E
2
| . . .
S Stmt ::= I = E; σ State = Id Int32
E : Expr State Int32
EJIKσ = σI EJE
1
E
2
Kσ = EJE
1
Kσ EJE
2
Kσ
I : Stmt State State
IJI = E;Kσ = σ[I 7→ EJEKσ]
This specification can be factored into client and core specifications
by introducing a domain Val, as well as operators xor, lookup, and
store. The client specification is defined by
xor : Val Val Val
lookup : State Id Val
store : State Id Val State
E : Expr State Val
EJIKσ = lookup σ I EJE
1
E
2
Kσ = EJE
1
Kσ xor EJE
2
Kσ
I : Stmt State State
IJI = E;Kσ = store σ I EJEKσ
For the concrete (or “standard”) semantics, the semantic core is
defined by
v Val
std
= Int32
State
std
= Id Val
xor
std
= λv
1
v
2
.v
1
v
2
lookup
std
= λσ.λII
store
std
= λσ.λI.λv[I 7→ v]
Different abstract interpretations can be defined by using the same
client semantics, but giving a different interpretation of the base
types, function types, and operators of the core. For example, for
sign analysis, the semantic core is reinterpreted as follows:
v Val
abs
= {neg, zero, pos}
>
State
abs
= Id Val
abs
xor
abs
= λv
1
.λv
2
.
v
2
neg zero pos >
neg > neg neg >
v
1
zero neg zero pos >
pos neg pos > >
> > > > >
lookup
abs
= λσ.λII
store
abs
= λσ.λI.λv[I 7→ v]
For instance, for the code fragment shown in Fig. 1, which
swaps two ints, sign-analysis reinterpretation creates abstract
transformers that, given the initial abstract state σ
0
= {x 7→
neg, y 7→ pos}, produce the following abstract states:
5
σ
0
:= {x 7→ neg, y 7→ pos}
σ
1
:= IJs
1
: x = x y;Kσ
0
= store
abs
σ
0
x (neg xor
abs
pos)
= {x 7→ neg, y 7→ pos}
σ
2
:= IJs
2
: y = x y;Kσ
1
= store
abs
σ
1
y (neg xor
abs
pos)
= {x 7→ neg, y 7→ neg}
σ
3
:= IJs
3
: x = x y;Kσ
2
= store
abs
σ
2
x (neg xor
abs
neg)
= {x 7→ >, y 7→ neg}
2
3. Overview
This section presents intuition about some of the elements that are
used in our work, and provides an overview of how it is possible
to automatically generate the three symbolic-analysis primitives.
§3.1 defines a stripped-down version of a logic L that is sufficient
for the discussion in this section. (The full logic is defined in
§4.1.) §3.2 presents examples of semantic reinterpretation applied
to forward symbolic evaluation; §3.3 discusses issues relevant to
weakest precondition and symbolic composition. We use the two
swap-code fragments shown in Fig. 1 as a running example.
Because tools that check path feasibility (`a la SLAM [1]) or per-
form path exploration (`a la DART [10], CUTE [28], SAGE [11],
and DASH [2]) only analyze traces, we can concentrate on non-
branching statement sequences. For this reason, our programming-
language definitions contain only assignment statements and state-
ment sequences, and do not have either if-then-else statements or
loop constructs.
3.1 A Simple Logic
The syntax of L is defined as follows:
I Id, T Term, ϕ Formula
F FuncId, FE FuncExpr, U FOUpdate
T ::= I | T
1
T
2
| FE(T )
ϕ ::= T
1
= T
2
| ϕ
1
&& ϕ
2
| . . .
FE ::= F | FE
1
[T
1
7→ T
2
]
U ::= ({I
i
- T
i
}, {F
j
- FE
j
})
Names of the form F FuncId, possibly with subscripts and/or
primes, are function symbols. We distinguish the xor constructor of
L from the programming-language xor (§2) by putting the former
in a box. A FuncExpr of the form FE
1
[T
1
7→ T
2
] denotes a
function-update expression.
An expression of the form ({I
i
- T
i
}, {F
j
- FE
j
}) is called
a structure-update expression. The subscripts i and j implicitly
range over certain index sets, which will be omitted to reduce clut-
ter. To emphasize that I
i
and F
j
refer to next-state quantities, we
sometimes write structure-update expressions with primes: ({I
0
i
-
T
i
}, {F
0
j
- FE
j
}). (Also, if a component has only a singleton set,
we omit the set brackets.) {I
0
i
- T
i
} specifies the updates to the
constants and {F
0
j
- FE
j
} specifies the updates to the functions.
Thus, a structure-update expression ({I
0
i
- T
i
}, {F
0
j
- FE
j
})
can be thought of as a kind of restricted 2-vocabulary (i.e., 2-state)
5
For numbers represented in two’s complement notation,
pos xor
abs
neg = neg xor
abs
pos = neg
because, for all combinations of values represented by pos and neg, the
sign bit of the result is set, which means that the result is guaranteed to be
negative. However,
pos xor
abs
pos = neg xor
abs
neg = >
because the concrete result could beeither 0 or positive, and zerotpos = >.
3 2008/7/22

formula
^
i
(I
0
i
= T
i
)
^
j
(F
0
j
= FE
j
).
We define U
id
to be
({I - I | I Id}, {F - F | F FuncId}).
Example 3.1. In §5, we work with a simple high-level language,
PL
1
, that only has int-valued variables. (PL
1
is the language
from §2, extended with some additional kinds of expressions.) In
§6, we introduce PL
2
, which extends PL
1
with pointers. Here we
confine ourselves to sketching how the semantics of various kinds
of assignment statements can be expressed in L[PL
1
] and L[PL
2
].
In PL
1
, a state σ State is a map Id Int32. This is modeled
in L[PL
1
] by using a constant c
x
Id for each PL
1
identifier
x. (However, to reduce clutter, we will merely use x for such
constants instead of c
x
.)
In PL
2
, a state σ is a pair (η, ρ), where, environment η Env =
Id Loc maps identifiers to their associated locations and
store ρ Store = Loc Int32 maps each location to the
value that it holds. (Loc stands for locations—e.g., memory
addresses—and we identify Loc with the set Int32 of values.)
This is modeled in L[PL
2
] by using a function symbol F
ρ
for
store ρ, and a constant symbol c
x
Id for each PL
2
identifier
x. (Again, to reduce clutter, we will use x for such constants
instead of c
x
.) The constants and their values correspond to the
environment η.
The following table illustrates how the semantics of a few assign-
ment statements are expressed as L[PL
1
] and L[PL
2
] structure-
update expressions:
PL
1
L[PL
1
]
x = 17; (x
0
- 17, )
x = y; (x
0
- y, )
PL
2
L[PL
2
]
x = 17; (, F
0
ρ
- F
ρ
[x 7→ 17])
x = y; (, F
0
ρ
- F
ρ
[x 7→ F
ρ
(y)])
x = q; (, F
0
ρ
- F
ρ
[x 7→ F
ρ
(F
ρ
(q))])
2
The semantics of L is defined in terms of a logical structure,
which gives meaning to the Id and FuncId symbols of the logic’s
vocabulary.
ι LogicalStruct = (Id Int32) × (FuncId (Int32 Int32))
We use (ι1) and (ι2) to denote the first and second components
of ι, respectively. (ι1) assigns meanings to constant symbols;
(ι2) assigns meanings to function symbols.
T : Term LogicalStruct Int32
T JIKι = (ι1) I
T JT
1
T
2
Kι = T JT
1
Kι T JT
2
Kι
T JFE(T )Kι = (FEJFEKι)(T JT
1
Kι)
F : Formula LogicalStruct Bool
FJT
1
= T
2
Kι = T JT
1
Kι = T JT
2
Kι
FJϕ
1
&& ϕ
2
Kι = FJϕ
1
Kι FJϕ
1
Kι
FE : FuncExpr LogicalStruct (Int32 Int32)
FEJF Kι = (ι2) F
FEJFE
1
[T
1
7→ T
2
]Kι = (FEJFE
1
Kι)[(T JT
1
Kι) 7→ (T JT
2
Kι)]
U : FOUpdate LogicalStruct LogicalStruct
UJ({I
i
- T
i
}, {F
j
- FE
j
})Kι
= ((ι1)[I
i
7→ T JT
i
Kι], (ι2)[F
j
7→ FEJFE
j
Kι])
U
id
:= ({x - x, y - y}, )
IJx = x y;KU
id
= ({x - (EJxKU
id
EJyKU
id
), y - y}, )
= ({x - (x
y), y - y}, )
= U
1
IJy = x y;KU
1
= ({x - (x y), y - (EJxKU
1
EJyKU
1
)}, )
= ({x - (x
y), y - ((x y) y)}, )
= ({x - (x
y), y - x}, )
= U
2
IJx = x y;KU
2
= ({x - (EJxKU
2
EJyKU
2
), y - x}, )
= ({x - ((x
y) x), y - x}, )
= ({x - y, y - x}, )
= U
3
Figure 2. Symbolic execution of Fig. 1(a) via semantic reinterpre-
tation, starting with the FOUpdate U
id
= ({x - x, y - y}, ).
Note how the meaning of a structure-update expression is a func-
tion that maps a pre-state logical structure ι to a post-state logical
structure: {I
i
- T
i
} specifies the updates to the constants and
{F
j
- FE
j
} specifies the updates to the functions.
3.2 Symbolic Evaluation via Reinterpretation
A primitive for forward symbolic-evaluation must solve the follow-
ing problem:
Given the semantic definition of a programming language,
together with a specific programming-language statement
(or instruction) s, create a logical formula that captures the
semantics of s.
To apply semantic reinterpretation to this problem, we use formu-
las of logic L as a reinterpretation domain for the semantic core of
PL
1
. The base types and the state type of the semantic core are rein-
terpreted as follows (our convention is to mark each reinterpreted
base type, function type, and operator with an overbar):
Val = Term BVal = Formula State = FOUpdate
The operators used in the factored versions of PL
1
s meaning func-
tions E, B, and I are reinterpreted over these domains; in particular,
operations that are used in the PL
1
semantics—e.g., xor—are inter-
preted as syntactic constructors of L[PL
1
] expressions—e.g.,
.
By extension, this produces reinterpreted meaning functions
E, B,
and
I with the types listed below:
Standard Reinterpreted
E: Expr State Val
E: Expr State Val
: Expr FOUpdate Term
B: BoolExpr State BVal
B: BoolExpr State BVal
: BoolExpr FOUpdate Formula
I: Stmt State State
I: Stmt State State
: Stmt FOUpdate FOUpdate
The reinterpreted function I translates a statement s of PL
1
to a
phrase in logic L[PL
1
].
Example 3.2. The steps of symbolic execution of Fig. 1(a) via se-
mantic reinterpretation, starting with the FOUpdate U
id
= ({x -
x, y - y}, ) are shown in Fig. 2. The final FOUpdate U
3
can be
considered to be the 2-vocabulary formula
(x
0
= y) (y
0
= x).
This expresses a state change in which the values of program
variables x and y are swapped. 2
Algebraic simplification of the resulting terms and formulas
also plays an important role. The simplification techniques that we
4 2008/7/22

use are similar to ones used by others, such as the preprocessing
steps used in decision procedures (e.g., the ite-lifting and read-over-
write transformations for operations on functions [29, 9, 17]).
We assume that the reinterpreted
performs bit-vector sim-
plification according to the algebraic laws for xor. For example,
when y is updated in U
1
by y - ((x
y) y) (see Fig. 2),
this is simplified to y - x. We assume that the other bit-vector,
relational, and Boolean constructors of the logic behave similarly.
Relationship to partial evaluation. In general, the semantic def-
inition of an imperative programming language is a meaning func-
tion I with type I : Stmt × State State. Given our goal, namely,
Given the semantic definition of a programming language,
I : Stmt × State State, together with a specific
programming-language statement (or instruction) s Stmt,
create a logical formula that captures the semantics of s.
it is not surprising that partial-evaluation techniques come into play.
In essence, we wish to partially evaluate I with respect to Stmt
s, while at the same time translating to L. Semantic reinterpretation
permits us to do this: Let U
s
be the FOUpdate
IJsKU
id
. Then U
s
is
the partial evaluation of I with respect to s, translated to logic.
We show in §5.1 that U
s
has the desired semantics. Note
that to model PL
1
programs in L[PL
1
], we do not require any
function symbols. Thus, a PL
1
state σ can be identified with
the LogicalStruct (σ, ).
6
In §5.1, we show that for all ι
LogicalStruct, evaluating U
s
is equivalent to running I on s—i.e.,
((UJU
s
Kι)1) = IJsK(ι1) (see Cor. 5.3).
In our implementation, discussed in §8, the TSL system is sup-
plied with a TSL program for the meaning function I, and the
way that it performs semantic reinterpretation is to create a kind
of generating extension [16] I-gen for I.
7
The full explanation is
complicated by the number of language levels involved when the
partial-evaluation machinery is included in the discussion. For this
reason, we have chosen to delay the discussion of generating exten-
sions and partial-evaluation machinery until §8, and instead to base
the discussion on the simpler principle of semantic reinterpretation.
This has benefits and drawbacks:
The benefit is that the explanation is simpler, and could also be
useful for direct hand implementation when a meta-system such
as TSL is not available.
The drawback is that in some of the sections before §8 it may
appear that many steps perform rather trivial transliteration of
expressions from programming language PL
i
into expressions
of the corresponding logic L[PL
i
]. In part, this is an artifact
of trying to present the method in an easy-to-digest manner; in
part, it mimics the behavior of a generating extension: copying
(or transliterating) the appropriate residual expression is one
of the principles of “writing a generating extension by hand”
[3, 18].
3.3 Other Symbolic-Analysis Operations
For weakest precondition and symbolic composition, we again use
L[·] as a reinterpretation domain; however, there is a trick: in con-
6
Similarly, for PL
2
a State σ = (η, ρ) can be identified with the
LogicalStruct (η, [F
ρ
7→ ρ]).
7
If p is a two-input program, then p-gen is any program with the property
that for every input pair a and b,
Jp-genK(a) = p
a
, where Jp
a
K(b) = JpK(a, b).
Thus, I-gen is a program such that for every statement s and State σ,
JI-genK(s) = I
s
, where JI
s
K(σ) = JIK(s, σ).
trast with what is done to generate symbolic-evaluation primitives,
we use the FOUpdate type of L[·] to reinterpret the meaning func-
tions U, FE, F, and T of L[·] itself! The general scheme is outlined
in the following table:
Meaning Type Replacement Function created
function(s) reinterpreted type
I, E, B State FOUpdate Symbolic evaluation
F, T LogicalStruct FOUpdate Weakest precondition
U, FE, F, T LogicalStruct FOUpdate Symbolic composition
To keep things simple in §3.2, we did not present the semantics
of L[·] in factored form (see §4.1). Thus, the discussion in the rest
of this section merely surveys a few of the results that are obtained
by the techniques presented in later sections.
Weakest precondition. The weakest (liberal) precondition
WLP(s, ϕ) characterizes the set of states σ such that the execu-
tion of s starting in σ either fails to terminate or results in a state
σ
0
such that ϕ(σ
0
) holds. For a language like PL
1
, which only has
int-valued variables, the WLP of a postcondition (specified by
formula ϕ) with respect to an assignment statement var = rhs;
can be expressed as the formula obtained by substituting rhs for all
(free) occurrences of var in ϕ: ϕ[var rhs].
For the swap-code fragment shown in Fig. 1(a), repeated sub-
stitution and simplification shows that the weakest precondition of
the program swap with respect to postcondition x = 2 is y = 2.
(This will be derived using semantic reinterpretation in §5.2.)
Complications from pointers. When Hoare logic is extended for
a language with pointer variables, such as PL
2
, syntactic substi-
tution is no longer adequate for finding weakest-precondition for-
mulas. For instance, suppose that we are interested in finding a
formula for the WLP of postcondition x = 5 with respect to
p = e;. This cannot be accomplished merely by performing the
substitution (x = 5)[p e]: the substitution yields the formula
x = 5, whereas the WLP depends on the execution context in
which p = e; is evaluated:
If p points to x, then the WLP formula should be e = 5.
If p does not point to x, then the WLP formula should be
x = 5.
In this case, the WLP formula can be expressed informally as
(p = &x) ? (e = 5) : (x = 5).
Example 3.3. In §5.2, such formulas are expressed as shown below
on the right.
Informal Formal
Query WLP(p = e, x = 5) WLP(p = e, F
ρ
(x) = 5)
Result (p = &x) ? (e = 5) : (x = 5)
ite(F
ρ
(p) = x,
F
ρ
(e)
= 5,
F
ρ
(x)
= 5)
2
For a program fragment that involves multiple pointer variables,
the WLP formula may have to take into account all possible
aliasing combinations. One of the most important features of our
approach is its ability to create correct implementations of Morris’s
rule of substitution [22] automatically—and basically for free.
Symbolic analysis of machine code.
Example 3.4. Fig. 4(a) shows a source-code fragment; Fig. 4(b)
shows the corresponding assembly code. To simplify the discus-
sion, the source-level variables are used in the assembly code in-
stead of having operations to access variable locations based on
their frame-pointer-relative offsets in the activation record.
5 2008/7/22

Citations
More filters
Book ChapterDOI

Directed proof generation for machine code

TL;DR: What distinguishes McVeto from other work on software model checking is that it shows how verification of machine-code can be performed, while avoiding conventional techniques that would be unsound if applied at the machine- code level.
Dissertation

Static Analysis of x86 Executables

TL;DR: This dissertation argues for the integration of disassembly, control flow reconstruction, and static analysis in a unified process and introduces a framework for simultaneous control and data flow analysis on low level binary code, which overcomes the "chicken and egg" problem and is proven to yield the most precise control flow graph with respect to the precision of the data flow domain.
Book ChapterDOI

There's plenty of room at the bottom: analyzing and verifying machine code

TL;DR: The authors discusses the obstacles that stand in the way of doing a good job of machine code analysis compared with analysis of source code, and describes some of the challenges that arise when analyzing machine code and what can be done about them.
Journal ArticleDOI

Symbolic analysis via semantic reinterpretation

TL;DR: A novel technique is presented to create a system in which, for the cost of writing just one specification—an interpreter for the programming language of interest—one obtains automatically generated, mutually-consistent implementations of all three symbolic-analysis primitives.

MCDASH: Refinement Based Property Verification for Machine Code

TL;DR: This paper discusses several challenges that arise when working with machine code, and explains how they are addressed in MCDASH, a refinement-based model checker for machine code.
References
More filters
Book ChapterDOI

Z3: an efficient SMT solver

TL;DR: Z3 is a new and efficient SMT Solver freely available from Microsoft Research that is used in various software verification and analysis applications.
Proceedings ArticleDOI

Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints

TL;DR: In this paper, the abstract interpretation of programs is used to describe computations in another universe of abstract objects, so that the results of abstract execution give some information on the actual computations.
Journal Article

An Axiomatic Basis for Computer Programming

Journal ArticleDOI

An axiomatic basis for computer programming

TL;DR: An attempt is made to explore the logical foundations of computer programming by use of techniques which were first applied in the study of geometry and have later been extended to other branches of mathematics.
Journal ArticleDOI

DART: directed automated random testing

TL;DR: DART is a new tool for automatically testing software that combines three main techniques, automated extraction of the interface of a program with its external environment using static source-code parsing, and dynamic analysis of how the program behaves under random testing and automatic generation of new test inputs to direct systematically the execution along alternative program paths.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What are the contributions mentioned in the paper "Symbolic analysis via semantic reinterpretation" ?

By “ symbolic program analysis ”, the authors mean logic-based techniques to analyze state changes along individual program paths. In this paper, the authors develop a method to create implementations of these primitives so that they can be made available easily for multiple programming languages—particularly for multiple machine-code instruction sets. In particular, the authors have created a system in which, for the cost of writing just one specification—of the semantics of the programming language of interest, in the form of an interpreter expressed in a functional language—one obtains automaticallygenerated implementations of all three symbolic-analysis functions. The authors show that this can be carried out even for programming languages with pointers, aliasing, dereferencing, and address arithmetic. 

When Hoare logic is extended for a language with pointer variables, such as PL2, syntactic substitution is no longer adequate for finding weakest-precondition formulas. 

Using TSL, the authors obtained automatically-generated implementations of all three symbolic-analysis functions from each of the specifications. 

The specification of the Intel x86 instruction set is about 2700 lines of TSL; the specification of the PowerPC instruction set is about 1200 lines. 

The standard interpretation of the operators used in the PL2 semantics is as follows:BValstd = BVal Valstd = Int32 Locstd = Int32η ∈ Envstd = Id→ Locstd ρ ∈ Storestd = Locstd → ValstdlookupStatestd = λ(η, ρ). 

For this reason, the authors have chosen to delay the discussion of generating extensions and partial-evaluation machinery until §8, and instead to base the discussion on the simpler principle of semantic reinterpretation. 

∈ LogicalStruct = (Id→ Val)× (FuncId→ (Val→ Val))The types of the functions that operate on Terms, Formulas, and FuncExprs are as follows:const : CInt32 → Val condL : BVal→ Val→ Val→ VallookupId : LogicalStruct→ Id→ Val binopL : BinOpL → (Val× Val→ Val) relopL : RelOpL → (Val× Val→ BVal)boolopL : BoolOpL → (BVal× BVal→ BVal) lookupFuncId : LogicalStruct→ FuncId→ (Val→ Val)access : (Val→ Val)× Val)→ 

For the swap-code fragment shown in Fig. 1(a), repeated substitution and simplification shows that the weakest precondition of the program swap with respect to postcondition x = 2 is y = 2.