•
Model-checking tools, such as SLAM [1] and BLAST [14], as
well as hybrid concrete/symbolic program-exploration tools,
such as DART [10], CUTE [28], YOGI [13], SAGE [11],
BITSCOPE [5], and DASH [2] use forward symbolic evalua-
tion, weakest precondition, or both.
Symbolic evaluation can be used to create path formulas.
When it is possible that a path π being analyzed might not
be executable, a call on an SMT solver to determine whether
π’s path formula is satisfiable can be used to decide whether
π is executable, and if so, to generate inputs that drive the
program down π. Weakest precondition can be used to create
new predicates that split part of a program’s state space [1, 13,
2].
•
Bug-finding tools, such as ARCHER [32] and SATURN [31],
as well as commercial bug-finding products, such as Coverity’s
PREVENT [8] and GrammaTech’s CODESONAR [12] use sym-
bolic composition.
Formulas are used to summarize a portion of the behavior of a
procedure. Suppose that procedure P calls Q at call-site c, and
that r is the site in P to which control returns after the call at c.
When c is encountered during the exploration of P , such tools
perform the symbolic composition of the formula that expresses
the behavior along the path [entry
P
, . . . , c] explored in P with
the formula that captures the behavior of Q to obtain a formula
that expresses the behavior along the path [entry
P
, . . . , r].
The aforementioned systems apply symbolic analysis to programs
written in languages with pointers, aliasing, dereferencing, and ad-
dress arithmetic. This paper demonstrates that the reinterpretation
technique provides a way to create symbolic-analysis primitives for
such languages.
As mentioned earlier, our motivation is to be able to cre-
ate implementations of symbolic-analysis primitives for multiple
machine-code instruction sets (including multiple variants of a
given machine-code instruction set). However, our work is also
useful for creating tools to analyze high-level-language programs,
starting from source code. Moreover, most of the principles that
we make use of can be explained using two variants of a simple
high-level language: PL
1
, defined in §4, and PL
2
, defined in §6.
For this reason, the paper is couched in terms of high-level lan-
guages up until §7, which discusses an idealized machine-code
language, MC. This has the benefit of making the paper accessi-
ble to a wider number of readers, but might cause readers who are
mainly familiar with analysis techniques for C, C++, C#, or Java
to under-appreciate the benefits that one obtains from our approach
when creating machine-code-analysis tools.
Three for the price of one! In §8, we describe how, using
binding-time analysis [16] and a two-level intermediate language
[25], the reinterpretation technique can be used to generate im-
plementations of symbolic-analysis primitives automatically, using
a meta-system that generates program-analysis components from
a specification of the subject language’s semantics. In particular,
we have created a system in which, for the cost of writing just
one specification—of the semantics of the programming language
of interest, in the form of an interpreter expressed in a functional
language—one obtains automatically-generated implementations
of all three symbolic-analysis functions. We show that this can be
carried out even for programming languages with pointers, alias-
ing, dereferencing, and address arithmetic.
This has been achieved using the TSL system [20], and the im-
plementation has been used to generate symbolic-analysis primi-
tives for multiple machine-code instruction sets. TSL
4
consists of
4
TSL stands for “Transformer Specification Language”.
(i) a language for specifying the concrete semantics of a machine-
code instruction set (i.e., a collection of concrete-state transform-
ers), (ii) a mechanism to create implementations of different ab-
stract interpretations easily by reinterpreting the TSL base types,
function types, and operators, and (iii) a run-time system to sup-
port the (re-)interpretation and analysis of executables written in
that instruction set.
Moreover, with TSL each reinterpretation is defined at the meta-
level, by reinterpreting the collection of TSL base types, function
types, and operators. When a reinterpretation is performed in this
way, it is independent of any given subject language. Consequently,
with our implementation, all three of the symbolic-analysis prim-
itives can be generated automatically for every instruction set for
which one has a TSL specification.
The contributions of the paper can be summarized as follows:
•
From the conceptual standpoint, we present a new application
for semantic reinterpretation. In particular, the paper shows how
semantic reinterpretation can be applied to create analysis func-
tions that compute formulas for forward symbolic evaluation,
weakest precondition, and symbolic composition (§5.1, §5.2,
and §5.3, respectively).
•
From the systems-building perspective, we show that this ob-
servation has algorithmic content: the paper describes how we
created a meta-system that, given an interpreter that specifies a
subject language’s concrete semantics, uses binding-time analy-
sis, a two-level intermediate language, and semantic reinterpre-
tation to automatically generate implementations of all three of
symbolic-analysis primitives, for every instruction set for which
one has a TSL specification (§8).
•
We demonstrate that semantic reinterpretation can handle lan-
guages with pointers, aliasing, dereferencing, and address arith-
metic (§3, §6, and §7). In particular, in §3 and §6.4.1, we show
how reinterpretation can automatically generate a weakest-
precondition primitive that implements Morris’s rule of sub-
stitution for a language with pointer variables [22].
•
§6.4.2 shows how the semantic-reinterpretation approach can
also generate a weakest-precondition primitive that implements
the pure substitution-based approach of Cartwright and Oppen
[6] (again for a language with pointer variables). This provides
insight on how Morris’s rule and Cartwright and Oppen’s rule
are related: both are based on substitution; the difference is
merely the degree of algebraic simplification that is performed.
Organization. §2 presents the basic principles of semantic rein-
terpretation by means of an example in which reinterpretation is
used to create abstract transformers for abstract interpretation. §3
provides an overview of our techniques and the results obtained
with the symbolic-analysis primitives that are created by seman-
tic reinterpretation. §4 defines the logic that we use, as well as the
programming languages PL
1
. §5 discusses how to use reinterpreta-
tion to obtain the primitives for forward symbolic evaluation, weak-
est precondition, and symbolic composition. §6 defines PL
2
, which
includes pointer variables and dereferencing, and shows how the
weakest-precondition operation that is obtained automatically via
semantic reinterpretation implements Morris’s rule of substitution.
§7 introduces a simplified machine-code language, which includes
address arithmetic and dereferencing, and shows that the reinter-
pretation technique applies at the machine-code level, as well. §8
describes how these ideas are implemented using the TSL system
[20]. §9 discusses related work. (Proofs of two lemmas appear in
App. A.)
2 2008/7/22