scispace - formally typeset
Open AccessJournal ArticleDOI

Symbolic Program Analysis in Almost-Linear Time

John H. Reif, +1 more
- 01 Feb 1982 - 
- Vol. 11, Iss: 1, pp 81-93
Reads0
Chats0
TLDR
This paper describes an algorithm to construct, for each expression in a given program text, a symbolic expression whose value is equal to the value of the text expression for all executions of the program.
Abstract
This paper describes an algorithm to construct, for each expression in a given program text, a symbolic expression whose value is equal to the value of the text expression for all executions of the program. We call such a mapping from text expressions to symbolic expressions a cover. Covers are useful in such program optimization techniques as constant propagation and code motion. The particular cover constructed by our methods is in general weaker than the covers obtainable by the methods of [Ki], [FKU], [RL], [R2] but our method has the advantage of being very efficient. It requires $O(m\alpha (m,n) + l)$ operations if extended bit vector operations have unit cost, where n is the number of vertices in the control flow graph of the program, m is the number of edges, l is the length of the program text, and $\alpha $ is related to a functional inverse of Ackermann’s function [T2]. Our method does not require that the program be well-structured nor that the flow graph be reducible.

read more

Content maybe subject to copyright    Report


82
J.
H.
REIF
AND
R.
E.
TAR
JAN
Let
0
be
the
set
of
function
signs
occurring
in
the
program.
For
simplicity,
we
assume
a
domain
D
such
that
every
k-ary
function
represented
by
a
sign
in
0
has
the
same
domain
Z)
k.
Let
C
be
a
set
of
constant
signs
containing
a
unique
sign
for
every
element
in
D.
Let
EXP
be
the
set
of
expressions
built
from
entry
variables,
constant
signs
in
C,
and
function
signs
in
0.
To
each
expression
g’
e
EXP
corresponds
a
unique
reduced
expression
R
formed
by
repeatedly
substituting
the
appropriate
constant
sign
for
each
subexpression
of
g’
consisting
of
a
function
sign
applied
to
constant
signs.
For
any
expression
g"
EXP
and
any
execution
of
the
program
r,
the
value
of
g"
on
exit
from
a
vertex
v
is
defined
as
follows’
If
f(
contains
an
entry
variable
X"
such
that
control
has
never
entered
u,
then
the
value
of
g’
is
undefined.
Otherwise
the
value
of
g’
is
computed
by
substituting
for
each
entry
variable
X"
the
value
of
X
when
control
last
entered
u,
and
evaluating
the
resulting
expression.
For
each
vertex
v
V
and
program
variable
X
Y_,
defined
at
v,
the
exit
expression
g’(X,
v)e
EXP
is
formed
as
follows.
Begin
by
letting
the
expression
g"
be
X.
Process
each
assignment
statement
of
v,
starting
from
the
last
assignment
defining
X
and
working
backwards
to
the
first
assignment
in
v.
To
process
an
assignment
Y:-g",
replace
each
occurrence
of
Y
in
g’
by
g".
After
all
assignments
are
processed,
reduce
g’
and
replace
each
occurrence
of
a
variable
Y
by
the
corresponding
entry
variable
Y.
The
resulting
exit
expression
g’(X,
v)
represents
the
value
of
X
on
exit
from
v
in
terms
of
constants
and
values
of
variables
on
entry
to
v.
For
example,
g’(Z,
v2)
Z
2
+
(X
2
Y:)
represents
the
value
of
Z
on
exit
from
vertex/-)2
in
the
flow
graph
of
Fig.
1.
A
text
expression
is
any
subexpression
of
an
exit
expression
g’(X,
v)
(including
the
expression
itself);
we
say
the
text
expression
occurs
at
v.
An
expression
g’
EXP
covers
a
text
expression
occurring
at
v
if
for
any
execution
of
program
r,
g"
and
have
the
Y:=X+Y
Z::
Z+(X*Y)
X:--Z
FIG.
1.
A
program
flow
graph.

SYMBOLIC
PROGRAM
ANALYSIS
IN
ALMOST-LINEAR
TIME
83
same
value
on
any
exit
from
v.
See
Fig.
1.
This
definition
implies
that
if
X
appears
in
then
u
dominates
v.
Thus
there
is
a
unique
vertex
v
which
is
minimal
(i.e.,
closest
to
the
start
vertex)
with
respect
to
the
dominator
relation
and
such
that
for
all
entry
variables
X
in
q,
u
dominates
v.
We
call
such
a
vertex
the
origin
of
;
it
is
the
earliest
point.in
the
program
at
which
can
be
computed.
A
cover
of
7r
is
a
mapping
from
all
text
expressions
to
reduced
expressions
in
EXP,
such
that,
for
each
text
expression
t,
(t)
covers
t.
We
would
like
to
construct
a
cover
whose
origins
are
minimal
with
respect
to
the
dominator
relation.
We
can
use
such
a
cover
for
constant
propagation"
if
a
constant
sign
c
covers
a
text
expression
t,
we
may
substitute
c
in
line
in
the
program
text
for
the
computation
associated
with
c.
We
can
also
use
a
cover
in
code
motion.
If
we
define
the
birthpoint
of
a
text
expression
to
be
the
minimal
vertex
to
which
the
computation
of
may
be
moved,
then
the
birthpoint
of
is
precisely
the
origin
of
a
minimal
cover
of
t.
For
example,
in
Fig.
1
the
birthpoint
of
text
expression
X
v2
yv2
is
v,;
X
(X
+
yvl)
covers
t.
Code
Text
expression
yv2
g(z,
v2)
Z
2
+
(x
Y)
Covering
expressions
X
X
+
y,
ZU
Z
’,
+
(X
,
(X
+
Y’,))
FIG.
2.
Symbolic
analysis
of
the
progra’m
in
Fig.
1.
motion
requires
approximations
to
.birthpoints
(i.e.,
vertices
which
are
dominated
by
the
true
birthpoints)
and
other
knowledge
including
knowledge
of
the
cycle
structure
of
the
flow
graph
of
7r.
(We
may
not
wish
to
move
code
as
far
as
the
birthpoint
since
the
birthpoint
may
be
contained
in
control
cycles
avoiding
the
original
location
of
the
c.ode.)
[R1]
presents
efficient
algorithms
which
utilize
approximate
birthpoints
for
code
motion
optimization.
See
[AU],
[CA],
[E],
[G]
for
further
discussion
of
code
motion
optimizations.
Other
practical
uses
of
covers
have
been
made
by
[FK]
in
their
optimizing
Pascal
compiler.
Unfortunately,
for
programs
which
manipulate
the
natural
numbers
using
ordinary
arithmetic
the
problem
of
computing
a
minimal
cover
is
recursively
unsolvable
JR2].
The
usual
approach
in
program
optimization
is
to
trade
accuracy
for
speed;
[FKU],
[Ki],
[RL],
JR2]
present
fast
algorithms
which
compute
reasonably
good
covers
whose
origins
yield
approximate
birthpoints.
The
fastest
of
these
[RL],
JR2]
has
a
time
bound
almost
linear
in
m.
lY_,I
+
l,
where
is
the
length
of
the
program
text.
In
this
paper
we
describe
a
very
fast
algorithm
for
computing
a
rather
weak
cover.
This
simple
cover
can
be
used
directly
for
code
optimization,
or
it
can
serve
as
input
to
a
more
powerful
method
for
symbolic
evaluation
presented
in
[RL],
[R2].
From
a
data
structure
called
a
global
value
graph
(which
is
related
to
the
use-definition
chains
of
[AU],
[Sc]
used
to
represent
the
flow
of
values
through
a
program),
the
algorithm
of
[RL],
JR2]
constructs
a
cover
which
yields
better
approximate
birthpoints
than
does
the
simple
cover.
This
algorithm
runs
in
time
almost
linear
in
the
size
of
the
input
global
value
graph,
which
is
very
compact
when
constructed
from
the
simple
cover
[RL],
[R2].
In
order
to
define
the
simple
cover
we
need
one
more
concept.
A
variable
X
is
definition-free
between
distinct
vertices
u
and
v
if
no
u-avoiding
path
from
a
successor
of
u
to
a
predecessor
of
v
contains
a
definition
of
X.
By
convention
any
program
variable
X
is
definition-free
between
v
and
v
for
any
vertex
v.
For
any
entry
variable
X
which
is
a
text
expression,
the
simple
origin
of
X
is
the
minimal
vertex
u
(with
respect

84
J.
H.
REIF
AND
R.
E.
TAR
JAN
to
the
dominator
relation)
such
that
X
is
definition-free
between
u
and
v.
In
the
example
of
Fig.
1,
X
2
has
a
simple
origin
r,
and
Yo2
and
Z
:
have
simple
origin
Vl.
If
X
has
simple
origin
u
v,
then
on
any
execution
of
rr
the
program
variable
X
has
the
same
value
on
entry
to
v
as
it
did
after
the
most
recent
execution
of
u;
we
take
the
simple
origin
as
an
approximation
to
the
birthpoint
of
X
.
We
recursively
define
the
simple
cover
using
simple
origins.
If
contains
no
entry
variables
then
(t)=
t.
Otherwise
we
form
(t)
from
by
applying
the
following
transformation.
(i)
Repeat
the
following
step
for
all
entry
variables
X
occurring
in
t:
Let
u
be
the
simple
origin
of
X
.
If
u
v
do
nothing.
Otherwise
replace
X
in
by
((X,
u))
if
X
is
defined
at
u
or
by
X
if
X
is
not
defined
at
u.
(ii)
Reduce
the
resulting
expression.
Our
algorithm
for
computing
the
simple
cover
consists
of
three
parts,
described
in
2-4
of
this
paper.
First,
we
determine
for
each
vertex
v
the
set
of
program
variables
defined
between
the
immediate
dominator
of
v
and
v
itself.
We
call
this
set
of
variables
idef
(v).
The
idef
computation
can
be
regarded
as
a
path
problem
of
the
kind
studied
in
[GW],
IT3],
but
another
approach
is
more
fruitful:
a
straightforward
modification
of
the
dominator-finding
algorithm
of
[LT]
computes
idef
in
O(ma(m,
n)+
l)
time,
assum-
ing
that
logical
bit
vector
operations
on
vectors
of
length
IEI
have
unit
cost,
where
is
the
length
of
the
program
text
and
a
is
related
to
an
inverse
of
Ackermann’s
function
IT2].
Second,
we
use
idef
to
compute
the
simple
origins
of
all
entry
variables
appearing
as
text
expressions.
This
computation
requires
a
variable-length
shift
operation
on
bit
vectors
(shift
left
to
the
first
nonzero
bit)
and
requires
O(n
+
l)
time.
Third,
we
construct
a
directed
acyclic
graph
representing
the
simple
cover
(we
save
space
by
combining
common
subexpressions).
This
algorithm
also
requires
O(n
+
l)
time
but
uses
no
bit
vector
operations.
The
total
running
time
of
our
algorithm
is
thus
O(ma(m,
n)
+
l)
if
extended
bit
vector
operations
require
constant
time.
2.
An
algorithm
for
computing
idef
based
on
finding
dominators.
In
this
section
we
shall
describe
an
algorithm
for
computing
idef
(v)
for
all
vertices
v
V
in
the
flow
graph
G
(V,
E,
r)
of
a
computer
program.
We
obtain
the
algorithm
by
adding
appropriate
extra
steps
to
the
dominators
algorithm
of
[LT],
and
we
shall
assume
that
the
reader
is
familiar
with
[LT].
Our
algorithm
requires
def
(w)
{XIX
is
defined
at
w}
for
each
vertex
w
V
as
input
and
uses
set
union
as
a
basic
operation.
If
each
subset
of
is
represented
as
a
bit
vector
of
length
]l,
then
a
set
union
is
equivalent
to
an
"or"
operation
on
bit
vectors;
we
shall
assume
each
set
union
requires
constant
time.
Construction
of
def
(w)
for
all
vertices
w
is
easy
and
requires
time
proportional
to
the
length
of
the
program
text.
Properties
of
idef.
For
any
vertex
w
r,
let
idom
(w)
be
the
immediate
dominator
of
w
in
G.
For
w
r,
we
define
idef(w)
{def
(v)l
there
is
a
nonempty
path
from
v
to
w
which
avoids
idom
(w)}.
Note
that
def
(w)
is
a
term
in
the
union
defining
idef
(w)
if
and
only
if
there
is
a
cycle
containing
w
but
avoiding
idom(w).
To
compute
idom
and
idef,
we
first
perform
a
depth-first
search
on
G,
starting
from
vertex
r
and
numbering
the
vertices
from
1
to
n
as
they
are
reached
during
the
search.
The
search
generates
a
spanning
tree
T
rooted
at
r,
with
vertices
numbered
in
preorder
IT1].
For
convenience
in
stating
our
results,
we
shall
assume
in
this
subsection
that
all
vertices
are
identified
by
:
+
number,
and
we
shall
use
-->,
,
-->
to
denote
ancestor-descendant
relationships
in
T
(see
the
appendix).

SYMBOLIC
PROGRAM
ANALYSIS
IN
ALMOST-LINEAR
TIME
85
FIG.
3.
Depth-first
search
of
the
flow
graph
given
in
Fig.
1.
Solid
edges
denote
tree
edges
and
dotted
edges
denote
nontree
edges.
The
depth-first
search
number
is
given
to
the
right
of
each
vertex.
vertex
number
idom
sdom
def
idef
sdef
vl
2
{Y}
{Y,Z}
{Y,Z}
v2
3
vl
Vx
{Z}
Q
v3
4
Y,
Z}
3
v4
5
vt
{X}
{Y,Z}
vs
6
v
v
3
FIG.
4.
Tabulation
of
information
calculated
]’or
the
program
flow
graph
given
in
Fig.
1.
The
following
paths
lemma
is
an
important
property
of
depth-first
search
and
is
crucial
to
the
correctness
of
our
algorithm.
LEMMA
2.1
[T1].
If
v
and
w
are
vertices
of
G
such
that
v
<-
w,
then
any
path
from
v
to
w
must
contain
a
common
ancestor
of
v
and
w
in
T.
As
an
intermediate
step,
the
dominators
algorithm
computes
a
value
for
each
vertex
w
#
r
called
its
semi-dominator,
denoted
sdom
(w)
and
defined
by
()
(2)
sdom
(w)
min
{vlthere
is
a
path
v
Vo,
vl,
,
vk
w
such
that
vi
>
w
for
1
-<
<
k}.
We
shall
in
addition
compute
a
value
sdef
(w)
for
each
vertex
w
#
r
defined
by
sdef
(w)
U
{def
(v)[
there
is
a
nonempty
path
v
v0,
vl,.
,
vk
w
such
that
vi
>=
w
for
0
<-
<=
k}.
The
following
properties
of
semi-dominators
and
dominators
justify
the
domina-
tors
algorithm.
q-
LEMMA
2.2
[LT].
Let
w
r.
Then
idom
(w)*->
sdom
(w)-->
w.
THEOREM
2.1
[LT].
For
any
vertex
w
r,
(3)
sdom
(w)
min
({vl(v,
w)
E
and
v
<
w}
U
{sdom
(u)lu
>
w
and
there
is
an
edge
(v,
w)
such
thatu
v}).

Citations
More filters
Journal ArticleDOI

Efficiently computing static single assignment form and the control dependence graph

TL;DR: In this article, the authors present new algorithms that efficiently compute static single assignment forms and control dependence graphs for arbitrary control flow graphs using the concept of {\em dominance frontiers} and give analytical and experimental evidence that these data structures are usually linear in the size of the original program.
Journal ArticleDOI

Constant propagation with conditional branches

TL;DR: Four algorithms, all conservitive in the sense that all constants may not be found, but each constant found is constant over all possible executions of the program, are presented.
Proceedings ArticleDOI

Global value numbers and redundant computations

TL;DR: This work proposes a redundancy elimination algorithm that is global (in that it deals with the entire program), yet able to recognize redundancy among expressions that are lexitally different, and takes advantage of second order effects.
Proceedings ArticleDOI

An efficient method of computing static single assignment form

TL;DR: This paper presents strong evidence that static single assignment form and the control dependence graph can be of practical use in optimization, and presents a new algorithm that efficiently computes these data structures for arbitrary control flow graph.
Proceedings ArticleDOI

Detecting equality of variables in programs

TL;DR: An algorithm for detecting when two computations produce equivalent values by developing a static property called congruence, which is conservative in that any variables detected to be e:quivalent will in fact be equivalent, but not all equivalences are detected.
References
More filters
Proceedings ArticleDOI

A unified approach to global program optimization

TL;DR: A technique is presented for global analysis of program structure in order to perform compile time optimization of object code generated for expressions that includes constant propagation, common subexpression elimination, elimination of redundant register load operations, and live expression analysis.
Journal ArticleDOI

A fast algorithm for finding dominators in a flowgraph

TL;DR: A fast algorithm for finding dominators in a flowgraph is presented, which beat the straightforward algorithm and the bit vector algorithm on all but the smallest graphs tested.
Journal ArticleDOI

Variations on the Common Subexpression Problem

TL;DR: Efficient algorithms are described for computing congruence closures in the general case and in the following two special cases to test expression eqmvalence and to test losslessness of joins in relational databases.
Journal ArticleDOI

Applications of Path Compression on Balanced Trees

TL;DR: A method for computing functions defined on paths in trees based on tree manipulation techniques first used for efficiently representing equivalence relations, which has an almost-linear running time and is useful for solving certain kinds of pathfinding problems on reducible graphs.
Journal ArticleDOI

A Unified Approach to Path Problems

TL;DR: The results provide a general-purpose algorithm for solving any path problem, and show that the problem of constructing path expressions is in some sense the most general path problem.
Related Papers (5)