Symbolic Program Analysis in Almost-Linear Time

doi:10.1137/0211007

82

J.

H.

REIF

AND

R.

E.

TAR

JAN

Let

0

be

the

set

of

function

signs

occurring

in

the

program.

For

simplicity,

we

assume

a

domain

D

such

that

every

k-ary

function

represented

by

a

sign

in

0

has

the

same

domain

Z)

k.

Let

C

be

a

set

of

constant

signs

containing

a

unique

sign

for

every

element

in

D.

Let

EXP

be

the

set

of

expressions

built

from

entry

variables,

constant

signs

in

C,

and

function

signs

in

0.

To

each

expression

g’

e

EXP

corresponds

a

unique

reduced

expression

R

formed

by

repeatedly

substituting

the

appropriate

constant

sign

for

each

subexpression

of

g’

consisting

of

a

function

sign

applied

to

constant

signs.

For

any

expression

g"

EXP

and

any

execution

of

the

program

r,

the

value

of

g"

on

exit

from

a

vertex

v

is

defined

as

follows’

If

f(

contains

an

entry

variable

X"

such

that

control

has

never

entered

u,

then

the

value

of

g’

is

undefined.

Otherwise

the

value

of

g’

is

computed

by

substituting

for

each

entry

variable

X"

the

value

of

X

when

control

last

entered

u,

and

evaluating

the

resulting

expression.

For

each

vertex

v

V

and

program

variable

X

Y_,

defined

at

v,

the

exit

expression

g’(X,

v)e

EXP

is

formed

as

follows.

Begin

by

letting

the

expression

g"

be

X.

Process

each

assignment

statement

of

v,

starting

from

the

last

assignment

defining

X

and

working

backwards

to

the

first

assignment

in

v.

To

process

an

assignment

Y:-g",

replace

each

occurrence

of

Y

in

g’

by

g".

After

all

assignments

are

processed,

reduce

g’

and

replace

each

occurrence

of

a

variable

Y

by

the

corresponding

entry

variable

Y.

The

resulting

exit

expression

g’(X,

v)

represents

the

value

of

X

on

exit

from

v

in

terms

of

constants

and

values

of

variables

on

entry

to

v.

For

example,

g’(Z,

v2)

Z

2

+

(X

2

Y:)

represents

the

value

of

Z

on

exit

from

vertex/-)2

in

the

flow

graph

of

Fig.

1.

A

text

expression

is

any

subexpression

of

an

exit

expression

g’(X,

v)

(including

the

expression

itself);

we

say

the

text

expression

occurs

at

v.

An

expression

g’

EXP

covers

a

text

expression

occurring

at

v

if

for

any

execution

of

program

r,

g"

and

have

the

Y:=X+Y

Z::

Z+(X*Y)

X:--Z

FIG.

1.

A

program

flow

graph.

SYMBOLIC

PROGRAM

ANALYSIS

IN

ALMOST-LINEAR

TIME

83

same

value

on

any

exit

from

v.

See

Fig.

1.

This

definition

implies

that

if

X

appears

in

then

u

dominates

v.

Thus

there

is

a

unique

vertex

v

which

is

minimal

(i.e.,

closest

to

the

start

vertex)

with

respect

to

the

dominator

relation

and

such

that

for

all

entry

variables

X

in

q,

u

dominates

v.

We

call

such

a

vertex

the

origin

of

;

it

is

the

earliest

point.in

the

program

at

which

can

be

computed.

A

cover

of

7r

is

a

mapping

from

all

text

expressions

to

reduced

expressions

in

EXP,

such

that,

for

each

text

expression

t,

(t)

covers

t.

We

would

like

to

construct

a

cover

whose

origins

are

minimal

with

respect

to

the

dominator

relation.

We

can

use

such

a

cover

for

constant

propagation"

if

a

constant

sign

c

covers

a

text

expression

t,

we

may

substitute

c

in

line

in

the

program

text

for

the

computation

associated

with

c.

We

can

also

use

a

cover

in

code

motion.

If

we

define

the

birthpoint

of

a

text

expression

to

be

the

minimal

vertex

to

which

the

computation

of

may

be

moved,

then

the

birthpoint

of

is

precisely

the

origin

of

a

minimal

cover

of

t.

For

example,

in

Fig.

1

the

birthpoint

of

text

expression

X

v2

yv2

is

v,;

X

(X

+

yvl)

covers

t.

Code

Text

expression

yv2

g(z,

v2)

Z

2

+

(x

Y)

Covering

expressions

X

+

y,

ZU

Z

’,

+

(X

,

(X

+

Y’,))

FIG.

2.

Symbolic

analysis

of

the

progra’m

in

Fig.

1.

motion

requires

approximations

to

.birthpoints

(i.e.,

vertices

which

are

dominated

by

the

true

birthpoints)

and

other

knowledge

including

knowledge

of

the

cycle

structure

of

the

flow

graph

of

7r.

(We

may

not

wish

to

move

code

as

far

as

the

birthpoint

since

the

birthpoint

may

be

contained

in

control

cycles

avoiding

the

original

location

of

the

c.ode.)

[R1]

presents

efficient

algorithms

which

utilize

approximate

birthpoints

for

code

motion

optimization.

See

[AU],

[CA],

[E],

[G]

for

further

discussion

of

code

motion

optimizations.

Other

practical

uses

of

covers

have

been

made

by

[FK]

in

their

optimizing

Pascal

compiler.

Unfortunately,

for

programs

which

manipulate

the

natural

numbers

using

ordinary

arithmetic

the

problem

of

computing

a

minimal

cover

is

recursively

unsolvable

JR2].

The

usual

approach

in

program

optimization

is

to

trade

accuracy

for

speed;

[FKU],

[Ki],

[RL],

JR2]

present

fast

algorithms

which

compute

reasonably

good

covers

whose

origins

yield

approximate

birthpoints.

The

fastest

of

these

[RL],

JR2]

has

a

time

bound

almost

linear

in

m.

lY_,I

+

l,

where

is

the

length

of

the

program

text.

In

this

paper

we

describe

a

very

fast

algorithm

for

computing

a

rather

weak

cover.

This

simple

cover

can

be

used

directly

for

code

optimization,

or

it

can

serve

as

input

to

a

more

powerful

method

for

symbolic

evaluation

presented

in

[RL],

[R2].

From

a

data

structure

called

a

global

value

graph

(which

is

the

use-definition

chains

of

[AU],

[Sc]

used

to

represent

the

flow

of

values

through

a

program),

the

algorithm

of

[RL],

JR2]

constructs

a

cover

which

yields

better

approximate

birthpoints

than

does

the

simple

cover.

This

algorithm

runs

in

time

almost

linear

in

the

size

of

the

input

global

value

graph,

which

is

very

compact

when

constructed

from

the

simple

cover

[RL],

[R2].

In

order

to

define

the

simple

cover

we

need

one

more

concept.

A

variable

X

is

definition-free

between

distinct

vertices

u

and

v

if

no

u-avoiding

path

from

a

successor

of

u

to

a

predecessor

of

v

contains

a

definition

of

X.

By

convention

any

program

variable

X

is

definition-free

between

v

and

v

for

any

vertex

v.

For

any

entry

variable

X

which

is

a

text

expression,

the

simple

origin

of

X

is

the

minimal

vertex

u

(with

respect

84

J.

H.

REIF

AND

R.

E.

TAR

JAN

to

the

dominator

relation)

such

that

X

is

definition-free

between

u

and

v.

In

the

example

of

Fig.

1,

X

2

has

a

simple

origin

r,

and

Yo2

and

Z

:

have

simple

origin

Vl.

If

X

has

simple

origin

u

v,

then

on

any

execution

of

rr

the

program

variable

X

has

the

same

value

on

entry

to

v

as

it

did

after

the

most

recent

execution

of

u;

we

take

the

simple

origin

as

an

approximation

to

the

birthpoint

of

X

.

We

recursively

define

the

simple

cover

using

simple

origins.

If

contains

no

entry

variables

then

(t)=

t.

Otherwise

we

form

(t)

from

by

applying

the

following

transformation.

(i)

Repeat

the

following

step

for

all

entry

variables

X

occurring

in

t:

Let

u

be

the

simple

origin

of

X

.

If

u

v

do

nothing.

Otherwise

replace

X

in

by

((X,

u))

if

X

is

defined

at

u

or

by

X

if

X

is

not

defined

at

u.

(ii)

Reduce

the

resulting

expression.

Our

algorithm

for

computing

the

simple

cover

consists

of

three

parts,

described

in

2-4

of

this

paper.

First,

we

determine

for

each

vertex

v

the

set

of

program

variables

defined

between

the

immediate

dominator

of

v

and

v

itself.

We

call

this

set

of

variables

idef

(v).

The

idef

computation

can

be

regarded

as

a

path

problem

of

the

kind

studied

in

[GW],

IT3],

but

another

approach

is

more

fruitful:

a

straightforward

modification

of

the

dominator-finding

algorithm

of

[LT]

computes

idef

in

O(ma(m,

n)+

l)

time,

assum-

ing

that

logical

bit

vector

operations

on

vectors

of

length

IEI

have

unit

cost,

where

is

the

length

of

the

program

text

and

a

is

an

inverse

of

Ackermann’s

function

IT2].

Second,

we

use

idef

to

compute

the

simple

origins

of

all

entry

variables

appearing

as

text

expressions.

This

computation

requires

a

variable-length

shift

operation

on

bit

vectors

(shift

left

to

the

first

nonzero

bit)

and

requires

O(n

+

l)

time.

Third,

we

construct

a

directed

acyclic

graph

representing

the

simple

cover

(we

save

space

by

combining

common

subexpressions).

This

algorithm

also

requires

O(n

+

l)

time

but

uses

no

bit

vector

operations.

The

total

running

time

of

our

algorithm

is

thus

O(ma(m,

n)

+

l)

if

extended

bit

vector

operations

require

constant

time.

2.

An

algorithm

for

computing

idef

based

on

finding

dominators.

In

this

section

we

shall

describe

an

algorithm

for

computing

idef

(v)

for

all

vertices

v

V

in

the

flow

graph

G

(V,

E,

r)

of

a

computer

program.

We

obtain

the

algorithm

by

adding

appropriate

extra

steps

to

the

dominators

algorithm

of

[LT],

and

we

shall

assume

that

the

reader

is

familiar

with

[LT].

Our

algorithm

requires

def

(w)

{XIX

is

defined

at

w}

for

each

vertex

w

V

as

input

and

uses

set

union

as

a

basic

operation.

If

each

subset

of

is

represented

as

a

bit

vector

of

length

]l,

then

a

set

union

is

equivalent

to

an

"or"

operation

on

bit

vectors;

we

shall

assume

each

set

union

requires

constant

time.

Construction

of

def

(w)

for

all

vertices

w

is

easy

and

requires

time

proportional

to

the

length

of

the

program

text.

Properties

of

idef.

For

any

vertex

w

r,

let

idom

(w)

be

the

immediate

dominator

of

w

in

G.

For

w

r,

we

define

idef(w)

{def

(v)l

there

is

a

nonempty

path

from

v

to

w

which

avoids

idom

(w)}.

Note

that

def

(w)

is

a

term

in

the

union

defining

idef

(w)

if

and

only

if

there

is

a

cycle

containing

w

but

avoiding

idom(w).

To

compute

idom

and

idef,

we

first

perform

a

depth-first

search

on

G,

starting

from

vertex

r

and

numbering

the

vertices

from

1

to

n

as

they

are

reached

during

the

search.

The

search

generates

a

spanning

tree

T

rooted

at

r,

with

vertices

numbered

in

preorder

IT1].

For

convenience

in

stating

our

results,

we

shall

assume

in

this

subsection

that

all

vertices

are

identified

by

:

+

number,

and

we

shall

use

-->,

,

-->

to

denote

ancestor-descendant

relationships

in

T

(see

the

appendix).

SYMBOLIC

PROGRAM

ANALYSIS

IN

ALMOST-LINEAR

TIME

85

FIG.

3.

Depth-first

search

of

the

flow

graph

given

in

Fig.

1.

Solid

edges

denote

tree

edges

and

dotted

edges

denote

nontree

edges.

The

depth-first

search

number

is

given

to

the

right

of

each

vertex.

vertex

number

idom

sdom

def

idef

sdef

vl

2

{Y}

{Y,Z}

v2

3

vl

Vx

{Z}

Q

v3

4

Y,

Z}

3

v4

5

vt

{X}

{Y,Z}

vs

6

v

3

FIG.

4.

Tabulation

of

information

calculated

]’or

the

program

flow

graph

given

in

Fig.

1.

The

following

paths

lemma

is

an

important

property

of

depth-first

search

and

is

crucial

to

the

correctness

of

our

algorithm.

LEMMA

2.1

[T1].

If

v

and

w

are

vertices

of

G

such

that

v

<-

w,

then

any

path

from

v

to

w

must

contain

a

common

ancestor

of

v

and

w

in

T.

As

an

intermediate

step,

the

dominators

algorithm

computes

a

value

for

each

vertex

w

#

r

called

its

semi-dominator,

denoted

sdom

(w)

and

defined

by

()

(2)

sdom

(w)

min

{vlthere

is

a

path

v

Vo,

vl,

,

vk

w

such

that

vi

>

w

for

1

-<

<

k}.

We

shall

in

addition

compute

a

value

sdef

(w)

for

each

vertex

w

#

r

defined

by

sdef

(w)

U

{def

(v)[

there

is

a

nonempty

path

v

v0,

vl,.

,

vk

w

such

that

vi

>=

w

for

0

<-

<=

k}.

The

following

properties

of

semi-dominators

and

dominators

justify

the

domina-

tors

algorithm.

q-

LEMMA

2.2

[LT].

Let

w

r.

Then

idom

(w)*->

sdom

(w)-->

w.

THEOREM

2.1

[LT].

For

any

vertex

w

r,

(3)

sdom

(w)

min

({vl(v,

w)

E

and

v

<

w}

U

{sdom

(u)lu

>

w

and

there

is

an

edge

(v,

w)

such

thatu

v}).

Symbolic Program Analysis in Almost-Linear Time

Citations

Efficiently computing static single assignment form and the control dependence graph

Constant propagation with conditional branches

Global value numbers and redundant computations

An efficient method of computing static single assignment form

Detecting equality of variables in programs

References

A unified approach to global program optimization

A fast algorithm for finding dominators in a flowgraph

Variations on the Common Subexpression Problem

Applications of Path Compression on Balanced Trees

A Unified Approach to Path Problems

Related Papers (5)

Global value numbers and redundant computations

Compilers: Principles, Techniques, and Tools

The program dependence graph and its use in optimization

Efficiently computing static single assignment form and the control dependence graph

A unified approach to global program optimization