What are the contributions mentioned in the paper "Code generation using tree matching and dynamic programming" ?

Twig this paper is a tree-manipulation language for code generation that combines a fast top-down tree-pattern matching algorithm with dynamic programming.

(Open Access) Code generation using tree matching and dynamic programming (1989) | Susan L. Graham

Code Generation Using Tree Matching and

Dynamic Programming

ALFRED V. AH0

AT&T Bell Laboratories

MAHADEVAN GANAPATHI

Stanford University

and

STEVEN W. K. TJIANG

AT&T Bell Laboratories

Compiler-component generators, such as lexical analyzer generators and parser generators, have long

been used to facilitate the construction of compilers. A tree-manipulation language called twig has

been developed to help construct efficient code generators. Twig transforms a tree-translation scheme

into a code generator that combines a fast top-down tree-pattern matching algorithm with dynamic

programming. Twig has been used to specify and construct code generators for several experimental

compilers targeted for different machines.

Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors-code genera-

tion, compilers, optimization compiler generators; F.2.2 [Analysis of Algorithms and Problem

Complexity]: Nonnumerical Algorithms and Problems-pattern matching; F.4.2 [Mathematical

Logic and Formal Languages]: Grammars and Other Rewriting Systems-parallel rewriting

systems

General Terms: Algorithms

Additional Key Words and Phrases: Code generation, code generator-generator, code optimization,

dynamic programming, pattern matching

1. INTRODUCTION

Research in code generation has yielded theoretical insights and practical tech-

niques [7, 21, 371. On the theoretical front, efficient algorithms for generating

provably optimal code on broad classes of uniform-register machines have been

developed for expressions with no common subexpressions [3,40]. However, once

common subexpressions are encountered or optimal code needs to be generated

for machines with irregular architectures, the problem of optimal code generation

Authors’ current addresses: A. V. Aho, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill,

N.J. 07974; M. Ganapathi and S. W. K. Tjiang, Stanford University, Department of Computer

Science, Stanford, CA 94305.

Permission to copy without fee all or part of this material is granted provided that the copies are not

made or distributed for direct commercial advantage, the ACM copyright notice and the title of the

publication and its date appear, and notice is given that copying is by permission of the Association

for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific

permission.

0 1989 ACM 0164-0925/89/1000-0491$01.50

ACM Transactions on Programming Languages and Systems, Vol.

11, No. 4, October 1989, Pages 491-516.

492 l

A. V. Aho, M. Ganapathi, and S. W. K. Tjimg

has been proven to be combinatorially difficult [4, lo], and heuristic techniques

for generating good code have been proposed and. theoretically analyzed [4, 51.

On the experimental front, several innovative approaches to retargetable code

generation have been pursued. These approaches have focused on the use of

table-driven techniques to separate the machi:ne description from the code-

generation algorithm. Compilers based on some of these techniques have been

easily retargeted [ll, 13, 17, 25, 32, 461.

This paper presents a new language called twig that encapsulates some of these

theoretical and experimental advances into a tree-based notation for describing

and implementing code generators. The language builds on the experience of

grammar-based descriptions of code generators. A compiler for twig has been

constructed that combines an efficient tree-pattern matching algorithm along

with a dynamic programming algorithm for optimal code selection. Twig has

been used by the authors to construct several code generators, including one for

the VAX that has been incorporated into the pcc2 compiler [32] and one for

the MIPS-X project [12]. Twig has also been us?d by A. W. Appel to construct

code generators for the VAX and the Motorola 68020 [9]. In addition to producing

traditional code generators for compilers, twig can be used as a tool for creating

tree-rewriting and tree-manipulation programs. In this vein, K. Keutzer and W.

Wolf have used twig to construct a standard-cell synthesizer for VLSI circuits

[33, 341.

2. CODE GENERATION BY TREE REWRITING

Simply speaking, a compiler consists of a from; end that analyzes the source

program and transforms it into an intermediate :representation (IR), and a back

end that transforms the IR into the target program [7]. Many factors are involved

in choosing an appropriate IR, but in most cases the IR is some encoding of a

graphical representation of the source program. In this paper, it is sufficient to

assume the IR is a sequence of trees at the semantic level of the target machine

as in [ 18, 23, 291.

Figure 1 shows an IR tree for an assignment statement a [ i I : = b in which a

and i are locals, stored on the stack, whose run-time addresses are given as

offsets, const, and consti, from a stack pointer stored in register SP. The

leaves in the tree are type attributes with subscripts; the subscript indicates the

value of the attribute.

The assignment to a [ i ] is an indirect assignment in which the contents of

the location for a [i 1 are set to the r-value of the global b. The address of the

first element of the array a is found by adding the value const, to the contents

of register SP; the value of i is in the location obtained by adding the value

consti to the contents of register SP.

In the tree, the ind operator makes its argument a memory address. As the

left child of an assignment operator, the ind node gives the location into which

the r-value on the right side of the assignment operator is to be stored. If an

argument of a + or ind operator is a memory :.ocation or a register, then the

contents of that memory location or register are .;aken as the value.

For code generation, the target-machine instructions can be represented by

tree-rewriting rules, consisting of a replacement node, a tree template, a cost,

ACM Transactions on Programming Languages and Systems, Vol. 11, No. 4, October 1989.

Code Generation Using Tree Matching and Dynamic Programming

493

/ \

ind global,

/ \

/+\

ind

const, reg,,

/+\

const, regsp

Fig. 1.

Intermediate-code tree for a [ i ] : = b .

and an action. The target code is generated by a process in which each IR tree is

reduced into a single node by repeatedly finding subtrees in the IR tree that

match templates and rewriting the matched subtrees by the corresponding

replacement nodes. The sequence of subtrees rewritten in this process is called a

cover of the IR tree. The target code is emitted by the actions associated with

the rules used in the cover, and the total cost is the sum of the costs of the

covering rules.

To be more precise, a tree-rewriting rule is a statement of the form

replacement t template (cost) = {action)

where

(1) replacement is a single node,

(2) template is a tree,

(3) cost is a code fragment that computes the cost associated with this template,

and

(4) action is a code fragment.

A set of tree-rewriting rules is called a tree-translation scheme.

A tree-translation scheme is a convenient way to represent the instruction-

selection phase of code generation. Each tree template represents a computation

performed by one or more target machine instructions. The leaves of a template

are attributes with subscripts, as in the IR tree. Often, certain restrictions apply

to the values of the subscripts in the templates. For example, a constant may be

required to fall in a certain range. These restrictions can be specified as semantic

predicates in the cost function or the action, and these predicates must be

satisfied before a template can match a subtree of the IR tree. Register allocation

is done by the user-specified actions.

As an example of a tree-rewriting rule, consider the rule for a register-to-

ADD Rj, Ri:

regi t

/+\

regi

regj

If the IR tree contains a subtree that matches this tree template, that is, a subtree

whose root is labeled by the operator + and whose left and right children are

quantities in registers i andj, then we might replace that subtree by a single node

ACM Transactions on Programming Languages and Systems, Vol. 11, No. 4, October 1989.

494 l

A. V. Aho, M. Ganapathi, and S. W. K. Tjlmg

Table I. Tree-Rewriting Rules for Some Tarj:et-Machine Instructions

Rewrite rule zest Instruction

reg;cconst,

MOV #c,Ri

(2)

reg, c mem.

MOVa,Ri

(3) x + :=

mem,

reck

(4) x c :=

ind

globalb

-3

2 + cost. rc?g,

MOV Ri,a

2 + cost. rc,g,

MOV b,* Ri

(5)

regi

ind

2 + cost. rc!gj

MOVc(Rj), Ri

/ \

const,

req;

(6)

regi c

/+\

2+cost.rc:g,+cost.regj ADD c(Rj),Ri

reg,

ind

/+\

const,

regj

(7)

r-3, +

/+\

l+cost.rfgi+cost.regj

ADD Rj, Ri

r-3:

r-3,

(8)

reg, +

/+\

l+cost.reg;

INCRi

regi

const,

labeled regi simulating the execution of the instruction

ADD Rj, Ri.

If more than

one template can match a subtree or a portion thereof, then dynamic program-

ming is used to determine a minimum-cost cover.

Table I contains tree-rewriting rules for a fely instructions for a VAX-like

target machine. Instead of showing the code for l;he actions, we have shown the

machine instruction that is generated by each rule. The first two rules correspond

to load instructions, the next two to store instructions, and the remainder to

indexed loads and additions. Note that rule (8) requires the value of the constant

to be 1. This condition can be enforced by a semantic predicate in the cost.

A tree-translation scheme generates code from .ln IR tree in the following way.

All templates in the tree-rewriting rules are matched against the subtrees of the

IR tree during a depth-first traversal of the tree. fI each node, the costs are used

to determine the best match, and the selected su’Dtree is replaced in the IR tree

by the associated replacement node. Sometimes the replacement is delayed until

the cost of another larger including match can be evaluated. By this process a

minimum-cost cover for the IR tree is found.

Then a second depth-first traversal of the original IR tree is made and the

actions associated with the rules used in the cover are executed. If an action

ACM

Transactions on Programming Languages and Systems,Vol. ll,No.4, October 1989.

Code Generation Using Tree Matching and Dynamic Programming 495

emits a sequence of target-machine instructions, the instructions become part of

the output. The sequence of machine instructions thus generated constitutes the

output of the tree-translation scheme.

To illustrate, let us use the tree-translation scheme in Table I to process the

IR tree in Figure 1. The template of the first rule

regocconst,

matches the leftmost leaf of the IR tree with i = 0 and c = a. If we use this rule,

the label of the left-most leaf is changed from cons t, to r eg,, and during

the second traversal the instruction MOV #a, RO will be generated to load the

constant a into register RO. The template of the seventh rule with i = 0 and

j=SP

rego +

/+\

rego regsp

now matches the leftmost subtree with root labeled +. Using this rule, we would

rewrite this subtree into a single node labeled r ego and later generate the

instruction ADD SP, RO. Now the tree looks like

/ \

ind

global,,

/+\

rech

ind

/+\

const,

regsp

At this point, we could apply rule (5) to reduce the subtree

ind

/+\

const,

regsp

to a single node labeled reg, . However, we can also use rule (6) to reduce the

larger subtree

/+\

rego

ind

/+\

const,

regsp

into a single node labeled reg, and later generate the instruction ADD i ( SP ) ,

RO. Assuming it is more efficient to use a single instruction to compute the larger

ACM Transactions on Programming Languages and Systems, Vol. 11, No. 4, October 1989.

Code generation using tree matching and dynamic programming

Citations

Precise interprocedural dataflow analysis via graph reachability

Specification of Graph Translators with Triple Graph Grammars

Baring it all to software: Raw machines

Algorithms for finding patterns in strings

Comparing multiple RNA secondary structures using tree comparisons

References

Compilers: Principles, Techniques, and Tools

Efficient string matching: an aid to bibliographic search

DAGON: Technology Binding and Local Optimization by DAG Matching

Pattern Matching in Trees

The Generation of Optimal Code for Arithmetic Expressions

Related Papers (5)

Engineering a simple, efficient code-generator generator

BURG: fast optimal instruction selection and tree parsing

Optimal Code Generation for Expression Trees

Compilers: Principles, Techniques, and Tools

Pattern Matching in Trees

Frequently Asked Questions (1)

Q1. What are the contributions mentioned in the paper "Code generation using tree matching and dynamic programming" ?