scispace - formally typeset
Open AccessJournal ArticleDOI

Code generation using tree matching and dynamic programming

Susan L. Graham
- 01 Oct 1989 - 
- Vol. 11, Iss: 4, pp 491-516
TLDR
A tree-manipulation language called twig has been developed to help construct efficient code generators that combines a fast top-down tree-pattern matching algorithm with dynamic programming.
Abstract
Compiler-component generators, such as lexical analyzer generators and parser generators, have long been used to facilitate the construction of compilers. A tree-manipulation language called twig has been developed to help construct efficient code generators. Twig transforms a tree-translation scheme into a code generator that combines a fast top-down tree-pattern matching algorithm with dynamic programming. Twig has been used to specify and construct code generators for several experimental compilers targeted for different machines.

read more

Content maybe subject to copyright    Report

Code Generation Using Tree Matching and
Dynamic Programming
ALFRED V. AH0
AT&T Bell Laboratories
MAHADEVAN GANAPATHI
Stanford University
and
STEVEN W. K. TJIANG
AT&T Bell Laboratories
Compiler-component generators, such as lexical analyzer generators and parser generators, have long
been used to facilitate the construction of compilers. A tree-manipulation language called twig has
been developed to help construct efficient code generators. Twig transforms a tree-translation scheme
into a code generator that combines a fast top-down tree-pattern matching algorithm with dynamic
programming. Twig has been used to specify and construct code generators for several experimental
compilers targeted for different machines.
Categories and Subject Descriptors: D.3.4 [Programming Languages]: Processors-code genera-
tion, compilers, optimization compiler generators; F.2.2 [Analysis of Algorithms and Problem
Complexity]: Nonnumerical Algorithms and Problems-pattern matching; F.4.2 [Mathematical
Logic and Formal Languages]: Grammars and Other Rewriting Systems-parallel rewriting
systems
General Terms: Algorithms
Additional Key Words and Phrases: Code generation, code generator-generator, code optimization,
dynamic programming, pattern matching
1. INTRODUCTION
Research in code generation has yielded theoretical insights and practical tech-
niques [7, 21, 371. On the theoretical front, efficient algorithms for generating
provably optimal code on broad classes of uniform-register machines have been
developed for expressions with no common subexpressions [3,40]. However, once
common subexpressions are encountered or optimal code needs to be generated
for machines with irregular architectures, the problem of optimal code generation
Authors’ current addresses: A. V. Aho, AT&T Bell Laboratories, 600 Mountain Avenue, Murray Hill,
N.J. 07974; M. Ganapathi and S. W. K. Tjiang, Stanford University, Department of Computer
Science, Stanford, CA 94305.
Permission to copy without fee all or part of this material is granted provided that the copies are not
made or distributed for direct commercial advantage, the ACM copyright notice and the title of the
publication and its date appear, and notice is given that copying is by permission of the Association
for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific
permission.
0 1989 ACM 0164-0925/89/1000-0491$01.50
ACM Transactions on Programming Languages and Systems, Vol.
11, No. 4, October 1989, Pages 491-516.

492 l
A. V. Aho, M. Ganapathi, and S. W. K. Tjimg
has been proven to be combinatorially difficult [4, lo], and heuristic techniques
for generating good code have been proposed and. theoretically analyzed [4, 51.
On the experimental front, several innovative approaches to retargetable code
generation have been pursued. These approaches have focused on the use of
table-driven techniques to separate the machi:ne description from the code-
generation algorithm. Compilers based on some of these techniques have been
easily retargeted [ll, 13, 17, 25, 32, 461.
This paper presents a new language called twig that encapsulates some of these
theoretical and experimental advances into a tree-based notation for describing
and implementing code generators. The language builds on the experience of
grammar-based descriptions of code generators. A compiler for twig has been
constructed that combines an efficient tree-pattern matching algorithm along
with a dynamic programming algorithm for optimal code selection. Twig has
been used by the authors to construct several code generators, including one for
the VAX that has been incorporated into the pcc2 compiler [32] and one for
the MIPS-X project [12]. Twig has also been us?d by A. W. Appel to construct
code generators for the VAX and the Motorola 68020 [9]. In addition to producing
traditional code generators for compilers, twig can be used as a tool for creating
tree-rewriting and tree-manipulation programs. In this vein, K. Keutzer and W.
Wolf have used twig to construct a standard-cell synthesizer for VLSI circuits
[33, 341.
2. CODE GENERATION BY TREE REWRITING
Simply speaking, a compiler consists of a from; end that analyzes the source
program and transforms it into an intermediate :representation (IR), and a back
end that transforms the IR into the target program [7]. Many factors are involved
in choosing an appropriate IR, but in most cases the IR is some encoding of a
graphical representation of the source program. In this paper, it is sufficient to
assume the IR is a sequence of trees at the semantic level of the target machine
as in [ 18, 23, 291.
Figure 1 shows an IR tree for an assignment statement a [ i I : = b in which a
and i are locals, stored on the stack, whose run-time addresses are given as
offsets, const, and consti, from a stack pointer stored in register SP. The
leaves in the tree are type attributes with subscripts; the subscript indicates the
value of the attribute.
The assignment to a [ i ] is an indirect assignment in which the contents of
the location for a [i 1 are set to the r-value of the global b. The address of the
first element of the array a is found by adding the value const, to the contents
of register SP; the value of i is in the location obtained by adding the value
consti to the contents of register SP.
In the tree, the ind operator makes its argument a memory address. As the
left child of an assignment operator, the ind node gives the location into which
the r-value on the right side of the assignment operator is to be stored. If an
argument of a + or ind operator is a memory :.ocation or a register, then the
contents of that memory location or register are .;aken as the value.
For code generation, the target-machine instructions can be represented by
tree-rewriting rules, consisting of a replacement node, a tree template, a cost,
ACM Transactions on Programming Languages and Systems, Vol. 11, No. 4, October 1989.

Code Generation Using Tree Matching and Dynamic Programming
l
493
:=
/ \
ind global,
+
/ \
/+\
ind
I
const, reg,,
/+\
const, regsp
Fig. 1.
Intermediate-code tree for a [ i ] : = b .
and an action. The target code is generated by a process in which each IR tree is
reduced into a single node by repeatedly finding subtrees in the IR tree that
match templates and rewriting the matched subtrees by the corresponding
replacement nodes. The sequence of subtrees rewritten in this process is called a
cover of the IR tree. The target code is emitted by the actions associated with
the rules used in the cover, and the total cost is the sum of the costs of the
covering rules.
To be more precise, a tree-rewriting rule is a statement of the form
replacement t template (cost) = {action)
where
(1) replacement is a single node,
(2) template is a tree,
(3) cost is a code fragment that computes the cost associated with this template,
and
(4) action is a code fragment.
A set of tree-rewriting rules is called a tree-translation scheme.
A tree-translation scheme is a convenient way to represent the instruction-
selection phase of code generation. Each tree template represents a computation
performed by one or more target machine instructions. The leaves of a template
are attributes with subscripts, as in the IR tree. Often, certain restrictions apply
to the values of the subscripts in the templates. For example, a constant may be
required to fall in a certain range. These restrictions can be specified as semantic
predicates in the cost function or the action, and these predicates must be
satisfied before a template can match a subtree of the IR tree. Register allocation
is done by the user-specified actions.
As an example of a tree-rewriting rule, consider the rule for a register-to-
register add instruction,
ADD Rj, Ri:
regi t
/+\
regi
regj
If the IR tree contains a subtree that matches this tree template, that is, a subtree
whose root is labeled by the operator + and whose left and right children are
quantities in registers i andj, then we might replace that subtree by a single node
ACM Transactions on Programming Languages and Systems, Vol. 11, No. 4, October 1989.

494 l
A. V. Aho, M. Ganapathi, and S. W. K. Tjlmg
Table I. Tree-Rewriting Rules for Some Tarj:et-Machine Instructions
Rewrite rule zest Instruction
0)
reg;cconst,
2
MOV #c,Ri
(2)
reg, c mem.
2
MOVa,Ri
(3) x + :=
/\
mem,
reck
(4) x c :=
/\
ind
globalb
I
-3
2 + cost. rc?g,
MOV Ri,a
2 + cost. rc,g,
MOV b,* Ri
(5)
regi
c
ind
2 + cost. rc!gj
MOVc(Rj), Ri
I
+
/ \
const,
req;
(6)
regi c
/+\
2+cost.rc:g,+cost.regj ADD c(Rj),Ri
reg,
ind
I
/+\
const,
regj
(7)
r-3, +
/+\
l+cost.rfgi+cost.regj
ADD Rj, Ri
r-3:
r-3,
(8)
reg, +
/+\
l+cost.reg;
INCRi
regi
const,
labeled regi simulating the execution of the instruction
ADD Rj, Ri.
If more than
one template can match a subtree or a portion thereof, then dynamic program-
ming is used to determine a minimum-cost cover.
Table I contains tree-rewriting rules for a fely instructions for a VAX-like
target machine. Instead of showing the code for l;he actions, we have shown the
machine instruction that is generated by each rule. The first two rules correspond
to load instructions, the next two to store instructions, and the remainder to
indexed loads and additions. Note that rule (8) requires the value of the constant
to be 1. This condition can be enforced by a semantic predicate in the cost.
A tree-translation scheme generates code from .ln IR tree in the following way.
All templates in the tree-rewriting rules are matched against the subtrees of the
IR tree during a depth-first traversal of the tree. fI each node, the costs are used
to determine the best match, and the selected su’Dtree is replaced in the IR tree
by the associated replacement node. Sometimes the replacement is delayed until
the cost of another larger including match can be evaluated. By this process a
minimum-cost cover for the IR tree is found.
Then a second depth-first traversal of the original IR tree is made and the
actions associated with the rules used in the cover are executed. If an action
ACM
Transactions on Programming Languages and Systems,Vol. ll,No.4, October 1989.

Code Generation Using Tree Matching and Dynamic Programming 495
emits a sequence of target-machine instructions, the instructions become part of
the output. The sequence of machine instructions thus generated constitutes the
output of the tree-translation scheme.
To illustrate, let us use the tree-translation scheme in Table I to process the
IR tree in Figure 1. The template of the first rule
regocconst,
matches the leftmost leaf of the IR tree with i = 0 and c = a. If we use this rule,
the label of the left-most leaf is changed from cons t, to r eg,, and during
the second traversal the instruction MOV #a, RO will be generated to load the
constant a into register RO. The template of the seventh rule with i = 0 and
j=SP
rego +
/+\
rego regsp
now matches the leftmost subtree with root labeled +. Using this rule, we would
rewrite this subtree into a single node labeled r ego and later generate the
instruction ADD SP, RO. Now the tree looks like
:=
/ \
ind
global,,
I
/+\
rech
ind
I
/+\
const,
regsp
At this point, we could apply rule (5) to reduce the subtree
ind
I
/+\
const,
regsp
to a single node labeled reg, . However, we can also use rule (6) to reduce the
larger subtree
/+\
rego
ind
I
/+\
const,
regsp
into a single node labeled reg, and later generate the instruction ADD i ( SP ) ,
RO. Assuming it is more efficient to use a single instruction to compute the larger
ACM Transactions on Programming Languages and Systems, Vol. 11, No. 4, October 1989.

Citations
More filters
Proceedings ArticleDOI

Precise interprocedural dataflow analysis via graph reachability

TL;DR: The paper shows how a large class of interprocedural dataflow-analysis problems can be solved precisely in polynomial time by transforming them into a special kind of graph-reachability problem.
Book ChapterDOI

Specification of Graph Translators with Triple Graph Grammars

TL;DR: Triple graph grammars are intended to fill the gap and to support the specification of interdependencies between graph-like data structures on a very high level.
Journal ArticleDOI

Baring it all to software: Raw machines

TL;DR: The most radical of the architectures that appear in this issue are Raw processors-highly parallel architectures with hundreds of very simple processors coupled to a small portion of the on-chip memory, allowing synthesis of complex operations directly in configured hardware.
Book ChapterDOI

Algorithms for finding patterns in strings

TL;DR: This chapter discusses the algorithms for solving string-matching problems that have proven useful for text-editing and text-processing applications and several innovative, theoretically interesting algorithms have been devised that run significantly faster than the obvious brute-force method.
Journal ArticleDOI

Comparing multiple RNA secondary structures using tree comparisons

TL;DR: This paper presents another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O[T1?] for average trees representing secondary structures.
References
More filters
Book

Compilers: Principles, Techniques, and Tools

TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Proceedings ArticleDOI

DAGON: Technology Binding and Local Optimization by DAG Matching

TL;DR: A solution to the problem of technology binding in terms of matching patterns, describing technology specific cells and optimizations, against a technology independent circuit represented as a directed acyclic graph is offered in DAGON.
Journal ArticleDOI

Pattern Matching in Trees

TL;DR: Five new techniques for tree pattern matching are presented, analyzed for time and space complexity, and compared with previously known methods.
Journal ArticleDOI

The Generation of Optimal Code for Arithmetic Expressions

TL;DR: It is shown that the algorithms presented here also minimize the number of storage references in the evaluation, and they are shown to take the shortest possible number of instructions.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions mentioned in the paper "Code generation using tree matching and dynamic programming" ?

Twig this paper is a tree-manipulation language for code generation that combines a fast top-down tree-pattern matching algorithm with dynamic programming.