scispace - formally typeset
Open AccessJournal ArticleDOI

Engineering a simple, efficient code-generator generator

Reads0
Chats0
TLDR
This paper describes a simple program that generates matchers that are fast, compact, and easy to understand and run up to 25 times faster than Twig's matchers.
Abstract
Many code-generator generators use tree pattern matching and dynamic programming This paper describes a simple program that generates matchers that are fast, compact, and easy to understand It is simpler than common alternatives: 200–700 lines of Icon or 950 lines of C versus 3000 lines of C for Twig and 5000 for burg Its matchers run up to 25 times faster than Twig's They are necessarily slower than burg's BURS (bottom-up rewrite system) matchers, but they are more flexible and still practical

read more

Content maybe subject to copyright    Report

Engineering a Simple, Efficient
Code-Generator Generator
CHRISTOPHER W, FRASER
AT& TBell Laboratories
DAVID R. HANSON
Princeton University
and
TODD A. PROEBSTING
The University of Arizona
Many code-generator generators use tree pattern matching and dynamic programming. This
paper describes a simple program that generates matchers that are fast, compact, and easy to
understand. It is simpler than common alternatives: 200–700 lines of Icon or 950 lines of C
versus 3000 lines of C for Twig and 5000 for
burg. Its matchers run up to 25 times faster than
Twig’s, They are necessarily slower than
burg’s BURS (bottom-up rewrite system) matchers, but
they are more flexible and still practical.
Categories and Subject Descriptors: D.3.4 [Programming
Languages]: Processors—code gener-
ation; compilers; translator writing systems and compiler generators
General Terms: Languages
Additional Key Words and Phrases: Code generation, code-generator generator, dynamic pro-
gramming, Icon programming language, tree pattern matching
1. INTRODUCTION
Many code-generator generators use tree pattern matching and dynamic
programming (DP) [2, 4, 8]. They accept tree patterns and associated costs,
and semantic actions that, for example, allocate registers and emit object
code. They produce tree matchers that make two passes over each subject
tree. The first pass is bottom up and finds a set of patterns that cover the tree
with minimum cost. The second pass executes the semantic actions associated
with minimum-cost patterns at the nodes they matched. Code-generator
generators based on this model include BEG [7], Twig [3], and burg [ 13].
Authors’ addresses: C. W. Fraser, AT&T Bell Laboratories, 600 Mountain Avenue 2C-464,
Murray Hill, NJ 07974-0636; D. R. Hanson, Department of Computer Science, Princeton Univer-
sity, Princeton, NJ 08544; T. A. Proebsting, Department of Computer Science, The University of
Arizona, Tucson, AZ 85721.
Permission to copy without fee all or part of this material is granted provided that the copies are
not made or distributed for direct commercial advantage, the ACM copyright notice and the title
of the publication and its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or
specific permission.
@ 1992 ACM 1057-4514/92/0900-0213 $01.50
ACM Letters on Programming Languages and Systems, Vol. 1, No. 3, September 1992, Pages 213-226,

214 .
C.
W. Fraser et al.
BEG matchers are hard-coded and mirror the tree patterns in the same
way that recursive-descent parsers mirror their input grammars. They use
DP at compile time to identify a minimum-cost cover.
Twig matchers use a table-driven variant of string matching [ 1, 15] that, in
essence, identifies all possible matches at the same time. This algorithm is
asymptotically better than trying each possible match one at a time, but
overhead is higher. Like BEG matchers, Twig matchers use DP at compile
time to identify a minimum-cost cover.
burg uses BURS (bottom-up rewrite system) theory [5,6, 17, 18] to move
the DP to compile-compile time. BURS table generation is more complicated,
but BURS matchers generate optimal code in constant time per node. The
main disadvantage of BURS is that costs must be constants; systems that
delay DP until compile time permit costs to involve arbitrary computations.
This paper describes a program called iburg that reads a burg specifica-
tion and writes a matcher that does DP at compile time. The matcher is
hard-coded, a technique that has proved effective with other types of code
generators [9, 12]. iburg was built to test early versions of what evolved into
burg’s specification language and interface, but it is useful in its own right
because it is simpler and thus easier for novices to understand, because it
allows dynamic cost computation, and because it admits a larger class of tree
grammars [16]. iburg has been used with good results in a first course on
compilers. burg and iburg have been used also to produce robust VAX,
MIPS, and SPARC code generators for lCC, a retargetable compiler for ANSI
c [11].
iburg and BEG produce similar matchers, but this paper describes them in
more detail than the standard BEG reference [7]. In particular, it describes
several optimizations that paid off and two that did not, and it quantifies the
strengths and weaknesses of such programs when compared with programs
like Twig and burg.
2. SPECIFICATIONS
Figure 1 shows an extended BNF grammar for burg and iburg specifica-
tions. Grammar symbols are displayed in italic type, and terminal symbols
are displayed in typewriter type. {X} denotes zero or more instances of X,
and [X] denotes an optional X. Specifications consist of declarations, a %%
separator, and rules. The declarations declare terminals—the operators in
subject trees—and associate a unique, positive external symbol number with
each one. Nonterminals are declared by their presence on the left side of
rules. The %start declaration optionally declares a nonterminal as the start
symbol. In Figure 1 term and nonterm denote identifiers that are terminals
and nonterminals, respectively.
Rules define tree patterns in a fully parenthesized prefix form. Every
nonterminal denotes a tree. Each operator has a fixed arity, which is inferred
from the rules in which it is used. A
chain rule is a rule whose pattern is
another nonterminal. If no start symbol is declared, the nonterminal defined
by the first rule is used.
ACM Letters on Programming Languages and Systems, Vol. 1, No. 3, September 1992.

A Simple, Ellicient Code-Generator Generator .
215
grammar
+ {dcl}~~ {rule)
dcl
-+
%start nonterm
I %terrn { identifier= integer}
rule
+
nonterm :
tree = integer [ cost ] ;
cost
+ ( integer )
tree
+ term ( tree , tree )
I term ( tree)
\
term
\ nonterm
Fig. 1. Extended BNF grammar for burg and iburg specifications.
1.
2.
3.
4,
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
%term ADDI=309 ADDRLP=295 ASGNI=53
%term CNSTI=21 CVCI=85 101=661 INDIRC=67
TL
stint: ASGNI(disp,reg) = 4 (i);
stint: reg = 5;
reg: ADDI(reg,rc) = 6 (l);
reg:
CVCI(INDIRC(disp)) = 7 (l);
reg: 101 = 8;
reg:
disp = 9 (l);
disp: ADDI(reg,con) = 10;
disp: ADDRLP = 11;
rc:
con = 12;
rc: reg = 13;
con: CNSTI = 14;
con: 101 = 15:
Fig.2. Sample burg specification.
Each rule has aunique, positive external rule number, which comes after
the pattern andis preceded by an equal sign.As described below, external
rule numbers are used to report the matching ruleto auser-supplied seman-
tic action routine. Rules end with an optional nonnegative, integer cost;
omitted costs defaultto zero.
Figure 2 shows a fragment of a burg specification for the VAX. This
example uses uppercase for terminals and lowercase for nonterminals. Lines
1-2 declare the operators and their external symbol numbers, and lines 4-15
give the rules. The external rule numbers correspond to the line numbers to
simplify interpreting subsequent figures. In practice, these numbers are
usually generated by a preprocessor that accepts a richer form of specification
(e.g., including YACC-style semantic actions) and that emits a burg specifica-
tion [13]. Only the rules on lines 4, 6, 7, and 9 have nonzero costs. The rules
on lines 5, 9, 12, and 13 are chain rules.
ACM Letters on Programming Languages and Systems, Vol. 1, No. 3, September 1992.

216 .
C. W. Fraser et al,
The operators in Figure 2 are some of the operators in lCC’S intermediate
language [10]. The operator names are formed by concatenating a generic
operator name with a one-character type suffix like C, I, or P, which denote
character, integer, and pointer operations, respectively. The operators used in
Figure 2 denote integer addition (ADDI), forming the address of a local
variable (ADDRLP), integer assignment (ASGNI), an integer constant
(CNSTI), “widening” a character to an integer (CVCI), the integer O(101),
and fetching a character (INDIRC). The rules show that ADDI and ASGNI
are binary; CVCI and INDIRC are unary; and ADDRLP, CNSTI, and 101
are leaves.
3.
MATCHING
Both versions of burg generate functions that the client calls to label and
reduce subject trees. The labeling function, label(p), makes a bottom-up,
left-to-right pass over the subject tree p computing the rules that cover the
tree with the minimum cost, if there is such a cover. Each node is labeled
with (M, C) to indicate that “the pattern associated with external rule M
matches the node with cost C’.”
Figure 3 shows the intermediate language tree for the assignment expres-
sion in the following C fragment:
{inti; charc; i=c +4;}
The left child of the ASGNI node computes the address of i. The right child
computes the address of C, fetches the character, widens it to an integer, and
adds 4 to the widened value, which the ASGNI assigns to i.
The other annotations in Figure 3 shows the results of labeling. (M, C)
denote labels from matches, and [M, C] denote labels from chain rules. The
rule from Figure 2 denoted by each M is also shown. Each C sums the costs
of the nonterminals on the right-hand side and the cost of the relevant
pattern or chain rule. For example, the pattern in line 11 of Figure 2 matches
the node ADDRLP i with cost O, so the node is labeled with(11, O). Since this
pattern denotes a disp, the chain rule in line 9 applies with a cost of O for
matching a disp plus 1 for the chain rule itself. Likewise, the chain rules in
lines 5 and 13 apply because the chain rule in line 9 denotes a reg.
Patterns can specify subtrees beyond the immediate children. For example,
the pattern in line 7 of Figure 2 refers to the grandchild of the CVCI node. No
separate pattern matches the INDIRC node, but line 7’s pattern covers that
node. The cost is the cost of matching the ADDRLP i as a disp, which is rule
11, plus 1.
Nodes are annotated with (M, C) only if C is less than all previous
matches for the nonterminal on the left-hand side of rule M. For example, the
ADDI node matches the disp pattern in line 10 of Figure 2, which means
that it also matches all rules with disp alone on the right-hand side, namely,
line 9. By transitivity, it also matches the chain rules in lines 5 and 13. But
all three of these chain rules yield cost 2, which is not better than previous
matches for those nonterminals.
Once labeled, a subject tree is reduced by traversing it from the top down
ACM Letters on Programming Languages and Systems, Vol. 1, No 3, September 1992.

A Simple, Efficient Code-Generator Generator .
217
disp: ADDRLP (11, o)
(4, 0+2+1=3) stint:ASGNI(disp, reg)
resf:
disp [9, 0+1=1]
ASGNI
(6, 1W+1=2) re9: ADDI (reir, rc)
stint: reg [5, I*II
stint: zeg
rc:
reg [13, 1*1]
rc: reg
(10, ItO+O=l) disp: ADDI(re9, con)
reg: C!VCI(INDIRC(disP) ) (7, 0+1=1)
stint: reg [5, 1*1]
(14, o)
con: CNSTI
rc: reg
[13, lW=l]
[12,0+0=0] rc: cm
INDIRC
A
(11, o)
disp : ADDRLP
[9, 0+1=1]
reg: disp
ADDRLP C
[5, lM=ll stint: reg
[13, 1+0=1] m: rw
Fig. 3. Intermediate language tree for i = c + 4.
and by performing appropriate semantic actions, such as generating and
emitting code. Reducers are supplied by clients, but burg generates functions
that assist in these traversals, for example, one function that returns M and
another that identifies subtrees for recursive visits. Reference [13] elaborates.
burg does all DP at compile-compile time and annotates each node with a
single, integral state number, which encodes all of the information concern-
ing matches and costs. iburg does the DP at compile time and annotates
nodes with data equivalent to (M, C). Its “state numbers” are really pointers
to records that hold these data.
Both versions of burg generate an implementation of label that accesses
node fields via client-supplied macros or functions and uses the nonrecursive
function state to identify matches:
int label(NODEPTR_TYPE p) {
if (p) {
int 1 = label(LEFT_CHILD(p));
int r = label(RIGHT.CHILD(p));
return STATE_LABEL(p) = state(OP_LABEL(p), 1, r);
} else
return O;
}
NODEPTR.TYPE is a typedef or macro that defines the data type of
nodes. OP_LABEL, LEFT. CHILD, and RIGHT_ CHILD are macros or
functions that return, respectively, a node’s external symbol number, its left
child, and its right child. STATE_LABEL is a macro that accesses a node’s
state number field.
state accepts an external symbol number for a node and the state numbers
for the node’s left and right children. It returns the state number to assign to
that node. For unary operators and leaves, it ignores the last one or two
arguments, respectively.
ACM Letters on Programming Langnages and Systems, Vol. 1, No. 3, September 1992.

Citations
More filters
Journal ArticleDOI

SPIRAL: Code Generation for DSP Transforms

TL;DR: SPIRAL generates high-performance code for a broad set of DSP transforms, including the discrete Fourier transform, other trigonometric transforms, filter transforms, and discrete wavelet transforms.
Journal ArticleDOI

The Garp architecture and C compiler

TL;DR: To help investigate the viability of connected FPGA systems, the authors designed their own architecture called Garp and experimented with running applications on it, investigating whether Garp's design enables automatic, fast, effective compilation across a broad range of applications.
Proceedings ArticleDOI

Object race detection

TL;DR: An on-the-fly mechanism that detects access conflicts in executions of multi-threaded Java programs and reduces the runtime overhead of detecting access conflicts to about 16-129% in time and less than 25% in space.
Journal ArticleDOI

Toward an engineering discipline for grammarware

TL;DR: This work identifies the problems with the current grammarware practices, the barriers that currently hamper research, and the promises of an engineering discipline for grammarware, its principles and the research challenges that have to be addressed.
Proceedings ArticleDOI

ISDL: an instruction set description language for retargetability

TL;DR: A tool is written that, given an ISDL description of a processor, automatically generates an assembler for it, and ongoing work includes the development of an automaticcode-generator generator.
References
More filters
Book

Compilers: Principles, Techniques, and Tools

TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
Journal ArticleDOI

Efficient string matching: an aid to bibliographic search

TL;DR: A simple, efficient algorithm to locate all occurrences of any of a finite number of keywords in a string of text that has been used to improve the speed of a library bibliographic search program by a factor of 5 to 10.
Journal ArticleDOI

Pattern Matching in Trees

TL;DR: Five new techniques for tree pattern matching are presented, analyzed for time and space complexity, and compared with previously known methods.
Journal ArticleDOI

Code generation using tree matching and dynamic programming

TL;DR: A tree-manipulation language called twig has been developed to help construct efficient code generators that combines a fast top-down tree-pattern matching algorithm with dynamic programming.
Journal Article

The ICON programming language

TL;DR: This edition of the "Icon bible" covers the new Icon Version 9, which offers many new features and enhancements.