scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Automatic generation of fast optimizing code generators

01 Jun 1988-Vol. 23, Iss: 7, pp 79-84
TL;DR: A system that accepts compact specifications of an intermediate code and target machine and produces program code for an integrated code generator and peephole optimizer, which obviates most inter-phase communication costs.
Abstract: This paper describes a system that accepts compact specifications of an intermediate code and target machine and produces program code for an integrated code generator and peephole optimizer. A compiler for most of C uses this packa.ge. It emits code comparable to PCCI’S, but it runs over five times faster on preliminary benchmarks. This compiler also runs over twice as fast as a version of pcc2 with a hand-coded, VAX-specific code generator. The code generators are produced as follows. A programmer describes a naive code generator by means of a non-procedural specification. The programmer also prepares a machine description for a retargetable peephole optimizer [2]. These two systems are used together to compile a testbed, and the compiler records each peephole optimization as it is made. This record and the specification of the naive code generator are compiled into a fast, integrated code generator and optimizer. This production code generator then takes the place of the slower “training” version. The production code generator and optimizer are integrated to the point that the code to be generated is communicated from one to the other by encoding it in the program counter, which obviates most inter-phase communication costs. Interpretive peephole optimizers have been driven by traces from retargetable peephole optimizers [3] and integrated with interpretive code generators [4], but the current work is distinguished by the production of a hard-coded, optimizing code generator. Historically, retargetable code generators (i.e., those not largely rewritten for each new machine)

Summary (2 min read)

Introduction

  • This paper describes a system that accepts compact specifications of an intermediate code and target machine and produces program code for an integrated code generator and peephole optimizer.
  • The code generators are produced as follows.
  • The programmer also prepares a machine description for a retargetable peephole optimizer [2] .
  • And notice is given that copying is by permission of the Association for Computing Machinery.
  • Requires a fee and/ or specific permissmn.

Representation

  • Both the training and production code generators accept the same input -an "abstract syntax dag" built by the front end.
  • The front end has propagated types and folded them into the opcodes (e.g. the I prefix flags integer opcodes) so that the back end need not understand t,he frout end's type system, which is typically more complex than the back end's.
  • On the VAX, for example, the subtree rooted at the ISUB above is ultimately replaced with the instruction sub13 -c,,r,r4, and the rest of the tree is replaced with clrl -up+4*7 Cr41.
  • The compiler has not yet accommodated full C, but the size of the table may be estimated.
  • The bindings for the pattern variables %O and %I are never stored in this node because they are available (after register assignment) in the children's vars fields.

Specifying the Code Generator

  • Here are a few lines from the specification that defines the int,ermediate code and the naive VAX code generator:.
  • Opcodes GLOBAL and moval -%O, r%l are leaves, and the remaining opcodes above are binary.
  • The presence of a second number indicates that a register must be allocated to hold the target instruction's result.
  • If the intermediate code uses a constant field -in the examples above, GLOBAL needs the name of a global variable and ILT needs a label number -the front end stores it in the appropriate pattern variable.
  • The automatically generated code generators do the rest.

The Training Code Generator

  • The 3 Initially, the code generator uses only those opcodes that appeared in the specification of the naive code generator, so the initial opcode list holds exactly the two columns from the specification.
  • This case analysis takes the form of an if-then-else chain that may edit the dag and jump off to the case that handles the new opcode.
  • The goto L37 above is really omitted.
  • This results in redundant assignments to the opcode field when rewrite re-encounters a multiply-referenced node that has been previously traversed and rewritten, but moving the assignment saves more than it sacrifices.
  • These arrays are needed by only the register allots tor and output routine, which need to know where to store register names and how many children to traverse.

The Peephole Optimizer and Trace

  • The training routine combine is a retargetable peephole optimizer.
  • It then searches the machine description for an instruction with this combined effect.
  • If the value produced by an instruction is used several times, its cost is divided equally between its users.
  • A full review of this technique is beyond the scope of this paper, but Reference 2 elaborates.
  • The last line above reports that the result register of the new instruction is to be bound to xl.

The Production Code Generator

  • To produce the production system, the code generator generator accepts the trace above and the specification of the naive code generator.
  • It produces an optimizing code generator that is like the naive one presented above, except the opcode list is extended to include all the new instruction variants generated during training, optimizing case analysis is inserted at the head of each case that handles a target instruction, and the call on combine is omitted.
  • It uses b-Wars CO] because %I is the first pattern variable of b that requires local storage.
  • If no optimization applies, control falls off the chain of ifs into code that updates a->op and returns.
  • Case analysis like that above could be generated without training on a testbed.

Discussion

  • Two emerging compilers use the techniques above.
  • One uses a modified peel as a front end and has largely complete back ends for the VAX and the MC68020.
  • The interface between its front end and generated code generators is somewhat less efficient than that shown above.
  • At present, this compiler runs in about 55% of the time taken by peel.
  • In a typical run, Thus rewrite currently takes less than 1% of the time taken by peel.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Automatic Generation of Fast Optimizing Code Generators
Christopher W. Fraser
AT&T Bell Laboratories
Murray Hill, NJ 07974
Alan L. Wendt
Department
of
Computer Science
University of
Arizona
Tucson, AZ 85721
Introduction
This paper describes a system that accepts compact
specifications of an intermediate code and target
machine and produces program code for an inte-
grated code generator and peephole optimizer. A
compiler for most of C uses this packa.ge. It emits
code comparable to PCCI’S, but it runs over five
times faster on preliminary benchmarks. This com-
piler also runs over twice as fast as a version of pcc2
with a hand-coded, VAX-specific code generator.
The code generators are produced as follows. A
programmer describes a naive code generator by
means of a non-procedural specification. The pro-
grammer also prepares a machine description for a
retargetable peephole optimizer [2]. These two sys-
tems are used together to compile a testbed, and the
compiler records each peephole optimization as it is
made. This record and the specification of the naive
code generator are compiled into a fast, integrated
code generator and optimizer. This production code
generator then takes the place of the slower “train-
ing” version. The production code generator and
optimizer are integrated to the point that the code
to be generated is communicated from one to the
other by encoding it in the program counter, which
obviates most inter-phase communication costs.
Interpretive peephole optimizers have been driv-
en by traces from retargetable peephole optimizers
[3] and integrated with interpretive code generators
[4], but the current work is distinguished by the
production of a hard-coded, optimizing code gener-
ator. Historically, retargetable code generators (i.e.,
those not largely rewritten for each new machine)
Permission to copy without fee all or part of this material is granted provided
that the copies are not made or distributed for direct commercial advantage.
the ACM copyright notice and the title of the publicatmn and its date appear.
and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise. or to republish. requires a fee and/
or specific permissmn.
0 1988 ACM O-8979 I-269-1/88/0006/0079 $1.50 r
Language Design and Implementation
Atlanta, Georgia, June 22-24. 1988
have applied a fixed, compile-time interpreter to ta-
bles t1la.t have been automatically generated from
formal specifications [S]. The code generators de-
scribed below interpret no tables, which helps them
run fast.
Representation
Both the training and production code generators
accept the same input - an “abstract syntax dag”
built by the front end. They use dags rather than
trees to accommodate source language features that
implicitly reuse values (like C’s auto-increment and
augmented and multiple assignment) as well as
front ends that eliminate common subexpressions
as they create nodes. Front ends may confine them-
selves to trees if the source language permits and if
common subexpression elimination is not desired.
The front end compiles, for example, the C state-
ment up [r-c+73 =0 into a tree annotated with inter-
mediate code:
ISET + ICONST 0
1
IADD + GLOBAL up
1
IMUL * ICONST 4
1
IADD ---+ ICONST 7
1
ISUB - IDEREF + GLOBAL c
1
IDEREF
1
GLOBAL r
The front end has propagated types and folded
them into the opcodes (e.g. the I prefix flags integer
opcodes) so that the back end need not understand
t,he frout end’s type system, which is typically more
complex than the back end’s. The front end has
79

ato exposed the multiplication implicit in array in-
dexing, so it needs the sizes and alignments of the
basic datatypes, but these are easily isolated in a
small table.
The code generators rewrite dag nodes in place,
replacing the intermediate code with naive and then
optimized assembly code. In the example above,
each node is first rewritten with a single instruc-
tion and then combined with one or more of its de-
scendants via peephole optimization. On the VAX,
for example, the subtree rooted at the ISUB above
is ultimately replaced with the instruction sub13
-c,,r,r4, and the rest of the tree is replaced with
clrl -up+4*7 Cr41. That is, the final tree is:
clrl -up+4*7 Cr41
I
sub13 -c,J-,r4
The clrl occupies the node originally occupied by
the ISET, and the sub13 occupies the node orig-
inally occupied by the ISUB. The actual register
assignment for temporaries (like r4 above) is not
needed during code generation and optimization, so
this task is postponed until these phases complete.
Since the same nodes represent intermediate and
assembly code, the code generator needs one rep-
resentation for both.
Assembly code is text, so
intermediate opcodes are also represented as text.
To avoid the necessity of creating new strings at
compile time, the system abstracts constants, iden-
tifiers, and register numbers out of the text. For
example, the instruction sub13 r2,r3,r4 is rep-
resented with the “skeleton” sub13 r%i ,r%O,r%2
plus bindings for the “pattern variables” xi. The
system enumerates all useful skeletons during train-
ing and stores them in a table. Opcodes are thus
represented as indices into this string table. The
compiler has not yet accommodated full C, but the
size of the table may be estimated. A production C
compiler generated over 26,000 instructions for an
ll,OOO-line testbed, but used fewer than 900 distinct
instruction variants. Intermediate codes and target
instructions that are always optimized out might
increase this figure somewhat, but even so the table
should not exceed 40kb even on the VAX, because
the average skeleton takes less than 25 bytes, in-
cluding four bytes for the pointer to each.
For nodes with n children, the first n pattern
variables denote the result registers of the children,
and bindings for the rest are stored locally. For
example, the instruction sub13 r2,r3,r4 is repre-
sented as a node with the following fields:
op = 39 where opcode[39] =
“sub13 r%l , r%O, rX2”
kids CO] = pointer to first child
kids [I] = pointer to second child
vars CO1 = “4”
The bindings for the pattern variables %O and %I are
never stored in this node because they are available
(after register assignment) in the children’s vars
fields. Pattern variable %2 is stored in vars CO] be-
cause it is the first (and only) pattern variable that
needs local storage; this cell is empty until registers
are assigned.
Specifying the Code Generator
Here are a few lines from the specification that de-
fines the int,ermediate code and the naive VAX code
generator:
%shape 0 1
GLOBAL moval -%O,r%l
ishape 2 2
IADD add13 r%l,r%O,r%2
ISUB sub13 r%l,r%O,r%2
ishape 2
ILT cmpl r%O,r%l; jlss L%2
ISET
movl r%i, (r%O)
. . .
Except for the %shape directives, this specification
forms two columns. The first lists the intermediate
code’s opcodes, and the second gives equivalent but
naive assembly code. Thus the intermediate code
IADD is to be replaced with the VAX skeleton add13
r%l,r%O ,r%2, and the intermediate code ILT (for
“integer less-than”) is to be replaced with the in-
structions cmpl r%O,r%l and jlss L%2.
The %shape directives describe features shared
by the opcodes that follow. Each lists one or two
numbers. The first number specifies the number
of children of subsequent opcodes.
For example,
opcodes GLOBAL and moval -%O, r%l are leaves, and
the remaining opcodes above are binary.
The presence of a second number indicates that
a register must be allocated to hold the target in-
struction’s result. The number specifies the pattern
variable to which the index of the register must
be bound. For example, moval -%O,r%l needs a
register allocated and bound to %l, opcodes add13
r%i,r%O ,r%2 and sub13 r%l ,r%O,r%2 need a reg-
ister allocated and bound to %2, and the remaining
instructions above need no result register at all.
80

When building an abstract syntax dag, the front
end sets the opcode fields using values from the first
column. If the intermediate code uses a constant
field - in the examples above, GLOBAL needs the
name of a global variable and ILT needs a label
number - the front end stores it in the appropriate
pattern variable. The automatically generated code
generators do the rest.
The compiler using these code generators is not
yet complete, but it appears that a naive code gen-
erator for, say, ANSI C will require about three
pages of lines like those above. The register alloca-
tor is retargeted by changing a table if the machine
uses general registers; as with most retargetable
code generators, machines with asymmetric regis-
ter sets may require some recoding.
The Training Code Generator
The specification above is automatically compiled
into a training code generator, whose general out-
lines appear below:
char *opcode[MAXOPSl = (
. . .
/* 36 */ IADD”,
/* 37 */ “add13 r%l,r%O.r%2”
/* 38 */ “ISUB”,
/* 39 */ “sub13 r%l ,r%O ,r%2”
rewrite(a)
register struct node *a;
c
switch (a->op) C
* . .
case 36: L36: /* IADD */
rewrite(a->kids CO1 > ;
rewriteca->kids
Cl1 > ;
a->op = 37;
got0 L37;
case 37: L37:
/* add13 r%l,r%O,r%2 */
(optimizing case analysis to go here)
break;
. . .
3
combine (a) ; (only in training version)
3
Initially, the code generator uses only those opcodes
that appeared in the specification of the naive code
generator, so the initial opcode list holds exactly
the two columns from the specification.
The routine rewrite is the automatically gener-
ated, integrated code generator and optimizer. It
accepts a pointer to a dag decorated with the sim-
ple intermediate code, and it rewrites the dag in
place to represent optimized assembly code. The
string opcodes are recoded as a range of contigu-
ous integers primarily so that rewrite can decode
them with an efficient switch statement. Each op-
code has a distinct case that rewrites its particular
opcode and jumps off to the case that handles the
new opcode just introduced into the dag.
Cases for intermediate codes recursively rewrite
any children, then change the node’s opcode field
to represent the specification’s naive target instruc-
tion, and finally jump to the case for that target in-
struction. The training code generator has no com-
piled code to improve these instructions, so their
cases break out of the switch and call combine,
which is a retargetable peephole optimizer [2].
The production code generator replaces the call
on combine with hard-coded case analysis in the
cases for target instructions. This case analysis
takes the form of an if-then-else chain that may edit
the dag and jump off to the case that handles the
new opcode. An example is presented in due course.
While the code is most easily introduced in the
form above, it is actually optimized slightly. The
code generator generator does not emit redundant
branches, so some cases fall into their successor.
(Recall that C cases exit only on an explicit break.)
For example, the goto L37 above is really omitted.
Also, the pattern above would have the pro-
duction code generator’s case analysis overwrite
a->op (sometimes more than once) before leaving
the switch statement. rewrite reads this field only
upon entry, so it can be safely out-of-date until the
break. Thus the code generator slides the assign-
ment to a->op down just before the break, which
guarantees that each invocation of rewrite sets it
exactly once. In a sense, the program counter en-
codes the proper value for the opcode field while
control remains inside the switch statement. This
results in redundant assignments to the opcode field
when rewrite re-encounters a multiply-referenced
node that has been previously traversed and rewrit-
ten, but moving the assignment saves more than it
sacrifices.
Two arrays not shown parallel the opcode array.
They record for each opcode the number of chil-
dren and the number of the pattern variable that
denotes any result register. rewrite does not need
81

these arrays because their values are compiled into
the code; for example, the IADD case has the proper
number of recursive calls compiled in, so it need ex-
amine no table to learn how many children it has.
These arrays are needed by only the register allots
tor and output routine, which need to know where
to store register names and how many children to
traverse. Flags in the nodes (namely, zeros in the
first unused slots in kids and vms) were used ini-
tially but rejected because maintaining them cost
almost as much as maintaining the useful data.
The Peephole Optimizer and Trace
The training routine combine is a retargetable peep-
hole optimizer. A programmer captures the se-
mantics of the target machine’s instructions in a
bi-directional grammar for translation between as-
sembly language and register transfers. A machine-
independent optimizer uses this machine descrip-
tion to translate pairs and triples of assembler skele-
tons to register transfer skeletons, which it sym-
bolically simulates to learn their combined effect.
It then searches the machine description for an in-
struction with this combined effect. If it finds one
whose cost does not exceed the cost of the original
instructions, it rewrites the dag to use the new in-
struction. If the value produced by an instruction
is used several times, its cost is divided equally be-
tween its users. A full review of this technique is be-
yond the scope of this paper, but Reference 2 elab-
orates. The current implementation adds instruc-
tion costs and machine descriptions re-engineered
so that, for example, the current, nearly complete
VAX description takes only 59 lines.
During training, the optimizer records every op-
timization. For example, when it replaces moval
-%O,r%l and movl (r%O) ,r%l with
movl -%O
,r%l
(the moval is the first child of the movl, so the for-
mer’s result register, r%i, is denoted by r%O in the
latter), the optimizer adds the following record to
its growing optimization trace:
self==movl (r%O),r%l
kidO==moval -%O,r%l
new=movl ,%O,r%l
refs<=l
aO=bO
ai=ai
result=1
The first three lines are self-explanatory. The fourth
reports that, according to the cost metric in the ma
chine description, the optimization pays off only if
the child is referenced just once. The next two lines
note that the new instruction’s %O is the old child’s
x0, and the new instruction’s %I is the old parent’s
Xi. The last line above reports that the result reg-
ister of the new instruction is to be bound to xl.
The specification of the code generator names the
pattern variable corresponding to the result register
for each naive instruction, but the new instruction
above has not been seen before, so the optimizer
must infer and report the pattern variable corre-
sponding to its result register.
The Production Code Generator
To produce the production system, the code gen-
erator generator accepts the trace above and the
specification of the naive code generator. It pro-
duces an optimizing code generator that is like the
naive one presented
above,
except the opcode list is
extended to include all the new instruction variants
generated during training, optimizing case analysis
is inserted at the head of each case that handles a
target instruction, and the call on combine is omit-
ted. Here, for example, are the production versions
of the cases presented above:
case 36: L36: /* IADD */
rewrite(a->kids CO1 ) ;
rewriteca->kids Cl1 > ;
case 37: L37: /* add13 r%l,r%O,r%2 */
b = a->kids CO1 ;
if (
b->op ==
127 /* mull3
$%l,r%O,r%2 */
&& b->vars[Ol == CON4
> (
a->kids CO1 = b-Bkids CO1 ;
got0 L93; /* moval (r%l) MO1 ,r%2 */
3
if ( . . .
a->op = 37 ;
break ;
The conditional looks for a sequence that multi-
plies a register by four and adds it to another reg-
ister. The expression b-Wars CO] == CON4 com-
pares the %I from
mull3
$%i ,r%O,r%2 with the
constant string “4”. It uses b-Wars CO] because %I
is the first pattern variable of b that requires local
storage. Strings are stored uniquely in a constant
table so that an address comparison can be substi-
tuted for what would otherwise be a character-by-
character comparison. If the conditional succeeds,
the dag is rewritten in place, so the “then” arm
overwrit,es a’s fields. In this case, the new values
of %I and X2 are the same as the old ones, so only
82

the change to %O requires code, which promotes a
grandchild.
If the conditional fails, the code generator looks
for another pattern, at the point of the ellipsis
above. If no optimization applies, control falls off
the chain of ifs into code that updates a->op and
returns.
In the optimization above, the new instruction
costs
no
more than the one originally pointed to
by a, so the replacement pays off regardless of the
number of uses of b. When the new instruction
costs more than a, the replacement generally pays
off
when
a + b/n 1 c, where n is the number of uses
of b, and a, b, and c denote the costs of a,
b,
and the
new instruction, respectively. All but n are known
when the compiler is generated, so the code gen-
erator generator computes the largest n for which
the replacement pays off and inserts a clause like
b->count <=
2
in the optimization’s enabling con-
dition (e.g. after the comparison with CON4 above).
Different cost metrics (like space, expected time,
worst-case time) yield different comparands.
To support such comparisons, the code genera-
tor maintains reference counts as it edits the dag.
Consider the example above. It edits the dag so
that a references b->kids[O] instead of
b.
Thus
it is necessary to decrement b->count. If the re-
sult is zero, then all reference counts are correct:
node b is vanishing, but a inherits b’s references to
its children, so these children have the same num-
ber of references before and after the edit. But if
--b->connt exceeds zero, then b is referenced else-
where. It still references its children, and now a will
too, so the reference counts for b’s children must be
incremented. Thus the actual then-clause above is
if (--b->count)
++b->kids CO] ->count ;
a->kids CO] = b->kids CO] ;
got0 L93;
/* moval Ml) Cr%Ol ,r%2 */
In
cases where b points to a leaf, the counts are
maintained with just
--b->count.
And in cases
where the optimization’s enabling condition es-
tablishes that b->count was one, then even the
--b->count is omitted.
Node storage is not reclaimed above because even
the simplest implementation consumed almost as
much time as the case analysis itself. The compiler
thus allocates nodes from a fixed pool and then frees
the entire pool at once at the end of the expression,
block, or procedure. (All three of these compilation
units have been used with this system.)
The case analysis above is close to typical. An
“average” one has two comparisons, two assign-
ments, and a simple
--b->count. A few perform
no
assignments at all, because all important fields
are already in the right place. Of course, an assign-
ment to a->op occurs just before control leaves the
switch.
The code generator is fast.
a
and b are in regis-
ters, so each line above takes just one or two VAX
instructions, and the entire fragment takes just 17.
It has not yet been possible to compile a thorough
testbed, but it appears that a complete rewrite
should not require more than 60kb.
It is also possible to eliminate most of the jumps
above. Rather than ending a change with goto
Ln, the code generator generator could simply place
case n and its code at the point of the goto. Since
most labels are the target
of
exactly one goto, most
of the branches would vanish. This optimization is
performed by some existing compilers.
Case analysis like that above could be generated
without training on a testbed. The trace encodes
simple peephole optimization rules, and there ex-
ist mechanisms for enumerating such rules without
training on a testbed [6, 71. These mechanisms are
immune to training failures, which can cause the
production system to emit code that is sub-optimal
(but never incorrect). Experiments have shown that
training failures are rare [3], and training does have
advantages. It allows the production system to test
only rules known to have been useful, and it al-
lows the code generator generator to sort if-then-
else chains so that the most common patterns are
tested first.
The compiler above gets all of its optimizations
from a record of replacements made by
a
retar-
getable peephole optimizer, but it could easily ac-
cept rewriting rules from other sources a well. The
system has already been adapted to accept hand-
written optimization rules, and it is a natural client
for rules discovered by exhaustive enumeration [$I.
Discussion
Two emerging compilers use the techniques above.
One uses a modified peel as a front end and has
largely complete back ends for the VAX and the
MC68020. The interface between its front end and
generated code generators is somewhat less efficient
than that shown above. At present, this compiler
runs in about 55% of the time taken by peel. The
other compiler uses a new front end and precisely
83

Citations
More filters
Journal ArticleDOI
TL;DR: This article presents a framework for combining constant propagation, value numbering, and unreachable-code elimination, and shows how to combine two such frameworks and how to reason about the properties of the resulting framework.
Abstract: Modern optimizing compilers use several passes over a program's intermediate representation to generate good code. Many of these optimizations exhibit a phase-ordering problem. Getting the best code may require iterating optimizations until a fixed point is reached. Combining these phases can lead to the discovery of more facts about the program, exposing more opportunities for optimization. This article presents a framework for describing optimizations. It shows how to combine two such frameworks and how to reason about the properties of the resulting framework. The structure of the frame work provides insight into when a combination yields better results. To make the ideas more concrete, this article presents a framework for combining constant propagation, value numbering, and unreachable-code elimination. It is an open question as to what other frameworks can be combined in this way.

173 citations


Cites background from "Automatic generation of fast optimi..."

  • ...These are called peephole optimizations because the compiler looks through a “peephole”, a very small window, into the code [16, 42, 17, 23, 24]....

    [...]

Journal ArticleDOI
TL;DR: A framework that enables the exploration, both analytically and experimentally, of properties of code-improving transformations and a tool that automatically produces a transformer that implements the transformations specified in Gospel is presented.
Abstract: Although code transformations are routinely applied to improve the performance of programs for both scalar and parallel machines, the properties of code-improving transformations are not well understood. In this article we present a framework that enables the exploration, both analytically and experimentally, of properties of code-improving transformations. The major component of the framework is a specification language, Gospel, for expressing the conditions needed to safely apply a transformation and the actions required to change the code to implement the transformation. The framework includes a technique that facilitates an analytical investigation of code-improving transformations using the Gospel specifications. It also contains a tool, Genesis, that automatically produces a transformer that implements the transformations specified in Gospel. We demonstrate the usefulness of the framework by exploring the enabling and disabling properties of transformations. We first present analytical results on the enabling and disabling properties of a set of code transformations, including both traditional and parallelizing transformations, and then describe experimental results showing the types of transformations and the enabling and disabling interactions actually found in a set of programs.

130 citations

Proceedings ArticleDOI
06 Apr 2008
TL;DR: NOLTIS is a near-optimal, linear time instruction selection algorithm for DAG expressions that is easy to implement, fast, and effective with a demonstrated average code size improvement of 5.1% compared to the traditional tree decomposition and tiling approach.
Abstract: Instruction selection is a key component of code generation. High quality instruction selection is of particular importance in the embedded space where complex instruction sets are common and code size is a prime concern. Although instruction selection on tree expressions is a well understood and easily solved problem, instruction selection on directed acyclic graphs is NP-complete. In this paper we present NOLTIS, a near-optimal, linear time instruction selection algorithm for DAG expressions. NOLTIS is easy to implement, fast, and effective with a demonstrated average code size improvement of 5.1% compared to the traditional tree decomposition and tiling approach.

30 citations

Journal ArticleDOI
TL;DR: A program that compiles BURS tables into a combination of hard code and data is described, which is not just faster but also significantly smaller than their predecessors.
Abstract: SUMMARY Code generators based on bottom-up rewrite systems (BURS) are automatically generated from machinedescription grammars. They produce locally optimal code for expression trees, but their tables are large and require compile-time interpretation. This paper describes a program that compiles BURS tables into a combination of hard code and data. Hard-coding exposed important opportunities for compression that were previously hidden in the tables, so the hard-coded code generators are not just faster but also significantly smaller than their predecessors. A VAX code generator takes 21.4Kbytes and identifies optimal assembly code in about 50 VAX instructions per node.

28 citations

Patent
12 Feb 2008
TL;DR: In this article, an efficient binary translator uses peephole translation rules to directly translate executable code from one instruction set to another, using superoptimization techniques that enable the translator to automatically learn translation rules for translating code from the source to target instruction set architecture.
Abstract: An efficient binary translator uses peephole translation rules to directly translate executable code from one instruction set to another. In a preferred embodiment, the translation rules are generated using superoptimization techniques that enable the translator to automatically learn translation rules for translating code from the source to target instruction set architecture.

25 citations

References
More filters
Proceedings ArticleDOI
01 Jun 1984
TL;DR: S o m e r e s e a r c h e r s r e ly heavi ly on the s e m a n t i c c o m p o n e n t s [GaFS~], while o the r s, including ourse lves, have e m p h a s i z e d the use of syn tax.
Abstract: 3. Syntax and S e m a n t i c s A m a j o r i ssue in des igning a G r a h a m Glanville s ty le code g e n e r a t o r is the d e g r e e of s y n t a c t i c or s e m a n t i c spec i f i ca t ion of t h e t a r g e t mach ine . Broad ly speak ing , the " s y n t a c t i c " c o m p o n e n t of the spec i f i ca t ion is the m a c h i n e d e s c r i p t i o n g r a m m a r . The " s e m a n t i c " c o m p o n e n t s a r e the s e m a n t i c a t t r i b u t e s , s e m a n t i c p r e d i c a t e s , s e m a n t i c ac t ions , and eva lua t ion o r d e r c o n s t r a i n t s inf luencing the pa r s ing ac t ions . Any m a c h i n e d e s c r i p t i o n m e t h o d o l o g y is l ikely to use t he s a m e in fo rma t ion , b u t the i n f o r m a t i o n is d e s c r i b e d and c o n s i d e r e d by the p a r s e r in d i f f e ren t ways. S o m e r e s e a r c h e r s r e ly heavi ly on the s e m a n t i c c o m p o n e n t s [GaFS~], while o the r s , including ourse lves , have e m p h a s i z e d the use of syn tax .

30 citations

Proceedings ArticleDOI
01 Jun 1984
TL;DR: This global flow analysis allows optimization across basic blocks of instructions, and the use of tables created at compiler-generation time minimizes the overhead of discovering optimizable instructions.
Abstract: Peep is an architectural description driven peephole optimizer, that is being adapted for use in the Portable Standard Lisp compiler. Tables of optimizable instructions are generated prior to the creation of the compiler from the architectural description of the target machine. Peep then performs global flow analysis on the target machine code and optimizes instructions as defined in the table. This global flow analysis allows optimization across basic blocks of instructions, and the use of tables created at compiler-generation time minimizes the overhead of discovering optimizable instructions.

28 citations

Proceedings ArticleDOI
01 Jul 1986
TL;DR: A compiler with a code generator and machine-directed peephole optimizer that are tightly integrated that helps make the compiler simple, fast, and retargetable.
Abstract: This paper describes a compiler with a code generator and machine-directed peephole optimizer that are tightly integrated. Both functions are performed by a single rule-based rewriting system that matches and replaces patterns. This organization helps make the compiler simple, fast, and retargetable. It also corrects certain phase-ordering problems.

25 citations


"Automatic generation of fast optimi..." refers methods in this paper

  • ...Interpretive peephole optimizers have been driven by traces from retargetable peephole optimizers [3] and integrated with interpretive code generators [4], but the current work is distinguished by the production of a hard-coded, optimizing code generator....

    [...]

Journal ArticleDOI
TL;DR: This paper describes a system that automatically infers rules by tracking the behaviour of a description‐directed optimizer on a testbed, and it adapts a classical optimizer to interpret these rules efficiently.
Abstract: Peephole optimizers that are driven by machine descriptions are generally more thorough but less efficient than their classical rule-directed counterparts. This paper describes a system that addresses this shortcoming. It automatically infers rules by tracking the behaviour of a description-directed optimizer on a testbed, and it adapts a classical optimizer to interpret these rules efficiently. Experiments show that an easily constructed testbed can generate rules similar to those in a large hand-written rulebase. This software forms part of a compiler that simplifies retargeting by substituting peephole optimization for case analysis.

20 citations


"Automatic generation of fast optimi..." refers background or methods in this paper

  • ...Experiments have shown that training failures are rare [3], and training does have advantages....

    [...]

  • ...Interpretive peephole optimizers have been driven by traces from retargetable peephole optimizers [3] and integrated with interpretive code generators [4], but the current work is distinguished by the production of a hard-coded, optimizing code generator....

    [...]

Proceedings ArticleDOI
01 Jul 1986
TL;DR: A compiler construction tool that automates much of the case analysis necessary to exploit special purpose instructions on a target machine is designed and built, and a working prototype of the instruction set analyzer needed in the framework outlined by [Giegerich 83].
Abstract: I have designed and built a compiler construction tool that automates much of the case analysis necessary to exploit special purpose instructions on a target machine. Given a suitable description of the target machine, my analysis identifies instruction sequences that are equivalent to single instructions. During code generation, these equivalences can be used to avoid inefficient instruction sequences in favor of more efficient instructions.I present a working prototype of the instruction set analyzer needed in the framework outlined by [Giegerich 83]. In contrast to the work presented in [Davidson and Fraser 80, 84], I analyze machine descriptions during compiler construction, rather than analyzing instruction sequences that occur during code generation. [R Kessler 84] describes a system which analyzes machine descriptions during compiler construction, but which which is limited to discovering instructions that are equivalent to instruction sequences of length 2. The techniques presented here can identify instruction sequences of arbitrary length that are equivalent to single instructions.I have applied this analysis to the descriptions of two machines, and used the results to replace hand-written case analysis routines in an otherwise table-driven code generator [Henry 84].

18 citations


"Automatic generation of fast optimi..." refers methods in this paper

  • ...Pennello has described a technique for replacing an LR parsing table and its interpreter with equivalent optimized assembly code [ 9 ]....

    [...]

Frequently Asked Questions (1)
Q1. What have the authors contributed in "Automatic generation of fast optimizing code generators" ?

This paper describes a system that accepts compact specifications of an intermediate code and target machine and produces program code for an integrated code generator and peephole optimizer. The code generators are produced as follows.