scispace - formally typeset
Open AccessJournal ArticleDOI

Superoptimizer: a look at the smallest program

Henry Massalin
- Vol. 22, Iss: 10, pp 122-126
Reads0
Chats0
TLDR
The superoptimizer as mentioned in this paper is a probabilistic test that makes exhaustive searches practical for programs of useful size, where the search space is defined by the processor's instruction set, which may include the whole set but it is typically restricted to a subset.
Abstract: 
Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size. The search space is defined by the processor's instruction set, which may include the whole set, but it is typically restricted to a subset. By constraining the instructions and observing the effect on the output program, one can gain insight into the design of instruction sets. In addition, superoptimized programs may be used by peephole optimizers to improve the quality of generated code, or by assembly language programmers to improve manually written code.

read more

Content maybe subject to copyright    Report

Superoptimizer -- A Look at the Smallest Program
Henry Massalin
Department of Computer Science
Columbia University
New York, NY 10027
Abstract
Given an instruction set, the superoptimizer finds the shortest
program to compute a function. Startling programs have been
generated, many of them engaging in convoluted bit-fiddling bearing
little resemblance to the source programs which defined the func-
tions. The key idea in the superoptimizer is a probabilistic test that
makes exhaustive searches practical for programs of useful size. The
search space is defined by the processor's instruction set, which may
include the whole set, but it is typically restricted to a subset. By
constraining the instructions and observing the effect on the output
program, one can gain insight into the design of instruction sets. In
addition, superoptimized programs may be used by peephole op-
timizers to improve the quality of generated code, or by assembly
language programmers to improve manually written code.
1. Introduction
The search for the optimal algorithm to compute a function is one of
the fundamental problems in computer science. In contrast to
theoretical studies of optimal algorithms, practical applications
motivated the design, implementation, and use of the superoptimizer.
Instead of proving upper or lower bounds for abstract algorithms, the
superoptimizcr finds the shortest program in the program space
defined by the instruction set of commercial machines, such the
Motorola 68000 or Intei 8086.
The functions to be optimized are specified with programs written
using the target machine's instruction set. Therefore, the input to the
superoptimizer is a machine language program. The output is
another program, which may be shorter. Since both programs run on
the same processor, with a well-defined environment, we can estab-
lish their equivalence.
A probabilistie test and a method for pruning the search tree makes
the superoptimizer a practical tool for programs of limited size
(about 13 machine instructions).
In section 2, we describe an interesting example to illustrate the su-
peroptimizer approach. The design azd algorithms used in the super-
optimizer are detailed in section 3. We discuss the applications and
limitations of the superoptimizer in section 4. In section 5, we corn-
Permission to copy without fee all or part of this material is granted
provided that the copies are not made or distributed for direct commercial
advantage, the ACM copyright notice and the title of the publication and
its date appear, and notice is given that copying is by permission of the
Association for Computing Machinery. To copy otherwise, or to
republish, requires a fee and/or specific permission.
pare the superoptimizer with related work. The conclusion in section
6 is followed by a list of interesting minimal programs in appendix I.
2. An Interesting Example
We begin with an example to show what superoptimized code looks
like. The instruction set used here, as in most of the paper, is
Motorola's 68020 instruction set. Our example is the
signum
func-
tion, defined by the following program:
signum (x)
int x;
{
if(x > 0) return I;
else if(x < 0} return -I;
else return 0;
)
This function compiles to 9 instructions occupying 18 bytes of
memory on the SUN-3 C compiler. Most programmers when asked
to write this function in assembly language would use comparison
instructions and conditional jumps to decide in what range the ar-
gument lies. Typically, this takes 8 68020 instructions, although
clever programmers can do it in 6.
It turns out that by exploiting various properties of two's comple-
ment arithmetic one can write
signum
in four instructions[ This is
what superoptimizer found when fed the compiled machine code for
the signum function as input:
(x in dO)
add.l d0,d0 ladd dO to itself
subx.l dl,dl lsubtract (dl + Carry) from dl
negx.l dO Iput (0 - dO - Carry) into dO
addx.l dl,dl ladd (dl + Carry) to dl
(signum(x) in dl} (4 instructions}
Like a typical superoptimized program, the logic is really con-
voluted. One of the first things that comes to mind is "where are the
conditional jumps?". As we will see later, many functions that
would normally be written with conditional jumps are optimized into
short programs without them. This can result in significant speedups
for certain pipelined machines that execute conditional jumps slowly.
Let us see how it works. The "add.l dO, dO" instruction doubles the
contents of register dO, but more importandy, the sign bit is now in
the carry flag. The "subx.l dl, dl" instruction computes "dl-dl-
carry --> dl". Regardless of the initial value of dl, dl-dl-carry is
-carry. Thus dl is -1 if dO was negative and 0 otherwise. Besides
negating, "negx.i dO" will set the carry flag if and only if dO was
nonzero. Finally, "addx.I dl, dl" doubles dl and adds the carry. Now
if dO was negative, dl is -1 and carry is set, so dl+dl+carry is -1, if
dO was 0, dl is 0 and carry is clear, so d0+d0+carry is 0, if dO was
positive, dl is 0 and carry isset, so dl+dl+carry is I.
© 1987 ACM
0-89791-238-1/87/1000-0122 $00 75
122

3. Superoptimizer Internals
Superoptimizer takes a program written in machine language as the
input source. It finds the shortest program that computes the same
function as the source program by doing an exhaustive search over
all possible programs. The search space is defined by choosing a
subset of the machine's instruction set, and the op-codes of these
instructions are stored in a table. Superoptimizer consults this table
and generates all combinations of these instructions, first of length 1,
then of length 2, and so on. Each of these generated programs is
tested, and if found to match the function of the source program,
superoptimizer prints the program and halts.
Two methods are used to reduce the search time. The first is a fast
probabilistie test for determining the the equivalence of two
programs. The second is a method for pruning the search space while
maintaining the guarantee of optimality. These two methods will
now be discussed, but first a boolean-logic equivalence test will be
explained, which was the first test proceedure implemented, because
it finds use in the tree pruning method.
3.1. Boolean Test
The most important part of superoptimizer is the routine that deter-
mines whether two pieces of code computes the same function. The
first version of superoptimizer used what we call the
boolean
program verifier. The
idea was to express the function output in
terms of boolean-logic operations on the input argument. Once this
is done, two programs are equivalent if their boolean expressions
matches minterm for minterm.
In practice, some instructions such as
add and mul have boolean ex-
pressions with on the order of 2^31 minterms. Various methods had
been devised to reduce the memory requirements, but it took too
long to compute the boolean expressions for every program
generated. The initial version of superoptimizer tested about 40
programs per second, and this allowed programs of up to 3 instruc-
tions to be generated in reasonable time.
One problem introduced by the probabilistie execution test is
machine dependency. The test works only if the instruction set being
searched can be executed on the machine running the super-
optimizer. In other words, if we wish to change the instruction set,
we would have to port the superoptimizer tothe new machine. This
port is not too difficult since the current version of superoptimizer is
rather short (about 300 lines of 68020 assembly code), however it
does require that one translate it into the target assembly code.
3.3.
Pruning
In order to further reduce the search time, we filter out instruction
sequences that are known not to occur in any optimal program. Any
sequence of instructions that has the same effect on the machine state
as a shorter sequence cannot be part of an optimal program, because
if it were, you can get a shorter program by substituting the shorter
sequence, and therefore the program was not optimal. Typical se-
quences include the obviously silly "move X,Y; move X,Y" and
"move X,Y; move Y,X", "and X,Y; move Z,Y" in which the MOVE
destroys the result of the AND, "and #0,X" which does the same
thing as "clr X", and "and X,Y; <any> Z,W; and.l X,Y" where the
second AND is superfluous.
This filtering is done with N-dimensional bit tables, where N is the
length of the longest sequence we wish to filter. Each instruction in
the sequence we wish to test indexes one dimension of the bit table,
and a lookup value of' 1' causes the program to be rejected as non-
optimal (and also as incorrect, since it is the same as a shorter
program, and superoptimizer has already checked all shorter
programs).
There are two ways that these bit tables can be filled. A human can
tell the bit table maker program to exclude all "move X,Y; move
Y,X" sequences. The program then scans all instructions in all
dimensions of the bit matrix and sets the values accordingly. One
can also run superoptimizer with the boolean test, and have it find
the equivalences on its own.
3.2. Probabilistlct Test
The idea behind the probabilistic test is simple: run the machine
code for the program being tested a few times with some set of in-
puts and check whether the outputs match those of the source
program. The idea here is that most programs will fail this simple
test, and a full program verification test will be done only for the few
programs that this test fails to catch. Running thmugh a few care-
fully chosen test vectors takes very little time. Currently, super-
optimizer can test 50000 programs per second and the exhaustive
search approach becomes practical.
The test vectors are chosen (manually) to maximize the probability
that a random program will fail on the first or second test. For ex-
ample, the test vectors for the
signum function included -1000, 0 and
456 as the first three vectors. This quickly eliminates programs that
return the same answer regardless of argument, answers of the same
sign, as well as programs that return their argument. Following these
vectors, all the numbers from -1024 to 1024 were tested.
It was found in practice that a program, has a very low probability of
passing this execution test and failing the boolean verification test.
This fact proves very useful since most programs of interest have
boolean expressions that are too large to fit in memory. We can
dispense with the boolean test and manually inspect the generated
programs for correctness, without having to analyze a large number
of wrong programs. This manual check is not difficult since the
programs are small (about 4 to 13 instructions). Currently, super-
optimizer runs without the boolean check, and the author has yet to
find an incorrect program.
4. Applications and Limitations
4.1. Current Limitations
Even with the pmbabilistie test, the exhaustive search still grows ex-
ponentially with the number of instructions in the generated
program. The current version of superoptimizer has generated
programs 12 instructions long in several hours running time on a
16MHz 68020 computer. Therefore, the superoptimizer has limited
usefulness as a code generator for a compiler.
Another difficulty concerns pointers. A pointer can point anywhere
in memory and so to model a pointer in terms of boolean expressions
one needs to take all of memory into account. Even on a 256-byte
machine, there are 2A(2^(256"8)) possible minterms, and these are
just too many. We have explored the probabilistie test approach for
pointers, but the results have heed inconclusive.
Currently, we have only the 68020 version of the superoptimizer run-
ning the probabilistic test, so the instruction sets are restricted to sub-
sets of the 68020 set. The machine-independent version of super-
optimizer is limited to very short programs.
4.2.
Applications
Because of the pointer problem, superr0pfimizer works best when the
instruction set is constrained to register-register operations. Even so,
it can be used to analyze instruction sets. Some of the programs in
appendix I were tried on the Western Electric WE32000
microprocessor and in every case the resulting program was longer
123

than the 68020 programs. The reason for this was found m be the
lack of an add-with-carry instmction and the fact that the flags are
set according to the 32 bit result, even for byte sized operands, The
National Semiconductor NS32032 was also found to suffer from flag
problems. Here the difficulty is that extra instructions are needed to
test the outcome of an operation because few instructions set the
flags.
Another use would be in the design of RISC architectures. One can
try various instruction sets simply by coding their function in terms
of boolean expressions and seeing what superoptimizer comes up
with. A particular instruction may be omitted if superoptimizer finds
a short equivalent sequence of other instructions.
The superoptimizer may be very useful in optimizing little tasks that
often confront a compiler. An example is finding the optimal
program that multiplies by a particular constant for use in accessing
arrays and such. Some examples of multiplication by constants can
be found in 1.6.
Another useful feature of superoptimizer is the identity tables con-
taining the equivalent program sequences found. These programs
may be extracted and used to increase the power of a conventional
peephole optimizer.
In practice, the best use of superoptimizer has been as an aid to the
assembly language programmer. An experienced programmer can
use superoptimizer to come up with nifty equivalent sequences for
small sections of his code, while retaining the overall logical flow
that makes a program maintainable. This method has been used by
the author (along with another program that optimizes code emulat-
ing state machines) to write the C library function prino ~ in only 500
bytes.
5. Comparison with
Related Work
The most commonly used optimization techniques are those that at-
tempt to improve the code that a compiler produces. Examples are
peephole optimizers and data-flow analysis. Peephole optimizers
[2] are table driven pattern matchers that operate on the assembly
language code produced by the compiler. Every time a sequence of
instmctions is matched by one of the tables, a smaller and faster
replacement sequence is used.
Data-flow analysis [1] is a technique applied during the semantic and
code generation phases of the compilation process. It improves code
in several ways. First, it eliminates redundant computations
(common sub-expression elimination). Second, it moves expressions
within a loop whose values do not depend on the loop variable to
outside the loop (loop invariance). Third, (also in a loop) it converts
expressions of the form 'K * loop-index' into the equivalent arith-
metic progression 'TMP ffi TMP + K' (strength reduction).
These methods are general. They work regardless of the machine-
specific details such as the representation of an integer. However,
usually the result is not optimal in either space or speed. Super-
optimizer depends on the instruction set, however, the code is
guaranteed to be optimal in space and it does a very good job in
speed as well.
Kmmme and Ackley [4] have written a code generator for the
DEC-10 computer that is based on exhaustive search. Their method
translates each interior node of an expression tree into several viable
instruction sequences. These sequences are then pieced together to
form a set of translations for the entire expression. This set is then
searched to find the cheapest alternative.
In their method, there is a one to one correspondence between the
instructions in the translation and the original expression. For ex-
ample, if there's an add in the expression, there will also be an add
somewhere in the generated code. Superoptimizer has a more global
view of the problem. It 'translates' one sequence of instructions into
another completely different sequence. On the other hand, super-
optimizer can't translate large programs.
The two approaches can be seen as complementing each other. Su-
peroptimizer can be used to prepare the code generation tables used
m Krumme and Ackley's method. Their method can also be incor-
porated into superoptimizer to increase the size of programs that can
be handled. Superoptimizer can generate several short equivalent
sequences for small fragments of the source program, and then
Krumme and Aekley's method would be used to piece these together
and find a short overall sequence.
Kessler [3] has written a code optimization tool, which translates se-
quences of instructions into one single instruction. The super-
optimizer can be seen as a more general tool with broader applica-
tions, since it can transform programs of many instructions to
another one of several instructions. However, Kessler's optimizer
works regardless of program size, and therefore can be easily used to
optimize compiled code. Another difference is that he uses template
matching, while supemptimizer relies on exhaustive search.
6.
Conclusion
We have taken a practical approach to the search for the optimal
program. We have found that the shortest programs are surprising,
often containing sequences of instructions that one would not expect
to see side by side. The signum function is an example of this, and
the min and max functions given in section 1.3 contain a beautiful
combination of the logical and and the arithmetic add.
Exhaustive search is justified by these results, and a probabilistic test
allows programs of practical size to be produced. Although results
are limited to a dozen instructions, those found are already useful.
Many examples of these can be found in Appendix I.
One of the most interesting results is not the programs themselves,
but a better understanding of the interrelations between arithmetic
and logical instructions. Similar ideas seem to come up consistently
in the superoptimized programs, These include the sequence 'add.l
dl,dl; subx.l dl,dl' that extracts the sign of a number in the signum
and abs functions and the sequence 'sub.l dl,dO; and.1 d2,do; add.l
dl ,dO' that selects one of two values depending on a third in the rain
and max functions.
In the future, we hope to explore these ideas further, and compile a
list of useful arithmetic-logical idioms that can be concatenated to
form optimal or near-optimal programs.
Appendix
I. More Interesting Results
1.1. SIGNUM
Function
The signum function has been defined in section 2. Given the 68000
instruction set, four is the minimum number of instructions to com-
pute signum. Interestingly, three suffice on the 8086.
(x in
ax)
cwd (sign extends register ax into dx)
neg ax
adc dx, dx
(slgnum(xl in dx}
124

!~,;~ .....................
Find the absolute value of a number, excluding conditional jumps
from the instruction set.
(x in dO)
move. 1
d0,dl
add. 1 dl, dl
subx. 1 dl, dl
eor.1
dl,d0
sub.
1 dl, dO
(abs ix) in dO)
Notice that although it is longer than the classical method (test;
jump-if-positive; negate), it has no jumps! This might actually be
faster than the classical method on some pipelined machines where
jumps are expensive.
1.3. Max and Min
This program finds the maximum of the unsigned numbers in dO and
dl and returns the answer in dO. The comments on the right show
what's in the various registers during execution and is similar to the
boolean expression checker's method of analysis.
(d0-X, dl-Y) lFlag,ReglIf di>d0 lIf dl<md0
sub.l
dl,d0[ (C,d0) -I (I, X-Y) I (0, X-Y)
subx.1
d2,d2l (C,d2) -1 (1,11..11) [ (0,0...0)
or.1 d2,d01(C,d0) -I (1,11..11)I(0,X-Y)
addx. 1 dl,d0ld0 - IY IX
(dO -
max(X, Y))
This program finds the minimum of the unsigned numbers in d0'and
dl and returns the answer in dO.
(d0-X, dl-Y) lFlag,Regllf dl>d0
liE
dl<-d0
sub.1 dl,d0l (C,d0) -I (1, X-Y) I (0, X-Y}
subx.l d2,d2ld2- 1111,..111
1000...000
and.l d2, d01 d0 - IX-Y
I0
add.1 dl, d0ld0 - IX IY
(dO - min (X, Y) )
Simultaneous min and max.
(d0-X, dl-Y) lFlag, ReglIf dl>d0 lIf dl<-d0
sub.1
dl,d0l (C,d0) -I (1, X-Y) [ C0, X-Y)
subx.1 d2,d2ld2- 1111...111
1000...000
and.1 d0, d2 ld2 - IX-Y
l0
eor.1
d2, d0l d0 - l0 [X-Y
add.l dl,d0ld0 - IY IX
add.l d2, dlldl - IX [Y
(dO -
max(X, Y), dl - rain(X, Y))
1.4. Logical Tests
Here are some logical tests that yield true/false answers. Sequences
such as these have immediate application in a compiler to improve
execution speed. Shown here are the tests for zero and non-zero.
Suitable
for BASIC Suitable for C, PASCAL
dO = 0 if dO -- 0 dO - 0 if dO -- 0
- -1 if dO l- 0 -1 if dO !- 0
neg.
1 dO neg. 1 dO
subx. 1 d0,d0 subx. 1 d0,d0
neg.1 dO
dO - -1 if dO -- 0 dO - 1 if dO ~- 0
0 if dO !- 0 - 0 if dO !- 0
neg.
1 dO neg. 1 dO
subx.
1 d0, dO subx. 1 d0, dO
not. 1 dO addq. 1 1, dO
By prepending 'move.l A,d0; sub.l B,d0' to the abave one can con-
struct tests for A == B and A
l=
B.
1.5. Decimal to Binary
This piece converts a 8 digit BCD number stored in dO, one digit to a
nibble, to binary with the result also in dO. It is the longest sequence
ever generated by superoptimizer, and was actually done in three
sequences to multiply by 10. At first I had superoptimizer compute
the 2 digit BCD to binary conversion function '((dO & 0xF0) >> 4) *
10 + (dO
& OxOF)'. This came out surprisingly short:
(2 dlgit
BCD number
In dO)
move. b
d0,dl
and.b
#$F0,dl
isr.b
#3,dl
sob.b
dl,d0
sub.b
dl,d0
sub.
b dl, dO
(binary equivalent
in dO}
What is actually being computed is
arts
-- dO - 3 * ((dO & 0xF0)/8)
Representing the contents of dO as (H:L) whereH is the upper nibble
and L is the lower nibble we get
dO - 16 * H + L, dO & 0xF0 - 16"H
ans - (16*H+L) - 3 * (16"H/8)
-
16*H+L - 6*H
- 10*H + L
which is the 2 digit BCD to binary function. Encouraged by this
result, superoptimizer was put to the task of computing first the 4
digit BCD to binary function and then the 8 digit BCD to binary
function. Here is the 8 digit converter:
(8 digit
BCD number
in dO)
move. 1 d0,dl *
and.l #$FOFOFOF0, 11 *
isr.1 #3,dl *
sub.
1 dl, dO *
sub.
1 dl, dO *
sub.
1 dl, dO *
move.
1 d0, dl +
and.
1 #$FF00FF00, dl +
lsr.1 |1,dl +
sub.
1 dl, dO +
Isr.l #2,dl +
sub.
1 dl, dO +
lsr.l #3,dl +
add. 1 dl, dO +
move. 1
d0,dl
swap dl
mulu #$DSf0,dl
sub.
1 dl, dO
(binary
equivalent
in dO)
What is most amazing is the first section (marked by * alongside the
program) It looks exactly like the 2 digit BCD to binary function.
This section computes 4 simultaneous 2 digit BCD to binary func-
tions on adjacent pairs of nibbles and deposits the answer back into
the byte occupied by those nibbles. The second part (marked by +)
computes two simultaneous 2-byte base 100 to binary conversion
functions. Finally, the third part computes the function 'high-word-
of-d0 * 10000 + low-word-of-d0' to complete the conversion.
1.6. Multiplication by Constants
During a two week period, superopdmizer Was used to find minimal
programs that multiply by constants. A sampling of these programs
is included in this section.
An interesting observation is that the average program size increases
as the multiplication constant increaseS, but it increases very slowly.
The average size of programs that multiply by small numbers (less
than 40) is 5 instructions, most programs that multiply by numbers in
the hundreds are 6 to 7 instructions long, and programs that multiply
by thousands are between 7 and 8 instructions long.
dO *- 29 dO *- 39
move.
1 dO, dl move. 1 d0, dl
181.1 #4,d0 lsl.l
#2,d0
sub.
1 dl, dO add. 1 dl, dO
add.l d0.d0 Isl.l #3,d0
sub.
1 dl, dO sub. 1 dl. dO
125

dO *m 625
move.l dO, dl
dO *- 156 Isl.l #2,d0
move.l dO,dl add.l dl,dO
ls1.1 #2,dl ls1.1 #3,dO
add.1 dl,dO sub.1 dl,dO
lsl.l #5,dO ls1.1 #4,dO
sub.1 dl,dO add.1 dl,dO
1.7. Division
by Constants
Division turns out to be difficult to optimize. A general divide by
constant that works for all 32-bit arguments is too long to realize any
time gain over the divide instruction, and is certainly not shorter.
Additionally, there doesn't seem to be any nifty arithmetic-logical
operations that simplify the process. The generated programs just
multiply by the reciprocal of the constant. Since we do an exhaus-
tive search, this negative result can be seen as a confirmation of the
inherent high cost of divisions for the instruction sets considered.
The following programs were generated in an attempt to gain insight
into binary to BCD algorithms, another area where superoptimizer
has had little success. Note that even with the restricted argument
range, these are much longer than the multiply programs.
dO - trunc(dO/lO) for dO - 0..99
move.b dO, d1
add.b dO,dO IdO - 10 * x
isr.b #1,dl Idl -
.1 * x
add.b dl,dO ldO - 10.1 * x
Isr.b #3,dO [dO - .0101 * x
add.b dl,dO IdO - .1101 * x
lsr.b #3, dO IdO - .0001101 * x
dO - trunc(dO/lO0) for dO - 0..9999
move.w
dO, d1
lsr.w #1,dl Idl
- .1 * x
add.w dO, dO [dO - i0 * g
add.w dO, d1 ldl - 10.1 * x
lsr.w #5,dO ldO - .0001 * x
add.w dl,dO ldO - 10.1001 * x
isr.w #8,dl Jnote: you can't isr.w #10,dl
Isr.w #2,dl [dl - .00000000101 * x
sub.w dl,dO IdO - 10.10001111011
isr.w #8, dO IdO - .0000001010001111011 * x
References
[1] Aho, A.V., 8ethi, R, Uilman, J.D.
Compilers Principles, Techniques, and Tools.
Addison Wesley, 1986.
[2] Davidson, J.W. and Fraser, C.W.
Automatic Generation of Peephole Optimizations.
In
Proceedings of the ACM SIGPLAN ' 84 Symposium on
Compiler Construction,
pages 111-116.
ACM/SIGPLAN, June, 1984.
[3] Kessler, P.B.
Discovering Machine-Specific Code Improvements.
In
Proceedings of the ACM SIGPLAN ' 86 Symposium on
Compiler Construction,
pages 249-254.
ACM/SIGPLAN, June, 1986.
[4] Krumme, D.W. and Aekley, D.H.
A Practical Method for Code Generation Based On Exhaus-
tive Search.
In
Proceedings of the ACM SIGPLAN ' 82 Symposium on
Compiler Construction,
pages 185-196.
ACM/SIGPLAN, June, 1982.
126
Citations
More filters
Proceedings ArticleDOI

CGCExplorer: a semi-automated search procedure for provably correct concurrent collectors

TL;DR: This framework automatically explores a space of algorithms, using model checking with abstraction to verify algorithms in the space, and captures the intuition behind some common mark-and-sweep algorithms using a set of building blocks.
Journal ArticleDOI

Computer systems are dynamical systems.

TL;DR: Strong indications, from multiple corroborating methods, of low-dimensional dynamics in the performance of a simple program running on a popular Intel computer are found, including the first experimental evidence of chaotic dynamics in real computer hardware.
Posted Content

Adaptive Neural Compilation

TL;DR: In this paper, an adaptive neural-compilation framework is proposed to address the problem of efficient program learning, which involves adapting programs to make them more efficient while considering correctness only on a target input distribution.
Book ChapterDOI

Statistical Models for Automatic Performance Tuning

TL;DR: A heuristic for stopping an exhaustive compiletime search early if a near-optimal implementation is found and a run-time decision rules for selecting from among a subset of the best implementations for selecting a fast implementation by an exhaustive, empirical search are developed.
Journal ArticleDOI

Fast and efficient searches for effective optimization-phase sequences

TL;DR: Two complementary general approaches for achieving faster searches for effective optimization sequences when using a genetic algorithm, one of which reduces the search time by avoiding unnecessary executions of the application when possible and the other modifies the search so fewer generations are required to achieve the same results.
References
More filters
Book

Compilers: Principles, Techniques, and Tools

TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.
Proceedings ArticleDOI

Automatic generation of peephole optimizations

TL;DR: A general peephole optimizer driven by a machine description produces optimizations at compile-compile time for a fast, pattern-directed, compile-time optimizer.
Proceedings ArticleDOI

A practical method for code generation based on exhaustive search

TL;DR: An original method for code generation has been developed in conjunction with the construction of a compiler for the C programming language on the DEC-10 computer, and is table-driven, with most machine-specific information isolated in the tables.
Proceedings ArticleDOI

Discovering machine-specific code improvements

TL;DR: A compiler construction tool that automates much of the case analysis necessary to exploit special purpose instructions on a target machine is designed and built, and a working prototype of the instruction set analyzer needed in the framework outlined by [Giegerich 83].
Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

The superoptimizer this paper is a probabilistic test that makes exhaustive searches practical for programs of useful size.Â