COMPUTER SYSTEMS
LABORATORY
STANFORD UNIVERSITY STANFORD, CA
943053055
Code Generation and
Reorganization in the Presence
of Pipeline Constraints
John Hennessy and Thomas Gross
Technical Report No. 224
November 1981
The MIPS project has been supported by the Defense Advanced Research
Projects
Agency
under contract #
MDA903-79-C-0680.
Thomas Gross is
supported by an
II3M
Graduate Fellowship.
L
Code Generation and
Reorganization in the Presence
of Pipeline Constraints
John
Hcnncssy
and
lāhomas
Gross
Iācchnical
Report No. 224
November
1981
.
Cornputcr
Systems Laboratory
Dcp:utmcnts
of
Electrical
Engineering
and Computer
Science
Stanford
University
Stanford, California 94305
Abstract
.
Pipeline
interlocks
arc used in a pipelincd architecture to prevent the execution of a machine instruction
before its operands are available. An alternative to this complex piece of hardware is to rearrange
the
instructions at compile-time to avoid pipeline
interlocks.
This problem,
called
code reorganization, is
studied.
The
basic problem of reorganization of machine
level
instructions at compile-time is shown to bc
NP-
complete. A heuristic algorithm is proposed and its properties and effectiveness are explored.
The
impact of
code reorganization techniques on the rest of a compiler system are discussed.
Key Words and Phrases: Code generation, pipelining, interlocks, instruction reordering, code optimization,
register
alhXdiim,
microprogramming
A
version
of this report will appear in the
Proc.
of
the
Ninth ACM
Conference
on
Principles
of Programming
Languages, 1981
1
1 Int
reduction
Recent research in computer
architecture
centers around two major trends:
the
development of
architectures that attempt to support high level language systems through more sophisticated instruction sets,
and the design of simpler architectures that arc
inherently
faster but may rely on more
powerful
compiler
technology. The latter trend has several properties that make it an attractive host for high
level
languages and
their compilers and optimizers:
1.
Because
the instruction set is simpler, individual instructions execute faster.
.
2. A compiler is not faced with the task of attempting to utilize a very sophisticated instruction that
dots not quite fit any particular high level construct.
Besides
slowing down other instructions,
using these instructions is sometimes slower than using a
customized
seque.ncc
of simplier
instructions
[12].
3. Although these architectures may require more sophisticated compiler
technology,
the potential
performance improvements to be obtained from
faster
machines and
better
compilers are
substantial.
.ā.
ā
Recently, several articles have
discussed
the relationship between compilers,
architectures
and performance
[16,5].
The concept of simplified instruction sets and their benefits, both for compilers and hardware
implementations, are presented in
[II,
121.
The unique property of some of these experimental architectures is that they will not perform
efflcicntly
without more sophisticated software technology. This paper investigates a major problem that
arises
when
generating code for a pipelined architecture that does not have hardware pipeline
interlocks.
Without
hardware
interlocks,
naively generated code sequences will not run correctly. These interlocks must be
provided
in software by arranging the instructions and inserting no-ops
(when
necessary) to prevent
undefined
execution
sequences.
There are currently several architectures that require software imposition of
certain types of interlocks
[lo,
61.
The absence of interlocks is also very common in micromachine
architectures, and the microprogrammer must often address this problem.
1.1 Background
A pipclined processor is one in which several sequential instructions are in simultaneous execution, usually
in different phases. One component of an instruction may refer to a component that is computed in an earlier
instruction. Because the earlier instruction may still be executing, the value of the component may not bc
available. A hardware mechanism, called a pipeline interlock, prevents the latter instruction from continuing
until
the
needed value is available.
Figure
1 shows a typical
pipeline
configuration. This pipe has three stages:
psa,
psb,
psC.
Three
instructions