scispace - formally typeset
Open AccessProceedings ArticleDOI

Register Allocation and Binding for Low Power

Reads0
Chats0
TLDR
Experimental results confirm the viability and usefulness of the approach in minimizing power consumption during the register assignment phase of the behavioral synthesis process.
Abstract
This paper describes a technique for calculating the switching activity of a set of registers shared by different data values. Based on the assumption that the joint pdf (probability density function) of the primary input random variables is known or that a suffficiently large number of input vectors has been given, the register assignment problem for minimum power consumption is formulated as a minimum cost clique covering of an appropriately defined compatibility graph (which is shown to be transitively orientable). The problem is then solved optimally (in polynomial time) using a max-cost ow algorithm. Experimental results confirm the viability and usefulness of the approach in minimizing power consumption during the register assignment phase of the behavioral synthesis process.

read more

Content maybe subject to copyright    Report

Register Allo cation and Binding for LowPower
Jui-Ming Chang and Massoud Pedram
Department of Electrical Engineering-Systems
University of Southern California
Los Angeles, CA 90089
Abstract
This paper describ es a technique for calculating the switch-
ing activity of a set of registers shared by dierent data
values. Based on the assumption that the joint pdf (prob-
ability density function) of the primary input random vari-
ables is known or that a suciently large number of input
vectors has b een given, the register assignment problem for
minimum p ower consumption is formulated as a minimum
cost clique covering of an appropriately dened compati-
bility graph (which is shown to be transitively orientable).
The problem is then solved optimally (in polynomial time)
using a max-cost ow algorithm. Experimental results con-
rm the viability and usefulness of the approach in mini-
mizing power consumption during the register assignment
phase of the b ehavioral synthesis pro cess.
1 Intro duction
One driving factor behind the push for lowpower design
is the growing class of personal computing devices as well
as wireless communications and imaging systems that de-
mand high-speed computations and complex functionalities
with lowpower consumption. Another driving factor is that
excessivepower consumption is b ecoming the limiting fac-
tor in integrating more transistors on a single chip or on
amultiple-chip mo dule. Unless p ower consumption is dra-
matically reduced, the resulting heat will limit the feasible
packing and performance of VLSI circuits and systems.
The behavioral synthesis pro cess consists of three phases:
allocation, assignment and scheduling. These pro cesses de-
termine how many instances of each resource are needed
(allocation), on what resource a computational operation
will b e p erformed (assignment) and when it will b e exe-
cuted (scheduling). Traditionally, behavioral synthesis aims
to minimize the number of resources required to perform a
task in a given time or to minimize the execution time for a
given set of resources. It has become necessary to develop
behavioral synthesis techniques that also account for power
dissipation in the circuit.
This researchwas supp orted in part by the NSF's Young Investiga-
tor Award under Contract No. MIP-9457392 and a grant from the
Intel Corp.
This extends the two-dimensional optimization problem
to a third dimension. The three phases of the behavioral
synthesis pro cess must b e thus modied to produce low
power circuits. Unfortunately,power dissipation is a strong
function of signal statistics and correlations, and hence is
non-deterministic.
Automatic techniques that minimize the switching activ-
ity on globally shared busses and register les, that select
lowpower macros that satisfy the timing constraints, that
schedule op erations to minimize the switching activity from
one cycle step to next, etc. must be developed. This paper
considers register assignment for lowpower.
Most of the high-level synthesis systems perform schedul-
ing of the control and data ow graph (CDFG) before al-
location of the registers and mo dules and synthesis of the
interconnect [11][18][7] as this approach provides timing in-
formation for the allocation and assignment tasks. Other
systems p erform the resource allo cation and binding before
scheduling, in order to provide more precise timing infor-
mation available during the scheduling [9]. Either approach
has its own advantages and shortcomings. The presentwork
assumes that the scheduling of the CDFG has been done
and performs the register allocation before the allocation
of modules and interconnection.
During the register allo cation and assignment, data val-
ues (arcs in the data ow graph) can share the same phys-
ical register if their life times do not overlap. In the past,
researchers have prop osed various techniques to reduce the
total number of the registers used. The existing approaches
include rule-based [6], greedy or iterative [10], branch and
bound [13], linear programming [1], and graph theoretic,
as in the Facet system [18], the HAL system [16] and the
EASY system [17].
Power consumption of well designed register sets depends
mainly
on the
total switching activity
of the registers. In
many applications, the data streams which are input to the
circuit have certain probability distributions. Various ways
of sharing registers among dierent data values thus pro-
duce dierent switching activities in these registers. This
work presents a novel wayof
calculating
this switching
activity based on the assumption that the
joint pdf (prob-
ability density function)
of primary input random variables
is known or a suciently large number of input vectors has
been given. In the latter case, the joint pdf can be ob-
tained by
statistical methods
. After obtaining the jointpdf
of primary input variables, the p df of anyinternal arc (data
value) in the data ow graph and the joint pdf of any pair
of arcs (data values) in the data ow graph are calculated
by a metho d that will be described in detail in the follow-
32nd ACM/IEEE Design Automation Conference
Permission to copy without fee all or part of this material is granted, provided
that the copies are not made or distributed for direct commercial advantage,
the ACM copyright notice and the title of the publication and its date appear,
and notice is given that copying is by permission of the Association for
Computing Machinery. To copy otherwise, or to republish, requires a fee
and/or specific permission. 1995 ACM 0-89791-756-1/95/0006 $3.50

ing sections. The switching activity on a pair of arcs is
then formulated in terms of the joint p df of these arcs, or
alternatively, in terms of a function of the joint pdf of all
primary input variables.
The
life time
of each arc (data value) in a sched-
uled data ow graph is the time during which the data
value is active(valid) and is dened byaninterval
[
birth time; death time
]. A
compatiblity graph G(V,A)
for
these arcs (data values) is then constructed, where vertices
correspond to data values, and there is a directed arc
(u,v)
between twovertices if and only if their corresponding life
times do not overlap and the
u
comes before
v
.We will
show that the unoriented compatiblity graph for the arcs
(data values) in a scheduled data ow graph without cycles
and branches is a
comparability graph (or transitively ori-
entable graph)
whichisa
perfect graph
[5]. This is a very
useful prop erty, as many graph problems (e.g. maximum
clique; maximum weight k-clique covering, etc.) can be
solved in p olynomial time for perfect graphs while they are
NP-complete
for general graphs.
Having calculated the switching activitybetween pairs
of arcs that could potentially share the same register and
given the number of registers that are to be used, the regis-
ter assignment problem for minimum p ower consumption is
formulated as a minimum cost clique covering of the com-
patibility graph. The problem is then solved optimally (in
polynomial time) using a max-cost ow algorithm.
The two problems, calculation of the cross-arc switching
activities (whichmust be performed
O
(
j
E
j
) times, where
j
E
j
is the number of edges in the compatibility graph)
and p ower minimization during register assignment, are in-
dependent. The calculation of the cross-arc switching ac-
tivities can be performed byany means. We present one
such technique later. Other techniques mayhowever b e
used. The p ower optimization is p erformed once the cross-
arc switching activities are known.
The remainder of this paper is organized as follows: Sec-
tion 2 shows the method to calculate the switching activity
between pairs of data values (arcs). Section 3 shows the
method to optimize the p ower consumption of registers in
the register allo cation phase in b ehavioral synthesis. Sec-
tion 4 are some examples to demonstrate the methodology.
2 Switching Activity Calculation
2.1 Calculation of various p dfs
In many instances, the input data streams are
somewhat
known
, and can b e thus described by some probabilistic dis-
tributions. (Our proposed method applies not only to the
well known probability distributions, such as joint Gaussian
distribution, but also to
arbitrary probability distributions
.)
Given a sucientnumber of input vectors, it is possible to
nd the symbolic expressions for the p df 's and the joint
pdf of all inputs using methods in
statistics
.For example,
one way to do this is to calculate the frequency of the o c-
curence for eachvector among the set of input vectors, and
then perform the interpolation on the sets of discrete p oints
to obtain the symbolic expression of the joint p df. Alterna-
tively, one can work directly with the input vectors without
having to nd the symbolic expression of the joint p df, that
is, for a suciently large number of the input vectors, the
frequency of occurence
for each input vector can serveas
the value of the joint p df for that pattern.
If we are given the joint pdf of the input random vari-
ables of a data ow graph, then the joint pdf of
any pair
of values (arcs in the data ow graph) can b e calcualted
[15]. Wewant to nd the joint pdf of anytwo arcs.
Suppose that the two arcs are
y
1
=
u
1
(
x
1
;x
2
;:::;x
n
)
and
y
2
=
u
2
(
x
1
;x
2
;:::;x
n
). We can add another
(
n
2) free functions
y
3
;y
4
;:::;y
n
and form a system
of
n
equations in
n
input variables. Let's denote the
joint p df of the
n
input variables as
(
x
1
;x
2
;:::;x
n
).
If the inverse solution
x
1
=
w
1
(
y
1
;y
2
;:::;y
n
)
;x
2
=
w
2
(
y
1
;y
2
;:::;y
n
)
;:::;x
n
=
w
n
(
y
1
;y
2
;:::;y
n
) can b e ob-
tained symbolically, then the joint pdf of
y
1
;y
2
;:::;y
n
which is denoted by
0
(
y
1
;y
2
;:::;y
n
) is:
0
(
y
1
;y
2
;:::;y
n
)=
j
J
1
j
[
w
1
(
y
1
;y
2
;:::;y
n
)
;
w
2
(
y
1
;y
2
;:::;y
n
)
;:::;
w
n
(
y
1
;y
2
;:::;y
n
)]
where
J
1
is the nxn inverse Jacobian:
J
1
(
y
1
;y
2
;:::;y
n
)=
@x
1
@y
1
@x
1
@y
2

@x
1
@y
n
@x
2
@y
1
@x
2
@y
2

@x
2
@y
n
.
.
.
.
.
.
.
.
.
@x
n
@y
1
@x
n
@y
2

@x
n
@y
n
Once wehave the
0
(
y
1
;y
2
;:::;y
n
),we can calculate the
pairwise p df of
y
1
and
y
2
,
f
y
1
y
2
(
y
1
;y
2
), as
f
y
1
y
2
(
y
1
;y
2
)=
Z
1
1
Z
1
1

Z
1
1
0
(
y
1
;y
2
;:::;y
n
)
dy
3
dy
4
:::dy
n
:
The integration can be p erformed either symbolically or
numerically. The numerical integration over (
n
2) vari-
ables involves much more computation, but is an alternative
approach which is always p ossible whenever the symbolic
integration over the (
n
2) variables is not p ossible.
In addition to the calculation of pairwise joint p dfs, the
pdf of anyinternal arc is needed to calculate the total
switching activity of the set of registers. Supp ose func-
tion
y
=
w
(
x
1
;x
2
;:::;x
n
) is some arc (data value) in the
data ow graph depending on
n
input random variables
x
1
;x
2
;:::;x
n
. The cdf (cumulated distribution function)
of the new random variable y is dened as
G
(
y
) = prob(Y
y), which is equal to prob(
w
(
x
1
;x
2
;:::;x
n
)
y
). The
above probability can b e evaluated as:
G
(
y
)=
ZZ

Z
A
(
x
1
;x
2
;:::;x
n
)
where
(
x
1
;x
2
;:::;x
n
) is the joint p df of the
n
input ran-
dom variables
x
1
;x
2
;:::;x
n
, and
A
=
f
(
x
1
;x
2
;:::;x
n
)
j
w
(
x
1
;x
2
;:::;x
n
)
y
g
. The pdf of y as
g
(
y
) is then ob-
tained by
g
(
y
)=
dG
(
y
)
dy
.

y
Mux
R
1
DeMux
x
C
out, R1
C
out,Mux
C
in,DeMux
Control Control
C
in,R1
i’
j’
k’
i
j
k
Figure 1: Our register sharing mo del
2.2 The power consumption model
Switchedcapacitance
refers to the product of the load capac-
itance and the switching activity of the driver. The power
consumption of a register is proportional to the switched
capacitance on its input and output (see Fig. 1). Supp ose
register
R
1
can be shared between three data values
i; j
and
k:
We assume that an input multiplexor picks the value that
is written into
R
1
while an output demultiplexor dispatches
the stored value to its proper destination. Now,
P
(
R
1
)
/
switching
(
x
)
(
C
out;M ux
+
C
in;R
1
)+
switching
(
y
)
(
C
out;R
1
+
C
in;DeM ux
). Since
switching
(
x
)=
switching
(
y
),
P
(
R
1
)=
switching
(
y
)
C
total
. Note that
C
total
is xed for
a given library.Inany case, minimizing the switching ac-
tivity at the output of the registers will minimize the power
consumption regardless of the sp ecic load seen at the out-
put of the registers. Here we ignore the power consumption
internal to registers and only consider the external p ower
consumption.
In the register allocation phase, if several compatible arcs
are assigned to the same register R, the switching on R
will o ccur whenever one stored data value is replaced by
another data value. For example, supp ose X,Y,Z and W
are four compatible data values that share register R and
the arcs (
X; Y
)
;
(
Y; Z
)
;
(
Z; W
)
2
A
. Supp ose that in the
beginning, the register was reset to some unknown value.
We assume the switching activity from the unknown value
to X is some constantvalue. Then the following is the
chain
of the data transitions
X
!
Y
!
Z
!
W
. If the input
variable values are known, then the exact switching activity
is calculated as
constant
+
H
(
X; Y
)+
H
(
Y; Z
) where
H
(
i; j
)
is the
Hamming distance
between twonumbers
i
and
j
. If,
however, the circuit has even one input random variable,
the whole system has to be described in a probabilistic way
as described next.
Assume that the
n
primary input random variables are
a
1
;a
2
;:::;a
n
and set
A
=
f
(
a
1
;a
2
;:::;a
n
)
g
is the set con-
taining all possible combinations of input tuples. Let set
B
=
f
(
x; y
)
j
x
=
x
(
a
1
;a
2
;:::;a
n
)
; y
=
y
(
a
1
;a
2
;:::;a
n
)
;
8
(
a
1
;a
2
;:::;a
n
)
2Ag
. The switching activitybetween the
two consecutive data values X and Y is then given by:
switching
(
X; Y
)=
X
(
x;y
)
2B
f
xy
(
x; y
)
H
(
x; y
) (1)
where the summation is over all possible patterns of (
x; y
)
2B
, and the function
H
(
x; y
) is the
Hamming distance
between twonumbers
x
and
y
which are represented in a
certain number system in binary form. Equation ( 1) re-
quires that the
discrete type
joint p df for
x; y
be known.
The method for calculating the joint pdf of two random
variables describ ed in section 2.1 is mainly suitable for the
case when the variables in the system are of
continuous
type
. When however the precision used to represent the
discrete numbers is high enough or the variance of the un-
derlying distribution is not to o large, the continuous type
pdf
g
xy
(
x; y
) can be used as a goo d approximation for the
discrete type pdf
f
xy
(
x; y
) after being multiplied by the
scaling factor (
P
(
x;y
)
2B
g
xy
(
x; y
))
1
.
The symbolic computation method is however not very
practical because it involves the tasks of nding the
sym-
bolic inverse
solution of
the system of nonlinear equations
and
symbolic or numerical integration
of complicated ex-
pressions over the region dened by a combination of in-
equalities and/or equalities. Fortunately, the same switch-
ing activity for a pair of
discrete
random variables
x
and
y
can be obtained much more easily by the following:
switching
(
X; Y
)=
X
a
1
X
a
2

X
a
n
(
a
1
;a
2
;:::;a
n
)
H
(
x
(
a
1
;a
2
;:::;a
n
)
;y
(
a
1
;a
2
;:::;a
n
)) (2)
where
(
a
1
;a
2
;:::;a
n
) is the joint pdf of the input variables
a
1
;a
2
;:::;a
n
.
Both equation ( 1) and equation ( 2) started from the
assumption that the jointpdf
(
a
1
;a
2
;:::;a
n
) is obtained
or known. This is a necessary condition in order to pre-
cisely calculate the cross-arc switching activities. Further-
more, equation ( 2) can be used directly once the input
vectors are given without obtaining the symbolic expression
for
(
a
1
;a
2
;:::;a
n
). Here we assume that the
bit w idth
of
a register is nite, so the total number of the patterns that
can be stored in a register is also nite. If we assume all of
the numbers in our system are integers (p ositive or nega-
tive), then the total number of dierent(
x; y
) pairs involved
in equation ( 1) is at most 2
2
bit w idth
. In general, equa-
tion ( 2) involves multidimensional nested summations over
intervals of integral values. When the joint p df of primary
input variables is band-limited (e.g. Gaussian), we can nar-
rowdown the interval of summation in each dimension and
thereby signicantly sp eed up the computation.
Let's denote the set
A
=
f
(
a
1
;a
2
;:::;a
n
)
g
, set
B
=
f
(
x; y
)
j
x
=
x
(
a
1
;a
2
;:::;a
n
)
;y
=
y
(
a
1
;a
2
;:::;a
n
)
;
8
(
a
1
;a
2
;:::;a
n
)
2Ag
,
C
=
f
(
y; z
)
j
y
=
y
(
a
1
;a
2
;:::;a
n
)
;
z
=
z
(
a
1
;a
2
;:::;a
n
)
;
8
(
a
1
;a
2
;:::;a
n
)
2Ag
, and
D
=
f
(
z; w
)
j
z
=
z
(
a
1
;a
2
;:::;a
n
)
;w
=
w
(
a
1
;a
2
;:::;a
n
)
;
8
(
a
1
;a
2
;:::;a
n
)
2Ag
.
The total switching activity in the ab ove example with
register R shared by four arcs (data values) is formulated
as follows:
constant
+
X
(
x;y
)
2B
f
xy
(
x; y
)
H
(
x; y
)
+
X
(
y;z
)
2C
f
yz
(
y; z
)
H
(
y; z
)
+
X
(
z;w
)
2D
f
zw
(
z; w
)
H
(
z; w
)=
constant
+
X
a
1
X
a
2

X
a
n
(
a
1
;a
2
;:::;a
n
)
(
H
(
x; y
)
+
H
(
y; z
)+
H
(
z; w
)) (3)
The total switching activity for a register can b e calcu-
lated after the the set of variables that share that register

are found. Note that the sequence of data transitions are
known at that time.
3 Power Optimization
3.1 Max-cost ow formulation
Denition 3.1
A directedgraph
G
0
= (V,A) is cal led
the compatibility graph for register allocation problem if
the it is constructed by the fol lowing procedure. Each
arc
(data value)
i
in the data ow graph has an interval
(
birth time
i
; death time
i
)
associated with it. Each open
interv al i
corresponds to a
ver tex i
in
G
0
= (V,A). There
is a directedarc
(
u; v
)
2
A
if and only if
interv al
u
\
interv al
v
=
;
and
death time
u
< birth time
v
.
All proofs can be found in [2].
Theorem 3.1
Given a data ow graph without loops and
branches, the compatibility graph
G
0
= G(V,A) for register
al location problem is acyclic.
Denition 3.2
[5]An undirectedgraph G =(V,E) is a
comparability graph if there exists an orientation (V,F) of
G satisfying
F
\
F
1
=
;
; F
+
F
1
=
E; F
2
F
where
F
2
=
f
(
a; c
)
j
(
a; b
)
;
(
b; c
)
2
F
for some vertex b
g
. Comparability graphs are also known as transitively ori-
entable graphs and partially oderable graphs.
Denition 3.3
The unorientedcompatibility graph
G
0
0
=
(
V; E
)
is obtainedby removing the edge orientations of
G
0
=
(
V; A
)
.
Theorem 3.2
Given a data ow graph without loops and
branches, the unorientedcompatibility graph
G
0
0
= (V,E)
for register al location problem is a comparability graph.
To minimize the total power consumption on the regis-
ters, a network
N
G
= (
s; t; V
n
;E
n
;C;K
) is constructed
from the compatibility graph
G
0
=
G
(
V; A
). This is
a similar construction to the one used in [17] to solve
the
weightedmodule al location
problem which simultane-
ously minimizes the number of modules and the amount
of interconnection needed to connect all modules. Con-
ceptually,
N
G
= (
s; t; V
n
;E
n
;C;K
) is constructed from
G
0
=
G
(
V; A
) with two extra vertices, the source vertex
s
and the sink vertex
t
. The additional arcs are the arcs from
s
to every vertex in
V
of
G
(
V; A
), and from every vertex in
V
of
G
(
V; A
)to
t
.We use the Max-Cost Flow algorithm
on
N
G
to nd a maximum cost set of cliques that cover
the
G
0
=
G
(
V; A
). The network on which the owis
conducted has the cost function
C
and the capacities
K
de-
ned on each arc in
E
n
. Assuming that each register has an
unknown value at time
t
0
,we use a constant
sw
0
to rep-
resent the
switching
(
U nknow n; v
) for eachvertex
v
. More
formally, the network
N
G
=(
s; t; V
n
;E
n
;C;K
) is dened
as the following:
V
n
=
V
[f
s; t
g
E
n
=
A
[f
(
s; v
)
;
(
v; t
)
j
v
2
V
g
w
(
s; v
) =
L
b
sw
0
M
c
(4)
w
(
u; v
) =
L
b
X
(
u;v
)
2B
f
uv
(
u; v
)
H
(
u; v
)
M
c
=
L
b
X
a
1
X
a
2

X
a
n
(
a
1
;a
2
;:::;a
n
)
H
(
u
(
a
1
;a
2
;:::;a
n
)
;v
(
a
1
;a
2
;:::;a
n
))
c
(5)
w
(
v; t
) =
L;
8
v
2
V; w
(
t; s
)=
L:
(6)
where
A
=
f
(
a
1
;a
2
;:::;a
n
)
g
,
B
=
f
(
u; v
)
j
u
=
u
(
a
1
;a
2
;:::;a
n
)
; v
=
v
(
a
1
;a
2
;:::;a
n
)
;
8
(
a
1
;a
2
;:::;a
n
)
2Ag
,
L
=
b
max
f
switching
(
u; v
)
g
M
c
+1 over all p ossible u,v
2
V
[f
s
g
, and
M
is a large
constant used to scale up the smallest switching activity
value to an integer.
For each arc e
2
E
n
, a cost function
C
:
E
n
!
N
is
dened, which assigns a non-negativeinteger cost to each
arc . The cost function
C
for network
N
G
is :
c
(
u; v
)=
w
(
u; v
) for all (
u; v
)
2
E
n
. The cost function is dened to
indicate the
power savings
on the arc.
For each arc
e
2
E
n
, a capacity function
K
:
E
n
!
N
,
is dened that assigns to each arc a non-negativenumber.
The capacity of all the arcs is one, except for the return arc
from
t
to
s
which has capacity
k
, where
k
is user-sp ecied
owvalue.
K
(
u; v
) = 1
;
8
(
u; v
)
2
E
n
nf
(
t; s
)
g
K
(
t; s
) =
k
For each arc
e
2
E
n
,aow function
f
:
E
n
!
N
is
dened which assigns to each arc a non-negativenumber.
The ow
f
(
e
) on each arc
e
2
E
n
must ob ey the following:
0
f
(
e
)
K
(
e
) and the owoneachvertex
v
2
V
n
must satisfy the ow conservation rule.
Theorem 3.3
A ow f:
E
n
!
N
with
j
f
j
=1,in
the network
N
G
corresponds to a clique
in the unoriented
compatibility graph
G
0
0
.
Theorem 3.4
A ow f:
E
n
!
N
, with
j
f
j
=
k
,inthe
network
N
G
corresponds to a set of cliques
1
;
2
;:::;
k
in
the unorientedcompatibility graph
G
0
0
.
The generated cliques may not b e vertex disjoint b ecause
the
k
paths in the
N
G
may not b e vertex disjoint. One
way to ensure that the resulting cliques are vertex disjoint
is to employ a node-splitting technique. This technique
duplicates every vertex
v
2
V
in the graph
G
0
=
G
(
V; A
)
into another node
v
0
. There is an arc from
v
to
v
0
for
each
v
2
V
. If there is an arc (
u; v
)
2
A
in the graph
G
0
=
G
(
V; A
), there is an arc (
u
0
;v
) in the new network
N
0
G
. There is also an arc from the source vertex
s
to every
vertex
v
2
V
and from every duplicated vertex
v
0
to the
sink vertex
t
.
More formally, the node splitting technique generates the
following network
N
0
G
=(
s; t; V
0
n
;E
0
n
;C
0
;K
0
) where:
V
0
n
=
V
n
[
V
0
0
there is a ver tex v
0
=
f
(
v
)
2
V
0
0
f or each vertex v
2
V
0
E
0
n
=
A
0
[f
(
s; v
)
;
(
f
(
v
)
;t
)
;v
2
V
0
g[f
(
t; s
)
g
[f
(
v; f
(
v
)
j
v
2
V
0
g

t
a
b
c
d
e
f
s
ab c
1
2
3
4
5
s
t
a
a’
b
b’
c
c’
d
d’
e
e’
f
f’
Data Folw Graph
Compatibility Graph
Network Before Vertex
f
d
e
+
*
/
b
d
f
e
a
c
Splitting
Network After Vertex
Splitting
Figure 2: From data ow graph to network
N
0
G
A
0
=
f
(
f
(
u
)
;v
)
j
(
u; v
)
2
A
g
C
0
((
t; s
)) =
C
0
((
v; f
(
v
)) =
L;
8
v
2
V
0
C
0
((
u
0
;v
)) =
C
((
u; v
))
for all
(
u
0
;v
)
2
A
0
[f
(
s; v
)
;
(
f
(
v
)
;t
)
j
v
2
V
0
g
K
0
((
t; s
)) =
k; K
0
((
u; v
)) = 1
for all u
6
=
t;
and v
6
=
s:
The transformations from the data ow graph to the nal
network
N
0
G
are shown in Fig. 2.
Theorem 3.5
A ow f:
E
n
!
N
, with
j
f
j
=k,in
the network
N
0
G
corresponds to a set of
vertex disjoint
cliques
1
;
2
;:::;
k
in the unorientedcompatibility graph
G
0
0
.
Denition 3.4
[14]Let N = (s,t,V,E,C,K) be a ow net-
work with underlying directedgraph G=(V,E), a weighting
on the arcs
c
ij
2
R
+
for every arc (i,j)
2
E, a capacity
K(e) for every arce
2
E, and a ow value
v
0
2
R
+
. The
min-cost ow problem is to nd a feasible s-t ow of value
v
0
that has minimum cost. In the form of an LP:
min c
t
f
Af
=
v
0
d ever y node
f
b every arc
f
0
ever y arc
where A is the node-arc incidence matrix and
d
i
=
(
1
i=s
+1
i=t
0
otherwise
Denition 3.5
The maximum cost ow problem is that
given a network N=(s,t,V,E,C,K) and a xed ow value
v
0
, nd the ow that maximizes the total cost.
The easiest method to solve the max-cost ow problem
is to negate the cost of each arc in the network, and run
the
min-cost ow algorithm
on the new network [14].
The previous network construction
N
0
G
ensures that the
resulting paths are vertex disjoint cliques in
G
0
(or
G
0
0
).
When the
max-cost ow
algorithm is applied on this net-
work,we obtain cliques that maximize the total cost. The
owvalue on each path is one, this implies that the total
cost on each individual path is the sum over all individual
arcs on that path according to their top ological order in
the graph
G
0
=
G
(
V; A
), where the cost on each arc is a
linear function of the \Saved Power". For example, if (
s; b
),
(
b; c
), (
c; d
), (
d; t
) is a path from source s to sink t. The
total cost on this path is
cost
(
s; b
)+
cost
(
b; c
)+
cost
(
c; d
)
+
cost
(
d; t
). Also, from the above information, we can con-
clude that the set of variables
f
b; c; d
g
will share the same
register according to the order
b
!
c
!
d:
Theorem 3.6
The max-cost ow algorithm on the network
N
0
G
gives the minimum total power consumption on the reg-
isters in the circuit represented by the compatibility graph
G
0
.
Proof: The total cost is
P
e
2
E
n
f
(
e
)
c
(
e
), whichisa
linear function of the
\Total Saved Power"
. The reason is
that
X
e
2
E
n
f
(
e
)
c
(
e
) =
X
e
2
E
n
f
(
e
)
[
L
M
switching
(
e
)] =
L
X
e
2
E
n
f
(
e
)
M
X
e
2
E
n
f
(
e
)
switching
(
e
)
In our specially constructed network,
f
(
e
)inevery arc
e
except (
t; s
) has value either zero or one. The rst term in
the above,
P
e
2
E
n
f
(
e
), is a constant(=2
j
V
j
+
k
for
G
0
=
G
(
V; A
)) among all p ossible clique coverings that
cover all of the vertices in the original graph
G
0
. When we
maximize the total cost for a given owvalue in
N
0
G
,we
are indeed minimizing the total p ower consumption given
that the number of registers is equal to this owvalue.
Note that, the max-cost owon
N
0
G
always nds the clique
covering that covers all of the vertices in the original graph
G
0
whenever the owvalue
j
f
j
is larger than or equal to
k
min
.
k
min
can b e determined by the left edge algorithm
[11] or simply by nding the maximum number of arcs that
cross any c-step b oundary. In most cases, the
k
min
found
by the left edge algorithm is equal to the
k
min
for max-cost
ow. However, in some pathological cases, the twovalues
are not the same. In that case, a p ost-processing step is
needed [2].
2
The time complexity for the max-cost ow algorithm is
O
(
km
2
), according to [4], where
m
=2
j
V
j
+2 for the
graph
G
0
=
G
(
V; A
) and
k
is the owvalus.
Conditional branches can be easily handeled in our sys-
tem by relaxing the conditional data ow graph into several

Citations
More filters
Journal ArticleDOI

Power minimization in IC design: principles and applications

TL;DR: An in-depth survey of CAD methodologies and techniques for designing low power digital CMOS circuits and systems is presented and the many issues facing designers at architectural, logical, and physical levels of design abstraction are described.
Journal ArticleDOI

A predictive system shutdown method for energy saving of event-driven computation

TL;DR: This work presents a new predictive system shutdown method to exploit sleep mode operations for power saving, using an exponential-average approach to predict the upcoming idle period and introduces two mechanisms, prediction-miss correction and pre-wakeup, to improve the hit ratio and to reduce the delay overhead.
Journal ArticleDOI

Energy minimization using multiple supply voltages

TL;DR: Experimental results show that using four supply voltage levels on a number of standard benchmarks, an average energy saving of 53% can be obtained compared to using one xed supply voltage level.
Journal ArticleDOI

Inherently lower-power high-performance superscalar architectures

TL;DR: This work attempts to bring the power issue to the earliest phases of microprocessor development, in particular, the stage of defining a chip microarchitecture, by investigating power-optimization techniques of superscalar microprocessors at the microarch Architecture level that do not compromise performance.
Proceedings ArticleDOI

High-level power modeling, estimation, and optimization

TL;DR: A non-exhaustive survey of the mostsuccessful and innovative ideas in this area that have appeared in the literature in the last few years is provided.
References
More filters
Book

Probability, random variables and stochastic processes

TL;DR: This chapter discusses the concept of a Random Variable, the meaning of Probability, and the axioms of probability in terms of Markov Chains and Queueing Theory.
Book

Probability, random variables, and stochastic processes

TL;DR: In this paper, the meaning of probability and random variables are discussed, as well as the axioms of probability, and the concept of a random variable and repeated trials are discussed.
Journal ArticleDOI

Combinatorial optimization: algorithms and complexity

TL;DR: This clearly written, mathematically rigorous text includes a novel algorithmic exposition of the simplex method and also discusses the Soviet ellipsoid algorithm for linear programming; efficient algorithms for network flow, matching, spanning trees, and matroids; the theory of NP-complete problems; approximation algorithms, local search heuristics for NPcomplete problems, more.
Book

Algorithmic graph theory and perfect graphs

TL;DR: This new Annals edition continues to convey the message that intersection graph models are a necessary and important tool for solving real-world problems and remains a stepping stone from which the reader may embark on one of many fascinating research trails.
Book

Synthesis and optimization of digital circuits

TL;DR: This book covers techniques for synthesis and optimization of digital circuits at the architectural and logic levels, i.e., the generation of performance-and-or area-optimal circuits representations from models in hardware description languages.
Related Papers (5)
Frequently Asked Questions (20)
Q1. What are the contributions mentioned in the paper "Register allocation and binding for low power" ?

This paper describes a technique for calculating the switching activity of a set of registers shared by di erent data values. 

Their future work will focus on the register assignment for pipelined design and data ow graph with outer loops. 

The existing approaches include rule-based [6], greedy or iterative [10], branch and bound [13], linear programming [1], and graph theoretic, as in the Facet system [18], the HAL system [16] and the EASY system [17]. 

One driving factor behind the push for low power design is the growing class of personal computing devices as well as wireless communications and imaging systems that demand high-speed computations and complex functionalities with low power consumption. 

In the register allocation phase, if several compatible arcs are assigned to the same register R, the switching on R will occur whenever one stored data value is replaced by another data value. 

Having calculated the switching activity between pairs of arcs that could potentially share the same register and given the number of registers that are to be used, the register assignment problem for minimum power consumption is formulated as a minimum cost clique covering of the compatibility graph. 

In addition to the calculation of pairwise joint pdfs, the pdf of any internal arc is needed to calculate the total switching activity of the set of registers. 

The life time of each arc (data value) in a scheduled data ow graph is the time during which the data value is active (valid) and is de ned by an interval [birth time; death time]. 

The easiest method to solve the max-cost ow problem is to negate the cost of each arc in the network, and run the min-cost ow algorithm on the new network [14]. 

To demonstrate that the switching activity calculation based on the joint pdf is necessary to obtain a low power register assignment the authors performed an experiment where every arc weight in the compatibility graph was set to some constant (C = 100) and then ran the max-cost ow for di erent ow values. 

The cdf (cumulated distribution function) of the new random variable y is de ned as G(y) = prob(Y y), which is equal to prob(w(x1; x2; : : : ; xn) y). 

To minimize the total power consumption on the registers, a network NG = (s; t; Vn; En; C;K) is constructed from the compatibility graph G0 = G(V; A). 

In any case, minimizing the switching activity at the output of the registers will minimize the power consumption regardless of the speci c load seen at the output of the registers. 

The method for calculating the joint pdf of two random variables described in section 2.1 is mainly suitable for the case when the variables in the system are of continuous type. 

When however the precision used to represent thediscrete numbers is high enough or the variance of the underlying distribution is not too large, the continuous type pdf gxy(x; y) can be used as a good approximation for the discrete type pdf fxy(x; y) after being multiplied by the scaling factor ( P (x;y)2B gxy(x;y)) 

The authors will show that the unoriented compatiblity graph for the arcs (data values) in a scheduled data ow graph without cycles and branches is a comparability graph (or transitively orientable graph) which is a perfect graph [5]. 

The ow value on each path is one, this implies that the total cost on each individual path is the sum over all individual arcs on that path according to their topological order in the graph G0 = G(V; A), where the cost on each arc is a linear function of the \\Saved Power". 

with j f j = k, in the network N 0G corresponds to a set of vertex disjoint cliques 1; 2; : : : ; k in the unoriented compatibility graph G00. 

If the authors are given the joint pdf of the input random variables of a data ow graph, then the joint pdf of any pair of values (arcs in the data ow graph) can be calcualted [15]. 

Note that, the max-cost ow on N 0G always nds the clique covering that covers all of the vertices in the original graph G0 whenever the ow value j f j is larger than or equal to kmin.