scispace - formally typeset
Open AccessProceedings ArticleDOI

Thermal-induced leakage power optimization by redundant resource allocation

Reads0
Chats0
TLDR
It is shown that there is a power density, hence, temperature, at which the total leakage power will reach its optimal value, and such an optimal resource number can be a better starting point for the subsequent switching-driven low power binding.
Abstract
Traditionally, at early design stages, leakage power is associated with the number of transistors in a design. Hence, intuitively an implementation with minimum resource usage would be best for low leakage. Such an allocation would generally be followed by switching optimal resource binding to achieve a low power design. This treatment of leakage power is unaware of operating conditions such as temperature. In this paper, we propose a technique to reduce the total leakage power of a design by identifying the optimal number of resources during allocation and binding. We demonstrate that, contrary to the general tendency to minimize the number of resources, the best solution can actually be achieved if a certain degree of redundancy is allowed. This is due to the fact that leakage is strongly dependent on the on-chip temperature profile. Distributing activity over a higher number of resources can reduce power density, remove potential hotspots and subsequently minimize thermal induced leakage. On the other hand, using an arbitrarily high number of resources will not yield the best solution. In this paper, we show that there is a power density, hence, temperature, at which the total leakage power will reach its optimal value. Such an optimal resource number can be a better starting point for the subsequent switching-driven low power binding. We also present a high-level power density-aware leakage model. Based on the estimates by this model, we optimize the total leakage power by 53.8% on average compared to the minimum resource binding, and 35.7% on average compared to a temperature-aware resource binding technique.

read more

Content maybe subject to copyright    Report

Thermal-Induced Leakage Power Optimization by
Redundant Resource Allocation
Min Ni and Seda Ogrenci Memik
Electrical Engineering and Computer Science
Northwestern University, Evanston, IL
f
mni166, seda
g
@ece.northwestern.edu
ABSTRACT
Traditionally, at early design stages, leakage power is associated
with the number of transistors in a design. Hence, intuitively an im-
plementation with minimum resource usage would be best for low
leakage. Such an allocation would generally be follo wed by switch-
ing optimal resource binding to achieve a low power design. This
treatment of leakage power is unaware of operating conditions such
as temperature. In this paper, we propose a technique to reduce the
total leakage power of a design by identifying the optimal num-
ber of resources during allocation and binding. We demonstrate
that, contrary to the general tendency to minimize the number of
resources, the best solution can actually be achieved if a certain de-
gree of redundancy is allowed. This is due to the fact that leakage is
strongly dependent on the on-chip temperature profile. Distributing
activity over a higher number of resources can reduce power den-
sity, remove potential hotspots and subsequently minimize thermal
induced leakage. On the other hand, using an arbitrarily high num-
ber of resources will not yield the best solution. In this paper, we
sho w that there is a power density, hence, temperature, at which the
total leakage power will reach its optimal value. Such an optimal
resource number can be a better starting point for the subsequent
switching-driv en low power binding. We also present a high-level
po wer density-aware leakage model. Based on the estimates by this
model, we optimize the total leakage power by 53.8% on average
compared to the minimum resource binding, and 35.7% on average
compared to a temperature-aware resource binding technique.
1. INTRODUCTION
Due to technology scaling, the share of leakage power in the to-
tal power budget is on the rise. Supply voltage levels are lowered
with each technology generation, which in turn necessitates lower-
ing of the threshold voltage levels of devices in order to maintain
low delay. Leakage increases exponentially with decreasing thresh-
old voltage levels. As a result, leakage power starts to become sig-
nificant, sometimes even dominant in total power budgets, which
could be up to 50% of the total power [5].
A plethora of techniques to reduce leakage power have been
proposed in literature. Majority of these techniques focus on the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the rst page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
ICCAD 2006, November 5–9, 2006, San Jose, California, USA.
Copyright 2006 ACM 1-59593-389-1/06/0011 ...$5.00.
gate or transistor-level optimizations. Assigning different thresh-
old and/or supply voltages to transistors or gates, together with si-
multaneous gate sizing [6, 11, 14, 16] is one of the most popular
techniques for both standby and operating mode leakage optimiza-
tion. Other techniques, such as using sleep transistors to put the
circuit into sleep mode whenever it idles for a certain period [5]
are also used for reducing standby state leakage power. All these
techniques are derived from the observation that the subthreshold
leakage current, which is the most significant one among the four
main sources of leakage current [12], can be expressed by the fol-
lowing equation [15]:
I
sub
=
W
L
µv
t
2
C
sth
e
(
V
GS
;
V
T
+
ηV
DS
)
=
(
ηV
t
)
(
1
;
e
;
V
DS
=
V
t
)
(1)
Therefore, subthreshold current is a function of device size, supply
voltage, temperature, and other process parameters, such as thresh-
old voltage (V
t
). Most of the above techniques trade-off leakage
po wer with the design complexity to manipulate the threshold volt-
age and supply voltage by adding extra power control components.
Another aspect of leakage is related to dynamic conditions such
as temperature. Leakage has a superlinear dependency on temper-
ature. Fallah et al. reported that the share of leakage power can
increase from 6% at the ambient temperature to as high as 56% of
total power at 110
o
C [5]. Another study reported that the leakage
po wer in an embedded processor can increase by about 30% due to
thermal-induced leakage [8].
Temperature on a chip is itself a function of various parame-
ters, where the foremost factors are the power density on the chip
and the properties of the package. The power density, hence tem-
perature, will continue increasing in future technologies according
to α-power law [13]. The abov ementioned techniques for leak-
age optimization generally do not address the power density on a
chip. Often times, they can in fact ex acerbate the effects of power
density while aiming to consolidate activity on fewer localized re-
sources (for instance in an effort to place parts of the chip in sleep
mode and channel computation towards a selected subset of com-
ponents).
In this work, we investigate a technique to consider the impact
of resource selection on the overall power density and consequently
on thermal-induced leakage in future technology nodes. Resource
allocation and binding is a proper stage during high-level synthe-
sis to consider the potential impact of area on po wer density. At
that stage it is decided how many resources and which type of re-
sources will be utilized in the design. More resources will result in
larger area and most likely in lower power density. In this paper,
we are trying to establish an effective tradeoff between the num-
ber of resources and the total leakage power. There exists an op-
timal point where the amount of resources used yields the most
favorable power density, which in turn results in the least thermal-
297

induced leakage po wer. Our study re veals that often times in order
to reach this point the amount of resources should be higher than
the amount which would be sufficient to satisfy the same perfor-
mance constraint. A judicious introduction of redundant resources
when there is need to relieve power density, will ultimately help
reduce thermal-induced leakage and total leakage significantly.
The major difference between our work and other hotspot-moving
resource allocation techniques is that in almost all the hotspot-
moving techniques [9, 10] a threshold temperature is assumed.
Based on this given constraint, they are trying to make sure that
there are no places on the chip where the static temperature will
exceed that threshold value. However, our work is to decide what
this threshold temperature is, in order to optimize performance,
e.g., to optimize leakage power in our work. Other low power re-
source binding techniques [2–4] which consider switching power
can be supported by our initial allocation. In this way, the low
power resource binding would address two components within two
stages. The first stage is to find the optimal resource number, re-
sulting in best power density and temperature, such that the leak-
age power will be minimized. The second stage is to optimize the
dynamic power and maintain control over thermal behavior by ex-
isting thermal-driv en techniques and switching-driven techniques
based on the results of the first stage.
One reason rendering this distinction feasible is that with differ-
ent starting points (different number of resources and temperature
constraints), the optimal dynamic power considering switching ac-
tivity does not vary significantly [2]. Experimental results reported
in past work [2] show that for a given design example the opti-
mal dynamic power for five resources is 70.882, for six resources
67.872, and for seven resources it is 65.514. Only less than 5%
change is observed when adding more resources. Often times, in-
troduction of redundancy to the resource set might in fact help re-
duce the impact of conflicts due to dependencies and scheduling
compatibility and create more opportunities for the switching op-
timal binding to find a slightly lower switching assignment, which
reduces the dynamic power. Therefore, we can safely conclude
that the optimal dynamic power of functional units will not in-
crease when we add resource redundancy to achieve the optimal
leakage power. On the other hand, the leakage po wer is much more
sensitive to the selection of the resource set than dynamic power.
Even adding one more resource may probably reduce the leakage
po wer by more than 50%, because leakage power is strongly cou-
pled with power density and in turn the chip temperature. There-
fore, the two-stage optimization is meaningful and effective. We
will address mainly the first stage, i.e. power density and resulting
thermal-induced leakage optimization during allocation.
The rest of this paper is organized as follows. Section 2 describes
the leakage power estimation model we will use in this paper. Main
ideas of our low power resource binding technique are discussed in
Section 3. Section 4 presents our experimental o w and results.
Conclusions are given in Section 5.
2. LEAKAGE ESTIMATION MODEL
Before we start to find the optimal number of resources for leak-
age power, it is necessary to establish first a simple model for leak-
age estimation. It is important to emphasize that the intention of
this model is not to compute exact temperature levels. This model
intends to establish the prevailing trend linking power density and
temperature and subsequent expected rate of increase in leakage.
Once we establish this trend it will be a reasonable tool for us to
search for the best resource allocation. Most importantly, it will
help us identify the point where the rate of increase in leakage
po wer due to addition of redundant resources will finally counter-
balance the decrease in thermal-induced leakage due to reduction of
power density after addition of each redundant resource. Up until
that point addition of redundant resources and distribution of oper-
ations onto them will be expected to progressively improve power
density and hence, the total leakage.
We need to establish the following in order to achieve this goal.
First, we need to have the means to compare the relative leakage
of different modules at ambient temperature. For this purpose, we
have used transistor-level (HSpice) simulation of simple building
blocks encountered within the resources in our library to obtain
leakage po wer values for each resource. After simulating the leak-
age power for a simple structure, such as a transistor or a gate, we
scale it to obtain ambient leakage power for individual modules.
Each module implementation in our library requires a customized
scaling factor. The scaling factor not only depends on the number
of transistors in the module, but also on the sizing of individual
transistors and the actual threshold voltage used in the design. We
used empirical data [12] to derive the leakage power scaling fac-
tors of each module type, under the basic idea that leakage po wer
becomes a certain fraction of total power at a gi ven temperature.
Next, we establish the trends to represent the rate of increase in
leakage in response to a change in temperature analytically. In-
stead of using Equation 1 directly, we use Lagrange’s interpolation
formula to implement the curve fitting, as shown in Equation (2),
y
=
L
p
(
x
)=
p
j
=
0
p
i
6
=
j
(
x
;
x
i
)
p
i
6
=
j
(
x
j
;
x
i
)
y
i
(2)
where
(
x
i
y
i
)
is the leakage point obtained from the Hspice simu-
lation. Using analytic leakage formula such as Equation 1 directly
is also feasible. However, we prefer to let the simulation engine to
decide the physics details and then fit the experiment data exactly
by Lagrange’s interpolation.
Having obtained the analytical form of the leakage power trend,
we can use a numerical method to establish the relationship be-
tween power density and temperature. At this point, we turn our
attention towards the two most important factors that affect the ther-
mal behavior: the power density P
=
A and the heat transfer coeffi-
cient.
Equation 3 [7] illustrates the relationship between power density,
heat transfer coefficient (i.e. thermal properties of packaging), and
temperature.
T
=
T
a
+
h
P
A
(3)
where T
a
is the ambient temperature, P is the total power dissipa-
tion, A is the area of design, and h is the heat transfer coefficient
as used in the heat transfer theory. The value of h represents how
well the chip package can dissipate the heat. A large value of h
always implies poor cooling package. An example of h value is
4.75cm
2
o
C/W, based on the operating chip temperature of 120
o
C
degree for the 180nm technology [7]. We will sho w that for ev-
ery power density level, there is always a maximum package heat
coefficient (thus poorest acceptable package). Using a packaging,
which has an even larger heat coefficient than this will be likely to
cause thermal run-away.
Figure 1 illustrates the relationship between average power den-
sity across a given chip, the heat coefficient of the package and the
expected steady state temperature. In this figure, the lines starting
from the origin represent the heat transfer ability of the package.
It is proportional to the chip temperature. High temperature results
in need for fast heat dissipation by the package. The other three
curves represent the different power density levels of the chip. The
298

bending of the curve reflects the fact that the leakage power has
become a significant part of total power consumption and the leak-
age power has a superlinear dependency on temperature. When
the heat generation equals the heat dissipation, the chip tempera-
ture will become steady. Therefore, the intersection point of both
po wer density curve and package heat coefficient curve represents
the steady state point. It can be seen from Figure 1 that for power
density, the higher it is, the higher steady temperature it will reach
with respect to the same packaging configuration.
50 60 70 80 90 100 110 120
0
0.5
1
1.5
2
2.5
3
x 10
−3
temperature(
o
C)
leakage power(W)
power density 1
power density 2
power density 3
package cooling level
Figure 1: Establishing the relationship between temperature
and power density.
This relationship between power density and package heat coef-
ficient is the base for our leakage estimation model. The analytical
formula for calculating the steady state temperature is,
A
h
(
T
x
;
T
a
)=(
p
j
=
0
p
i
6
=
j
(
T
x
;
x
i
)
p
i
6
=
j
(
x
j
;
x
i
)
y
i
f
n
+
P
d
)
(4)
where A is the total area of resources, n is the number of resources,
and f is the leakage power scaling factor. In our experiments, f is
250 for a 16-bit multiplier module and 80 for a 32-bit adder mod-
ule. It is approximately proportional to the area of the module.
P
d
represents the dynamic power. Our purpose is to solve for the
steady state temperature T
x
from this equation. Before that, we first
sho w that it is the superlinear relationship between leakage power
and temperature that leads to our conclusion that there exists an
optimal number of resources (corresponding to an optimal temper-
ature).
LE MMA 1. The steady state temperature T
x
monotonically de-
creases with the incr easing number of resources n if the Lagrange
formula is linear.
P
ROOF. After rearranging Equation 4, we have
T
x
=
T
a
+
h
(
P
d
na
0
+
L
1
(
T
x
)
a
0
=
f
)
(5)
where P
d
is the dynamic power, which is constant as we discussed
above. n is the number of resource, a
0
is the area of one resource.
Using linear Lagrange interpolation, we substitute L
1
(
T
x
)=
ax
+
b
into equation (5) and solve for T
x
,
T
x
=
a
0
f
T
a
+
bh
+
h
P
d
n
a
0
f
;
ah
(6)
it can be seen that T
x
decreases monotonically when n increases.
LE MMA 2. The leakag e power in the form of n
L
1
(
T
x
)
mono-
tonically increases with increasing number of resources.
T
HEOREM 1. The leakage power in the form of n
L
p
(
T
x
)
,p
6
=
1, is not a monotonic function. It obtains a minimal value at some
resource number n
.
P
ROOF. We only analyze the situation where p
=
2 here. Higher
order Lagrange interpolation can be analyzed numerically in the
similar way. Suppose L
2
(
T
x
)=
ax
2
+
bx
+
c, substitute it into Equa-
tion (5),
T
x
=
a
0
f
;
bh
+
q
(
a
0
f
;
bh
)
2
;
4ah
(
hc
+
a
0
f
T
a
+
P
d
h
=
n
)
2ah
(7)
Therefore the total leakage power in the form of n
L
2
(
T
x
)
becomes,
P
l
=
n
L
2
(
T
x
)=
p
s
1
n
2
+
s
2
n
+
t
1
n
+
t
2
(8)
where s
1
s
2
t
1
t
2
are some coefficients. The optimal solution can
be found by setting the derivative to zero. It is in the form of a
quadratic equation.
We proved theoretically that there exists an optimal number of
resources which minimizes the total leakage power. In the next sec-
tion we will show how to reach the optimal solution by a numerical
method.
3. REDUNDANT RESOURCE ALLOCATION
FOR LEAKAGE OPTIMIZATION
Our main goal is to achieve low power density by introduction
of redundant resources in the search of the optimal point where the
reduction in thermal-induced leakage still brings a higher benefit
compared to the additional leakage due to the redundant resources.
Ho wever, deriving an analytic formula for the optimal number of
resources is only possible for 2-degree Lagrange interpolation. In
reality, we will use at least a 10-degree Lagrange formula (there-
fore at least 10 experiment data points) in order to maintain good
accuracy. Another way to solve this problem is to perform an incre-
mental search in the solution space. This is feasible because of the
number of resources will take discrete values. The main algorithm
is illustrated in Figure 2.
Algorithm
Redundant Resource Allocation
Input
: Resource library with power
characterization, resource scheduled DFG,
minimum required leakage power reduction a%
Output
: Number of resources after redundant
allocation
For each resource type
Do
find
avg dynamic power();
find
resnum bounds();
find
package parameter();
n = min
resource number;
While (
P
l
> a%)
add
resource redundancy(n);
steady
temperature = secant(n,
F
(
T
x
)
);
P
l
=
P
l
(
T
0
x
)
;
P
l
(
T
x
)
P
l
(
T
0
x
)
;
T
0
x
=
T
x
;
End
Return number of resources n in new allocation;
End
Figure 2: Pseudocode of the redundant resource allocation al-
gorithm.
The basic idea of this algorithm is to increment the number of re-
sources until the benefits of leakage power reduction become less
than some expectation constraint. In each iteration, we use a nu-
merical method to solve equation (4). In this equation, T
x
is the
v ariable. Before we can solve it, we have to know the dynamic
299

po wer value P
d
and package heat coefficient h. Leakage power
scaling factor f is derived empirically [12].
Therefore, based on the information given by the scheduled DFG,
we first calculate the average dynamic power for each resource
type. At such a high level, we have to ignore the thermal coupling
between different resources because we have no physical position
information available. However, our methodology is still applica-
ble if thermal coupling information is av ailable. The new steady
state temperature can be calculated by combining our results and
the information of thermal coupling. Moreover, ignoring coupling
only underestimates the total leakage po wer, because when one
resource temperature reduces due to resource redundancy, other
resources can also reduce their temperature through thermal cou-
pling. In other words, we can at least get as much leakage reduc-
tion as our result shows. Higher benefits can be expected if ther-
mal coupling is introduced into the leakage estimation model. The
lower bound and upper bound for the number of resources can also
be derived from these DFG files and incorporated into the search.
The next step is to decide the package heat coefficient according
to different power density levels. Using a very lo w package heat
coefficient h is always good, because the chip temperature can be
controlled effecti vely. However, such very low h always implies
high packaging cost. Therefore, we will find the lowest cost (high-
est h) feasible package for each binding based on the relationship
between power density and package heat coefficient. This packag-
ing characteristics will be used in our experiments.
We will discuss estimating the average dynamic power in sub-
section 3.1. The algorithm for identifying the lowest cost package
is presented in subsection 3.2. In subsection 3.3 we will show how
to use a numerical method to obtain the expected steady state tem-
perature, and relate it to the leakage trends.
3.1 Av erage Resource Dynamic Power
We assume that each resource will consume a typical av erage
dynamic power for executing one operation. In other words, the
total dynamic power will be represented by a constant after the
scheduled DFG is given. The total power will be decided by the
total number operations that will be executed in a given number of
control steps. This approximation helps us focus on the contribu-
tion of leakage power. This is a reasonable assumption as we have
discussed in Section 1. Also, at the high-level synthesis stage in-
put switching probabilities are highly unpredictable. Individual dy-
namic power consumptions of operations can be weighted with re-
spective input switching behavior if an appropriate statistical model
is provided.
We first derive a typical dynamic power value of the module
P
0
by some existing power estimation technique. We have used
the po wer estimations obtained after synthesizing different mod-
ules using Synopsys Design Compiler. Assume the signal toggle
rate is TR. It represents how many logic transitions there are per
unit time when the dynamic power is P
0
. Given a scheduled DFG,
which spans a total of m control steps and with the clock cycle time
of the design being s, we can calculate the dynamic power of each
operation as:
P
opt
=
P
0
TR
m
s
(9)
Dynamic power consumption per operation corresponds to the power
consumption when there is only one operation scheduled on the re-
source within m control steps. By using this metric, we can scale
the dynamic power of any resource by the total number of opera-
tions assigned to it.
3.2 Estimating the Package Prop erties
The chip temperature, hence leakage power, is highly related to
the cooling package. Using an arbitrarily low h package will al-
ways guarantee a low temperature. Ho wever, it also means the
package cost will increase. We show that for each po wer density
level, there is a maximum h (minimum cost) package. If the h
exceeds this maximum value, the package heat dissipation curve
and the chip heat generation curve will not have any intersection,
which means that the heat dissipation is always slower than heat
generation. Eventually, the chip temperature will increase to an
uncontrolled high level. This phenomenon is called thermal run-
away. Mathematically, we can get the minimum cost h value when
Equation (4) has only one root.
We use a binary search algorithm to find the maximum package
coefficient. The basic idea in this algorithm is to find a point on
the power density curve such that its tangent line intersects the zero
point of the x-axis. We can select any two points as our initial
v alues as long as one of them intersects the x-axis at a negativ e
value and the other intersects at a positive value. The algorithm
runs recursively, and finally stops when the intersection point is
close enough to the zero point.
After getting the maximum package coefficient, we will decrease
its by some constant value, e.g., 10%, in order to make sure that it is
safely far away from the thermal run-away condition, but still very
low cost. This may also be needed to identify the applicable safe
and lo west cost coefficient among a discrete set of v alues. We will
use this package parameter in the process of estimating the steady
state temperature level.
3.3 Steady State Temperature
The calculation of steady state temperature is basically to find
the solution of a nonlinear equation. Newton-Raphson method can
be a good candidate. However, this method is only applicable when
the order of Lagrange interpolation is not too high.
Therefore, we use the secant method, which has the iteration
expression as shown below.
x
i
+
1
=
x
i
;
f
(
x
i
)
f
0
(
x
i
)
=
x
i
;
f
(
x
i
)
x
i
;
x
i
;
1
f
(
x
i
)
;
f
(
x
i
;
1
)
]
(10)
It substitutes the derivative value by a secant estimation. The con-
vergence speed depends on how far the initial point is from the real
solution. Therefore, nding a good starting point is critical in order
to guarantee the running time of our algorithm.
One such good start point can be obtained by nding the inter-
section of two lines. One is the heat package dissipation line, the
other is the simplified heat generation line by assuming that there
is no leakage power.
T
x
=
T
a
+
h
P
d
A
(11)
It can be seen analytically that this point is very near the solution.
Starting from this initial point and searching in the positive direc-
tion, we can find t he solution within a few iterations.
Having obtained the steady state temperature by the secant method,
we use P
l
(
T
x
)=
n
p
j
=
0
p
i
6
=
j
(
T
x
;
x
i
)
p
i
6
=
j
(
x
j
;
x
i
)
y
i
to calculate the total leakage
po wer for a given resource allocation, that is, for certain number of
resources.
4. EXPERIMENTAL RESULTS
4.1 Experimental Flow
300

arf ewf fdct fft jct1 jdm1 jdm3 jdm4 mot2 mot3 noi
0
2
4
6
8
10
12
14
x 10
4
leakage power(µw)
min−resource allocation
temperature−aware allocation
optimal−leakage allocation
(a)
arf ewf fdct fft jct1 jdm1 jdm3 jdm4 mot2 mot3 noi
0
0.5
1
1.5
2
2.5
3
3.5
x 10
5
total power(µw)
min−resource allocation
temperature−aware allocation
optimal−leakage allocation
(b)
arf ewf fdct fft jct1 jdm1 jdm3 jdm4 mot2 mot3 noi
0
20
40
60
80
100
120
temperature(
o
C)
min−resource allocation
temperature−aware allocation
optimal−leakage allocation
(c)
arf ewf fdct fft jct1 jdm1 jdm3 jdm4 mot2 mot3 noi
0
50
100
150
temperature(
o
C)
min−resource allocation
temperature−aware allocation
optimal−leakage allocation
(d)
Figure 3: (a)Leakage power of our redundancy resource allocation technique compared with thermal-aware resource allocation
technique and minimum resource number allocation; (b)Total power of our technique and other resource allocation techniques;
(c)Average temperature of adders in three different resource allocation schemes; (d)Average temperature of multiplier in three
different resource allocation schemes.
We used two types of functional units (adders and multipliers) to
bind operations in a set of scheduled DFGs. The minimum number
of resources required is determined by the compatibility between
operations as dictated by the schedule. The maximum number of
operations of the same type, which are scheduled in the same con-
trol step correspond to the minimum number of resources required
of that type.
The area value and the average dynamic power consumption of
each module type is obtained after synthesizing them using Synop-
sys Design Compiler with the tsmc 180nm library. We scale down
these values to 70nm technology by full-scale methodology after
synthesis.
4.2 Results
The relevant information regarding our benchmarks is given in
Table 1. Our benchmark DFGs are extracted from popular DSP
and multimedia kernels [1]. Their names are listed in the first col-
umn. The second column is the total number of operations of each
type in these DFGs. The third column presents the minimum num-
ber of resources required by the schedule of each DFG. The re-
maining columns present the average dynamic power consumption
estimated per adder and multiplier module during the execution of
these DFGs, using the method described in Section 3.
Table 1: Properties and Relevant Information on the Scheduled
DFGs
Schedule Num. of Minimum Dyn. Dyn.
Name Nodes Resources Power µW Power µW
[add,mul] [add,mul] per ADD per MUL
arf [12,16] [2,2] 534.19 3446.26
ewf [26, 8] [3,2] 659.89 4257.15
fdct [26,16] [4,4] 934.84 6030.96
fft [26,16] [3,3] 747.87 4824.77
jctrans1 [13,2] [3,2] 801.29 5169.40
jdmerge1 [23,4] [3,3] 659.89 4257.15
jdmerge3 [30,4] [3,3] 487.74 3146.59
jdmerge4 [18,12] [3,3] 509.91 3289.62
motion2 [26,14] [4,3] 467.42 3015.48
motion3 [26,14] [5,3] 467.42 3015.48
noise est [17,9] [3,2] 659.89 4257.15
Figure 4 illustrates the trends for total leakage power of one
resource type (multiplier in this case) with allocations of the re-
source in the same design. The most important observation is that
there exists an optimal number of resources which achiev es the
least total leakage power. We have observed similar trends for all
test cases. As we mentioned before, adding extra resources is not
free. The total leakage power will start to increase after some point
with further increase in number of resources. The sharpest leak-
age po wer reduction happens at high temperatures, i.e., when using
few resources at high power densities. At that point allocating one
more resource impacts the power density and thermal-induced leak-
age most. As we introduce more and more redundancy the return
diminishes. This is expected, since the thermal-induced leakage
po wer only becomes significant at high temperature levels.
When there are more than one resource type in a DFG, we first
add redundancy for the module with highest power density. Be-
cause such a module will be very likely to contain a hotspot leading
to high thermal-induced leakage po wer.
In practice, we set a lo wer bound on leakage power reduction to
accept the addition of a new resource. Only if adding further redun-
dancy can reduce the leakage power by a percentage larger than a
predefined level, we add an extra resource. In our experiment, we
set the value to be 20% for every additional resource. This value
plays the role of judging how important power is compared to area.
Ho wever, as seen from our results, there is an optimal number of
resources, which can achieve minimum total leakage power. In
the power -critical design, we can perform a full search and use as
many resources as that optimal number indicates. Otherwise, if we
choose to stop the search earlier we might not have reached that
optimal number yet.
4 5 6 7 8 9 10 11 12 13 14 15
0
0.5
1
1.5
2
2.5
3
3.5
4
x 10
4
resource number
total leakage power (µw)
Figure 4: Trends in leakage for different allocations of the mul-
tiplier module for FFT design.
Figure 3 illustrates our results. We compared our results against
the thermal-aware resource binding techniques [9, 10]. These tech-
niques try to meet a temperature constraint while using minimum
number of resources during binding. The temperature constraint is
100
o
C, exactly the same as what has been used in these works. As
we can see from the results, we achieved at most 56.5%, on av-
301

Citations
More filters
Proceedings ArticleDOI

The effect of data center temperature on energy efficiency

TL;DR: In this paper, the authors examine the complete energy picture from the utility connection to the rejection of heat from the facility to the outdoor environment and look at the impact an increased ambient temperature will have on each component in that chain.
Journal ArticleDOI

An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures

TL;DR: A mapping approach, referred to as priority and ratio oriented branch and bound (PRBB), is proposed to derive the best mapping by enumerating all the candidate mappings organized in a search tree and achieves a competitive processing speed, which is faster than other mapping approaches.
Journal ArticleDOI

A Multi-Objective Model Oriented Mapping Approach for NoC-based Computing Systems

TL;DR: A multi-objective, i.e., reliability, communication energy, performance, co-optimization model oriented mapping approach is proposed to find optimal mappings when applications are mapped onto network-on-chip (NoC) based reconfigurable architectures.
Journal ArticleDOI

High-level Synthesis for Low-power Design

TL;DR: The recent research development of using HLS to effectively explore a multi-dimensional design space and derive low-power implementations is discussed and potential opportunities in tackling these challenges are outlined.
Patent

Hardware synthesis using thermally aware scheduling and binding

TL;DR: In this article, a linear programming framework is used to analyze the multiple designs and construct a thermally aware rotation scheduling and binding, and then durations for operating each version within a rotation may be determined.
References
More filters
Proceedings ArticleDOI

MediaBench: a tool for evaluating and synthesizing multimedia and communications systems

TL;DR: The MediaBench benchmark suite as discussed by the authors is a benchmark suite that has been designed to fill the gap between the compiler community and embedded applications developers, which has been constructed through a three-step process: intuition and market driven initial selection, experimental measurement, and integration with system synthesis algorithms to establish usefulness.
Journal ArticleDOI

Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas

TL;DR: In this paper, an alpha-power-law MOS model that includes the carrier velocity saturation effect, which becomes prominent in short-channel MOSFETs, is introduced and closed-form expressions for the delay, short-circuit power, and transition voltage of CMOS inverters are derived.
Journal ArticleDOI

Thermal Modeling, Analysis, and Management in VLSI Circuits: Principles and Methods

TL;DR: A brief discussion of key sources of power dissipation and their temperature relation in CMOS VLSI circuits, and techniques for full-chip temperature calculation with special attention to its implications on the design of high-performance, low-power V LSI circuits is presented.
Journal ArticleDOI

Standby and Active Leakage Current Control and Minimization in CMOS VLSI Circuits

TL;DR: Circuit optimization and design automation techniques are introduced to bring leakage under control in CMOS circuits and present techniques for active leakage control.
Proceedings ArticleDOI

Design and optimization of low voltage high performance dual threshold CMOS circuits

TL;DR: This paper uses dual threshold technique to reduce leakage power by assigning high threshold voltage to some transistors in non-critical paths, and using low-threshold transistor in critical paths in order to achieve the best leakage power saving under target performance constraints.
Related Papers (5)
Frequently Asked Questions (9)
Q1. What is the popular technique for reducing leakage power?

Assigning different threshold and/or supply voltages to transistors or gates, together with simultaneous gate sizing [6, 11, 14, 16] is one of the most popular techniques for both standby and operating mode leakage optimization. 

In this paper, the authors propose a technique to reduce the total leakage power of a design by identifying the optimal number of resources during allocation and binding. The authors demonstrate that, contrary to the general tendency to minimize the number of resources, the best solution can actually be achieved if a certain degree of redundancy is allowed. In this paper, the authors show that there is a power density, hence, temperature, at which the total leakage power will reach its optimal value. The authors also present a high-level power density-aware leakage model. Distributing activity over a higher number of resources can reduce power density, remove potential hotspots and subsequently minimize thermal induced leakage. 

Supply voltage levels are lowered with each technology generation, which in turn necessitates lowering of the threshold voltage levels of devices in order to maintain low delay. 

Other techniques, such as using sleep transistors to put the circuit into sleep mode whenever it idles for a certain period [5] are also used for reducing standby state leakage power. 

Experimental results reported in past work [2] show that for a given design example the optimal dynamic power for five resources is 70.882, for six resources 67.872, and for seven resources it is 65.514. 

Temperature on a chip is itself a function of various parameters, where the foremost factors are the power density on the chip and the properties of the package. 

As the authors can see from the results, the authors achieved at most 56.5%, on av-erage 35.7%, leakage power reduction compared to thermal-aware resource binding technique. 

If the h exceeds this maximum value, the package heat dissipation curve and the chip heat generation curve will not have any intersection, which means that the heat dissipation is always slower than heat generation. 

the authors will find the lowest cost (highest h) feasible package for each binding based on the relationship between power density and package heat coefficient.