scispace - formally typeset
Open AccessProceedings ArticleDOI

Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling

TLDR
The experimental results demonstrate that the force-directed floorplanning technique can effectively suppress supply noise experienced by modules, reduce the total number of supply-noise margin violations, and achieve a floor-plan with considerably lower IR drop, as compared to a wire-length driven floorplan.
Abstract
This paper proposes noise-direct, a design methodology for power integrity aware floorplanning, using microarchitectural feedback to guide module placement. Stringent power constraints have led microprocessor designers to incorporate aggressive power saving techniques such as clock-gating, that place a significant burden on the power delivery network. While the application of extensive clock-gating can effectively reduce power consumption, unfortunately, it can also induce large inductive noise (di/dt), resulting in signal integrity and reliability issues. To combat these problems, processors are usually designed for the worst-case current consumption scenario using adequate supply voltage and decoupling capacitances. To tackle high-frequency inductive noise and potential IR drops, we propose a novel design methodology that integrates microarchitectural profiling feedback into the floorplanning process. We present two microarchitectural metrics to quantify the noise susceptibility of a module:self weighting and correlation weighting. By using these metrics in a force-directed floorplanning algorithm to assign power pin affinity to modules, we can quickly converge to a design for average-case current consumption. By designing for the average-case and employing dynamic di/dt control for the worst-case, we can ensure that a chip is noise-tolerant without exceeding decap budget constraints. Our observations showed that certain functional modules in a processor exhibit consistent and highly correlated switching activity, that can be used to guide module placement distance from power pins. The experimental results demonstrate that the force-directed floorplanning technique can effectively suppress supply noise experienced by modules, reduce the total number of supply-noise margin violations, and achieve a floor-plan with considerably lower IR drop, as compared to a wire-length driven floorplan.

read more

Content maybe subject to copyright    Report

Noise-Direct: A Technique for Power Supply Noise Aware
Floorplanning Using Micro architecture Profiling
Fayez Mohamood Michael B . Healy Sung Kyu Lim Hsien-Hsin S. Lee
School of Electrical and Computer Engineering
Georgia Institute of Technology
Atlanta, GA 30332
{fayez, mbhealy, limsk, leehs}@ece.gatech.edu
ABSTRACT
This paper proposes Noise-Direct, a design methodology for power
integrity aware floorplanning, using microarchitectural feedback
to guide module placement. Stringent power constraints hav e led
microprocessor designers to incorporate aggressive power saving
techniques such as clock-gating, that place a significant burden on
the power delivery network. While the application of extensive
clock-gating can effectively reduce power consumption, unfortu-
nately, it can also induce large inductive noise (di/dt), resulting
in signal integrity and reliability issues. To combat these prob-
lems, processors are usually designed for the worst-case current
consumption scenario using adequate supply voltage and decou-
pling capacitances.
To tackle high-frequency inductive noise and potential IR drops,
we propose a novel design methodology that integrates microar-
chitectural profiling feedback into the oorplanning process. We
present two microarchitectural metrics to quantify the noise sus-
ceptibility of a module:self weighting and correlation weighting.
By using these metrics in a force-directed floorplanning algorithm
to assign power pin affinity to modules, we can quickly converge
to a design for average-case current consumption. By designing
for the average-case and employing dynamic di/dt control for the
worst-case, we can ensure that a chip is noise-tolerant without ex-
ceeding decap budget constraints. Our observations showed that
certain functional modules in a processor exhibit consistent and
highly correlated switching activity, that can be used to guide mod-
ule placement distance from power pins. The experimental results
demonstrate that the force-directed floorplanning technique can ef-
fectively suppress supply noise experienced by modules, reduce the
total number of supply-noise margin violations, and achieve a floor-
plan with considerably lower IR drop, as compared to a wire-length
driv en floorplan.
1. INTRODUCTION
Power efficiency is the first-order physical constraint in modern
day processor design. The excessive power demand has led to the
use of aggressive techniques such as dynamic voltage/frequency
scaling, clock or power gating, etc. Although techniques like clock-
gating can dramatically reduce dynamic power consumption for
idle modules, they also exacerbate inductiv e noise (di/dt) and IR
drops on the power delivery network. As a result, processor design-
ers ha ve to account for worst-case inductive noise, typically using
an ultra-low impedance power supply network. In order to meet
the impedance target across a wide range of frequencies, multi-
stage decoupling capacitors are necessary. High-frequency noise
is handled by on-die decaps distributed across the die while low-
frequency noise is handled by package level decap.
1
Alternatively,
designers also incorporate ne-grained clock-gating domains whereby
modules are clock-gated in an incremental fashion in order to min-
imize abrupt current surges [6, 12]. Note that both techniques
are centered around the philosophy of designing the chip based on
worst-case switching activity.
1
Note that this work focuses on the high-frequency di/dt issue.
Representative Benchmarks as Input
to Microarchitecture/Current Profiler
Identification of high di/dt modules
using SimpleScalar Performance
Simulator & Wattch Power Analyser
Self-Switching Weight Assignment
for all modules
Correlated-Switching Weight
Assignment for all module pairs
Force-directed Floorplanning to guide
high di/dt modules to appropriate
power pin locations
Power Supply Noise Analysis using
current profile for benchmarks for
each module.
Design evaluation for noise violation
frequency and decap budget
M ic r oa r ch i t ec t ur a l P r of i l in g
No i se An al y s is
N o i s e - a w a r e
F l o o r p l a n n i n g
Figure 1: Noise-Direct Design Methodology Overview
W ith low supply voltages and high power consumption in newer
generations of processors, the worst-case design strategy becomes
highly inefficient. Increasing amounts of decap will consume chip
area and lead to excessive leakage current. Static control for di/dt in
the form of fine-grained clock gating will cause performance de gra-
dation since modules cannot be gated-on quickly. To ov ercome
these issues and avoid designing for the w o rst case inductive noise,
we propose a design methodology, Noise-Direct, that integrates mi-
croarchitectural profiling feedback into the floorplanning process.
The basic idea involves the identification of correlated modules that
are highly likely to cause power supply noise violations and to use
such information to guide module placement. An overview of the
design flow is illustrated in Figure 1. T here are three phases in-
cluding microarchitectural profiling, noise-aware floorplanning and
po wer supply noise analysis. This paper makes the following con-
tributions:
We introduce two metrics called self switching weight and cor-
related switching weight for identifying modules that are highly
likely to cause large di/dt.
We present a force-directed floorplanning algorithm that incor-
porates microarchitectural feedback for module placement. It
ensures a design for the average-case along with dynamic con-
trol at the microarchitectural level to account for the worst-case
current scenario.
To evaluate the effectiveness of our noise-aware floorplan, we
apply a SPICE model of an on-chip power delivery network.
Based on the model, we present the maximal voltage swing at
each module and the overall noise tolerance of the chip.
Current design methodologies consider inductiv e noise issues in
the power supply network as an afterthought. In contrast, we ad-
dress this issue early in the architectural planning phase, thereby
1-4244-0630-7/07/$20.00 ©2007 IEEE.
8B-2
786

reducing decap requirements and design complexity. By floorplan-
ning for the average case using the techniques we propose with
dynamic di/dt control schemes [15] to account for the worst case,
we can ensure a design that is far more resistant to inducti ve noise
than a purely wirelength driven floorplan.
The rest of the paper is organized as follows. Section 2 out-
lines the motivation. Related works are discussed in Section 3, fol-
lowed by a design space analysis in Section 4. Section 5 describes
Noise-Direct. Section 6 presents the ev aluation methodology and
Section 7 shows experimental results. F inally, Section 8 concludes.
2. PRELIMINARIES
Power deli very noise is a g rowing concern and presents a major
issue that thwarts processor designers. One reason is due to the
increasing amount of current consumption in ne wer chips. In addi-
tion, as devices shrink, the supply voltage is also reduced to meet
gate-oxide reliability requirement. Although the lowered voltage
offsets the current consumption to some extent, it also results in a
lower noise margin. Increasing current consumption and switch-
ing activity, coupled with lower noise margins, means that design-
ers have to meet stringent noise constraints by accounting for the
worst-case current scenario. This is typically done by using differ-
ent types of decoupling capacitors and making an extremely low
impedance path from the power supply to the chip. This procedure,
undoubtedly, is not very effective, in terms of cost or complex-
ity [1].
To mitigate dynamic power, processor vendors employ (aggres-
sive) clock-gating on their chips. Clock-gating not only reduces dy-
namic power and heat dissipation, but also can save leakage power
due to the temperature drop. However, simultaneous gating of large
modules in the chip can lead to excessive inductive noise in the
po wer supply. Typically, this issue is dealt with the deployment
of both off-chip and on-chip decaps [26, 18], which increases chip
area and can result in excessive leakage current. Alternativ ely, cer-
tain commercial processors also employ fine-grained gating do-
mains to prevent large modules from being gated on or off too
quickly [6, 12]. Note that fine-grained gating domains increase
the design complexity and lead to performance loss due to the fact
that modules cannot be gated on immediately. The inefficacy of
this technique lies in the fact that this design is aimed at the infre-
quent worst-case current consumption scenarios. This technique
also requires complex modeling of the power delivery network for
ensuring that all supply noise constraints are met. To minimize the
effort of post-design optimization for power supply noise in future
processors that have higher functionality and a lower noise margin,
alternative design methodologies need to be sought.
3. RELATED WORK
Power supply noise aware floorplanning were studied in the past [4,
5, 14, 26]. The central idea in these arena involves two concepts:
the first one involves creating a low impedance path to the chip,
and the second involves optimizing on-chip decap placement and
allocation to suppress inductive noise effects. At the microarchi-
tectural level, some techniques were proposed to address the worst
case inductive noise effects due to applying power saving tech-
niques [23, 21, 22, 19, 20, 10]. These techniques typically employ
certain types of dynamic control mechanisms that can estimate the
incoming current surges and subsequently throttle processor activ-
ity, thereby avoiding noise margin violation.
In contrast to the prior art, we are advocating a methodology that
takes inductive noise issues into account early in the architecture
planning phase of a design. By analyzing microarchitectural be-
havior of real workloads, we exploit module placement in the floor-
planning process to create a design that is inherently more tolerant
to inductive noise than a con ventional wire-length driven oorplan.
4. DESIGN SPACE ANALYSIS
The main focus of this work is to target a floorplan for the average-
case current consumption scenario. Based on actual profiling re-
sults of dynamic module switching activity, our floorplan can be
inherently more noise tolerant. Nonetheless, every design still has
to guarantee the reliability by considering the worst-case scenario
even though it might be rather infrequent.
The option with respect to how the worst-case scenario is ad-
dressed depends on the designers. A traditional solution could
involve deployment of sufficient decoupling capacitance to mini-
mize inductive noise. In order to reduce the increasingly growing
size of decaps on a processor, Noise-Direct is aimed to reduce de-
cap requirements by analyzing the switching correlation between
microarchitecture modules and placing each module based on the
average case di/dt distribution. However, in cases when the cur-
rent threshold is exceeded, a dynamic di/dt control mechanism at
the microarchitecture level is still needed to handle the potential
noise emergency in addition to our noise tolerable floorplan. It is
achieved by dynamically throttling a processor’s activity [10, 15,
23] at the potential cost of performance de gradation. By coupling
noise-aware floorplanning with dynamic di/dt control, we can guar-
antee that our floorplan for the average case is more noise tolerant.
2
Compared to prior art, Noise-Direct can both reduce the total de-
cap requirement on an overly conservative chip design and avoid
performance throttling by only invoking dynamic di/dt control for
the unlikely worst case scenarios.
5. NOISE-DIRECT METHODOLOGY
Our Noise-Direct design methodology consists of two primary
phases: (1) microarchitectural profiling and (2) Floorplanning. The
following sub-sections detail the entire procedure.
5.1 Microarchitectural Profiling
The dynamic power consumption of a processor is correlated to
the characteristics of running programs. To profile current con-
sumption and module activity, cycle-level architecture simulators
such as Simplescalar can be used. Microarchitecture level power
simulation [2], incorporated inside a cycle-level simulator can be
easily extended to quantify current consumption for each module
on a per-cycle basis. This method provides a good understanding
of current demands (di) during each clock per iod (dt) and identifies
modules that are likely culprits of inducing high di/dt noise.
Recently, researchers [10, 15] advocated incorporating dynamic
di/dt control at the microarchitectural level to avoid excessive volt-
age ringing in the power supply. By including current calcula-
tion into microarchitectural simulations, these techniques analyzed
benchmark behavior and used it to guide the dynamic di/dt control.
Along a similar line, our methodology incorporates a fine-grained
current and switching activity profiling by the cycle-level simula-
tor to guide our noise-aware floorplanner. As described earlier, ex-
cessive and/or simultaneous gating of microarchitectural modules
can lead to reliability issues caused by inductive noise. Our mi-
croarchitectural profiling involves quantifying switching activity of
modules under ideal clock-gating. By gathering switching corre-
lation and characterizing dynamic current demands for target ap-
plications, we can provide essential metrics that can be used in the
floorplanning process to generate a noise-aware floorplan aimed for
average-case current consumption and switching activity.
To identify problematic (high switching activity) modules and
perform di/dt aware power pin assignments to them, we use two
metrics. The first metric involves measuring module activity o ver
the duration of a benchmark and assigning weights to modules that
is proportional to the relative number of switches and the intensity
of the switch. The second metric involves identifying the amount of
correlation between each module in the microprocessor, in terms of
simultaneous on/off gating. A detailed description of these metrics
follow.
5.1.1 Self Switching Weight Assignment
Self switching measurement is used to quantify the number of
gating occurrences in the processor for a benchmark during the
profiling period. Both gating on and off are considered as likely
events to cause di/dt fluctuation. The objective of this metric is to
2
Note that we are not proposing any new type of dynamic di/dt
control, which is outside the scope of this work. Rather, we are ad-
vocating a complementary design methodology that inherently tries
to achieve a static design that is noise-tolerant for the average case
current consumption. The core of Noise-Direct is in noise-tolerant
floorplanning. For dynamic di/dt control in processors, interested
readers can refer to [10, 15, 23].
8B-2
787

isolate the microarchitectural modules with high switching activity.
For example, certain modules such as the I-Cache are likely to be
needed almost every c ycle and hence will not be gated on/of f very
often unless the instruction fetch is stalled due to misses. Such
modules will not be considered as major offenders of inductive
noise. This factor is captured and collected in the switching re-
sults generated by our extended cycle-lev el simulator. In addition,
the intensity of the gating activity also depends on the current con-
sumption of each module. The normalized switching activity f ac-
tor and the current consumption per cy cle called intensity of switch
are combined into a single weight α, represented by the following
equation. If sw
i
represents the raw number of switching events
for module i and I
i
is the intensity of the switch, then the self-
switching factor α
i
is denoted by:
α
i
= sw
i
I
i
(1)
In essence, modules with larger weights indicate higher suscep-
tibility to functional failure due to higher inductive noise. This
heuristic is applied to the force directed oorplanning technique
discussed in Section 5.2.
5.1.2 Correlated Switching Weight Assignment
In addition to the absolute self switching magnitude, di/dt issues
due to clock-gating also arise from simultaneous gating of neigh-
boring modules. If two modules that switch simultaneously have
their least impedance paths to the same power pin, this will cause
larger inductive noise effects at the modules. Our second metric,
the correlated weight, accounts for the degree of gating correlation
between microarchitectural modules. The basic idea is to use this
heuristic to either place highly correlated modules away from each
other, or at least assign them to different power pins. For instance,
the I-Cache and the I-TLB are two units that are likely be highly
correlated since they are almost always accessed simultaneously.
Simultaneous gating of these modules in the same direction (both
on or both off) at the same power pin is likely to induce a high di/dt
in the supply voltage.
To measure correlation, we capture the inter-cycle gating direc-
tion of each module in the profiling process. Then each module is
paired with every other module in the processor, and checked for si-
multaneous gating in the same direction. The result is a correlation
matrix with each location representing the number of simultaneous
gating ev ents encountered.
Since switching characteristics of modules var y from each other,
the correlation factors have to be determined in a manner that en-
sures fairness. For instance, if module A and B switched only twice
throughout the execution, and if they happened to switch simulta-
neously only for one single occasion, this would indicate a corre-
lation of 50%. In contrast, if modules C and D switched 10 times
throughout the execution and happened to contain only 3 simulta-
neous switches, this means that they are correlated only 30% of the
time. Clearly, the latter case would be more susceptible to higher
inductive noise. We need to consider such occurrences prudently
in the correlation factor computation.
To ensure fairness, we begin with a correlation matrix that con-
tains raw numbers of correlated switches. We then normalize each
row with respect to a single module that is assigned to the row. In
order to ensure fair switching weights for each row, we calculate
the average of weight that is normalized to each module (in each
row). The result is a symmetric correlation matrix that will con-
tain weights that capture both correlation and ensure that they are
relative to the switching of each module. An illustration of the cal-
culation process of correlated switching events that is relative to
the modules, is shown in Figure 2. In the matrix, X
ij
is the number
of raw correlated switches that occurred o ver the profiling duration
and sw
i
is the number of self-switching events for module i.
The extent of correlation is proportional to the magnitude of the
above calculated weight, and the average intensity of the gating
event. Equation 2 represents the correlation weight γ
i,j
, between
two modules i and j.
γ
i,j
=
1
2
(
X
ij
sw
i
+
X
ji
sw
j
)
1
2
(I
i
+ I
j
) (2)
During power pin assignment, the modules with a high correla-
tion weight will be placed farther apart from each other for alleviat-
2
6
6
6
6
4
X
1
2
X
12
sw
1
+
X
21
sw
2
...
1
2
X
1n
sw
1
+
X
n1
sw
n
0 X ...
1
2
X
2n
sw
2
+
X
n2
sw
n1
............................................
............................................
00... X
3
7
7
7
7
5
Figure 2: Correlated Switching Matrix
ing the inductive noise caused by simultaneous gating. The corre-
lation weights are also factored into the noise-aware oorplanning
technique to be described next.
5.2 Floorplanning Algorithm
5.2.1 Overview of the Approach
Given a set of microarchitectural modules and a netlist that spec-
ifies the connectivity among these modules, our noise-aware mi-
croarchitectural floorplanner tries to determine the location of the
modules in a chip such that (i) there is no overlap among modules,
(ii) the sum of current demand for each power pin does not exceed
its capacity, and (iii) power supply noise experienced by each mod-
ule does not exceed the given bound. Our objective is to provide
a oorplan that minimizes the area of the floorplan and total wire-
length. Microarchitectural floorplanning has drawn significant in-
terests from both the computer architecture and EDA communities
recently [13, 7, 3, 9, 17, 11]. These existing works mainly target
performance and thermal issues, but power supply noise issue has
not been addressed.
Among several methods known for floorplan optimization, we
employ the force-directed oorplanning method [8]. Compared
with other methods such as Simulated Annealing [16], slicing method
[24], and analytical approach [25], force-directed method does not
require tedious parameter tuning and converges quickly while ob-
taining high quality solutions [8]. We formulate the floorplanning
problem as finding a set of forces among and between fixed ob-
jects (such as I/O or power pins) and movable modules in order
to optimize the objective function. The problem of finding mod-
ule position then becomes one of finding forces. Our floorplanner
consists of the following four steps:
1. Initialization: To begin, all modules are randomly distributed
throughout the placement area, without regard to overlap.
2. Iteration: Our objective function is optimized in an iterative
manner, where we update a certain set of forces based on the
last iteration to guide the optimization process.
3. Stopping Criterion: The iterations are stopped when the uti-
lization of the oorplanning area is above a threshold. This
has the effect of an overlap constraint as the floorplan area is
related to the sum of the area of the blocks and the utilization
cannot go above a certain level without a corresponding drop
in the amount of overlap.
4. Legalization: The legalization step removes the overlap among
modules while maintaining the quality of the solution.
Our objective function contains the following types forces (see
Figure 3 for reference): (1) net force (F
net
): all pins in the same net
are pulled closer together to minimize the wirelength objective. (2)
center force (F
cen
): all modules are pulled to the center of the chip
to discourage the modules to escape the chip boundary. (3) correla-
tion force (F
cor
): modules with high switching activity repel each
other so that the noise caused b y the modules is reduced. The cor-
relation factors γ
i,j
described in Section 5.1.2 are used to compute
the magnitude. (4) density force (F
den
): modules located in a high
density region of the chip are pushed apart to reduce the overlap.
(5) pin capacity force (F
pin
): modules are pulled into or pushed
out of each power pin so that the total demand on each power pin
is evenly distributed and its capacity is not violated. The first three
types are non-iterative, where as the last two are iterative. We fix
all the non-iterative forces during the oorplan optimization pro-
cess, whereas the iterative forces are updated based on the previous
8B-2
788

module 1
power pin 1 region
power pin 2 region
power pin 3 region
power pin 1 region
power pin 2 region
power pin 3 region
module 1
module 2
module 2
module 3
module 3
net force
center force
pin force
density force
correlation force
non-iterative forces
iterative forces
Figure 3: Illustration of various forces optimized in our floor-
planner
iterations. In order to balance the impact of the ve types of forces,
we optimize the following combined force:
F
tot
= λ · F
net
+ θ · F
cen
+ µ · F
cor
+ K · F
den
+ ρ · F
pin
where λ, θ, µ, K,andρ are weighting constants.
3
5.2.2 Force Equations
Let n be the number of free modules in the floorplan and (x
i
,y
i
)
be the x and y-coordinates of the center of module i, respectively.
A placement can be described by the 2n-dimensional vector p =
(x
1
,...,x
n
,y
1
,...,y
n
)
T
. The cost of a connection is then formu-
lated such that it is proportional to the squared Euclidean distance
between its endpoints. The objective function sums the cost of all
connections and therefore can be written in matrix notation as
1
2
p
T
Cp +
d
T
p + const (3)
where the 2n × 2n symmetric matrix C and the vector
d are pro-
duced from the module connections and their weights and the for-
mula for squared Euclidean distance. For example, the x-part of
the connection between two free modules i and j is (x
i
x
j
)
2
=
x
2
i
2x
i
x
j
+ x
2
j
. The first term adds to C
i,i
, the second term to
C
i,j
and C
j,i
, and the third term to C
j,j
. Similarly for a fixed con-
nection between free module i and fixed location f , (x
i
x
f
)
2
=
x
2
i
2x
i
x
f
+ x
2
f
adds the first term to C
i,i
, the second term to
d
i
,
and the third term to the constant part of Equation (3). This cost
function is minimized by solving the linear equation system
Cp +
d =0 (4)
This formulation is equivalent to modeling connections as springs
and calculating the state of equilibrium.
Force-directed floorplanning and placement algorithms are well
known for their overlap problems. Spreading or repulsive forces are
required to make the final solution feasible, i.e. with zero overlap.
These additional forces extend Equation (4) with the force vector e
to model constant additional forces which are iteratively updated:
Cp +
d + e =0 (5)
The complexity of solving this equation is O(k · n
2
),wherek is
the number of iterations, and n is the number of modules. Our
experiments show that k ranges from 1 to 10 and n is around 20.
Thus, our algorithm generates optimized solutions quickly.
We compute the pin capacity force as follows: The “current
drawing region” of a pin is defined as a rectangle centered on that
pin with width and height equal to the distance between pins. Then,
the pin capacity force is formulated as follows. Let c
i
be the power
consumption of module i located within the current drawing region
of power pin j, I
j
be the capacity of power pin j, (x
i
,y
i
) be the
center of module i, (x
j
,y
j
) be the location of pin j,andd
i,j
be the
3
Our empirical choice of these values is to set them all equal. We
fix these weights constant during the entire oorplan optimization
process. One can tune the weights statically or dynamically to em-
phasize desired objectives.
squared Euclidean distance between module i and pin j.Letα
i
be
the self switching weight of module i defined in Section 5.1.1. The
x direction force between free module i and fixed pin j is then
F
x
pin
(i, j)=
"
I
j
P
i
c
i
«
2
1
#
·
|x
i
x
j
|
d
i,j
· α
i
(6)
A similar definition follows for the force along the y direction. This
force is proportional to the distance between the module and the
pin, negative if the sum of the current being drawn from the mod-
ules in the current drawing region of the pin are greater than the ca-
pacity of the pin and positive otherwise, and in the range (1, ).
Basically if the demand of block i is higher than the capacity of the
pin j, then the force pushes the modules away; otherwise, it pulls
the modules towards the pin.
5.2.3 Updating Iterative Forces
As mentioned previously, we update two kinds of forces dur-
ing each iteration: F
den
and F
pin
. Specifically, we first obtain the
location of the modules from the previous iteration and use them
to recompute the density of each region in the floorplan and at-
tractive or repulsive forces among the modules within a vicinity of
each power pin. The main motivation for this force update is to
satisfy the non-overlap constraint (via updating F
den
) and pin ca-
pacity constraint (via updating F
pin
). In case these constraints are
not met in the current solution, we try to minimize the amount of
violation as much as possible by attempting another iteration. We
note that the pin constraint is easily satisfied, but not the overlap
constraint. Thus, our post-process explicitly removes the overlap
among the modules. Since e consists of F
den
and F
pin
, e gets
updated and solved in each iteration.
5.2.4 Legalization
A simple heuristic is used to legalize the floorplan of the mod-
ules. Vertical and horizontal constraint graphs similar to those used
for the [16] are created based on the oorplan solution. The basic
idea is to derive the relative positions among the modules based
on the force-directed floorplanning, and use Sequence Pair [16] to
encode them to remove overlap. For each pair of modules, the hor-
izontal and vertical distance between their centers is compared. If
the horizontal distance is smaller than the vertical distance then
the appropriate constraint is added to the v ertical constraint graph.
Conversely, if the vertical distance is less, the appropriate constraint
is added to the horizontal constraint graph. If the modules overlap,
then these constraints will push the modules apart in the direction
that minimizes overall movement. Thus, the legalized modules re-
main close to their original locations. The constraint graphs ensure
that the final floorplan is non-slicing and non-ov erlapping.
6. POWER NETWORK ANALYSIS
To evaluate the effectiveness of the two heuristics that were used
to guide noise-aware floorplanning, we use a SPICE model of the
on-chip power delivery network. We evaluate the benefits of our
technique under the worst-case current consumption scenario. The
worst-case switching activity of an application is determined by
sampling microarchitectural activity of all modules over the dura-
tion of the simulation. By comparing module activity during dif fer-
ent program phases, we can determine the period where the highest
module switching occurs. Once the worst-case phase is identified,
the current profile of each module is generated from the microarchi-
tectural simulator. This complex current waveform is used as input
in the SPICE module as piece-wise linear source (PWL) input. By
incorporating per-cycle current consumption profile obtained from
our microarchitecture simulation, we are able to observe induced
noise effects as a direct function of the application’s behavior.
Based on the power supply noise of each module, we also calcu-
late the amount of decap r equired for the floorplan [26]. If V
noise
is the noise of a given module, V
limit
is the noise margin, Q is the
amount of charge dra wn by the module, then the amount of decap
required (C), can be estimated according to the following:
θ = max(1,
V
noise
V
limit
) (7)
8B-2
789

C =(1 θ
1
)
Q
V
limit
(8)
7. QUANTITATIVE ANALYSIS
We used SimpleScalar 3.0 and Wattch for microarchitectural pro-
filing and simulation. We incorporated e xtensions to generate both
self and correlated module switch weights to be used in our oor-
planner. The power and current consumptions were based on a 5
GHz processor with 70nm process. Nine integer programs from
the SPEC2000 benchmark were used in this study. Each simula-
tion was fast-forwarded by 4 billion instructions and simulated for
100 million instructions.
LSQ RUU BTB L2$ IRF L1D$ ALU0 ALU1 ALU2 ALU3 ALU4 ALU5 L1I$ Bpred DTLB ITLB FALU0 FALU1 Freg
LSQ 28 0 20 13 20 2 10 10 10 10 10 10 11 20 0 11 10 10 12
RUU 0 26 8413200000058250 05
BTB 20 8 18 7 29 17 13 13 13 13 13 13 37 100 17 37 13 13 13
L2$ 13 4 7 16 14 28 12 12 12 12 12 12 21 7 26 21 4 4 7
IRF 20 13 29 14 10 17777777232917238 824
L1D$ 2 2 17 28 17 7 666666111793115 56
ALU0 10 0 13 12 7 6 3 100 100 100 100 100 15 13
6
15 66 66 4
ALU1 10 0 13 12 7 6 100 3 100 100 100 100 15 13 6 15 66 66 4
ALU2 10 0 13 12 7 6 100 100 3 100 100 100 15 13 6 15 66 66 4
ALU3 10 0 13 12 7 6 100 100 100 3 100 100 15 13 6 15 66 66 4
ALU4 10 0 13 12 7 6 100 100 100 100 3 100 15 13 6 15 66 66 4
ALU5 10 0 13 12 7 6 100 100 100 100 100 3 15 13 6 15 66 66 4
L1I$ 11 5 37 21 23 11 15 15 15 15 15 15 3 37 12 100 11 11 5
Bpred 20 8 100 7 29 17 13 13 13 13 13 13 37 3 17 37 13 13 13
DTLB 0 2 17 26 17 93 6 6 6 6 6 6 12 17 2 12 5 5 6
ITLB 11 5 37 21 23 11 15 15 15 15 15 15 100 37 12 1 11 11 5
FALU0 10 0 13 4 8 5 66 66 66 66 66 66 11 13 5 11 1 100 5
FALU1 10 0 13 4 8 5 66 66 66 66 66 66 11 13 5 11 100 1 5
Freg 12 5 13 7 24 6 4 4 4 4 4 4 5 13 6 5 5 5 0
Figure 4: Self and Correlated Switching Weights of All Mod-
ules
7.1 Self and Correlated Switching Weights
Figure 4 shows the both the average self switching weight and
correlated weight of all modules in a symmetric matrix table. The
forward diagonal in the matrix represents the self-switching weights
of each module and all the remaining locations represent the cor-
related switching weights.
4
The rows and columns are sorted in
the descending order of self-switching weights from left to right.
A higher self-switching weight indicates higher susceptibility to
di/dt problems. As shown, the load/store queue (LSQ) and regis-
ter update unit (RUU) carry more weights in comparison to other
modules. On the other hand, the weights of the modules that are
likely to be accessed e very cycle (turned on mostly) such as the L1
I-Cache and the I-TLB are lower. Some modules that are dormant,
only accessed once in a long while e.g. FPU register file, also ha ve
lower weights.
5
The correlation weights are used to place modules that switch si-
multaneously, away from each other in the floorplan. As expected,
branch predictor and BTB, I-Cache and I-TLB and D-Cache and
D-TLB are all highly correlated modules. In addition, it is also
observed that the first six ALU modules are also highly correlated
for concurrency exists in integer instructions. These modules will
be directed away from each other to lessen the inductive noise by
removing clustering of modules.
7.2 Power Supply Noise Analysis
The noise-aware floorplan algorithm used both microarchitec-
tural metrics to guide module placement. In order to demonstrate
the noise-tolerance of the force-directed floorplan, we compare our
noise-aware floorplan to a baseline floorplan that minimizes total
wirelength. In our noise analysis, we assumed a V
dd
of 1 volt
(for 70nm), and a maximum allowed noise margin of 10%. To
illustrate the noise analysis in more details, we depict the worse-
case noise for each module using gzip, a compute-bound program.
They are shown in Figure 5. Note that this graph is sorted from left
to right in the decreasing order of module self-switching activity.
As shown, the noise-aware oorplan significantly suppresses the
noise experienced by modules with high switching activity as well
high current consumption. Almost all the ALUs that exhibit a fair
amount of switching activity and extremely high correlation with
each other, sho w significant voltage noise reductions. For the inte-
ger r egister le (iregfile), the voltage noise w as reduced by 81.7%
4
Note that this matrix shows both self as well as correlated switches
which is why the diagonal is non-zero.
5
Although we profiled only SPECint2000, there are certain bench-
marks (e.g. data compression) that use oats and doubles.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
lsq
ruu
btb
dcache2
iregfile
dcache
alu0
alu1
a
lu2
alu3
alu4
alu5
icache
bpred
dtlb
itlb
Voltage Swing (V)
Wire-length Noise-aware
Descending order of Self Switching Factor
Figure 5: Power Supply Noise at Modules for gzip
ruu & inst
scheduler
dcache
alu7
dtlb
alu3
alu5
alu4
alu2
alu1
fpregfile
lsq & ld/st
scheduler
iregfile
btb
falu3
falu1
falu2
alu8
itlb
alu6
falu4
icache
dcache2
bpred
fpregfile
itlb
alu8
alu2
alu3
falu3
iregfile
alu7
falu2
falu1
dcache2
lsq & ld/st
scheduler
btb
alu4
icache
alu1
alu5
alu6
bpred
dtlb
ruu & inst
scheduler
falu4
dcache
(a) wirelength-driven floorplan (a) noise-aware floorplan
Figure 6: Noise Tolerance for gzip. (Darker module has higher
noise)
in the noise-aware floorplan. We do observe that the L1 Data Cache
(dcache) and ALU0 have a higher noise violation in the noise-aware
floorplan as compared to the baseline, although it has a high self-
switching factor. This is due to the fact that other units, especially
the remaining ALUs that will have a higher priority when it comes
to being directed tow ards power pins because of their strong cor-
relation. The L1 D-Cache does not exhibit a high correlation with
other units and will hence is less important than other modules that
have a higher potential of noise margin violations. Nonetheless,
it is also be noted that the increased violations in the noise-aware
floorplan are only slightly above the allowed 10% margin, making
the overall solution much more noise tolerant.
7.3 Floorplan and Decap Requirement
We now present the baseline wire-length driv en floorplan and the
noise-aware one in Figure 6. The color code in each module repr e-
sents the degree of noise tolerance. The cross (+) in the gure rep-
resents the location of the power pins. The area of the wire-length
driv en oorplan is 69.35 mm
2
with a total wirelength of 804.86
0
0.05
0.1
0.15
0.2
0.25
0.3
bzip crafty eon gap gzip mcf perl twolf Average
Noise Violation Occurences
Wire-length NoiseAware
Figure 7: Noise Margin Violation
8B-2
790

Citations
More filters
Patent

Method and Apparatus for Detecting Clock Gating Opportunities in a Pipelined Electronic Circuit Design

TL;DR: In this paper, a pipeline electronic circuit and design methodology enables power conservation in the stages of the pipeline via a simulation that identifies clock-gating opportunities among the stages in the pipeline.
Proceedings ArticleDOI

A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design

TL;DR: A new dynamic inductive-noise controlling mechanism at the microarchitectural level that limit the on-die current demand within predefined bounds, regardless of the native power and current characteristics of running applications is proposed.
Journal ArticleDOI

Modeling and Tools for Power Supply Variations Analysis in Networks-on-Chip

TL;DR: A tool dedicated to determining the on-chip VDD drops due to communication workload in NoCs, which integrates a fast power grid model, an NoC simulator, an on- chip link model, and a microarchitectural power model for router is presented.
Journal ArticleDOI

Power Gating Aware Task Scheduling in MPSoC

TL;DR: Two efficient algorithms are proposed to reduce noise protection penalty and improve MPSoC performance and a lightweight online adjustment strategy accompanying the offline scheduling method is proposed to adapt to runtime variations and improve reliability.
Proceedings ArticleDOI

On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise

TL;DR: An efficient on-line Greedy Heuristic (GH) algorithm that adapts well to real-time variation is proposed to reduce noise protection penalty and improve MPSoC performance.
References
More filters
Proceedings ArticleDOI

Wattch: a framework for architectural-level power analysis and optimizations

TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
Proceedings ArticleDOI

Generic global placement and floorplanning

TL;DR: The algorithm is capable of addressing the problems of global placement, floorplanning, timing minimization and interaction to logic synthesis, and its iterative nature assures that timing requirements are precisely met.
Journal ArticleDOI

Optimal orientations of cells in slicing floorplan designs

TL;DR: A methodology of VLSI layout described by several authors first determines the relative positions of indivisible pieces, called cells, on the chip and orientation optimization for more general layouts is shown to be NP-complete (in the strong sense).
Journal ArticleDOI

Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning

TL;DR: Compared to postfloorplan approach, the peak power-supply noise can be reduced by as much as 40% and the decap budget can be reduction by asMuch as 21% by using noise-aware floorplanning methodology.
Proceedings ArticleDOI

Design and implementation of the POWER5 microprocessor

TL;DR: POWERS offers significantly increased performance over previous POWER designs by incorporating simultaneous multithreading, an enhanced memory subsystem, and extensive RAS and power management support.
Related Papers (5)
Frequently Asked Questions (11)
Q1. What are the contributions mentioned in the paper "Noise-direct: a technique for power supply noise aware floorplanning using microarchitecture profiling" ?

This paper proposes Noise-Direct, a design methodology for power integrity aware floorplanning, using microarchitectural feedback to guide module placement. To tackle high-frequency inductive noise and potential IR drops, the authors propose a novel design methodology that integrates microarchitectural profiling feedback into the floorplanning process. The authors present two microarchitectural metrics to quantify the noise susceptibility of a module: self weighting and correlation weighting. By using these metrics in a force-directed floorplanning algorithm to assign power pin affinity to modules, the authors can quickly converge to a design for average-case current consumption. By designing for the average-case and employing dynamic di/dt control for the worst-case, the authors can ensure that a chip is noise-tolerant without exceeding decap budget constraints. 

By coupling noise-aware floorplanning with dynamic di/dt control, the authors can guarantee that their floorplan for the average case is more noise tolerant. 

In addition, with smaller devices and lower supply voltage, processors will become less tolerant to inductive noise induced by abrupt current fluctuation (di/dt). 

By gathering switching correlation and characterizing dynamic current demands for target applications, the authors can provide essential metrics that can be used in the floorplanning process to generate a noise-aware floorplan aimed for average-case current consumption and switching activity. 

The worst-case switching activity of an application is determined by sampling microarchitectural activity of all modules over the duration of the simulation. 

The excessive power demand has led to the use of aggressive techniques such as dynamic voltage/frequency scaling, clock or power gating, etc. 

The choice to address the worst-case current consumption is dependent on the designer, whereby noise can be addressed by decap alone, or by the incorporation of dynamic di/dt control. 

The first metric involves measuring module activity over the duration of a benchmark and assigning weights to modules that is proportional to the relative number of switches and the intensity of the switch. 

Since switching characteristics of modules vary from each other, the correlation factors have to be determined in a manner that ensures fairness. 

At the microarchitectural level, some techniques were proposed to address the worst case inductive noise effects due to applying power saving techniques [23, 21, 22, 19, 20, 10]. 

Compared with other methods such as Simulated Annealing [16], slicingmethod [24], and analytical approach [25], force-directed method does not require tedious parameter tuning and converges quickly while obtaining high quality solutions [8].