How can the authors ensure that the average case is more noise tolerant?

By coupling noise-aware floorplanning with dynamic di/dt control, the authors can guarantee that their floorplan for the average case is more noise tolerant.

What is the effect of di/dt on the processor?

In addition, with smaller devices and lower supply voltage, processors will become less tolerant to inductive noise induced by abrupt current fluctuation (di/dt).

What is the purpose of noise-tolerant floorplans?

By gathering switching correlation and characterizing dynamic current demands for target applications, the authors can provide essential metrics that can be used in the floorplanning process to generate a noise-aware floorplan aimed for average-case current consumption and switching activity.

What is the way to address the worst-case noise?

The choice to address the worst-case current consumption is dependent on the designer, whereby noise can be addressed by decap alone, or by the incorporation of dynamic di/dt control.

(Open Access) Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling (2007) | Fayez Mohamood

Q: What is the metric used to determine the amount of correlation between modules?

The first metric involves measuring module activity over the duration of a benchmark and assigning weights to modules that is proportional to the relative number of switches and the intensity of the switch.

Q: What is the correlation factor for the gating of modules?

Since switching characteristics of modules vary from each other, the correlation factors have to be determined in a manner that ensures fairness.

Q: What is the way to address the worst case inductive noise effects?

At the microarchitectural level, some techniques were proposed to address the worst case inductive noise effects due to applying power saving techniques [23, 21, 22, 19, 20, 10].

Noise-Direct: A Technique for Power Supply Noise Aware

Floorplanning Using Micro architecture Proﬁling

Fayez Mohamood Michael B . Healy Sung Kyu Lim Hsien-Hsin S. Lee

School of Electrical and Computer Engineering

Georgia Institute of Technology

Atlanta, GA 30332

{fayez, mbhealy, limsk, leehs}@ece.gatech.edu

ABSTRACT

This paper proposes Noise-Direct, a design methodology for power

integrity aware ﬂoorplanning, using microarchitectural feedback

to guide module placement. Stringent power constraints hav e led

microprocessor designers to incorporate aggressive power saving

techniques such as clock-gating, that place a signiﬁcant burden on

the power delivery network. While the application of extensive

clock-gating can effectively reduce power consumption, unfortu-

nately, it can also induce large inductive noise (di/dt), resulting

in signal integrity and reliability issues. To combat these prob-

lems, processors are usually designed for the worst-case current

consumption scenario using adequate supply voltage and decou-

pling capacitances.

To tackle high-frequency inductive noise and potential IR drops,

we propose a novel design methodology that integrates microar-

chitectural proﬁling feedback into the ﬂoorplanning process. We

present two microarchitectural metrics to quantify the noise sus-

ceptibility of a module:self weighting and correlation weighting.

By using these metrics in a force-directed ﬂoorplanning algorithm

to assign power pin afﬁnity to modules, we can quickly converge

to a design for average-case current consumption. By designing

for the average-case and employing dynamic di/dt control for the

worst-case, we can ensure that a chip is noise-tolerant without ex-

ceeding decap budget constraints. Our observations showed that

certain functional modules in a processor exhibit consistent and

highly correlated switching activity, that can be used to guide mod-

ule placement distance from power pins. The experimental results

demonstrate that the force-directed ﬂoorplanning technique can ef-

fectively suppress supply noise experienced by modules, reduce the

total number of supply-noise margin violations, and achieve a ﬂoor-

plan with considerably lower IR drop, as compared to a wire-length

driv en ﬂoorplan.

1. INTRODUCTION

Power efﬁciency is the ﬁrst-order physical constraint in modern

day processor design. The excessive power demand has led to the

use of aggressive techniques such as dynamic voltage/frequency

scaling, clock or power gating, etc. Although techniques like clock-

gating can dramatically reduce dynamic power consumption for

idle modules, they also exacerbate inductiv e noise (di/dt) and IR

drops on the power delivery network. As a result, processor design-

ers ha ve to account for worst-case inductive noise, typically using

an ultra-low impedance power supply network. In order to meet

the impedance target across a wide range of frequencies, multi-

stage decoupling capacitors are necessary. High-frequency noise

is handled by on-die decaps distributed across the die while low-

frequency noise is handled by package level decap.

Alternatively,

designers also incorporate ﬁne-grained clock-gating domains whereby

modules are clock-gated in an incremental fashion in order to min-

imize abrupt current surges [6, 12]. Note that both techniques

are centered around the philosophy of designing the chip based on

worst-case switching activity.

Note that this work focuses on the high-frequency di/dt issue.

Representative Benchmarks as Input

to Microarchitecture/Current Profiler

Identification of high di/dt modules

using SimpleScalar Performance

Simulator & Wattch Power Analyser

Self-Switching Weight Assignment

for all modules

Correlated-Switching Weight

Assignment for all module pairs

Force-directed Floorplanning to guide

high di/dt modules to appropriate

power pin locations

Power Supply Noise Analysis using

current profile for benchmarks for

each module.

Design evaluation for noise violation

frequency and decap budget

M ic r oa r ch i t ec t ur a l P r of i l in g

No i se An al y s is

N o i s e - a w a r e

F l o o r p l a n n i n g

Figure 1: Noise-Direct Design Methodology Overview

W ith low supply voltages and high power consumption in newer

generations of processors, the worst-case design strategy becomes

highly inefﬁcient. Increasing amounts of decap will consume chip

area and lead to excessive leakage current. Static control for di/dt in

the form of ﬁne-grained clock gating will cause performance de gra-

dation since modules cannot be gated-on quickly. To ov ercome

these issues and avoid designing for the w o rst case inductive noise,

we propose a design methodology, Noise-Direct, that integrates mi-

croarchitectural proﬁling feedback into the ﬂoorplanning process.

The basic idea involves the identiﬁcation of correlated modules that

are highly likely to cause power supply noise violations and to use

such information to guide module placement. An overview of the

design ﬂow is illustrated in Figure 1. T here are three phases in-

cluding microarchitectural proﬁling, noise-aware ﬂoorplanning and

po wer supply noise analysis. This paper makes the following con-

tributions:

• We introduce two metrics called self switching weight and cor-

related switching weight for identifying modules that are highly

likely to cause large di/dt.

• We present a force-directed ﬂoorplanning algorithm that incor-

porates microarchitectural feedback for module placement. It

ensures a design for the average-case along with dynamic con-

trol at the microarchitectural level to account for the worst-case

current scenario.

• To evaluate the effectiveness of our noise-aware ﬂoorplan, we

apply a SPICE model of an on-chip power delivery network.

Based on the model, we present the maximal voltage swing at

each module and the overall noise tolerance of the chip.

Current design methodologies consider inductiv e noise issues in

the power supply network as an afterthought. In contrast, we ad-

dress this issue early in the architectural planning phase, thereby

8B-2

786

reducing decap requirements and design complexity. By ﬂoorplan-

ning for the average case using the techniques we propose with

dynamic di/dt control schemes [15] to account for the worst case,

we can ensure a design that is far more resistant to inducti ve noise

than a purely wirelength driven ﬂoorplan.

The rest of the paper is organized as follows. Section 2 out-

lines the motivation. Related works are discussed in Section 3, fol-

lowed by a design space analysis in Section 4. Section 5 describes

Noise-Direct. Section 6 presents the ev aluation methodology and

Section 7 shows experimental results. F inally, Section 8 concludes.

2. PRELIMINARIES

Power deli very noise is a g rowing concern and presents a major

issue that thwarts processor designers. One reason is due to the

increasing amount of current consumption in ne wer chips. In addi-

tion, as devices shrink, the supply voltage is also reduced to meet

gate-oxide reliability requirement. Although the lowered voltage

offsets the current consumption to some extent, it also results in a

lower noise margin. Increasing current consumption and switch-

ing activity, coupled with lower noise margins, means that design-

ers have to meet stringent noise constraints by accounting for the

worst-case current scenario. This is typically done by using differ-

ent types of decoupling capacitors and making an extremely low

impedance path from the power supply to the chip. This procedure,

undoubtedly, is not very effective, in terms of cost or complex-

ity [1].

To mitigate dynamic power, processor vendors employ (aggres-

sive) clock-gating on their chips. Clock-gating not only reduces dy-

namic power and heat dissipation, but also can save leakage power

due to the temperature drop. However, simultaneous gating of large

modules in the chip can lead to excessive inductive noise in the

po wer supply. Typically, this issue is dealt with the deployment

of both off-chip and on-chip decaps [26, 18], which increases chip

area and can result in excessive leakage current. Alternativ ely, cer-

tain commercial processors also employ ﬁne-grained gating do-

mains to prevent large modules from being gated on or off too

quickly [6, 12]. Note that ﬁne-grained gating domains increase

the design complexity and lead to performance loss due to the fact

that modules cannot be gated on immediately. The inefﬁcacy of

this technique lies in the fact that this design is aimed at the infre-

quent worst-case current consumption scenarios. This technique

also requires complex modeling of the power delivery network for

ensuring that all supply noise constraints are met. To minimize the

effort of post-design optimization for power supply noise in future

processors that have higher functionality and a lower noise margin,

alternative design methodologies need to be sought.

3. RELATED WORK

Power supply noise aware ﬂoorplanning were studied in the past [4,

5, 14, 26]. The central idea in these arena involves two concepts:

the ﬁrst one involves creating a low impedance path to the chip,

and the second involves optimizing on-chip decap placement and

allocation to suppress inductive noise effects. At the microarchi-

tectural level, some techniques were proposed to address the worst

case inductive noise effects due to applying power saving tech-

niques [23, 21, 22, 19, 20, 10]. These techniques typically employ

certain types of dynamic control mechanisms that can estimate the

incoming current surges and subsequently throttle processor activ-

ity, thereby avoiding noise margin violation.

In contrast to the prior art, we are advocating a methodology that

takes inductive noise issues into account early in the architecture

planning phase of a design. By analyzing microarchitectural be-

havior of real workloads, we exploit module placement in the ﬂoor-

planning process to create a design that is inherently more tolerant

to inductive noise than a con ventional wire-length driven ﬂoorplan.

4. DESIGN SPACE ANALYSIS

The main focus of this work is to target a ﬂoorplan for the average-

case current consumption scenario. Based on actual proﬁling re-

sults of dynamic module switching activity, our ﬂoorplan can be

inherently more noise tolerant. Nonetheless, every design still has

to guarantee the reliability by considering the worst-case scenario

even though it might be rather infrequent.

The option with respect to how the worst-case scenario is ad-

dressed depends on the designers. A traditional solution could

involve deployment of sufﬁcient decoupling capacitance to mini-

mize inductive noise. In order to reduce the increasingly growing

size of decaps on a processor, Noise-Direct is aimed to reduce de-

cap requirements by analyzing the switching correlation between

microarchitecture modules and placing each module based on the

average case di/dt distribution. However, in cases when the cur-

rent threshold is exceeded, a dynamic di/dt control mechanism at

the microarchitecture level is still needed to handle the potential

noise emergency in addition to our noise tolerable ﬂoorplan. It is

achieved by dynamically throttling a processor’s activity [10, 15,

23] at the potential cost of performance de gradation. By coupling

noise-aware ﬂoorplanning with dynamic di/dt control, we can guar-

antee that our ﬂoorplan for the average case is more noise tolerant.

Compared to prior art, Noise-Direct can both reduce the total de-

cap requirement on an overly conservative chip design and avoid

performance throttling by only invoking dynamic di/dt control for

the unlikely worst case scenarios.

5. NOISE-DIRECT METHODOLOGY

Our Noise-Direct design methodology consists of two primary

phases: (1) microarchitectural proﬁling and (2) Floorplanning. The

following sub-sections detail the entire procedure.

5.1 Microarchitectural Proﬁling

The dynamic power consumption of a processor is correlated to

the characteristics of running programs. To proﬁle current con-

sumption and module activity, cycle-level architecture simulators

such as Simplescalar can be used. Microarchitecture level power

simulation [2], incorporated inside a cycle-level simulator can be

easily extended to quantify current consumption for each module

on a per-cycle basis. This method provides a good understanding

of current demands (di) during each clock per iod (dt) and identiﬁes

modules that are likely culprits of inducing high di/dt noise.

Recently, researchers [10, 15] advocated incorporating dynamic

di/dt control at the microarchitectural level to avoid excessive volt-

age ringing in the power supply. By including current calcula-

tion into microarchitectural simulations, these techniques analyzed

benchmark behavior and used it to guide the dynamic di/dt control.

Along a similar line, our methodology incorporates a ﬁne-grained

current and switching activity proﬁling by the cycle-level simula-

tor to guide our noise-aware ﬂoorplanner. As described earlier, ex-

cessive and/or simultaneous gating of microarchitectural modules

can lead to reliability issues caused by inductive noise. Our mi-

croarchitectural proﬁling involves quantifying switching activity of

modules under ideal clock-gating. By gathering switching corre-

lation and characterizing dynamic current demands for target ap-

plications, we can provide essential metrics that can be used in the

ﬂoorplanning process to generate a noise-aware ﬂoorplan aimed for

average-case current consumption and switching activity.

To identify problematic (high switching activity) modules and

perform di/dt aware power pin assignments to them, we use two

metrics. The ﬁrst metric involves measuring module activity o ver

the duration of a benchmark and assigning weights to modules that

is proportional to the relative number of switches and the intensity

of the switch. The second metric involves identifying the amount of

correlation between each module in the microprocessor, in terms of

simultaneous on/off gating. A detailed description of these metrics

follow.

5.1.1 Self Switching Weight Assignment

Self switching measurement is used to quantify the number of

gating occurrences in the processor for a benchmark during the

proﬁling period. Both gating on and off are considered as likely

events to cause di/dt ﬂuctuation. The objective of this metric is to

Note that we are not proposing any new type of dynamic di/dt

control, which is outside the scope of this work. Rather, we are ad-

vocating a complementary design methodology that inherently tries

to achieve a static design that is noise-tolerant for the average case

current consumption. The core of Noise-Direct is in noise-tolerant

ﬂoorplanning. For dynamic di/dt control in processors, interested

readers can refer to [10, 15, 23].

8B-2

787

isolate the microarchitectural modules with high switching activity.

For example, certain modules such as the I-Cache are likely to be

needed almost every c ycle and hence will not be gated on/of f very

often unless the instruction fetch is stalled due to misses. Such

modules will not be considered as major offenders of inductive

noise. This factor is captured and collected in the switching re-

sults generated by our extended cycle-lev el simulator. In addition,

the intensity of the gating activity also depends on the current con-

sumption of each module. The normalized switching activity f ac-

tor and the current consumption per cy cle called intensity of switch

are combined into a single weight α, represented by the following

equation. If sw

represents the raw number of switching events

for module i and I

is the intensity of the switch, then the self-

switching factor α

is denoted by:

= sw

• I

(1)

In essence, modules with larger weights indicate higher suscep-

tibility to functional failure due to higher inductive noise. This

heuristic is applied to the force directed ﬂoorplanning technique

discussed in Section 5.2.

5.1.2 Correlated Switching Weight Assignment

In addition to the absolute self switching magnitude, di/dt issues

due to clock-gating also arise from simultaneous gating of neigh-

boring modules. If two modules that switch simultaneously have

their least impedance paths to the same power pin, this will cause

larger inductive noise effects at the modules. Our second metric,

the correlated weight, accounts for the degree of gating correlation

between microarchitectural modules. The basic idea is to use this

heuristic to either place highly correlated modules away from each

other, or at least assign them to different power pins. For instance,

the I-Cache and the I-TLB are two units that are likely be highly

correlated since they are almost always accessed simultaneously.

Simultaneous gating of these modules in the same direction (both

on or both off) at the same power pin is likely to induce a high di/dt

in the supply voltage.

To measure correlation, we capture the inter-cycle gating direc-

tion of each module in the proﬁling process. Then each module is

paired with every other module in the processor, and checked for si-

multaneous gating in the same direction. The result is a correlation

matrix with each location representing the number of simultaneous

gating ev ents encountered.

Since switching characteristics of modules var y from each other,

the correlation factors have to be determined in a manner that en-

sures fairness. For instance, if module A and B switched only twice

throughout the execution, and if they happened to switch simulta-

neously only for one single occasion, this would indicate a corre-

lation of 50%. In contrast, if modules C and D switched 10 times

throughout the execution and happened to contain only 3 simulta-

neous switches, this means that they are correlated only 30% of the

time. Clearly, the latter case would be more susceptible to higher

inductive noise. We need to consider such occurrences prudently

in the correlation factor computation.

To ensure fairness, we begin with a correlation matrix that con-

tains raw numbers of correlated switches. We then normalize each

row with respect to a single module that is assigned to the row. In

order to ensure fair switching weights for each row, we calculate

the average of weight that is normalized to each module (in each

row). The result is a symmetric correlation matrix that will con-

tain weights that capture both correlation and ensure that they are

relative to the switching of each module. An illustration of the cal-

culation process of correlated switching events that is relative to

the modules, is shown in Figure 2. In the matrix, X

is the number

of raw correlated switches that occurred o ver the proﬁling duration

and sw

is the number of self-switching events for module i.

The extent of correlation is proportional to the magnitude of the

above calculated weight, and the average intensity of the gating

event. Equation 2 represents the correlation weight γ

i,j

, between

two modules i and j.

i,j

(

) •

+ I

) (2)

During power pin assignment, the modules with a high correla-

tion weight will be placed farther apart from each other for alleviat-

“

”

...

“

”

0 X ...

“

n−1

”

............................................

00... X

Figure 2: Correlated Switching Matrix

ing the inductive noise caused by simultaneous gating. The corre-

lation weights are also factored into the noise-aware ﬂoorplanning

technique to be described next.

5.2 Floorplanning Algorithm

5.2.1 Overview of the Approach

Given a set of microarchitectural modules and a netlist that spec-

iﬁes the connectivity among these modules, our noise-aware mi-

croarchitectural ﬂoorplanner tries to determine the location of the

modules in a chip such that (i) there is no overlap among modules,

(ii) the sum of current demand for each power pin does not exceed

its capacity, and (iii) power supply noise experienced by each mod-

ule does not exceed the given bound. Our objective is to provide

a ﬂoorplan that minimizes the area of the ﬂoorplan and total wire-

length. Microarchitectural ﬂoorplanning has drawn signiﬁcant in-

terests from both the computer architecture and EDA communities

recently [13, 7, 3, 9, 17, 11]. These existing works mainly target

performance and thermal issues, but power supply noise issue has

not been addressed.

Among several methods known for ﬂoorplan optimization, we

employ the force-directed ﬂoorplanning method [8]. Compared

with other methods such as Simulated Annealing [16], slicing method

[24], and analytical approach [25], force-directed method does not

require tedious parameter tuning and converges quickly while ob-

taining high quality solutions [8]. We formulate the ﬂoorplanning

problem as ﬁnding a set of forces among and between ﬁxed ob-

jects (such as I/O or power pins) and movable modules in order

to optimize the objective function. The problem of ﬁnding mod-

ule position then becomes one of ﬁnding forces. Our ﬂoorplanner

consists of the following four steps:

1. Initialization: To begin, all modules are randomly distributed

throughout the placement area, without regard to overlap.

2. Iteration: Our objective function is optimized in an iterative

manner, where we update a certain set of forces based on the

last iteration to guide the optimization process.

3. Stopping Criterion: The iterations are stopped when the uti-

lization of the ﬂoorplanning area is above a threshold. This

has the effect of an overlap constraint as the ﬂoorplan area is

related to the sum of the area of the blocks and the utilization

cannot go above a certain level without a corresponding drop

in the amount of overlap.

4. Legalization: The legalization step removes the overlap among

modules while maintaining the quality of the solution.

Our objective function contains the following types forces (see

Figure 3 for reference): (1) net force (F

net

): all pins in the same net

are pulled closer together to minimize the wirelength objective. (2)

center force (F

cen

): all modules are pulled to the center of the chip

to discourage the modules to escape the chip boundary. (3) correla-

tion force (F

cor

): modules with high switching activity repel each

other so that the noise caused b y the modules is reduced. The cor-

relation factors γ

i,j

described in Section 5.1.2 are used to compute

the magnitude. (4) density force (F

den

): modules located in a high

density region of the chip are pushed apart to reduce the overlap.

(5) pin capacity force (F

): modules are pulled into or pushed

out of each power pin so that the total demand on each power pin

is evenly distributed and its capacity is not violated. The ﬁrst three

types are non-iterative, where as the last two are iterative. We ﬁx

all the non-iterative forces during the ﬂoorplan optimization pro-

cess, whereas the iterative forces are updated based on the previous

8B-2

788

module 1

power pin 1 region

power pin 2 region

power pin 3 region

power pin 1 region

power pin 2 region

power pin 3 region

module 1

module 2

module 3

net force

center force

pin force

density force

correlation force

non-iterative forces

iterative forces

Figure 3: Illustration of various forces optimized in our ﬂoor-

planner

iterations. In order to balance the impact of the ﬁve types of forces,

we optimize the following combined force:

tot

= λ · F

net

+ θ · F

cen

+ µ · F

cor

+ K · F

den

+ ρ · F

where λ, θ, µ, K,andρ are weighting constants.

5.2.2 Force Equations

Let n be the number of free modules in the ﬂoorplan and (x

)

be the x and y-coordinates of the center of module i, respectively.

A placement can be described by the 2n-dimensional vector p =

,...,x

,...,y

)

. The cost of a connection is then formu-

lated such that it is proportional to the squared Euclidean distance

between its endpoints. The objective function sums the cost of all

connections and therefore can be written in matrix notation as

p

Cp +



p + const (3)

where the 2n × 2n symmetric matrix C and the vector



d are pro-

duced from the module connections and their weights and the for-

mula for squared Euclidean distance. For example, the x-part of

the connection between two free modules i and j is (x

− x

)

− 2x

+ x

. The ﬁrst term adds to C

i,i

, the second term to

i,j

and C

j,i

, and the third term to C

j,j

. Similarly for a ﬁxed con-

nection between free module i and ﬁxed location f , (x

− x

)

− 2x

+ x

adds the ﬁrst term to C

i,i

, the second term to



and the third term to the constant part of Equation (3). This cost

function is minimized by solving the linear equation system

Cp +



d =0 (4)

This formulation is equivalent to modeling connections as springs

and calculating the state of equilibrium.

Force-directed ﬂoorplanning and placement algorithms are well

known for their overlap problems. Spreading or repulsive forces are

required to make the ﬁnal solution feasible, i.e. with zero overlap.

These additional forces extend Equation (4) with the force vector e

to model constant additional forces which are iteratively updated:

Cp +



d + e =0 (5)

The complexity of solving this equation is O(k · n

),wherek is

the number of iterations, and n is the number of modules. Our

experiments show that k ranges from 1 to 10 and n is around 20.

Thus, our algorithm generates optimized solutions quickly.

We compute the pin capacity force as follows: The “current

drawing region” of a pin is deﬁned as a rectangle centered on that

pin with width and height equal to the distance between pins. Then,

the pin capacity force is formulated as follows. Let c

be the power

consumption of module i located within the current drawing region

of power pin j, I

be the capacity of power pin j, (x

) be the

center of module i, (x

) be the location of pin j,andd

i,j

be the

Our empirical choice of these values is to set them all equal. We

ﬁx these weights constant during the entire ﬂoorplan optimization

process. One can tune the weights statically or dynamically to em-

phasize desired objectives.

squared Euclidean distance between module i and pin j.Letα

the self switching weight of module i deﬁned in Section 5.1.1. The

x direction force between free module i and ﬁxed pin j is then

(i, j)=

„

− 1

− x

i,j

· α

(6)

A similar deﬁnition follows for the force along the y direction. This

force is proportional to the distance between the module and the

pin, negative if the sum of the current being drawn from the mod-

ules in the current drawing region of the pin are greater than the ca-

pacity of the pin and positive otherwise, and in the range (−1, ∞).

Basically if the demand of block i is higher than the capacity of the

pin j, then the force pushes the modules away; otherwise, it pulls

the modules towards the pin.

5.2.3 Updating Iterative Forces

As mentioned previously, we update two kinds of forces dur-

ing each iteration: F

den

and F

. Speciﬁcally, we ﬁrst obtain the

location of the modules from the previous iteration and use them

to recompute the density of each region in the ﬂoorplan and at-

tractive or repulsive forces among the modules within a vicinity of

each power pin. The main motivation for this force update is to

satisfy the non-overlap constraint (via updating F

den

) and pin ca-

pacity constraint (via updating F

). In case these constraints are

not met in the current solution, we try to minimize the amount of

violation as much as possible by attempting another iteration. We

note that the pin constraint is easily satisﬁed, but not the overlap

constraint. Thus, our post-process explicitly removes the overlap

among the modules. Since e consists of F

den

and F

, e gets

updated and solved in each iteration.

5.2.4 Legalization

A simple heuristic is used to legalize the ﬂoorplan of the mod-

ules. Vertical and horizontal constraint graphs similar to those used

for the [16] are created based on the ﬂoorplan solution. The basic

idea is to derive the relative positions among the modules based

on the force-directed ﬂoorplanning, and use Sequence Pair [16] to

encode them to remove overlap. For each pair of modules, the hor-

izontal and vertical distance between their centers is compared. If

the horizontal distance is smaller than the vertical distance then

the appropriate constraint is added to the v ertical constraint graph.

Conversely, if the vertical distance is less, the appropriate constraint

is added to the horizontal constraint graph. If the modules overlap,

then these constraints will push the modules apart in the direction

that minimizes overall movement. Thus, the legalized modules re-

main close to their original locations. The constraint graphs ensure

that the ﬁnal ﬂoorplan is non-slicing and non-ov erlapping.

6. POWER NETWORK ANALYSIS

To evaluate the effectiveness of the two heuristics that were used

to guide noise-aware ﬂoorplanning, we use a SPICE model of the

on-chip power delivery network. We evaluate the beneﬁts of our

technique under the worst-case current consumption scenario. The

worst-case switching activity of an application is determined by

sampling microarchitectural activity of all modules over the dura-

tion of the simulation. By comparing module activity during dif fer-

ent program phases, we can determine the period where the highest

module switching occurs. Once the worst-case phase is identiﬁed,

the current proﬁle of each module is generated from the microarchi-

tectural simulator. This complex current waveform is used as input

in the SPICE module as piece-wise linear source (PWL) input. By

incorporating per-cycle current consumption proﬁle obtained from

our microarchitecture simulation, we are able to observe induced

noise effects as a direct function of the application’s behavior.

Based on the power supply noise of each module, we also calcu-

late the amount of decap r equired for the ﬂoorplan [26]. If V

noise

is the noise of a given module, V

limit

is the noise margin, Q is the

amount of charge dra wn by the module, then the amount of decap

required (C), can be estimated according to the following:

θ = max(1,

noise

limit

) (7)

8B-2

789

C =(1− θ

−1

)

limit

(8)

7. QUANTITATIVE ANALYSIS

We used SimpleScalar 3.0 and Wattch for microarchitectural pro-

ﬁling and simulation. We incorporated e xtensions to generate both

self and correlated module switch weights to be used in our ﬂoor-

planner. The power and current consumptions were based on a 5

GHz processor with 70nm process. Nine integer programs from

the SPEC2000 benchmark were used in this study. Each simula-

tion was fast-forwarded by 4 billion instructions and simulated for

100 million instructions.

LSQ RUU BTB L2$ IRF L1D$ ALU0 ALU1 ALU2 ALU3 ALU4 ALU5 L1I$ Bpred DTLB ITLB FALU0 FALU1 Freg

LSQ 28 0 20 13 20 2 10 10 10 10 10 10 11 20 0 11 10 10 12

RUU 0 26 8413200000058250 05

BTB 20 8 18 7 29 17 13 13 13 13 13 13 37 100 17 37 13 13 13

L2$ 13 4 7 16 14 28 12 12 12 12 12 12 21 7 26 21 4 4 7

IRF 20 13 29 14 10 17777777232917238 824

L1D$ 2 2 17 28 17 7 666666111793115 56

ALU0 10 0 13 12 7 6 3 100 100 100 100 100 15 13

15 66 66 4

ALU1 10 0 13 12 7 6 100 3 100 100 100 100 15 13 6 15 66 66 4

ALU2 10 0 13 12 7 6 100 100 3 100 100 100 15 13 6 15 66 66 4

ALU3 10 0 13 12 7 6 100 100 100 3 100 100 15 13 6 15 66 66 4

ALU4 10 0 13 12 7 6 100 100 100 100 3 100 15 13 6 15 66 66 4

ALU5 10 0 13 12 7 6 100 100 100 100 100 3 15 13 6 15 66 66 4

L1I$ 11 5 37 21 23 11 15 15 15 15 15 15 3 37 12 100 11 11 5

Bpred 20 8 100 7 29 17 13 13 13 13 13 13 37 3 17 37 13 13 13

DTLB 0 2 17 26 17 93 6 6 6 6 6 6 12 17 2 12 5 5 6

ITLB 11 5 37 21 23 11 15 15 15 15 15 15 100 37 12 1 11 11 5

FALU0 10 0 13 4 8 5 66 66 66 66 66 66 11 13 5 11 1 100 5

FALU1 10 0 13 4 8 5 66 66 66 66 66 66 11 13 5 11 100 1 5

Freg 12 5 13 7 24 6 4 4 4 4 4 4 5 13 6 5 5 5 0

Figure 4: Self and Correlated Switching Weights of All Mod-

ules

7.1 Self and Correlated Switching Weights

Figure 4 shows the both the average self switching weight and

correlated weight of all modules in a symmetric matrix table. The

forward diagonal in the matrix represents the self-switching weights

of each module and all the remaining locations represent the cor-

related switching weights.

The rows and columns are sorted in

the descending order of self-switching weights from left to right.

A higher self-switching weight indicates higher susceptibility to

di/dt problems. As shown, the load/store queue (LSQ) and regis-

ter update unit (RUU) carry more weights in comparison to other

modules. On the other hand, the weights of the modules that are

likely to be accessed e very cycle (turned on mostly) such as the L1

I-Cache and the I-TLB are lower. Some modules that are dormant,

only accessed once in a long while e.g. FPU register ﬁle, also ha ve

lower weights.

The correlation weights are used to place modules that switch si-

multaneously, away from each other in the ﬂoorplan. As expected,

branch predictor and BTB, I-Cache and I-TLB and D-Cache and

D-TLB are all highly correlated modules. In addition, it is also

observed that the ﬁrst six ALU modules are also highly correlated

for concurrency exists in integer instructions. These modules will

be directed away from each other to lessen the inductive noise by

removing clustering of modules.

7.2 Power Supply Noise Analysis

The noise-aware ﬂoorplan algorithm used both microarchitec-

tural metrics to guide module placement. In order to demonstrate

the noise-tolerance of the force-directed ﬂoorplan, we compare our

noise-aware ﬂoorplan to a baseline ﬂoorplan that minimizes total

wirelength. In our noise analysis, we assumed a V

of 1 volt

(for 70nm), and a maximum allowed noise margin of 10%. To

illustrate the noise analysis in more details, we depict the worse-

case noise for each module using gzip, a compute-bound program.

They are shown in Figure 5. Note that this graph is sorted from left

to right in the decreasing order of module self-switching activity.

As shown, the noise-aware ﬂoorplan signiﬁcantly suppresses the

noise experienced by modules with high switching activity as well

high current consumption. Almost all the ALUs that exhibit a fair

amount of switching activity and extremely high correlation with

each other, sho w signiﬁcant voltage noise reductions. For the inte-

ger r egister ﬁle (iregﬁle), the voltage noise w as reduced by 81.7%

Note that this matrix shows both self as well as correlated switches

which is why the diagonal is non-zero.

Although we proﬁled only SPECint2000, there are certain bench-

marks (e.g. data compression) that use ﬂoats and doubles.

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

lsq

ruu

btb

dcache2

iregfile

dcache

alu0

alu1

lu2

alu3

alu4

alu5

icache

bpred

dtlb

itlb

Voltage Swing (V)

Wire-length Noise-aware

Descending order of Self Switching Factor

Figure 5: Power Supply Noise at Modules for gzip

ruu & inst

scheduler

dcache

alu7

dtlb

alu3

alu5

alu4

alu2

alu1

fpregfile

lsq & ld/st

scheduler

iregfile

btb

falu3

falu1

falu2

alu8

itlb

alu6

falu4

icache

dcache2

bpred

fpregfile

itlb

alu8

alu2

alu3

falu3

iregfile

alu7

falu2

falu1

dcache2

lsq & ld/st

scheduler

btb

alu4

icache

alu1

alu5

alu6

bpred

dtlb

ruu & inst

scheduler

falu4

dcache

(a) wirelength-driven floorplan (a) noise-aware floorplan

Figure 6: Noise Tolerance for gzip. (Darker module has higher

noise)

in the noise-aware ﬂoorplan. We do observe that the L1 Data Cache

(dcache) and ALU0 have a higher noise violation in the noise-aware

ﬂoorplan as compared to the baseline, although it has a high self-

switching factor. This is due to the fact that other units, especially

the remaining ALUs that will have a higher priority when it comes

to being directed tow ards power pins because of their strong cor-

relation. The L1 D-Cache does not exhibit a high correlation with

other units and will hence is less important than other modules that

have a higher potential of noise margin violations. Nonetheless,

it is also be noted that the increased violations in the noise-aware

ﬂoorplan are only slightly above the allowed 10% margin, making

the overall solution much more noise tolerant.

7.3 Floorplan and Decap Requirement

We now present the baseline wire-length driv en ﬂoorplan and the

noise-aware one in Figure 6. The color code in each module repr e-

sents the degree of noise tolerance. The cross (+) in the ﬁgure rep-

resents the location of the power pins. The area of the wire-length

driv en ﬂoorplan is 69.35 mm

with a total wirelength of 804.86

0.05

0.1

0.15

0.2

0.25

0.3

bzip crafty eon gap gzip mcf perl twolf Average

Noise Violation Occurences

Wire-length NoiseAware

Figure 7: Noise Margin Violation

8B-2

790

Noise-Direct: A Technique for Power Supply Noise Aware Floorplanning Using Microarchitecture Profiling

Figures

Citations

Method and Apparatus for Detecting Clock Gating Opportunities in a Pipelined Electronic Circuit Design

A Floorplan-Aware Dynamic Inductive Noise Controller for Reliable Processor Design

Modeling and Tools for Power Supply Variations Analysis in Networks-on-Chip

Power Gating Aware Task Scheduling in MPSoC

On-line MPSoC Scheduling Considering Power Gating Induced Power/Ground Noise

References

Wattch: a framework for architectural-level power analysis and optimizations

Generic global placement and floorplanning

Optimal orientations of cells in slicing floorplan designs

Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning

Design and implementation of the POWER5 microprocessor

Related Papers (5)

Wattch: a framework for architectural-level power analysis and optimizations

Understanding voltage variations in chip multiprocessors using a distributed power-delivery network

Pipeline muffling and a priori current ramping: architectural techniques to reduce high-frequency inductive noise

Decoupling capacitance allocation and its application to power-supply noise-aware floorplanning

Microarchitectural simulation and control of di/dt-induced power supply voltage variation

Frequently Asked Questions (11)

Q1. What are the contributions mentioned in the paper "Noise-direct: a technique for power supply noise aware floorplanning using microarchitecture profiling" ?

Q2. How can the authors ensure that the average case is more noise tolerant?

Q3. What is the effect of di/dt on the processor?

Q4. What is the purpose of noise-tolerant floorplans?

Q5. How is the worst-case switching of an application determined?

Q6. What is the main reason for the excessive power demand in modern day processor design?

Q7. What is the way to address the worst-case noise?

Q8. What is the metric used to determine the amount of correlation between modules?

Q9. What is the correlation factor for the gating of modules?

Q10. What is the way to address the worst case inductive noise effects?

Q11. What is the difference between the force-directed method and other methods?