What is the effect of smoothing potential levels on the placement area?

Smoothing potential levels reduce “mountain” (highpotential regions) heights so that movable blocks can smoothly spread to the whole placement area.

What is the white space of the root?

The white space of the root is 3, and it should always be greater than or equal to 0, or the blocks can never fit into the placement region.

How does the algorithm reduce the densities of overflowed bins?

The authors divide the placement region into uniform bins, and then their algorithm iteratively reduces the densities of overflowed bins by sliding the cells from denser bins to sparser ones while the cell order is preserved.

How does the WDP algorithm solve the bipartite matching problem?

The WDP algorithm finds a group of exchangeable cells inside a given window and formulates a bipartite matching problem by matching the cells to all empty slots in the window.

How can the authors obtain a smoother base potential?

Applying convolution to the Gaussian function G with the basepotential P asP ′(x, y) = G(x, y) ∗ P (x, y) (15)we can obtain a smoother base potential P ′.

What is the way to solve the bipartite matching problem?

Though the bipartite matching problem can optimally be solved in polynomial time, the optimal assignment cannot guarantee the optimal HPWL result because the HPWL cost of a cell connected to each empty slot depends on the positions of other connected cells.

How do the authors allocate white space to the two children?

2) If the two children both have white spaces greater than or equal to 0, the authors allocate the white space proportional to their original white space amount.

What is the average HPWL of the placer?

On average, their resulting HPWL is smaller than that of APlace 2.0 by 5% and similar to mPL6’s, and their placer is 10.32× and 2.56× faster than APlace 2.0 and mPL6, respectively.

(Open Access) NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs With Preplaced Blocks and Density Constraints (2008) | Tung-Chieh Chen

Q: What are the contributions in "Ntuplace3: an analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints" ?

The authors propose a high-quality analytical placement algorithm considering wirelength, preplaced blocks, and density based on the log-sum-exp wirelength model proposed by Naylor et al. and the multilevel framework. To handle preplaced blocks, the authors use a two-stage smoothing technique, i. e., Gaussian smoothing followed by level smoothing, to facilitate block spreading during global placement ( GP ). The authors further use the conjugate gradient method with dynamic step-size control to speed up the GP and macro shifting to find better macro positions.

1228 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008

NTUplace3: An Analytical Placer for Large-Scale

Mixed-Size Designs With Preplaced Blocks and

Density Constraints

Tung-Chieh Chen, Student Member, IEEE, Zhe-Wei Jiang, Student Member, IEEE, Tien-Chang Hsu,

Hsin-Chen Chen, Student Member, IEEE, and Yao-Wen Chang, Member, IEEE

Abstract—In addition to wirelength, modern placers need to

consider various constraints such as preplaced blocks and den-

sity. We propose a high-quality analytical placement algorithm

considering wirelength, preplaced blocks, and density based on

the log-sum-exp wirelength model proposed by Naylor et al. and

the multilevel framework. To handle preplaced blocks, we use a

two-stage smoothing technique, i.e., Gaussian smoothing followed

by level smoothing, to facilitate block spreading during global

placement (GP). The density is controlled by white-space reallo-

cation using partitioning and cut-line shifting during GP and cell

sliding during detailed placement. We further use the conjugate

gradient method with dynamic step-size control to speed up the GP

and macro shifting to ﬁnd better macro positions. Experimental

results show that our placer obtains very high-quality results.

Index Terms—Legalization (LG), physical design, placement.

I. INTRODUCTION

S process technology advances, the feature size is getting

smaller and smaller. As a result, billions of transistors can

be integrated in a single chip. Meanwhile, the intellectual prop-

erty modules and predesigned macro blocks (such as embedded

memories, analog blocks, predesigned datapaths, etc.) are often

reused. As a result, modern advanced IC designs often contain

millions of standard cells and hundreds of macros with different

sizes. Hence, modern placers need to handle the instances with

large-scale mixed-size macros and standard cells.

Manuscript received September 17, 2006; revised August 18, 2007. This

work was supported in part by MediaTek Inc., National Science Council of

Taiwan, R.O.C., under Grant NSC 94-2215-E-002-030 and Grant NSC 94-

2752-E-002-008-PAE, and in part by RealTek Semiconductor Corporation.

This paper was recommended by Associate Editor C. J. Alpert.

T.-C. Chen is with the Graduate Institute of Electronic Engineering, National

Taiwan University, Taipei 106, Taiwan, R.O.C. He is also with SpringSoft, Inc.,

Hsinchu 300, Taiwan, R.O.C. (e-mail: donnie@eda.ee.ntu.edu.tw).

Z.-W. Jiang is with the Graduate Institute of Electronic Engineering, Na-

tional Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: crazying@eda.

ee.ntu.edu.tw).

T.-C. Hsu was with the Graduate Institute of Electronic Engineering, Na-

tional Taiwan University, Taipei 106, Taiwan, R.O.C. He is now with Synopsys

Taiwan Ltd., Taipei 110, Taiwan, R.O.C. (e-mail: tchsu@eda.ee.ntu.edu.tw).

H.-C. Chen was with the Department of Electrical Engineering, National

Taiwan University, Taipei 106, Taiwan, R.O.C. He is currently serving in the

military in Taiwan, R.O.C. (e-mail: indark@eda.ee.ntu.edu.tw).

Y.-W. Chang is with the Department of Electrical Engineering and Graduate

Institute of Electronics Engineering, National Taiwan University, Taipei 106,

Taiwan, R.O.C., and also with Waseda University, Tokyo 169-8050, Japan

(e-mail: ywchang@cc.ee.ntu.edu.tw).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCAD.2008.923063

In addition, high-performance IC designs usually require sig-

niﬁcant white space for further performance optimization, such

as buffer insertion and gate sizing. Therefore, density control

and white-space allocation (WSA) have become very impor-

tant. A wirelength-driven placer without considering placement

density tends to pack blocks together to minimize wirelength.

However, an overcongested region may not have enough white

space for buffer insertion and thus degrade the chip perfor-

mance. Although some congestion-aware placement algorithms

were proposed [3], [4], these algorithms intend to minimize the

routing congestion, which is different from the density control

since the density can still be high for some regions even if no

routing overﬂows occur in those regions.

Further, modern chip designs often consist of many pre-

placed blocks, such as analog blocks, memory blocks, and/or

I/O buffers, which are ﬁxed in the chip and cannot overlap with

other blocks. These preplaced blocks impose more constraints

on the placement problem. A placement algorithm without

considering preplaced blocks may generate illegal placement

or inferior solutions.

Most of the recently proposed placement algorithms can

handle the mixed-size constraints [5]–[11]. However, very few

modern mixed-size placement algorithms can satisfactorily

handle preplaced blocks and the chip density. In this paper,

we present a high-quality mixed-size analytical placement algo-

rithm considering preplaced blocks and density constraints. Our

placer is based on a three-stage technique: 1) global placement

(GP); 2) legalization (LG); and 3) detailed placement (DP). It

has the following distinguished features.

1) Based on the log-sum-exp wirelength model

proposed

by Naylor et al. [2] and the multilevel framework,

our placer consistently generates high-quality mixed-size

placement results.

2) To solve the unconstrained minimization placement prob-

lem, we use the conjugate gradient (CG) method with

dynamic step sizes. Experimental results show that the

method leads to signiﬁcant run-time speedups.

3) Our placer handles preplaced blocks by a two-stage

smoothing technique. The preplaced block potential is

ﬁrst smoothed by a Gaussian function to r emove the

rugged potential regions, and then the potential levels are

The log-sum-exp wirelength model is a patented technology [2], and use

requires a license from Synopsys.

CHEN et al.: NTUplace3: ANALYTICAL PLACER FOR LARGE-SCALE MIXED-SIZE DESIGNS 1229

TAB LE I

OMPARISONS BETWEEN OUR PLACER AND APLACE AND mPL; ALL THE PLACERS ARE BASED ON THE ANALYTI C AL TECHNIQUE AND THE

LOG-SUM-EXP WIRELENGTH MODEL.UNKNOWN:NOT MENTIONED IN THE CORRESPONDING WORK

smoothed so that movable blocks can effectively spread

to the whole placement region.

4) Density constraints are considered during both GP and

DP. We reallocate the white space using partitioning and

cut-line shifting to remove density overﬂows between

different levels of GP. In DP, a cell-sliding technique is

applied to reduce the density overﬂow.

5) A macro shifting technique is used between levels of GP

to ﬁnd better macro positions that are easier for LG.

6) A look-ahead LG scheme during GP i s used to obtain

a better legal placement result. The legalizer is called

several times near the end of GP. This technique can

reduce the gap between GP and LG.

Table I summarizes the comparisons between our placer and

two state-of-the-art analytical placers, i.e., APlace 2.0/3.0 [12],

[13] and mPL5/6 [6], [14], which are also based on the log-sum-

exp wirelength model. In the table, “Unknown” denotes that the

corresponding method is not available in the literature.

The remainder of this paper is organized as follows.

Section II gives the analytical model used in our placer.

Our core placement techniques are explained in Section I II.

Section IV reports the experimental results. Finally, the con-

clusions are given in Section V.

II. A

NA LYT I C A L PLACEMENT MODEL

The circuit placement problem can be formulated as a

hypergraph H =(V,E) placement problem. Let the vertices

V = {v

,...,v

} represent blocks and the hyperedges E =

,...,e

} represent nets. Let x

and y

be the x and y

coordinates of the center of block v

, and let a

be the area of

the block v

. The circuit may contain some preplaced blocks

that have ﬁxed x and y coordinates and cannot be moved. We

intend to determine the optimal positions of movable blocks so

that the total wirelength is minimized and there is no overlap

among blocks. The placement problem is usually solved in

three stages: 1) GP; 2) LG; and 3) DP. GP evenly distributes the

blocks and ﬁnds the best position for each block to minimize the

target cost (e.g., wirelength). Then, LG removes all overlaps.

Finally, DP reﬁnes the solution.

Fig. 1 shows the notation used in this paper.

Fig. 1. Notation used in this paper.

To evenly distribute the blocks, we divide the placement

region into uniform nonoverlapping bin grids. Then, the GP

problem can be formulated as a constrained minimization prob-

lem as follows:

min W (x, y)

s.t. D

(x, y) ≤ M

, for each bin b (1)

where W (x, y) is the wirelength function, D

(x, y) is the

potential function that is the total area of movable blocks in

bin b, and M

is the maximum allowable area of movable blocks

in bin b. M

can be computed by

= t

density

− P

) (2)

where t

density

is a user-speciﬁed target density value for each

bin, w

) is the width (height) of bin b, and P

is the base

potential that equals the preplaced block area in bin b. Note that

is a ﬁxed value as long as all preplaced block positions are

given and the bin size is determined.

The wirelength W (x, y) is deﬁned as the total half-perimeter

wirelength (HPWL)

W (x, y)=



net e



max

∈e

− x

| + max

∈e

− y



. (3)

Since W (x, y) is not smooth and nonconvex, it is hard to

directly minimize it. Thus, several smooth wirelength approx-

imation functions are proposed, such as quadratic wirelength

1230 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008

[15], [16], L

-norm wirelength [13], [14], and log-sum-exp

wirelength [2], [5], [6]. The log-sum-exp wirelength model



e∈E



(log



∈e

exp(x

/γ) + log



∈e

exp(−x

/γ)

+ log



∈e

exp(y

/γ) + log



∈e

exp(−y

/γ)



(4)

proposed in [2] achieves the best result among these three

models [14]. When γ is small, the log-sum-exp wirelength is

close to the HPWL [2]. However, due to the computer precision,

we can only choose a reasonably small γ, for example, 1%

long of the chip width, so that it will not cause any arithmetic

overﬂow.

Since the density D

(x, y) is neither smooth nor differen-

tiable, mPL [14] uses inverse Laplace transformation to smooth

the density, whereas APlace [5] uses a bell-shaped function

for each block to smooth the density. We express the function

(x, y) as

(x, y)=



v∈V

(b, v)P

(b, v) (5)

where P

and P

are the overlap functions of bin b and block v

along the x- and y-directions. We adopt the bell-shaped poten-

tial function [5] p

to smooth P

. p

is deﬁned by

(b, v)=







1 − ad

, 0 ≤ d

≤

+ w



−

−2w



≤d

≤

+2w

≤ d

(6)

where

a =

+2w

)(w

+4w

)

b =

+4w

)

(7)

is the bin width, w

is the block width, and d

is the center-

to-center distance of the block v and the bin b in the x-direction.

Fig. 2(a) and (b) shows the original and smoothed overlap

functions, respectively. The range of the block’s potential is

+4w

in the x-direction. The smooth y-potential function

(b, v) can be deﬁned in a similar way, and the range of

the block’s potential is h

+4h

in the y-direction. By doing

so, the nonsmooth function D

(x, y) can be replaced by a

smooth one

(x, y)=



v∈V

(b, v)p

(b, v) (8)

where c

is a normalization factor so that the total potential of

a block equals its area.

The quadratic penalty method is used to solve (1), implying

that we solve a sequence of unconstrained minimization prob-

Fig. 2. (a) Overlap function P

(b, v). (b) Smoothed overlap function

(b, v).

lems of the form

min W (x, y)+λ





(x, y) − M



(9)

with increasing λ’s. The solution of the previous problem is

used as the initial solution for the next one. We solve the

unconstrained problem in (9) by the CG method. Further, we

observe that CG with line search in [5] is not efﬁcient since

the line-search method takes most portion of its runtime for the

minimization process. Therefore, we use CG with a dynamic

step size to minimize (9). Numerical results show that our

approach is much faster than that used in [5].

III. P

ROPOSED ALGORITHM

As mentioned earlier, our placement algorithm consists of

three stages: 1) GP; 2) LG; and 3) DP. We detail each stage in

the following sections.

A. GP

Our placement algorithm is based on the aforementioned an-

alytical technique and the multilevel framework. The multilevel

framework adopts a two-stage ﬂow of clustering followed by

declustering. At each level of declustering, GP is performed to

ﬁnd the best positions for macros and standard cells. For the

analytical search, the CG search with dynamic step-size control

is adopted to speed up the search for a desirable solution. To

handle preplaced blocks, we resort to a two-stage smoothing

technique of Gaussian smoothing followed by level smoothing

to smooth the search space and thus facilitate cell spreading. To

control t he chip density, we apply white space distribution to

allocate more white space to areas with density overﬂows.

To facilitate macro and cell LG, we further apply macro shifting

and look-ahead LG (described in Section III-B2) at the GP

stage. We detail the aforementioned techniques in the following

sections.

1) Multilevel Framework: We use the multilevel framework

for GP to improve the scalability. Our algorithm is summarized

in Fig. 3. Lines 1–4 are the coarsening stage. The initial

placement is generated in line 5. Lines 6–23 are uncoarsening

stages. The details of each step are explained as follows.

During the coarsening stage, we cluster blocks to reduce the

number of movable blocks. The hierarchy of clusters is built

by the ﬁrst-choice (FC) clustering algorithm [14]. To apply the

CHEN et al.: NTUplace3: ANALYTICAL PLACER FOR LARGE-SCALE MIXED-SIZE DESIGNS 1231

Fig. 3. Our multilevel GP algorithm.

FC clustering algorithm, we examine each block in the circuit

one-by-one, ﬁnd the block with the highest connectivity, and

cluster these two blocks. We control the area of a clustered

block so that it will not be 1.5 times larger than the average area

of clustered blocks. The clustering process continues until the

number of blocks is reduced by ﬁve times, and then we obtain a

level of clustered circuit. The FC clustering algorithm is applied

several times until the block number in the resulting clustered

circuit is less than a user-speciﬁed number n

max

, for example,

6000 by default.

After clustering, the initial placement for the coarsest level

is generated by minimizing the quadratic wirelength using the

CG method, the same method as in quadratic placement.

Then, we solve the placement problem from the coarsest

level to t he ﬁnest level. The placement for the current level

provides the initial placement for the next level. The horizontal/

vertical grid numbers are set to the square root of the num-

ber of clusters in the current level, i.e., grid_num_v =

grid_num_h =



BlockNumber(H

level

). Then, the base po-

tential P

for each bin is computed, and the maximum area

of movable blocks M

is updated accordingly. In addition, the

value of λ is initialized according to the strength of wirelength

and density gradients as

λ =



|∂W(x, y)|





∂

(x, y)



(10)

and the value of λ is increased by two times for each iteration.

A CG solver with dynamic step-size control is used to solve the

constrained minimization problem in (1) (in lines 10–17).

During uncoarsening, all blocks inside a cluster inherit the

center position of the original cluster. Macro shifting for LG

and WSA for density control are applied between uncoarsening

levels. We will explain them in Sections III-A4 and A5, respec-

tively. Then, the blocks are declustered, providing the initial

placement for the next level.

To measure the evenness of the block distribution, discrep-

ancy is used in [5]. They deﬁne discrepancy as the maximum

ratio of the actual total block area to the maximum allowable

block area over all windows within the chip. Unlike their

method, we use the overﬂow ratio to measure the evenness

of block distribution. We deﬁne the overﬂow ratio as the total

overﬂow area in all bins over the area of total movable blocks

as follows:

overflow_ratio =



max (D

(x, y) − M

, 0)



total movable area

(11)

where overflow_ratio ≥ 0. The overﬂow ratio has a more

global view since it considers all overﬂow areas in the place-

ment region while discrepancy only considers the maximum

density of a window in the placement region. The GP stage

stops when the overﬂow ratio is less than or equal to a user-

speciﬁed target value, which is 0 by default.

Fig. 4 shows the block spreading process (Lines 10–17 of the

algorithm in Fig. 3). Each time we increase the value of λ,solve

the nonlinear equation, and obtain a placement result with fewer

overlaps. The block spreading process continues until the total

overﬂow ratio is small enough. Then, the spreading process

stops, and all blocks are declustered into the next level.

2) CG Search With Dynamic Step Sizes: We use the CG

algorithm to minimize (9). APlace uses the golden section line

search to ﬁnd the optimal step size, which takes most portion of

its runtime during the minimization process. Instead, our step

size is computed by a more efﬁcient method. After computing

the CG direction d

, the step size α

is computed by

d



(12)

where s is a user-speciﬁed scaling factor, and w

is the bin

width. By doing so, we can limit the step size of block spreading

since the total quadratic Euclidean movement is ﬁxed as



∈V



∆x

+∆y



= α



= s

(13)

where ∆x

and ∆y

are the amount of movement along the x-

and y-directions for the block v

in each iteration, respectively.

The value of s affects the precision of objective minimiza-

tion; smaller s values lead to better results but longer runtime.

In Fig. 5, the CPU times and HPWLs are plotted as functions

of the step sizes. The CPU time decreases as the step size s

becomes larger. In contrast, the HPWL decreases as the step

size s gets smaller. The results show that the step size s igniﬁ-

cantly affects t he running time and the solution quality. In our

1232 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008

Fig. 4. Block spreading process. As the overlap weight λ increases, the overlaps are gradually reduced. The process stops when the total overﬂow ratio is small

enough.

Fig. 5. CPU times and HPWLs resulting from different step sizes based on

the circuit adaptec1.

implementation, we set s between 0.2 and 0.3 to obtain a good

tradeoff between runtime and quality.

Fig. 6 shows our CG algorithm for minimizing the placement

objective during GP.

3) Base Potential Smoothing: Preplaced blocks predeﬁne

the base potential, which signiﬁcantly affects block spreading

(Fig. 7). Since the base potential P

is not smooth, it forms

mountains that prevent movable blocks from passing through

these regions. Therefore, we shall smooth the base potential to

facilitate block spreading. We ﬁrst use the Gaussian function to

smooth the base potential change, removing the rugged regions

in the base potential, and then smooth the base potential level so

that movable blocks can spread to the whole placement region.

The base potential of each block can be calculated by the

bell-shaped function. However, we observe that the potential

generated by the bell-shaped function has “valleys” between the

adjacent regions of blocks. Fig. 8(a) shows the base potential

generated by the bell-shaped function. The z-coordinate is the

Fig. 6. Our nonlinear placement objective solver. This algorithm is called in

Line 11 of the multilevel GP in Fig. 3.

value of P

/(w

). If a bin has z>1, it means that the

potential in the bin is larger than the bin area. There are several

valleys in the bottom-left regions, as shown in the ﬁgure; these

regions do not have free space, but their potentials are so low

that a large number of blocks may spread to these regions. To

avoid this problem, we calculate the exact density as the base

potential and then use the Gaussian function to smooth the base

potential. The 2-D Gaussian has the form

G(x, y)=

2πσ

−

2σ

(14)

where σ is the standard deviation of the distribution. Apply-

ing convolution to the Gaussian function G with the base

NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs With Preplaced Blocks and Density Constraints

Figures

Citations

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

Electronic Design Automation: Synthesis, Verification, and Test

SimPL: An Effective Placement Algorithm

Chip Placement with Deep Reinforcement Learning

A graph placement methodology for fast chip design

References

A shortest augmenting path algorithm for dense and sparse linear assignment problems

IEEE transactions on computer-aided design of integrated circuits and systems : a publication of the IEEE Circuits and Systems Society

GORDIAN: VLSI placement by quadratic programming and slicing optimization

Generic global placement and floorplanning

Multilevel generalized force-directed method for circuit placement

Related Papers (5)

Non-linear optimization system and method for wire length and delay optimization for an automatic electric circuit placer

mPL6: enhanced multilevel mixed-size placement

FastPlace 3.0: A Fast Multilevel Quadratic Placement Algorithm with Placement Congestion Control

Kraftwerk2—A Fast Force-Directed Quadratic Placement Approach Using an Accurate Net Model

Implementation and extensibility of an analytic placer

Frequently Asked Questions (10)

Q1. What are the contributions in "Ntuplace3: an analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints" ?

Q2. What is the effect of smoothing potential levels on the placement area?

Q3. What is the white space of the root?

Q4. How does the algorithm reduce the densities of overflowed bins?

Q5. How does the WDP algorithm solve the bipartite matching problem?

Q6. How can the authors obtain a smoother base potential?

Q7. What is the way to solve the bipartite matching problem?

Q8. How do the authors allocate white space to the two children?

Q9. What is the CG search with dynamic step size control?

Q10. What is the average HPWL of the placer?