scispace - formally typeset
Open AccessJournal ArticleDOI

NTUplace3: An Analytical Placer for Large-Scale Mixed-Size Designs With Preplaced Blocks and Density Constraints

TLDR
This work proposes a high-quality analytical placement algorithm considering wirelength, preplaced blocks, and density based on the log-sum-exp wirelength model proposed by Naylor and the multilevel framework and uses the conjugate gradient method to find better macro positions.
Abstract
In addition to wirelength, modern placers need to consider various constraints such as preplaced blocks and density. We propose a high-quality analytical placement algorithm considering wirelength, preplaced blocks, and density based on the log-sum-exp wirelength model proposed by Naylor and the multilevel framework. To handle preplaced blocks, we use a two-stage smoothing technique, i.e., Gaussian smoothing followed by level smoothing, to facilitate block spreading during global placement (GP). The density is controlled by white-space reallocation using partitioning and cut-line shifting during GP and cell sliding during detailed placement. We further use the conjugate gradient method with dynamic step-size control to speed up the GP and macro shifting to find better macro positions. Experimental results show that our placer obtains very high-quality results.

read more

Content maybe subject to copyright    Report

1228 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008
NTUplace3: An Analytical Placer for Large-Scale
Mixed-Size Designs With Preplaced Blocks and
Density Constraints
Tung-Chieh Chen, Student Member, IEEE, Zhe-Wei Jiang, Student Member, IEEE, Tien-Chang Hsu,
Hsin-Chen Chen, Student Member, IEEE, and Yao-Wen Chang, Member, IEEE
Abstract—In addition to wirelength, modern placers need to
consider various constraints such as preplaced blocks and den-
sity. We propose a high-quality analytical placement algorithm
considering wirelength, preplaced blocks, and density based on
the log-sum-exp wirelength model proposed by Naylor et al. and
the multilevel framework. To handle preplaced blocks, we use a
two-stage smoothing technique, i.e., Gaussian smoothing followed
by level smoothing, to facilitate block spreading during global
placement (GP). The density is controlled by white-space reallo-
cation using partitioning and cut-line shifting during GP and cell
sliding during detailed placement. We further use the conjugate
gradient method with dynamic step-size control to speed up the GP
and macro shifting to find better macro positions. Experimental
results show that our placer obtains very high-quality results.
Index Terms—Legalization (LG), physical design, placement.
I. INTRODUCTION
A
S process technology advances, the feature size is getting
smaller and smaller. As a result, billions of transistors can
be integrated in a single chip. Meanwhile, the intellectual prop-
erty modules and predesigned macro blocks (such as embedded
memories, analog blocks, predesigned datapaths, etc.) are often
reused. As a result, modern advanced IC designs often contain
millions of standard cells and hundreds of macros with different
sizes. Hence, modern placers need to handle the instances with
large-scale mixed-size macros and standard cells.
Manuscript received September 17, 2006; revised August 18, 2007. This
work was supported in part by MediaTek Inc., National Science Council of
Taiwan, R.O.C., under Grant NSC 94-2215-E-002-030 and Grant NSC 94-
2752-E-002-008-PAE, and in part by RealTek Semiconductor Corporation.
This paper was recommended by Associate Editor C. J. Alpert.
T.-C. Chen is with the Graduate Institute of Electronic Engineering, National
Taiwan University, Taipei 106, Taiwan, R.O.C. He is also with SpringSoft, Inc.,
Hsinchu 300, Taiwan, R.O.C. (e-mail: donnie@eda.ee.ntu.edu.tw).
Z.-W. Jiang is with the Graduate Institute of Electronic Engineering, Na-
tional Taiwan University, Taipei 106, Taiwan, R.O.C. (e-mail: crazying@eda.
ee.ntu.edu.tw).
T.-C. Hsu was with the Graduate Institute of Electronic Engineering, Na-
tional Taiwan University, Taipei 106, Taiwan, R.O.C. He is now with Synopsys
Taiwan Ltd., Taipei 110, Taiwan, R.O.C. (e-mail: tchsu@eda.ee.ntu.edu.tw).
H.-C. Chen was with the Department of Electrical Engineering, National
Taiwan University, Taipei 106, Taiwan, R.O.C. He is currently serving in the
military in Taiwan, R.O.C. (e-mail: indark@eda.ee.ntu.edu.tw).
Y.-W. Chang is with the Department of Electrical Engineering and Graduate
Institute of Electronics Engineering, National Taiwan University, Taipei 106,
Taiwan, R.O.C., and also with Waseda University, Tokyo 169-8050, Japan
(e-mail: ywchang@cc.ee.ntu.edu.tw).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD.2008.923063
In addition, high-performance IC designs usually require sig-
nificant white space for further performance optimization, such
as buffer insertion and gate sizing. Therefore, density control
and white-space allocation (WSA) have become very impor-
tant. A wirelength-driven placer without considering placement
density tends to pack blocks together to minimize wirelength.
However, an overcongested region may not have enough white
space for buffer insertion and thus degrade the chip perfor-
mance. Although some congestion-aware placement algorithms
were proposed [3], [4], these algorithms intend to minimize the
routing congestion, which is different from the density control
since the density can still be high for some regions even if no
routing overflows occur in those regions.
Further, modern chip designs often consist of many pre-
placed blocks, such as analog blocks, memory blocks, and/or
I/O buffers, which are fixed in the chip and cannot overlap with
other blocks. These preplaced blocks impose more constraints
on the placement problem. A placement algorithm without
considering preplaced blocks may generate illegal placement
or inferior solutions.
Most of the recently proposed placement algorithms can
handle the mixed-size constraints [5]–[11]. However, very few
modern mixed-size placement algorithms can satisfactorily
handle preplaced blocks and the chip density. In this paper,
we present a high-quality mixed-size analytical placement algo-
rithm considering preplaced blocks and density constraints. Our
placer is based on a three-stage technique: 1) global placement
(GP); 2) legalization (LG); and 3) detailed placement (DP). It
has the following distinguished features.
1) Based on the log-sum-exp wirelength model
1
proposed
by Naylor et al. [2] and the multilevel framework,
our placer consistently generates high-quality mixed-size
placement results.
2) To solve the unconstrained minimization placement prob-
lem, we use the conjugate gradient (CG) method with
dynamic step sizes. Experimental results show that the
method leads to significant run-time speedups.
3) Our placer handles preplaced blocks by a two-stage
smoothing technique. The preplaced block potential is
first smoothed by a Gaussian function to r emove the
rugged potential regions, and then the potential levels are
1
The log-sum-exp wirelength model is a patented technology [2], and use
requires a license from Synopsys.
0278-0070/$25.00 © 2008 IEEE

CHEN et al.: NTUplace3: ANALYTICAL PLACER FOR LARGE-SCALE MIXED-SIZE DESIGNS 1229
TAB LE I
C
OMPARISONS BETWEEN OUR PLACER AND APLACE AND mPL; ALL THE PLACERS ARE BASED ON THE ANALYTI C AL TECHNIQUE AND THE
LOG-SUM-EXP WIRELENGTH MODEL.UNKNOWN:NOT MENTIONED IN THE CORRESPONDING WORK
smoothed so that movable blocks can effectively spread
to the whole placement region.
4) Density constraints are considered during both GP and
DP. We reallocate the white space using partitioning and
cut-line shifting to remove density overflows between
different levels of GP. In DP, a cell-sliding technique is
applied to reduce the density overflow.
5) A macro shifting technique is used between levels of GP
to find better macro positions that are easier for LG.
6) A look-ahead LG scheme during GP i s used to obtain
a better legal placement result. The legalizer is called
several times near the end of GP. This technique can
reduce the gap between GP and LG.
Table I summarizes the comparisons between our placer and
two state-of-the-art analytical placers, i.e., APlace 2.0/3.0 [12],
[13] and mPL5/6 [6], [14], which are also based on the log-sum-
exp wirelength model. In the table, “Unknown” denotes that the
corresponding method is not available in the literature.
The remainder of this paper is organized as follows.
Section II gives the analytical model used in our placer.
Our core placement techniques are explained in Section I II.
Section IV reports the experimental results. Finally, the con-
clusions are given in Section V.
II. A
NA LYT I C A L PLACEMENT MODEL
The circuit placement problem can be formulated as a
hypergraph H =(V,E) placement problem. Let the vertices
V = {v
1
,v
2
,...,v
n
} represent blocks and the hyperedges E =
{e
1
,e
2
,...,e
m
} represent nets. Let x
i
and y
i
be the x and y
coordinates of the center of block v
i
, and let a
i
be the area of
the block v
i
. The circuit may contain some preplaced blocks
that have fixed x and y coordinates and cannot be moved. We
intend to determine the optimal positions of movable blocks so
that the total wirelength is minimized and there is no overlap
among blocks. The placement problem is usually solved in
three stages: 1) GP; 2) LG; and 3) DP. GP evenly distributes the
blocks and finds the best position for each block to minimize the
target cost (e.g., wirelength). Then, LG removes all overlaps.
Finally, DP refines the solution.
Fig. 1 shows the notation used in this paper.
Fig. 1. Notation used in this paper.
To evenly distribute the blocks, we divide the placement
region into uniform nonoverlapping bin grids. Then, the GP
problem can be formulated as a constrained minimization prob-
lem as follows:
min W (x, y)
s.t. D
b
(x, y) M
b
, for each bin b (1)
where W (x, y) is the wirelength function, D
b
(x, y) is the
potential function that is the total area of movable blocks in
bin b, and M
b
is the maximum allowable area of movable blocks
in bin b. M
b
can be computed by
M
b
= t
density
(w
b
h
b
P
b
) (2)
where t
density
is a user-specified target density value for each
bin, w
b
(h
b
) is the width (height) of bin b, and P
b
is the base
potential that equals the preplaced block area in bin b. Note that
M
b
is a fixed value as long as all preplaced block positions are
given and the bin size is determined.
The wirelength W (x, y) is defined as the total half-perimeter
wirelength (HPWL)
W (x, y)=
net e
max
v
i
,v
j
e
|x
i
x
j
| + max
v
i
,v
j
e
|y
i
y
j
|
. (3)
Since W (x, y) is not smooth and nonconvex, it is hard to
directly minimize it. Thus, several smooth wirelength approx-
imation functions are proposed, such as quadratic wirelength

1230 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008
[15], [16], L
p
-norm wirelength [13], [14], and log-sum-exp
wirelength [2], [5], [6]. The log-sum-exp wirelength model
γ
eE
(log
v
k
e
exp(x
k
) + log
v
k
e
exp(x
k
)
+ log
v
k
e
exp(y
k
) + log
v
k
e
exp(y
k
)
(4)
proposed in [2] achieves the best result among these three
models [14]. When γ is small, the log-sum-exp wirelength is
close to the HPWL [2]. However, due to the computer precision,
we can only choose a reasonably small γ, for example, 1%
long of the chip width, so that it will not cause any arithmetic
overflow.
Since the density D
b
(x, y) is neither smooth nor differen-
tiable, mPL [14] uses inverse Laplace transformation to smooth
the density, whereas APlace [5] uses a bell-shaped function
for each block to smooth the density. We express the function
D
b
(x, y) as
D
b
(x, y)=
vV
P
x
(b, v)P
y
(b, v) (5)
where P
x
and P
y
are the overlap functions of bin b and block v
along the x- and y-directions. We adopt the bell-shaped poten-
tial function [5] p
x
to smooth P
x
. p
x
is defined by
p
x
(b, v)=
1 ad
2
x
, 0 d
x
w
v
2
+ w
b
b
d
x
w
v
2
2w
b
2
,
w
v
2
+w
b
d
x
w
v
2
+2w
b
0,
w
v
2
+2w
b
d
x
(6)
where
a =
4
(w
v
+2w
b
)(w
v
+4w
b
)
b =
2
w
b
(w
v
+4w
b
)
(7)
w
b
is the bin width, w
v
is the block width, and d
x
is the center-
to-center distance of the block v and the bin b in the x-direction.
Fig. 2(a) and (b) shows the original and smoothed overlap
functions, respectively. The range of the block’s potential is
w
v
+4w
b
in the x-direction. The smooth y-potential function
p
y
(b, v) can be defined in a similar way, and the range of
the block’s potential is h
v
+4h
b
in the y-direction. By doing
so, the nonsmooth function D
b
(x, y) can be replaced by a
smooth one
ˆ
D
b
(x, y)=
n
vV
c
v
p
x
(b, v)p
y
(b, v) (8)
where c
v
is a normalization factor so that the total potential of
a block equals its area.
The quadratic penalty method is used to solve (1), implying
that we solve a sequence of unconstrained minimization prob-
Fig. 2. (a) Overlap function P
x
(b, v). (b) Smoothed overlap function
p
x
(b, v).
lems of the form
min W (x, y)+λ
b
ˆ
D
b
(x, y) M
b
2
(9)
with increasing λs. The solution of the previous problem is
used as the initial solution for the next one. We solve the
unconstrained problem in (9) by the CG method. Further, we
observe that CG with line search in [5] is not efficient since
the line-search method takes most portion of its runtime for the
minimization process. Therefore, we use CG with a dynamic
step size to minimize (9). Numerical results show that our
approach is much faster than that used in [5].
III. P
ROPOSED ALGORITHM
As mentioned earlier, our placement algorithm consists of
three stages: 1) GP; 2) LG; and 3) DP. We detail each stage in
the following sections.
A. GP
Our placement algorithm is based on the aforementioned an-
alytical technique and the multilevel framework. The multilevel
framework adopts a two-stage flow of clustering followed by
declustering. At each level of declustering, GP is performed to
find the best positions for macros and standard cells. For the
analytical search, the CG search with dynamic step-size control
is adopted to speed up the search for a desirable solution. To
handle preplaced blocks, we resort to a two-stage smoothing
technique of Gaussian smoothing followed by level smoothing
to smooth the search space and thus facilitate cell spreading. To
control t he chip density, we apply white space distribution to
allocate more white space to areas with density overflows.
To facilitate macro and cell LG, we further apply macro shifting
and look-ahead LG (described in Section III-B2) at the GP
stage. We detail the aforementioned techniques in the following
sections.
1) Multilevel Framework: We use the multilevel framework
for GP to improve the scalability. Our algorithm is summarized
in Fig. 3. Lines 1–4 are the coarsening stage. The initial
placement is generated in line 5. Lines 6–23 are uncoarsening
stages. The details of each step are explained as follows.
During the coarsening stage, we cluster blocks to reduce the
number of movable blocks. The hierarchy of clusters is built
by the first-choice (FC) clustering algorithm [14]. To apply the

CHEN et al.: NTUplace3: ANALYTICAL PLACER FOR LARGE-SCALE MIXED-SIZE DESIGNS 1231
Fig. 3. Our multilevel GP algorithm.
FC clustering algorithm, we examine each block in the circuit
one-by-one, find the block with the highest connectivity, and
cluster these two blocks. We control the area of a clustered
block so that it will not be 1.5 times larger than the average area
of clustered blocks. The clustering process continues until the
number of blocks is reduced by five times, and then we obtain a
level of clustered circuit. The FC clustering algorithm is applied
several times until the block number in the resulting clustered
circuit is less than a user-specified number n
max
, for example,
6000 by default.
After clustering, the initial placement for the coarsest level
is generated by minimizing the quadratic wirelength using the
CG method, the same method as in quadratic placement.
Then, we solve the placement problem from the coarsest
level to t he finest level. The placement for the current level
provides the initial placement for the next level. The horizontal/
vertical grid numbers are set to the square root of the num-
ber of clusters in the current level, i.e., grid_num_v =
grid_num_h =
BlockNumber(H
level
). Then, the base po-
tential P
b
for each bin is computed, and the maximum area
of movable blocks M
b
is updated accordingly. In addition, the
value of λ is initialized according to the strength of wirelength
and density gradients as
λ =
|∂W(x, y)|
ˆ
D
b
(x, y)
(10)
and the value of λ is increased by two times for each iteration.
A CG solver with dynamic step-size control is used to solve the
constrained minimization problem in (1) (in lines 10–17).
During uncoarsening, all blocks inside a cluster inherit the
center position of the original cluster. Macro shifting for LG
and WSA for density control are applied between uncoarsening
levels. We will explain them in Sections III-A4 and A5, respec-
tively. Then, the blocks are declustered, providing the initial
placement for the next level.
To measure the evenness of the block distribution, discrep-
ancy is used in [5]. They define discrepancy as the maximum
ratio of the actual total block area to the maximum allowable
block area over all windows within the chip. Unlike their
method, we use the overflow ratio to measure the evenness
of block distribution. We define the overflow ratio as the total
overflow area in all bins over the area of total movable blocks
as follows:
overflow_ratio =
b
max (D
b
(x, y) M
b
, 0)
total movable area
(11)
where overflow_ratio 0. The overflow ratio has a more
global view since it considers all overflow areas in the place-
ment region while discrepancy only considers the maximum
density of a window in the placement region. The GP stage
stops when the overflow ratio is less than or equal to a user-
specified target value, which is 0 by default.
Fig. 4 shows the block spreading process (Lines 10–17 of the
algorithm in Fig. 3). Each time we increase the value of λ,solve
the nonlinear equation, and obtain a placement result with fewer
overlaps. The block spreading process continues until the total
overflow ratio is small enough. Then, the spreading process
stops, and all blocks are declustered into the next level.
2) CG Search With Dynamic Step Sizes: We use the CG
algorithm to minimize (9). APlace uses the golden section line
search to find the optimal step size, which takes most portion of
its runtime during the minimization process. Instead, our step
size is computed by a more efficient method. After computing
the CG direction d
k
, the step size α
k
is computed by
α
k
=
sw
b
d
k
2
(12)
where s is a user-specified scaling factor, and w
b
is the bin
width. By doing so, we can limit the step size of block spreading
since the total quadratic Euclidean movement is fixed as
v
i
V
x
2
i
+∆y
2
i
= α
k
d
k
2
2
= s
2
w
2
b
(13)
where x
i
and y
i
are the amount of movement along the x-
and y-directions for the block v
i
in each iteration, respectively.
The value of s affects the precision of objective minimiza-
tion; smaller s values lead to better results but longer runtime.
In Fig. 5, the CPU times and HPWLs are plotted as functions
of the step sizes. The CPU time decreases as the step size s
becomes larger. In contrast, the HPWL decreases as the step
size s gets smaller. The results show that the step size s ignifi-
cantly affects t he running time and the solution quality. In our

1232 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 27, NO. 7, JULY 2008
Fig. 4. Block spreading process. As the overlap weight λ increases, the overlaps are gradually reduced. The process stops when the total overflow ratio is small
enough.
Fig. 5. CPU times and HPWLs resulting from different step sizes based on
the circuit adaptec1.
implementation, we set s between 0.2 and 0.3 to obtain a good
tradeoff between runtime and quality.
Fig. 6 shows our CG algorithm for minimizing the placement
objective during GP.
3) Base Potential Smoothing: Preplaced blocks predefine
the base potential, which significantly affects block spreading
(Fig. 7). Since the base potential P
b
is not smooth, it forms
mountains that prevent movable blocks from passing through
these regions. Therefore, we shall smooth the base potential to
facilitate block spreading. We first use the Gaussian function to
smooth the base potential change, removing the rugged regions
in the base potential, and then smooth the base potential level so
that movable blocks can spread to the whole placement region.
The base potential of each block can be calculated by the
bell-shaped function. However, we observe that the potential
generated by the bell-shaped function has “valleys” between the
adjacent regions of blocks. Fig. 8(a) shows the base potential
generated by the bell-shaped function. The z-coordinate is the
Fig. 6. Our nonlinear placement objective solver. This algorithm is called in
Line 11 of the multilevel GP in Fig. 3.
value of P
b
/(w
b
h
b
). If a bin has z>1, it means that the
potential in the bin is larger than the bin area. There are several
valleys in the bottom-left regions, as shown in the figure; these
regions do not have free space, but their potentials are so low
that a large number of blocks may spread to these regions. To
avoid this problem, we calculate the exact density as the base
potential and then use the Gaussian function to smooth the base
potential. The 2-D Gaussian has the form
G(x, y)=
1
2πσ
2
e
x
2
+y
2
2σ
2
(14)
where σ is the standard deviation of the distribution. Apply-
ing convolution to the Gaussian function G with the base

Citations
More filters
Book

Electronic Design Automation: Synthesis, Verification, and Test

TL;DR: EDA/VLSI practitioners and researchers in need of fluency in an "adjacent" field will find this an invaluable reference to the basic EDA concepts, principles, data structures, algorithms, and architectures for the design, verification, and test of VLSI circuits.
Journal ArticleDOI

SimPL: An Effective Placement Algorithm

TL;DR: SimPL is a self-contained, flat, force-directed algorithm for global placement that is simpler than existing placers and easier to integrate into timing-closure flows.
Posted Content

Chip Placement with Deep Reinforcement Learning

TL;DR: This work presents a learning-based approach to chip placement, and shows that, in under 6 hours, this method can generate placements that are superhuman or comparable on modern accelerator netlists, whereas existing baselines require human experts in the loop and take several weeks.
Journal ArticleDOI

A graph placement methodology for fast chip design

TL;DR: In this article, the authors presented a deep reinforcement learning approach to chip floorplanning, which can automatically generate chip floorplans that are superior or comparable to those produced by humans in all key metrics, including power consumption, performance and chip area.
References
More filters
Journal ArticleDOI

A shortest augmenting path algorithm for dense and sparse linear assignment problems

TL;DR: A shortest augmenting path algorithm for the linear assignment problem that contains new initialization routines and a special implementation of Dijkstra's shortest path method is developed.
Journal ArticleDOI

GORDIAN: VLSI placement by quadratic programming and slicing optimization

TL;DR: The authors present a placement method for cell-based layout styles that is composed of alternating and interacting global optimization and partitioning steps that are followed by an optimization of the area utilization.
Proceedings ArticleDOI

Generic global placement and floorplanning

TL;DR: The algorithm is capable of addressing the problems of global placement, floorplanning, timing minimization and interaction to logic synthesis, and its iterative nature assures that timing requirements are precisely met.
Proceedings ArticleDOI

Multilevel generalized force-directed method for circuit placement

TL;DR: A generalized force-directed algorithm embedded in mPL2's multilevel framework is presented, which produces the shortest wirelength among all published placers with very competitive runtime on the IBM circuits used in [29].
Related Papers (5)
Frequently Asked Questions (10)
Q1. What are the contributions in "Ntuplace3: an analytical placer for large-scale mixed-size designs with preplaced blocks and density constraints" ?

The authors propose a high-quality analytical placement algorithm considering wirelength, preplaced blocks, and density based on the log-sum-exp wirelength model proposed by Naylor et al. and the multilevel framework. To handle preplaced blocks, the authors use a two-stage smoothing technique, i. e., Gaussian smoothing followed by level smoothing, to facilitate block spreading during global placement ( GP ). The authors further use the conjugate gradient method with dynamic step-size control to speed up the GP and macro shifting to find better macro positions. 

Smoothing potential levels reduce “mountain” (highpotential regions) heights so that movable blocks can smoothly spread to the whole placement area. 

The white space of the root is 3, and it should always be greater than or equal to 0, or the blocks can never fit into the placement region. 

The authors divide the placement region into uniform bins, and then their algorithm iteratively reduces the densities of overflowed bins by sliding the cells from denser bins to sparser ones while the cell order is preserved. 

The WDP algorithm finds a group of exchangeable cells inside a given window and formulates a bipartite matching problem by matching the cells to all empty slots in the window. 

Applying convolution to the Gaussian function G with the basepotential P asP ′(x, y) = G(x, y) ∗ P (x, y) (15)we can obtain a smoother base potential P ′. 

Though the bipartite matching problem can optimally be solved in polynomial time, the optimal assignment cannot guarantee the optimal HPWL result because the HPWL cost of a cell connected to each empty slot depends on the positions of other connected cells. 

2) If the two children both have white spaces greater than or equal to 0, the authors allocate the white space proportional to their original white space amount. 

For the analytical search, the CG search with dynamic step-size control is adopted to speed up the search for a desirable solution. 

On average, their resulting HPWL is smaller than that of APlace 2.0 by 5% and similar to mPL6’s, and their placer is 10.32× and 2.56× faster than APlace 2.0 and mPL6, respectively.