scispace - formally typeset
Open AccessJournal ArticleDOI

Zero skew clock routing with minimum wirelength

TLDR
In this article, a deferred-merge embedding (DME) algorithm is proposed to construct a clock tree with zero skew while minimizing the total wirelength, which can be applied to either the Elmore or linear delay model.
Abstract
The deferred-merge embedding (DME) algorithm, which embeds any given connection topology to create a clock tree with zero skew while minimizing total wirelength, is presented. The algorithm always yields exact zero skew trees with respect to the appropriate delay model. Experimental results show an 8% to 15% wire length reduction over some previous constructions. The DME algorithm may be applied to either the Elmore or linear delay model, and yields optimal total wirelength for linear delay. DME is a very fast algorithm, running in time linear in the number of synchronizing elements. A unified BB+DME algorithm, which constructs a clock tree topology using a top-down balanced bipartition (BB) approach and then applies DME to that topology, is also presented. The experimental results indicate that both the topology generation and embedding components of the methodology are necessary for effective clock tree construction. >

read more

Content maybe subject to copyright    Report

Zero Skew Clo ck Routing With Minimum Wirelength
Ting-Hai Chao
y
,Yu-Chin Hsu
z
, Jan-Ming Ho
{
,
Kenneth D. Boese
x
and Andrew B. Kahng
x
Abstract
In the design of high performance VLSI systems, minimizatio n of clockskew is an increasingly
important ob jective. Additionally, wirelength of clo ck routing trees should b e minimized in order to
reduce system power requirements and deformation of the clo ck pulse at the synchronizing elements of
the system. In this pap er, we rst present the Deferred-Merge Embedding (DME) algorithm, which
embeds any given connection topology to create a clock tree with zero skew while minimizing total
wirelength. The algorithm always yields exact zero skew trees with resp ect to the appropriate delay
model. Experimental results show an 8% to 15% wirelength reduction over previous constructions in [17]
[18]. The DME algorithm may b e applied to either the Elmore or linear delay model, and yields
optimal
total wirelength for linear delay. DME is a very fast algorithm, running in time linear in the number of
synchronizing elements. We also present a unied BB+DME algorithm, which constructs a clock tree
topology using a top-down
balancedbipartition
(BB) approach, and then applies DME to that topology.
Our experimental results indicate that both the top ology generation and embedding comp onents of our
methodology are necessary for eectiveclock tree construction. The BB+DME method averages 15%
wirelength savings over the previous method of [17], and also gives 10% average wirelength savings when
compared to the method of [25]. The paper concludes with a number of extensions and directions for
future research.
1 Intro duction
In synchronous VLSI designs, circuit sp eed is increasingly limited bytwo factors: (i) delay on the longest
path through combinational logic, and (ii) clo ckskew, which is the maximum dierence in arrival times of
the clocking signal at the synchronizing elements of the design. This is seen from the following well-known
inequalitygoverning the clo ck p eriod of a clock signal net [2][17]:
clock period
t
d
+
t
skew
+
t
su
+
t
ds
where
t
d
is the delay on the longest path through combinational logic,
t
skew
is the clo ckskew,
t
su
is the
set up time of the synchronizing elements (assuming edge triggering), and
t
ds
is the propagation delay
within the synchronizing elements. The term
t
d
can be further decomposed into
t
d
=
t
d interconnect
+
t
d gates
, where
t
d inter connect
is the delay associated with the interconnect of the longest path through
combinational logic, and
t
d g ates
is the delay through the combinational logic gates on this path. Increased
The work of A. B. Kahng and K. D. Boese was supported in part by NSF MIP-9110696, ARODAAK-70-92-K-0001, ARO
DAAL-03-92-G-0050, and a GTE Graduate Fellowship. A. B. Kahng is also supp orted by an NSF Young InvestigatorAward.
Author aliations: (
x
) Dept. of Computer Science, UCLA, Los Angeles, CA 90024-1596; (
y
) Computer and Communication
Research Laboratories, ITRI, Hsin-Chu, Taiwan 31015 R.O.C.; (
z
) Dept. of Computer Science, UC Riverside, Riverside, CA
92521; (
{
) Institute of Information Science, Academia Sinica, Taipei, Taiwan 11529 R.O.C.
1

switching speeds due to advances in VLSI fabrication technology will signicantly decrease the terms
t
su
,
t
ds
, and
t
d g ates
. Therefore,
t
d inter connect
and
t
skew
become the dominant factors in determining circuit
performance: Bakoglu [2] has noted that
t
skew
may account for over 10% of the system cycle time in high-
performance systems. With this in mind, a number of researchers haverecently studied the clo ckskew
minimization problem.
Several results address formulations with inherently small problem size. For building block design styles,
Ramananathan and Shin [21]have prop osed a clo ck distribution scheme which applies when the blocks are
hierarchically organized. The number of blo cks at each level of the hierarchy is assumed to b e small, since
the algorithm exhaustively enumerates all p ossible clo ck routings and clock buer optimizations. Burkis
[5] and Bo on et al. [4]have also prop osed hierarchical clock tree synthesis approaches involving geometric
clustering and buer optimization at eachlevel. More p owerful clock tree resynthesis or reassignment
methods were used by Fishburn [13] and Edahiro [11]tominimize the clo ck p eriod while avoiding hazards
or race conditions; Fishburn employed a mathematical programming formulation, while Edahiro employed
a clustering-based heuristic augmented by techniques from computational geometry. All of these metho ds
are essentially limited to small problem sizes, either by their algorithmic complexityor by their reliance
on strong hierarchical clustering. In contrast, weareinterested in clo ck tree synthesis for \at" problem
instances with many sinks (synchronizing elements), as will arise in large standard-cell, sea-of-gates, and
multichip module designs.
Clock tree construction for designs with many clock sinks was rst attacked by the H-tree metho d, which
was used in regular systolic arrays by Bakoglu and other authors [1][10][14][26]. The H-tree structure
can signicantly reduce clo ckskew [10][26], but is applicable only when all of the sinks have identical
loading capacitances and are placed in a symmetric array. A more robust clo ck tree construction for cell-
based layouts is due to Jackson, Srinivasan and Kuh [17]: their \method of means and medians" (MMM)
algorithm generates a topology by recursively partitioning the set of sinks into two equal-sized subsets,
then connecting the center of mass of the entire set to the centers of mass of the two subsets. While the
MMM solution will have reasonable skew on average, Kahng et al. [18] gave small examples for which
the source-sink pathlengths in the MMM solution mayvary byasmuch as half of the chip diameter. In
some sense, this reects an inherentweakness in the top-down approach: it can commit to an unfortunate
topology early on in the construction. Kahng et al. [18][9]have proposed a b ottom-up matching approach
to clo ck tree construction: in practice their metho d eliminates all source-sink pathlength skew, while using
5%-7% less total wirelength than the MMM algorithm. However, as the method of [18][9] focuses primarily
on pathlength balancing, their metho d addresses clo ckskew minimization only in the sense of the
linear
delaymodel. Tsay [25] uses ideas similar to b oth [17] and [18], and achieves exact zero skew trees with
respect to the Elmore delay model [12][22]. His algorithm was the rst to produce trees with exact zero
skew in all cases. In the same spirit as the metho d of [18], Tsay's metho d recursively combines pairs of zero
skew trees at \tapping points", analogous to the \balance points" in [18], to yield larger zero skew trees.
2

The primary motivation b ehind our workistominimize the total wirelength of clo ck routing trees while
maintaining exact zero skew with respect to the appropriate delaymodel. Total wirelength is a critical
parameter of the clock routing solution since excess interconnect not only increases layout area but also
results in greater tree capacitance, thus requiring more p ower for distribution of the clo ck signal. However,
both the top-down metho d of [17] and the b ottom-up metho ds of [18][9][25] concentrate on the problem
of computing a clocktree
topology
, and only incompletely address the asso ciated problem of nding a
minimum-cost
embedding
of the top ology. These previous metho ds are actually quite inexible in that they
permanently embed eachinternal no de of the tree as soon as it b ecomes dened [18], or else choose the
embedding with at most one level of lo okahead in the tree construction [17][25].
In this paper, we rst propose a new approach whichachieves exact zero skew while signicantly reducing
the total wirelength of the clo ck tree. The basic idea of our Deferred-Merge Embedding (DME) algorithm is
to
defer
the embedding of internal no des in a given top ology for as long as possible: (i) a b ottom-up phase
computes lo ci of feasible locations for the ro ots of recursively merged subtrees, and (ii) a top-down phase
then resolves the exact embedding of these internal no des of the clock tree. In practice, the DME algorithm
begins with an initial clock tree computed byany previous method, then maintains exact zero clockskew
while reducing the wirelength. In regimes where the linear delay mo del applies, our method pro duces the
optimal
(i.e., minimum wirelength) zero skew clock tree with resp ect to the prescribed topology, and this
tree will also enjoy optimal source-sink delay. Exp erimental results in Section 4 b elowshow that the DME
approach is highly eective in both the Elmore and linear delaymodels. Weachieveaverage savings in
total clock tree wirelength of 15% over the MMM algorithm [17] and 8% over the method of Kahng et al.
[18]. In all cases, our clock trees have
exact
zero skew according to the appropriate delay mo del, and our
Elmore delay computations have b een conrmed by SPICE simulations which show sub-picosecond skew
on all b enchmark examples.
Since the DME algorithm only optimizes a prescrib ed topology,itcannot achieve all p ossible improve-
ment of the clock tree construction. Thus, to complement this successful embedding metho d, we also
propose a new top-down heuristic for constructing an initial clock tree topology, based on the geometric
concept of a
balanced bipartition
(BB). Applying our embedding to topologies generated in this way yields
a unied BB+DME algorithm which gives very promising results: weachieve 15% reduction in tree cost
and as compared with the MMM algorithm [17], and weachieve 10% reduction in tree cost and a 22%
reduction in Elmore delay as compared with the method of Tsay[25].
1
Again, all of our solutions have
exact zero skew. Our metho ds are quite robust, and extend to prescribed skew formulations as well as more
general optimizations of topologies for b oth clo ck routing and global routing. Furthermore, because our
method implicitly maintains
al l
possible minimum-cost embeddings of a top ology,it may b e used to reroute
the clock net while preserving minimum wirelength, as may be necessary when channel densitymust b e
1
Note that SPICE simulations for BB+DME constructions on random sink sets (Table 4 below) indicate only a 3%
improvementin delay compared to the MMM algorithm. This suggests that although the Elmore model is reasonably accurate
for predicting skew, it is less accurate for predicting delay.
3

minimized.
The remainder of this pap er is organized as follows. In Section 2, we formalize the minimum-cost
zero skew clo ck routing problem and also establish the linear and Elmore delay mo dels that are used in
the subsequent discussion. Section 3 presents our main results. These include: (i) the Deferred-Merge
Embedding (DME) algorithm for eciently embedding a given topology; (ii) application of the DME
algorithm to b oth the linear and Elmore delay regimes; and (iii) our unied BB+DME algorithm, which
uses a top-down balanced bipartitioning (BB) strategy to derive a go od tree topology to which the DME
algorithm may be applied. Section 4 gives experimental results and comparisons with previous work, and
Section 5 concludes with directions for future research.
2 Problem Formulation
The placement phase of physical layout determines p ositions for the synchronizing elements of a circuit,
whichwe call the
sinks
of the clock net. A nite set of sink locations, denoted by
S
=
f
s
1
;s
2
;:::;s
n
g<
2
,
species an instance of the clock routing problem. A
connection topology
is dened to b e a ro oted binary
tree,
G
,which has
n
leaves corresponding to the set of sinks
S
.A
clock tree
T
(
S
)isanembedding of the
connection top ology in the Manhattan plane.
2
The embedding asso ciates a
placement
in
<
2
with eachnode
v
2
G
;we will use
pl
(
T; v
)or
pl
(
v
) to represent this lo cation. (When no confusion arises, wemay also
denote
pl
(
T; v
)simplyby
v
.) Therootoftheclocktree isthe clock
source
, denoted by
s
0
.We direct all
edges of the clocktree away from the source; a directed edge from
v
to
w
may b e uniquely identied with
w
and written as
e
w
.Wesay that
v
is the
parent
of
w
,and
w
is a
child
of
v
; the set of all children of
v
is
denoted by
childr en
(
v
). The wirelength, or
cost
, of the edge
e
w
is denoted by
j
e
w
j
,andmust b e greater
than or equal to the Manhattan distance b etween its endp oints
pl
(
w
)and
pl
(
v
).
3
The cost of
T
(
S
), denoted
cost
(
T
(
S
)), is the total wirelength of the edges in
T
(
S
).
For a given clo ck tree
T
(
S
), let
t
d
(
s
0
;s
i
) denote the signal propagation time, or
delay
, on the unique
path from source
s
0
to sink
s
i
; the collection of edges in this path is denoted by
path
(
s
0
;s
i
). The
skew
of
T
(
S
) is the maximum value of
j
t
d
(
s
0
;s
i
)
,
t
d
(
s
0
;s
j
)
j
over all sink pairs
s
i
;s
j
2
S
. If the skew of
T
(
S
)
is zero then it is called a
zero skew clock tree
(ZST). Given a set
S
of sinks, the zero skew clo ck routing
problem is to construct a ZST
T
(
S
)ofminimum cost. A variant of the zero skew clo ck routing problem
asks for a minimum cost ZST with a prescrib ed connection top ology:
Zero Skew Clo ck Routing Problem (S,G):
Given a set
S
of sink locations, and given a connection
topology
G
,construct a zero skew clock tree
T
(
S
)
with topology
G
and having minimum cost.
2
Note that the binary tree representation suces to capture arbitrary Steiner routing topologies. Also, because the meaning
is clear, we use
T
(
S
) instead of
T
(
S; G
) to denote a clock tree; implicitly,theembedding is always with resp ect to a particular
topology
G
.
3
To route a wire of greater length than the distance b etween its endp oints, the metho d of specied-length routing due to
Hanafusa et al. [16] can be used.
4

The notion of a zero skew clocktreeiswell dened only in the context of a metho d for evaluating signal
delays. The delay from the source to any sink dep ends on the wirelength of the source-sink path, the RC
constants of the wire segments in the routing, and the underlying connection topology of the clock tree.
4
Using equations such as those of Rubinstein et al. [22], one can achieve tight upp er and lower b ounds on
delay in a distributed RC tree model of the clocknet. However, in practice it is appropriate to apply one
of two simpler RCdelay approximations, either the the linear model or the Elmore model, both of which
are easier to compute and optimize during clock tree design.
2.1 Delay Models
2.1.1 Linear Delay
In the linear delay model, the delay along
path
(
s
0
;s
i
) is proportional to the length of the path and is
independent of the rest of the connection top ology. Normalized by an appropriate constant factor, the
linear delaybetween anytwo nodes
u
and
w
in a source-sink path is
t
LD
(
u; w
)=
X
e
v
2
path
(
u;w
)
j
e
v
j
:
While less accurate than the distributed RC tree delayformulas of Rubinstein et al [22], the linear delay
model has been eectively used in clock tree synthesis [18] [21]. In general, use of the linear approximation
is reasonable with older ASIC technologies, whichhave larger mask geometries and slower packages. Tsay
[25] notes that the linear delay mo del is also prop er for emerging optical and waveinterconnect technologies.
In addition, we observe that linear delay applies to hybrid packaging technologies, whichhave relatively
large interconnect geometries [24].
2.1.2 Elmore Delay
With smaller device dimensions and higher ASIC system sp eeds, a distributed RC tree model for signal
delay in clock nets is often required to derive accurate timing information. Typically,we use the rst-
order moment of the impulse response, also known as the Elmore delay [6] [8] [25]. The Elmore delay
model is developed as follows. Let
and
respectively denote the resistance and capacitance per unit
length of interconnect, so that the resistance
r
e
v
and capacitance
c
e
v
of edge
e
v
are given by
j
e
v
j
and
j
e
v
j
, respectively.For each sink
s
i
in the tree
T
(
S
), there is a loading capacitance
c
L
i
which is the input
capacitance of the functional unit driven by
s
i
.
Welet
T
v
denote the subtree of
T
(
S
)rootedat
v
, and let
c
v
denote the node capacitance of
v
.
5
The
4
The global routing phase of layout will typically consider the clo ck and power/ground nets for preferential assignmentto
(dedicated) routing layers. We assume that the interconnect delay parameters are the same on all metal routing layers, and
we ignore via resistances. Thus, wirelength b ecomes a valid measure of the RC parameters of interconnections.
5
As noted earlier, we will assume that
c
v
=0foreachinternal no de in all of our examples and benchmarks.
5

Citations
More filters
Journal ArticleDOI

Clock distribution networks in synchronous digital integrated circuits

TL;DR: A theoretical background of clock skew is provided and minimum and maximum timing constraints are developed from the relative timing between the localized clock skew and the data paths.
Journal ArticleDOI

Performance optimization of VLSI interconnect layout

TL;DR: A comprehensive survey of existing techniques for interconnect optimization during the VLSI physical design process, with emphasis on recent studies on interconnect design and optimization for high-performance V LSI circuit design under the deep submicron fabrication technologies.
Journal ArticleDOI

Modeling and analysis of nonuniform substrate temperature effects on global ULSI interconnects

TL;DR: This study suggests that thermally aware analysis should become an integrated part of the various optimization steps in physical-synthesis flow to improve the performance and integrity of signals in global ultra large scale integration interconnects.
Proceedings ArticleDOI

A Clustering-Based Optimization Algorithm in Zero-Skew Routings

TL;DR: A zero-skew routing algorithm with clustering and improvement methods is proposed that achieves 20% reduction of the total wire length on benchmark data compared with the best known algorithm.
Proceedings ArticleDOI

Zero-skew clock routing trees with minimum wirelength

TL;DR: The deferred-merge embedding (DME) algorithm is presented, which embeds any given connection topology into the Manhattan plane to create a clock tree with zero skew while minimizing total wirelength in linear time.
References
More filters
Journal ArticleDOI

The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers

TL;DR: It is found possible to define delay time and rise time in such a way that these quantities can be computed very simply from the Laplace system function of the network.
Journal ArticleDOI

The Rectilinear Steiner Tree Problem is $NP$-Complete

TL;DR: The problem of determining the minimum length of an optimum rectilinear Steiner tree for a set A of points in the plane is shown to be NP-complete and the emphasis of the literature on heuristics and special case algorithms is well justified.
Journal ArticleDOI

Signal Delay in RC Tree Networks

TL;DR: Upper and lower bounds for delay that are computationally simple are presented in this paper and can be used to bound the delay, given the signal threshold, and to certify that a circuit is "fast enough," given both the maximum delay and the voltage threshold.
Journal ArticleDOI

Clock skew optimization

TL;DR: Using a model to detect clocking hazards, two linear programs are investigated: minimizing the clock period, while avoiding clock hazards, and for a given period, maximizing the minimum safety margin against clock hazard.
Journal ArticleDOI

Approximation of wiring delay in MOSFET LSI

TL;DR: Two approximation methods for wiring delay in MOS LSI are studied and the widely used L ladder circuit model is found to be a poor approximation, while /spl pi and T ladder circuit models give satisfactory results.
Frequently Asked Questions (21)
Q1. What are the contributions mentioned in the paper "Zero skew clock routing with minimum wirelength" ?

In this paper, the authors rst present the Deferred-Merge Embedding ( DME ) algorithm, which embeds any given connection topology to create a clock tree with zero skew while minimizing total wirelength. The authors also present a uni ed BB+DME algorithm, which constructs a clock tree topology using a top-down balanced bipartition ( BB ) approach, and then applies DME to that topology. The paper concludes with a number of extensions and directions for future research. 

Because unit resistance and capacitance both equal one, and because loading capacitances at the leaves are zero, the tree capacitance of each node equals the amount of wire in its subtree. 

The primary motivation behind their work is to minimize the total wirelength of clock routing trees while maintaining exact zero skew with respect to the appropriate delay model. 

With smaller device dimensions and higher ASIC system speeds, a distributed RC tree model for signal delay in clock nets is often required to derive accurate timing information. 

minimization of total wirelength will lead to reduction of wiring area, with the added e ect of less blockage for subsequent routing phases of layout. 

The DME algorithm can also be used for problems with allowed skew [1] [13] [25], where the signal must arrive at each sink within a prescribed segment of time. 

The H-tree structure can signi cantly reduce clock skew [10] [26], but is applicable only when all of the sinks have identical loading capacitances and are placed in a symmetric array. 

A Manhattan arc is de ned to be a line segment, possibly of zero length, with slope +1 or -1; in other words, a Manhattan arc is a line segment tilted at 45 degrees from the wiring directions. 

The merging cost depends on the distance between the two roots of the ZSTs, the delay of each ZST, and the tree capacitance of each ZST. 

the authors used the circuit simulator SPICE2G.6 [20] to evaluate13A surprising outcome of their experiments was the strong performance of topologies generated by the KCR algorithm. 

By Lemma 1, procedure Build Tree of Segments requires constant time to compute each new mergingsegment, and time linear in the size of S to construct the entire tree of merging segments. 

The merging segment of a node depends on the merging segments of its two children, so the connection topology must be processed in a bottom-up order. 

Normalized by an appropriate constant factor, the linear delay between any two nodes u and w in a source-sink path istLD(u;w) = Xev2path(u;w)jevj:While less accurate than the distributed RC tree delay formulas of Rubinstein et al [22], the linear delay model has been e ectively used in clock tree synthesis [18] [21]. 

In practice, the DME algorithm begins with an initial clock tree computed by any previous method, then maintains exact zero clock skew while reducing the wirelength. 

In regimes where the linear delay model applies, their method produces the optimal (i.e., minimum wirelength) zero skew clock tree with respect to the prescribed topology, and this tree will also enjoy optimal source-sink delay. 

to minimize the merging cost the authors should therefore choose topologies such that merged subtrees have minimum distance between their roots, along with similar capacitances and delays, so as to avoid the extra cost 0 . 

It should be noted that DME alone resulted in an average improvement of only 2% over Tsay's algorithm, which can be attributed to the fact that Tsay's embedding algorithm allows deferral of the choice of placements for one level in the tree (the two endpoints of each merging segment are selected and carried to the next level, where the actual embedding is chosen to be the point which allows the minimum connection cost). 

A more robust clock tree construction for cellbased layouts is due to Jackson, Srinivasan and Kuh [17]: their \\method of means and medians" (MMM) algorithm generates a topology by recursively partitioning the set of sinks into two equal-sized subsets, then connecting the center of mass of the entire set to the centers of mass of the two subsets. 

[18] [9] showed that a closely related problem (in the linear delay model), the \\bounded-skew pathlength-balanced tree problem", is trivially NP-complete since it reduces the minimum rectilinear Steiner tree problemwhen the allowed pathlength skew is in nite. 

Once the tree of segments has been constructed, the exact embeddings of internal nodes in the ZST are chosen in a top-down manner. 

To calculate the edge lengths needed to merge two trees of merging segments TSa and TSb with minimum merging cost in the Elmore model, the authors use the analysis of Tsay [25].