What are the contributions mentioned in the paper "Zero skew clock routing with minimum wirelength" ?

In this paper, the authors rst present the Deferred-Merge Embedding ( DME ) algorithm, which embeds any given connection topology to create a clock tree with zero skew while minimizing total wirelength. The authors also present a uni ed BB+DME algorithm, which constructs a clock tree topology using a top-down balanced bipartition ( BB ) approach, and then applies DME to that topology. The paper concludes with a number of extensions and directions for future research.

Why is the load capacitance of each node equal to the amount of wire in its?

Because unit resistance and capacitance both equal one, and because loading capacitances at the leaves are zero, the tree capacitance of each node equals the amount of wire in its subtree.

What is the effect of minimization of total wire length?

minimization of total wirelength will lead to reduction of wiring area, with the added e ect of less blockage for subsequent routing phases of layout.

How can DME be used for a given problem?

The DME algorithm can also be used for problems with allowed skew [1] [13] [25], where the signal must arrive at each sink within a prescribed segment of time.

What is the definition of a Manhattan arc?

A Manhattan arc is de ned to be a line segment, possibly of zero length, with slope +1 or -1; in other words, a Manhattan arc is a line segment tilted at 45 degrees from the wiring directions.

What is the cost of merging two trees?

The merging cost depends on the distance between the two roots of the ZSTs, the delay of each ZST, and the tree capacitance of each ZST.

What was the surprising result of their experiments?

the authors used the circuit simulator SPICE2G.6 [20] to evaluate13A surprising outcome of their experiments was the strong performance of topologies generated by the KCR algorithm.

How long does it take to build a tree of segments?

By Lemma 1, procedure Build Tree of Segments requires constant time to compute each new mergingsegment, and time linear in the size of S to construct the entire tree of merging segments.

What is the topology of the merging segment?

The merging segment of a node depends on the merging segments of its two children, so the connection topology must be processed in a bottom-up order.

What is the linear delay between a source and a sink?

Normalized by an appropriate constant factor, the linear delay between any two nodes u and w in a source-sink path istLD(u;w) = Xev2path(u;w)jevj:While less accurate than the distributed RC tree delay formulas of Rubinstein et al [22], the linear delay model has been e ectively used in clock tree synthesis [18] [21].

How do the authors minimize the merging cost?

to minimize the merging cost the authors should therefore choose topologies such that merged subtrees have minimum distance between their roots, along with similar capacitances and delays, so as to avoid the extra cost 0 .

What is the average improvement of BB+DME over Tsay's algorithm?

It should be noted that DME alone resulted in an average improvement of only 2% over Tsay's algorithm, which can be attributed to the fact that Tsay's embedding algorithm allows deferral of the choice of placements for one level in the tree (the two endpoints of each merging segment are selected and carried to the next level, where the actual embedding is chosen to be the point which allows the minimum connection cost).

What is the NP-completeness of the linear delay problem?

[18] [9] showed that a closely related problem (in the linear delay model), the \\bounded-skew pathlength-balanced tree problem", is trivially NP-complete since it reduces the minimum rectilinear Steiner tree problemwhen the allowed pathlength skew is in nite.

How is the tree of segments constructed?

Once the tree of segments has been constructed, the exact embeddings of internal nodes in the ZST are chosen in a top-down manner.

What is the way to calculate the edge lengths needed to merge two trees of merging?

To calculate the edge lengths needed to merge two trees of merging segments TSa and TSb with minimum merging cost in the Elmore model, the authors use the analysis of Tsay [25].

(Open Access) Zero skew clock routing with minimum wirelength (1992) | Ting-Hai Chao

Q: What is the motivation behind the work?

The primary motivation behind their work is to minimize the total wirelength of clock routing trees while maintaining exact zero skew with respect to the appropriate delay model.

Q: What is the linear delay model for clock nets?

With smaller device dimensions and higher ASIC system speeds, a distributed RC tree model for signal delay in clock nets is often required to derive accurate timing information.

Q: What is the way to reduce clock skew?

The H-tree structure can signi cantly reduce clock skew [10] [26], but is applicable only when all of the sinks have identical loading capacitances and are placed in a symmetric array.

Zero Skew Clo ck Routing With Minimum Wirelength



Ting-Hai Chao

,Yu-Chin Hsu

, Jan-Ming Ho

{

Kenneth D. Boese

and Andrew B. Kahng

Abstract

In the design of high performance VLSI systems, minimizatio n of clockskew is an increasingly

important ob jective. Additionally, wirelength of clo ck routing trees should b e minimized in order to

reduce system power requirements and deformation of the clo ck pulse at the synchronizing elements of

the system. In this pap er, we rst present the Deferred-Merge Embedding (DME) algorithm, which

embeds any given connection topology to create a clock tree with zero skew while minimizing total

wirelength. The algorithm always yields exact zero skew trees with resp ect to the appropriate delay

model. Experimental results show an 8% to 15% wirelength reduction over previous constructions in [17]

[18]. The DME algorithm may b e applied to either the Elmore or linear delay model, and yields

optimal

total wirelength for linear delay. DME is a very fast algorithm, running in time linear in the number of

synchronizing elements. We also present a unied BB+DME algorithm, which constructs a clock tree

topology using a top-down

balancedbipartition

(BB) approach, and then applies DME to that topology.

Our experimental results indicate that both the top ology generation and embedding comp onents of our

methodology are necessary for eectiveclock tree construction. The BB+DME method averages 15%

wirelength savings over the previous method of [17], and also gives 10% average wirelength savings when

compared to the method of [25]. The paper concludes with a number of extensions and directions for

future research.

1 Intro duction

In synchronous VLSI designs, circuit sp eed is increasingly limited bytwo factors: (i) delay on the longest

path through combinational logic, and (ii) clo ckskew, which is the maximum dierence in arrival times of

the clocking signal at the synchronizing elements of the design. This is seen from the following well-known

inequalitygoverning the clo ck p eriod of a clock signal net [2][17]:

clock period



skew

where

is the delay on the longest path through combinational logic,

skew

is the clo ckskew,

is the

set up time of the synchronizing elements (assuming edge triggering), and

is the propagation delay

within the synchronizing elements. The term

can be further decomposed into

d interconnect

d gates

, where

d inter connect

is the delay associated with the interconnect of the longest path through

combinational logic, and

d g ates

is the delay through the combinational logic gates on this path. Increased



The work of A. B. Kahng and K. D. Boese was supported in part by NSF MIP-9110696, ARODAAK-70-92-K-0001, ARO

DAAL-03-92-G-0050, and a GTE Graduate Fellowship. A. B. Kahng is also supp orted by an NSF Young InvestigatorAward.

Author aliations: (

) Dept. of Computer Science, UCLA, Los Angeles, CA 90024-1596; (

) Computer and Communication

Research Laboratories, ITRI, Hsin-Chu, Taiwan 31015 R.O.C.; (

) Dept. of Computer Science, UC Riverside, Riverside, CA

92521; (

{

) Institute of Information Science, Academia Sinica, Taipei, Taiwan 11529 R.O.C.

switching speeds due to advances in VLSI fabrication technology will signicantly decrease the terms

, and

d g ates

. Therefore,

d inter connect

and

skew

become the dominant factors in determining circuit

performance: Bakoglu [2] has noted that

skew

may account for over 10% of the system cycle time in high-

performance systems. With this in mind, a number of researchers haverecently studied the clo ckskew

minimization problem.

Several results address formulations with inherently small problem size. For building block design styles,

Ramananathan and Shin [21]have prop osed a clo ck distribution scheme which applies when the blocks are

hierarchically organized. The number of blo cks at each level of the hierarchy is assumed to b e small, since

the algorithm exhaustively enumerates all p ossible clo ck routings and clock buer optimizations. Burkis

[5] and Bo on et al. [4]have also prop osed hierarchical clock tree synthesis approaches involving geometric

clustering and buer optimization at eachlevel. More p owerful clock tree resynthesis or reassignment

methods were used by Fishburn [13] and Edahiro [11]tominimize the clo ck p eriod while avoiding hazards

or race conditions; Fishburn employed a mathematical programming formulation, while Edahiro employed

a clustering-based heuristic augmented by techniques from computational geometry. All of these metho ds

are essentially limited to small problem sizes, either by their algorithmic complexityor by their reliance

on strong hierarchical clustering. In contrast, weareinterested in clo ck tree synthesis for \at" problem

instances with many sinks (synchronizing elements), as will arise in large standard-cell, sea-of-gates, and

multichip module designs.

Clock tree construction for designs with many clock sinks was rst attacked by the H-tree metho d, which

was used in regular systolic arrays by Bakoglu and other authors [1][10][14][26]. The H-tree structure

can signicantly reduce clo ckskew [10][26], but is applicable only when all of the sinks have identical

loading capacitances and are placed in a symmetric array. A more robust clo ck tree construction for cell-

based layouts is due to Jackson, Srinivasan and Kuh [17]: their \method of means and medians" (MMM)

algorithm generates a topology by recursively partitioning the set of sinks into two equal-sized subsets,

then connecting the center of mass of the entire set to the centers of mass of the two subsets. While the

MMM solution will have reasonable skew on average, Kahng et al. [18] gave small examples for which

the source-sink pathlengths in the MMM solution mayvary byasmuch as half of the chip diameter. In

some sense, this reects an inherentweakness in the top-down approach: it can commit to an unfortunate

topology early on in the construction. Kahng et al. [18][9]have proposed a b ottom-up matching approach

to clo ck tree construction: in practice their metho d eliminates all source-sink pathlength skew, while using

5%-7% less total wirelength than the MMM algorithm. However, as the method of [18][9] focuses primarily

on pathlength balancing, their metho d addresses clo ckskew minimization only in the sense of the

linear

delaymodel. Tsay [25] uses ideas similar to b oth [17] and [18], and achieves exact zero skew trees with

respect to the Elmore delay model [12][22]. His algorithm was the rst to produce trees with exact zero

skew in all cases. In the same spirit as the metho d of [18], Tsay's metho d recursively combines pairs of zero

skew trees at \tapping points", analogous to the \balance points" in [18], to yield larger zero skew trees.

The primary motivation b ehind our workistominimize the total wirelength of clo ck routing trees while

maintaining exact zero skew with respect to the appropriate delaymodel. Total wirelength is a critical

parameter of the clock routing solution since excess interconnect not only increases layout area but also

results in greater tree capacitance, thus requiring more p ower for distribution of the clo ck signal. However,

both the top-down metho d of [17] and the b ottom-up metho ds of [18][9][25] concentrate on the problem

of computing a clocktree

topology

, and only incompletely address the asso ciated problem of nding a

minimum-cost

embedding

of the top ology. These previous metho ds are actually quite inexible in that they

permanently embed eachinternal no de of the tree as soon as it b ecomes dened [18], or else choose the

embedding with at most one level of lo okahead in the tree construction [17][25].

In this paper, we rst propose a new approach whichachieves exact zero skew while signicantly reducing

the total wirelength of the clo ck tree. The basic idea of our Deferred-Merge Embedding (DME) algorithm is

defer

the embedding of internal no des in a given top ology for as long as possible: (i) a b ottom-up phase

computes lo ci of feasible locations for the ro ots of recursively merged subtrees, and (ii) a top-down phase

then resolves the exact embedding of these internal no des of the clock tree. In practice, the DME algorithm

begins with an initial clock tree computed byany previous method, then maintains exact zero clockskew

while reducing the wirelength. In regimes where the linear delay mo del applies, our method pro duces the

optimal

(i.e., minimum wirelength) zero skew clock tree with resp ect to the prescribed topology, and this

tree will also enjoy optimal source-sink delay. Exp erimental results in Section 4 b elowshow that the DME

approach is highly eective in both the Elmore and linear delaymodels. Weachieveaverage savings in

total clock tree wirelength of 15% over the MMM algorithm [17] and 8% over the method of Kahng et al.

[18]. In all cases, our clock trees have

exact

zero skew according to the appropriate delay mo del, and our

Elmore delay computations have b een conrmed by SPICE simulations which show sub-picosecond skew

on all b enchmark examples.

Since the DME algorithm only optimizes a prescrib ed topology,itcannot achieve all p ossible improve-

ment of the clock tree construction. Thus, to complement this successful embedding metho d, we also

propose a new top-down heuristic for constructing an initial clock tree topology, based on the geometric

concept of a

balanced bipartition

(BB). Applying our embedding to topologies generated in this way yields

a unied BB+DME algorithm which gives very promising results: weachieve 15% reduction in tree cost

and as compared with the MMM algorithm [17], and weachieve 10% reduction in tree cost and a 22%

reduction in Elmore delay as compared with the method of Tsay[25].

Again, all of our solutions have

exact zero skew. Our metho ds are quite robust, and extend to prescribed skew formulations as well as more

general optimizations of topologies for b oth clo ck routing and global routing. Furthermore, because our

method implicitly maintains

al l

possible minimum-cost embeddings of a top ology,it may b e used to reroute

the clock net while preserving minimum wirelength, as may be necessary when channel densitymust b e

Note that SPICE simulations for BB+DME constructions on random sink sets (Table 4 below) indicate only a 3%

improvementin delay compared to the MMM algorithm. This suggests that although the Elmore model is reasonably accurate

for predicting skew, it is less accurate for predicting delay.

minimized.

The remainder of this pap er is organized as follows. In Section 2, we formalize the minimum-cost

zero skew clo ck routing problem and also establish the linear and Elmore delay mo dels that are used in

the subsequent discussion. Section 3 presents our main results. These include: (i) the Deferred-Merge

Embedding (DME) algorithm for eciently embedding a given topology; (ii) application of the DME

algorithm to b oth the linear and Elmore delay regimes; and (iii) our unied BB+DME algorithm, which

uses a top-down balanced bipartitioning (BB) strategy to derive a go od tree topology to which the DME

algorithm may be applied. Section 4 gives experimental results and comparisons with previous work, and

Section 5 concludes with directions for future research.

2 Problem Formulation

The placement phase of physical layout determines p ositions for the synchronizing elements of a circuit,

whichwe call the

sinks

of the clock net. A nite set of sink locations, denoted by

;:::;s

g<

species an instance of the clock routing problem. A

connection topology

is dened to b e a ro oted binary

tree,

,which has

leaves corresponding to the set of sinks

clock tree

(

)isanembedding of the

connection top ology in the Manhattan plane.

The embedding asso ciates a

placement

with eachnode

;we will use

(

T; v

)or

(

) to represent this lo cation. (When no confusion arises, wemay also

denote

(

T; v

)simplyby

.) Therootoftheclocktree isthe clock

source

, denoted by

.We direct all

edges of the clocktree away from the source; a directed edge from

may b e uniquely identied with

and written as

.Wesay that

is the

parent

,and

is a

child

; the set of all children of

denoted by

childr en

(

). The wirelength, or

cost

, of the edge

is denoted by

,andmust b e greater

than or equal to the Manhattan distance b etween its endp oints

(

)and

(

The cost of

(

), denoted

cost

(

)), is the total wirelength of the edges in

(

For a given clo ck tree

(

), let

(

) denote the signal propagation time, or

delay

, on the unique

path from source

to sink

; the collection of edges in this path is denoted by

path

(

). The

skew

(

) is the maximum value of

(

)

(

)

over all sink pairs

. If the skew of

(

)

is zero then it is called a

zero skew clock tree

(ZST). Given a set

of sinks, the zero skew clo ck routing

problem is to construct a ZST

(

)ofminimum cost. A variant of the zero skew clo ck routing problem

asks for a minimum cost ZST with a prescrib ed connection top ology:

Zero Skew Clo ck Routing Problem (S,G):

Given a set

of sink locations, and given a connection

topology

,construct a zero skew clock tree

(

)

with topology

and having minimum cost.

Note that the binary tree representation suces to capture arbitrary Steiner routing topologies. Also, because the meaning

is clear, we use

(

) instead of

(

S; G

) to denote a clock tree; implicitly,theembedding is always with resp ect to a particular

topology

To route a wire of greater length than the distance b etween its endp oints, the metho d of specied-length routing due to

Hanafusa et al. [16] can be used.

The notion of a zero skew clocktreeiswell dened only in the context of a metho d for evaluating signal

delays. The delay from the source to any sink dep ends on the wirelength of the source-sink path, the RC

constants of the wire segments in the routing, and the underlying connection topology of the clock tree.

Using equations such as those of Rubinstein et al. [22], one can achieve tight upp er and lower b ounds on

delay in a distributed RC tree model of the clocknet. However, in practice it is appropriate to apply one

of two simpler RCdelay approximations, either the the linear model or the Elmore model, both of which

are easier to compute and optimize during clock tree design.

2.1 Delay Models

2.1.1 Linear Delay

In the linear delay model, the delay along

path

(

) is proportional to the length of the path and is

independent of the rest of the connection top ology. Normalized by an appropriate constant factor, the

linear delaybetween anytwo nodes

and

in a source-sink path is

(

u; w

path

(

u;w

)

While less accurate than the distributed RC tree delayformulas of Rubinstein et al [22], the linear delay

model has been eectively used in clock tree synthesis [18] [21]. In general, use of the linear approximation

is reasonable with older ASIC technologies, whichhave larger mask geometries and slower packages. Tsay

[25] notes that the linear delay mo del is also prop er for emerging optical and waveinterconnect technologies.

In addition, we observe that linear delay applies to hybrid packaging technologies, whichhave relatively

large interconnect geometries [24].

2.1.2 Elmore Delay

With smaller device dimensions and higher ASIC system sp eeds, a distributed RC tree model for signal

delay in clock nets is often required to derive accurate timing information. Typically,we use the rst-

order moment of the impulse response, also known as the Elmore delay [6] [8] [25]. The Elmore delay

model is developed as follows. Let



and



respectively denote the resistance and capacitance per unit

length of interconnect, so that the resistance

and capacitance

of edge

are given by



j

and



j

, respectively.For each sink

in the tree

(

), there is a loading capacitance

which is the input

capacitance of the functional unit driven by

Welet

denote the subtree of

(

)rootedat

, and let

denote the node capacitance of

The

The global routing phase of layout will typically consider the clo ck and power/ground nets for preferential assignmentto

(dedicated) routing layers. We assume that the interconnect delay parameters are the same on all metal routing layers, and

we ignore via resistances. Thus, wirelength b ecomes a valid measure of the RC parameters of interconnections.

As noted earlier, we will assume that

=0foreachinternal no de in all of our examples and benchmarks.

Zero skew clock routing with minimum wirelength

Figures

Citations

Clock distribution networks in synchronous digital integrated circuits

Performance optimization of VLSI interconnect layout

Modeling and analysis of nonuniform substrate temperature effects on global ULSI interconnects

A Clustering-Based Optimization Algorithm in Zero-Skew Routings

Zero-skew clock routing trees with minimum wirelength

References

The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers

The Rectilinear Steiner Tree Problem is $NP$-Complete

Signal Delay in RC Tree Networks

Clock skew optimization

Approximation of wiring delay in MOSFET LSI

Related Papers (5)

A Clustering-Based Optimization Algorithm in Zero-Skew Routings

Clock routing for high-performance ICs

Exact zero skew

An exact zero-skew clock routing algorithm

Zero-skew clock routing trees with minimum wirelength

Frequently Asked Questions (21)

Q1. What are the contributions mentioned in the paper "Zero skew clock routing with minimum wirelength" ?

Q2. Why is the load capacitance of each node equal to the amount of wire in its?

Q3. What is the motivation behind the work?

Q4. What is the linear delay model for clock nets?

Q5. What is the effect of minimization of total wire length?

Q6. How can DME be used for a given problem?

Q7. What is the way to reduce clock skew?

Q8. What is the definition of a Manhattan arc?

Q9. What is the cost of merging two trees?

Q10. What was the surprising result of their experiments?

Q11. How long does it take to build a tree of segments?

Q12. What is the topology of the merging segment?

Q13. What is the linear delay between a source and a sink?

Q14. How does the DME algorithm achieve the optimal clock tree length?

Q15. What is the method for achieving the optimal zero skew clock tree?

Q16. How do the authors minimize the merging cost?

Q17. What is the average improvement of BB+DME over Tsay's algorithm?

Q18. What is the robust method of clock tree construction for cellbased layouts?

Q19. What is the NP-completeness of the linear delay problem?

Q20. How is the tree of segments constructed?

Q21. What is the way to calculate the edge lengths needed to merge two trees of merging?