Showing papers by "Hai Zhou published in 2006"

PDF

Open Access

Proceedings Article•DOI•

[...]

Serkan Ozdemir¹, Debjit Sinha¹, Gokhan Memik¹, Jonathan Adams¹, Hai Zhou¹ - Show less +1 more•Institutions (1)

09 Dec 2006

TL;DR: Four yield-aware micro architecture schemes for data caches are developed, including a variable-latency cache architecture that allows different load accesses to be completed with varying latencies, and chips that may be tossed away due to parametric yield loss can be saved.

...read moreread less

Abstract: One of the major issues faced by the semiconductor industry today is that of reducing chip yields. As the process technologies have scaled to smaller feature sizes, chip yields have dropped to around 50% or less. This figure is expected to decrease even further in future technologies. To attack this growing problem, we develop four yield-aware microarchitecture schemes for data caches. The first one is called Yield-Aware Power-Down (YAPD). YAPD turns off cache ways that cause delay violation and/or have excessive leakage. We also modify this approach to achieve better yields. This new method is called Horizontal YAPD (HYAPD), which turns off horizontal regions of the cache instead of ways. A third approach targets delay violation in data caches. Particularly, we develop a VAriable-latency Cache Architecture (VACA). VACA allows different load accesses to be completed with varying latencies. This is enabled by augmenting the functional units with special buffers that allow the dependants of a load operation to stall for a cycle if the load operation is delayed. As a result, if some accesses take longer than the predefined number of cycles, the execution can still be performed correctly, albeit with some performance degradation. A fourth scheme we devise is called the Hybrid mechanism, which combines the YAPD and the VACA. As a result of these schemes, chips that may be tossed away due to parametric yield loss can be saved. Experimental results demonstrate that the yield losses can be reduced by 68.1% and 72.4% with YAPD and HYAPD schemes and by 33.3% and 81.1% with VACA and Hybrid mechanisms, respectively, improving the overall yield to as much as 97.0%.

...read moreread less

116 citations

Proceedings Article•DOI•

Advances in Computation of the Maximum of a Set of Random Variables

[...]

Debjit Sinha¹, Hai Zhou¹, Narendra Shenoy²•Institutions (2)

Northwestern University¹, Synopsys²

27 Mar 2006

TL;DR: This paper quantifies the approximation error in Clark's approach to computing the maximum (max) of Gaussian random variables; a fundamental operation in statistical timing and shows that a finite look up table can be used to store these errors.

...read moreread less

Abstract: This paper quantifies the approximation error in Clark's approach presented in C. E. Clark (1961) to computing the maximum (max) of Gaussian random variables; a fundamental operation in statistical timing. We show that a finite look up table can be used to store these errors. Based on the error computations, approaches to different orderings for pair-wise max operations on a set of Gaussians are proposed. Experiments show accuracy improvements in the computation of the max of multiple Gaussians by up to 50% in comparison to the traditional approach. To the best of our knowledge, this is the first work addressing the mentioned issues.

...read moreread less

31 citations

Journal Article•DOI•

Statistical Timing Yield Optimization by Gate Sizing

[...]

Debjit Sinha¹, Narendra Shenoy², Hai Zhou¹•Institutions (2)

Northwestern University¹, Synopsys²

01 Oct 2006-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: An insight into statistical properties of gate delays for a commercial 0.13-mum technology library is presented which intuitively provides one reason why statistical timing driven optimization does better than deterministic timingdriven optimization.

...read moreread less

Abstract: In this paper, we propose a statistical gate sizing approach to maximize the timing yield of a given circuit, under area constraints. Our approach involves statistical gate delay modeling, statistical static timing analysis, and gate sizing. Experiments performed in an industrial framework on combinational International Symposium on Circuits and Systems (ISCAS'85) and Microelectronics Center of North Carolina (MCNC) benchmarks show absolute timing yield gains of 30% on the average, over deterministic timing optimization for at most 10% area penalty. It is further shown that circuits optimized using our metric have larger timing yields than the same optimized using a worst case metric, for iso-area solutions. Finally, we present an insight into statistical properties of gate delays for a commercial 0.13-mum technology library which intuitively provides one reason why statistical timing driven optimization does better than deterministic timing driven optimization

...read moreread less

30 citations

Proceedings Article•DOI•

A revisit to floorplan optimization by Lagrangian relaxation

[...]

Chuan Lin¹, Hai Zhou², Chris Chu³•Institutions (3)

Magma Design Automation¹, Northwestern University², Iowa State University³

05 Nov 2006

TL;DR: The problem is not convex and its optimal solution cannot be obtained by solving its Lagrangian dual problem, so a modified convex formulation is proposed and it is proposed to solve it using min-cost flow technique and trust region method.

...read moreread less

Abstract: With the advent of deep sub-micron (DSM) era, floorplanning has become increasingly important in physical design process. In this paper we clarify a misunderstanding in using Lagrangian relaxation for the minimum area floorplanning problem. We show that the problem is not convex and its optimal solution cannot be obtained by solving its Lagrangian dual problem. We then propose a modified convex formulation and solve it using min-cost flow technique and trust region method. Experimental results under module aspect ratio bound [0.5, 2.0] show that the running time of our floorplanner scales well with the problem size in MCNC benchmark. Compared with the floorplanner in the work of Young et al. (2001), our floorplanner is 9.5times faster for the largest case "ami49". It also generates a floorplan with smaller deadspace for almost all test cases. In addition, since the generated floorplan has an aspect ratio closer to 1, it is more friendly to packaging. Our floorplanner is also amicable to including interconnect cost and other physical design metrics

...read moreread less

24 citations

Proceedings Article•DOI•

Automatic Vulnerability Checking of IEEE 802.16 WiMAX Protocols through TLA

[...]

Prasad Narayana¹, Ruiming Chen¹, Yao Zhao¹, Yan Chen¹, Zhi Fu², Hai Zhou¹ - Show less +2 more•Institutions (2)

Northwestern University¹, Motorola²

12 Nov 2006

TL;DR: This paper proposes use of TLA+ to automatically check DoS vulnerability of network protocols with completeness guarantee and develops new schemes to avoid state space explosion in property checking and to model attackers' capabilities for finding realistic attacks.

...read moreread less

Abstract: Vulnerability analysis is indispensably the first step towards securing a network protocol, but currently remains mostly a best effort manual process with no completeness guarantee. Formal methods are proposed for vulnerability analysis and most existing work focus on security properties such as perfect forwarding secrecy and correctness of authentication. However, it remains unclear how to apply these methods to analyze more subtle vulnerabilities such as denial-of-service (DoS) attacks. To address this challenge, in this paper, we propose use of TLA+ to automatically check DoS vulnerability of network protocols with completeness guarantee. In particular, we develop new schemes to avoid state space explosion in property checking and to model attackers' capabilities for finding realistic attacks. As a case study, we successfully identify threats to IEEE 802.16 air interface protocols.

...read moreread less

22 citations

Journal Article•DOI•

Statistical timing verification for transparently latched circuits

[...]

Ruiming Chen¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

01 Sep 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Two algorithms to handle a statistical check of the structural conditions for correct clocking of high-performance integrated-circuit designs and can conservatively estimate timing yields are proposed.

...read moreread less

Abstract: High-performance integrated-circuit designs need to verify the clock schedules as they usually have level-sensitive latches for their speed. With process variations, the verification needs to compute the probability of correct clocking. Because of complex statistical correlations and accumulated inaccuracy of statistical operations, traditional iterative approaches have difficulties in getting accurate results. A statistical check of the structural conditions for correct clocking is proposed instead, where the central problem is to compute the probability of having a positive cycle in a graph with random edge weights. The authors proposed two algorithms to handle this. The proposed algorithms traverse the graph only several times to reduce the correlations among iterations, and it considers not only data delay variations but also clock-skew variations. Although the first algorithm is a heuristic algorithm that may overestimate timing yields, experimental results show that it has an error of 0.16% on average in comparison with the Monte Carlo (MC) simulation. Based on a cycle-breaking technique, the second heuristic algorithm can conservatively estimate timing yields. Both algorithms are much more efficient than the MC simulation

...read moreread less

21 citations

Proceedings Article•DOI•

An efficient retiming algorithm under setup and hold constraints

[...]

Chuan Lin¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

24 Jul 2006

TL;DR: A new efficient algorithm for retiming sequential circuits with edge-triggered registers under both setup and hold constraints is presented, which solves the same problem in O(|V|2|E|) time.

...read moreread less

Abstract: In this paper, we present a new efficient algorithm for retiming sequential circuits with edge-triggered registers under both setup and hold constraints. Compared with the previous work (Papaefthymiou, 1998), which computed the minimum clock period in O(|V|/sup 3/|E|lg|V|) time, our algorithm solves the same problem in O(|V|/sup 2/|E|) time. Experimental results validate the efficiency of our algorithm.

...read moreread less

19 citations

Journal Article•DOI•

Statistical Timing Analysis With Coupling

[...]

Debjit Sinha¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

01 Dec 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The authors establish a theoretical framework for statistical timing analysis with coupling and prove the convergence of their proposed iterative approach and discuss implementation issues under the assumption of a Gaussian distribution for the parameters of variation.

...read moreread less

Abstract: As technology scales to smaller dimensions, increasing process variations and coupling induced delay variations make timing verification extremely challenging. In this paper, the authors establish a theoretical framework for statistical timing analysis with coupling. They prove the convergence of their proposed iterative approach and discuss implementation issues under the assumption of a Gaussian distribution for the parameters of variation. A statistical timer based on their proposed approach is developed and experimental results are presented for the International Symposium on Circuits and Systems benchmarks. They juxtapose their timer with a single pass, noniterative statistical timer that does not consider the mutual dependence of coupling with timing, and another statistical timer that handles coupling deterministically. Monte Carlo simulations reveal a distinct gain (up to 24%) in accuracy by their approach in comparison to the others mentioned

...read moreread less

17 citations

Proceedings Article•DOI•

A timing dependent power estimation framework considering coupling

[...]

Debjit Sinha¹, DiaaEldin Khalil¹, Yehea Ismail¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

05 Nov 2006

TL;DR: In this article, the authors proposed a timing dependent dynamic power estimation framework that considers the impact of coupling and glitches, and showed that relative switching activities and times of coupled nets significantly affect dynamic power consumption, and neither should be ignored during power estimation.

...read moreread less

Abstract: In this paper, we propose a timing dependent dynamic power estimation framework that considers the impact of coupling and glitches. We show that relative switching activities and times of coupled nets significantly affect dynamic power consumption, and neither should be ignored during power estimation. To capture the timing dependence, an approach to efficient representation and propagation of switching-window distributions through a circuit, considering coupling induced delay variations, is developed. Based on the propagated switching-window distributions, power consumption in charging or discharging coupling capacitances is calculated, and accounted for in the total power. Experimental results for the ISCAS'85 benchmarks demonstrate that ignoring the impact of timing dependent coupling on power can cause up to 59% error in coupling power estimation (up to 25% error in total power estimation).

...read moreread less

15 citations

Proceedings Article•DOI•

FA-STAC: A Framework for Fast and Accurate Static Timing Analysis with Coupling

[...]

Debasish Das¹, Ahmed Shebaita¹, Hai Zhou¹, Yehea Ismail¹, Kip Killpack² - Show less +1 more•Institutions (2)

Northwestern University¹, Intel²

01 Oct 2006

TL;DR: A novel and accurate coupling delay model is proposed, and techniques to increase the convergence rate of timing analysis when complex coupling models are employed are presented.

...read moreread less

Abstract: This paper presents a framework for fast and accurate static timing analysis considering coupling. With technology scaling to smaller dimensions, the impact of coupling induced delay variations can no longer be ignored. Timing analysis considering coupling is iterative, and can have considerably larger run-times than a single pass approach. We propose a novel and accurate coupling delay model, and present techniques to increase the convergence rate of timing analysis when complex coupling models are employed. Experimental results obtained for the ISCAS benchmarks show promising accuracy improvements using our coupling model while an efficient iteration scheme shows significant speedup (up to 62.1%) in comparison to traditional approaches.

...read moreread less

9 citations

Journal Article•DOI•

Optimal wire retiming without binary search

[...]

Chuan Lin¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

01 Sep 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A new algorithm is presented that solves the optimal wire retiming problem with polynomial-time worst case complexity and is essentially incremental, which has the potential of being combined with other optimization techniques.

...read moreread less

Abstract: The problem of retiming over a netlist of macroblocks to achieve minimal clock period, where block internal structures may not be changed and flip-flops may not be inserted on some wire segments, is called the optimal wire retiming problem. This paper presents a new algorithm that solves the optimal wire retiming problem with polynomial-time worst case complexity. Since the new algorithm avoids binary search and is essentially incremental, it has the potential of being combined with other optimization techniques. Experimental results show that the new algorithm is very efficient in practice

...read moreread less

Journal Article•DOI•

Gate-size optimization under timing constraints for coupling-noise reduction

[...]

Debjit Sinha¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

01 Jun 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A gate-sizing algorithm for coupling-noise reduction, which optimizes the area or power consumption of a circuit while ensuring that its timing constraints are met, which is driven by the algorithms used to solving two subproblems of gate-size optimization under noise and timing constraints.

...read moreread less

Abstract: This paper presents a gate-sizing algorithm for coupling-noise reduction, which optimizes the area or power consumption (represented as a weighted sum of gate sizes) of a circuit while ensuring that its timing constraints are met. A problem for gate-size optimization under coupling-noise and timing constraints is formulated, and is broken down into two subproblems of gate-size optimization under noise and timing constraints, respectively. The subproblem of gate-size optimization under noise constraints is solved as a fixpoint computation problem on a complete lattice. The proposed algorithm to solve this problem is guaranteed to yield the optimal solution, provided it exists. The subproblem for circuit optimization under timing constraints is considered as a geometrical programming problem. The solutions to the two problems are finally combined to solve the original problem in a Lagrangian relaxation (LR) framework. Experimental results demonstrating the effectiveness of the algorithms are reported for the International Symposium on Circuits and Systems (ISCAS) benchmarks and larger circuits. The obtained results are compared to the approach where successive iterations of gate sizing are performed for timing and for noise reduction independently. This alternative design approach is driven by the algorithms used to solving the mentioned subproblems, respectively

...read moreread less

Journal Article•DOI•

An Efficient Data Structure for Maxplus Merge in Dynamic Programming

[...]

Ruiming Chen¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

01 Dec 2006-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The authors propose in this paper a more efficient data structure for the merge operations, with parameters to adjust adaptively, that works better than Shi's under all cases, unbalanced, balanced, and mix sizes.

...read moreread less

Abstract: Dynamic programming is a useful technique to handle slicing floorplan, technology mapping, and buffering problems, where many maxplus merge operations of solution lists are needed. Shi proposed an efficient O(nlogn) time algorithm to speed up the merge operation. Based on balanced binary search trees, his algorithm showed superb performance with the most unbalanced sizes of merging solution lists. The authors propose in this paper a more efficient data structure for the merge operations. With parameters to adjust adaptively, their algorithm works better than Shi's under all cases, unbalanced, balanced, and mix sizes. Their data structure is also simpler

...read moreread less

Proceedings Article•DOI•

Processing Rate Optimization by Sequential System Floorplanning

[...]

Jia Wang¹, Hai Zhou¹, Ping-Chih Wu²•Institutions (2)

Northwestern University¹, Cadence Design Systems²

27 Mar 2006

TL;DR: The problem of floorplanning for processing rate optimization is formulated and solved and it is shown that the minimal ratio of the flip-flop number over the delay on any cycle is an upper bound of the processing rate.

...read moreread less

Abstract: The performance of a sequential system is usually measured by its frequency. However, with the appearance of global interconnects that require multiple clock periods to communicate, the throughput is usually traded-off for higher frequency (for example, through wire pipelining or latency insensitive design). Therefore, we propose to use the processing rate, defined as the amount of processed inputs per unit time, as the performance measure. We show that the minimal ratio of the flip-flop number over the delay on any cycle is an upper bound of the processing rate. Since the processing rate of a sequential system is mainly decided by its floorplan when interconnect delays are dominant, the problem of floorplanning for processing rate optimization is formulated and solved. We optimize the processing rate bound directly in a floorplanner by applying Howard's algorithm incrementally. Experimental results confirm the effectiveness of our approach.

...read moreread less

Journal Article•DOI•

Clustering for Processing Rate Optimization

[...]

Chuan Lin¹, Jia Wang², Hai Zhou²•Institutions (2)

Magma Design Automation¹, Northwestern University²

01 Nov 2006-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper forms the problem of processing rate optimization as seeking an optimal clustering with the minimal maximum-cycle-ratio in a general graph, and presents an iterative algorithm to solve it.

...read moreread less

Abstract: Clustering (or partitioning) is a crucial step between logic synthesis and physical design in the layout of a large scale design. A design verified at the logic synthesis level may have timing closure problems at post-layout stages due to the emergence of multiple-clock-period interconnects. Consequently, a tradeoff between clock frequency and throughput may be needed to meet the design requirements. In this paper, we find that the processing rate, defined as the product of frequency and throughput, of a sequential system is upper bounded by the reciprocal of its maximum cycle ratio, which is only dependent on the clustering. We formulate the problem of processing rate optimization as seeking an optimal clustering with the minimal maximum-cycle-ratio in a general graph, and present an iterative algorithm to solve it. Experimental results validate the efficiency of our algorithm

...read moreread less

Proceedings Article•DOI•

Smart Bit-width Allocation for Low Power Optimization in a SystemC based ASIC Design Environment

[...]

Arindam Mallik¹, Debjit Sinha¹, Prithviraj Banerjee², Hai Zhou¹•Institutions (2)

Northwestern University¹, University of Illinois at Chicago²

06 Mar 2006

TL;DR: In this paper, the authors proposed an algorithm for optimal bit width precision for two variables and a greedy heuristic which works for any number of variables for low power in a SystemC design environment.

...read moreread less

Abstract: The modern era of embedded system design is geared towards design of low-power systems. One way to reduce power in an ASIC implementation is to reduce the bit-width precision of its computation units. This paper describes algorithms to optimize the bit-widths of fixed point variables for low power in a SystemC design environment. We propose an algorithm for optimal bit width precision for two variables and a greedy heuristic which works for any number of variables. The algorithms are used in the automation of converting floating point SystemC programs into ASIC synthesizable SystemC programs. Expected inputs are profiled to estimate errors in the finite precision conversions. Experimental results on the trade-offs between quantization error, power consumption and hardware resources used are reported on a set of four SystemC benchmarks that are mapped onto 0.18 micron ASIC cell library from Artisan Components. We demonstrate that it is possible to reduce the power consumption by 50% on average by allowing round-off errors to increase from 0.5% to 1%. 1

...read moreread less

Proceedings Article•DOI•

Optimal jumper insertion for antenna avoidance under ratio upper-bound

[...]

Jia Wang¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

24 Jul 2006

TL;DR: An optimal algorithm for jumper insertion under the ratio upper-bound is presented, which handles Steiner trees with obstacles and works on free trees.

...read moreread less

Abstract: Antenna effect may damage gate oxides during plasma-based fabrication process. The antenna ratio of total exposed antenna area to total gate oxide area is directly related to the amount of damage. Jumper insertion is a common technique applied at routing and post-layout stages to avoid and to fix the problems caused by the antenna effect. This paper presents an optimal algorithm for jumper insertion under the ratio upper-bound. It handles Steiner trees with obstacles. The algorithm us based on dynamic programming while works on free trees. The time complexity is O(/spl alpha/|V|/sup 2/ ) and the space complexity is O(|V|/sup 2/), where |V| is the number of nodes in the routing tree and a is a factor depending on how to find a non-blocked position on a wire for a jumper.

...read moreread less

Proceedings Article•

Proceedings of the 16th ACM Great Lakes symposium on VLSI

[...]

Gang Qu¹, Yehea Ismail², Vijaykrishnan Narayanan³, Hai Zhou²•Institutions (3)

University of Maryland, College Park¹, Northwestern University², Pennsylvania State University³

30 Apr 2006

TL;DR: The 16th edition of the Great Lakes Symposium on VLSI (GLSVLSI'06) as discussed by the authors was held in the city of Philadelphia, United States.

...read moreread less

Abstract: Welcome to the 16th edition of the Great Lakes Symposium on VLSI (GLSVLSI'06) and the city of Philadelphia. Since its first meeting in March 1991 at Kalamazoo, Michigan, GLSVLSI has traveled beyond the Great Lakes and become an international conference with submissions from all over the United States and the world. It has emerged as a premier conference for publishing innovations in VLSI.This year, 219 papers were submitted, of which 82 (a 20.1% acceptance rate for full papers and 37.4% overall) were accepted for presentation at the symposium and publication in the proceedings. The final technical program consists of 44 full papers in 12 sessions and 38 poster papers in 2 poster sessions.Congratulations to Garrett S. Rose, Adam C. Cabe, Nadine Gergel-Hackett, Nabanita Majumdar, Mircea R. Stan, John C. Bean, Lloyd R. Harriott, Yuxing Yao, and James M. Tour for winning the GLSVLSI 2006 Best Student Paper Award sponsored by Intel. Their paper "Design Approaches for Hybrid CMOS/Molecular Memory based on Experimental Device Data" will be the first presentation of the symposium. They will also receive the prize from Intel on Monday's dinner banquet.This year's tutorial, "DFM: Swimming Upstream", will be conducted by Dan Page, Jamil Kawa, and Charles Chiang of Synopsys. The tutorial is free to all attendees and local universities thanks to the generous donation of our corporate supporters.The keynote speaker at Monday's dinner banquet is Jeff Parkhurst, Intel's academic research programs manager. The talk title is: "From single core to multi-core to many core: Are we ready for a new exponential?"

...read moreread less

Smart Bit-width Allocation for Low Power Optimization in a SystemC based

[...]

Arindam Mallik, Debjit Sinha, Prith Banerjee, Hai Zhou

01 Jan 2006

TL;DR: An algorithm for optimal bit-width precision for two variables and a greedy heuristic which works for any number of variables is proposed and it is demonstrated that it is possible to reduce the power consumption by 50% on average by allowing round-off errors to increase from 0.5% to 1%

...read moreread less

Abstract: The modern era of embedded system design is geared towards design of low-power systems. One way to reduce power in an ASIC implementation is to reduce the bit-width precision of its computation units. This paper describes algorithms to optimize the bit-widths of fixed point variables for low power in a SystemC design environment. We propose an algorithm for optimal bitwidth precision for two variables and a greedy heuristic which works for any number of variables. The algorithms are used in the automation of converting floating point SystemC programs into ASIC synthesizable SystemC programs. Expected inputs are profiled to estimate errors in the finite precision conversions. Experimental results on the trade-offs between quantization error, power consumption and hardware resources used are reported on a set of four SystemC benchmarks that are mapped onto 0.18 micron ASIC cell library from Artisan Components. We demonstrate that it is possible to reduce the power consumption by 50% on average by allowing round-off errors to increase from 0.5% to 1%.1

...read moreread less

Analysis and optimization under crosstalk and variability in deep sub-micron vlsi circuits

[...]

Hai Zhou¹, Debjit Sinha¹•Institutions (1)

Northwestern University¹

01 Jan 2006

TL;DR: This research investigates the essential problems of timing verification, power estimation, and circuit (area or power) optimization under crosstalk and variability, and shows that a circuit optimization problem under constraints on the maximal induced noise on each wire is equivalent to a fixpoint computation problem in a complete lattice.

...read moreread less

Abstract: With very large scale integrated (VLSI) circuit fabrication entering the deep sub-micron era, devices are scaled down to finer geometries, clocks are run at higher frequencies, and more functionality is integrated into one chip. All these bring a great promise of "system-on-a-chip", but also introduce challenging new issues in the design process. As a result of the increasing frequency and density, coupling effects or crosstalk between neighboring wires are increased. These effects can cause functionality and timing failures in a circuit. The dynamic power consumption in charging or discharging coupling capacitances is timing dependent, and contributes significantly to a circuit's power consumption. In addition, manufacturing process variations (e.g. VT, Le), and environmental variations (e.g. Vdd, Temperature) contribute to uncertainties that deeply impact the timing characteristics of a circuit. This variability makes timing verification, and consequently, timing driven circuit optimization extremely difficult. Although worst case analyses for circuit optimization are simpler, they are not desirable since they severely over-constrain the optimization problem, and result in designs that have excessive penalties in terms of area or power consumption. In this research, we investigate the essential problems of timing verification, power estimation, and circuit (area or power) optimization under crosstalk and variability. We show that a circuit optimization problem under constraints on the maximal induced noise on each wire is equivalent to a fixpoint computation problem in a complete lattice. An optimal algorithm to solving this problem is developed, and is extended to handle variations. Under explicit timing constraints, we solve this problem in a Lagrangian Relaxation framework. We present a timing yield driven circuit optimization algorithm that considers variability and is based on statistical timing methodologies. Approaches to fast and approximation error aware statistical timing analysis are developed that also consider effects due to coupling as well as variability. Multiple input switching effects are considered for improved timing accuracy. We signify the importance of the timing dependence of dynamic power consumption in coupling capacitances, and develop an algorithm for accurate and efficient power estimation. Experimental results validate our approaches, and are promising.

...read moreread less

Timing optimization algorithms for sequential circuits

[...]

Hai Zhou¹, Chuan Lin¹•Institutions (1)

Northwestern University¹

01 Jan 2006

TL;DR: It is shown that the trade-off between a level-sensitive latch and an edge-triggered flop can be leveraged in a sequential circuit design with crosstalk, so that the clock period is minimized by selecting a configuration of mixed latches and flops.

...read moreread less

Abstract: With the advent of deep sub-micron era (DSM), "System-on-chip (SOC)" has become a mainstream of IC industry. Semiconductor devices based on a smaller feature size offer the promise of faster and more highly integrated designs, but also provide a number of new challenges. In SOCs, a large amount of communication time is spent on global multi-clock-period interconnects, which present themselves as the main performance limiting factor. How to handle global interconnects for performance optimization becomes an urgent issue. Another challenge is the increasing coupling effect (also known as crosstalk) between neighboring interconnects. Besides introducing noises on quiet interconnects, crosstalk could potentially change the interconnect delays and cause timing violations in the circuit. In this dissertation, we investigate and propose solutions to a few problems involving global interconnects and crosstalk. To handle global interconnects, we propose techniques at different stages of the design flow. At physical layout stage, we propose to pipeline global interconnects by relocating flip-flops (an operation also known as retiming). Three efficient algorithms are designed to find an optimal retiming with the minimal clock period. We then solve the problem of retiming under both setup and hold constraints more efficiently than the best-known algorithm in the literature. We also consider clock skew scheduling for prescribed skew domains and give an optimal polynomial-time algorithm to minimize the clock period with possible delay padding. At clustering stage, we propose an iterative algorithm that finds an optimal clustering with the minimal maximum-cycle-ratio. At register transfer level (RTL), we use delay relaxation to do interconnect planning. For crosstalk, we propose a circular time representation under which coupling detection is easier and more efficient than state-of-the-art approaches. Using the circular time representation, clock schedule verification with crosstalk is more efficient. We show that the trade-off between a level-sensitive latch and an edge-triggered flop can be leveraged in a sequential circuit design with crosstalk, so that the clock period is minimized by selecting a configuration of mixed latches and flops. We design an effective and efficient algorithm to solve this problem.

...read moreread less