Showing papers by "Hai Zhou published in 2009"

PDF

Open Access

Proceedings Article•DOI•

Statistical reliability analysis under process variation and aging effects

[...]

Yinghai Lu¹, Li Shang², Hai Zhou¹, Hengliang Zhu¹, Fan Yang¹, Xuan Zeng¹ - Show less +2 more•Institutions (2)

Fudan University¹, University of Colorado Boulder²

26 Jul 2009

TL;DR: This work presents a statistical analysis framework that characterizes the lifetime reliability of nanometer-scale integrated circuits by jointly considering the impact of fabrication-induced process variation and run-time aging effects, and focuses on characterizing circuit threshold voltage lifetime variation.

...read moreread less

Abstract: Circuit reliability is affected by various fabrication-time and run-time effects. Fabrication-induced process variation has significant impact on circuit performance and reliability. Various aging effects, such as negative bias temperature instability, cause continuous performance and reliability degradation during circuit run-time usage. In this work, we present a statistical analysis framework that characterizes the lifetime reliability of nanometer-scale integrated circuits by jointly considering the impact of fabrication-induced process variation and run-time aging effects. More specifically, our work focuses on characterizing circuit threshold voltage lifetime variation and its impact on circuit timing due to process variation and the negative bias temperature instability effect, a primary aging effect in nanometer-scale integrated circuits. The proposed work is capable of characterizing the overall circuit lifetime reliability, as well as efficiently quantifying the vulnerabilities of individual circuit elements. This analysis framework has been carefully validated and integrated into an iterative design flow for circuit lifetime reliability analysis and optimization.

...read moreread less

101 citations

Journal Article•DOI•

Gate Sizing by Lagrangian Relaxation Revisited

[...]

Jia Wang¹, Debasish Das¹, Hai Zhou¹•Institutions (1)

Northwestern University¹

01 Jul 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The generalized convex sizing (GCS) problem is formulated that unifies the sizing problems and applies to sequential circuits with clock-skew optimization and an algorithm based on the method of feasible directions and min-cost network flow is designed to solve proper GCS problems.

...read moreread less

Abstract: In this paper, we formulate the generalized convex sizing (GCS) problem that unifies the sizing problems and applies to sequential circuits with clock-skew optimization. We revisit the approach to solve the sizing problem by Lagrangian relaxation, point out several misunderstandings in the previous paper, and extend the approach to handle general convex delay functions in the GCS problems. We identify a class of proper GCS problems whose objective functions in the simplified dual problem are differentiable and transform the simultaneous sizing and clock-skew optimization problem into a proper GCS problem. We design an algorithm based on the method of feasible directions and min-cost network flow to solve proper GCS problems. The algorithm will provide evidences for infeasible GCS problems according to a condition derived by us. Experimental results confirm the efficiency and the effectiveness of our algorithm when the Elmore delay model is used.

...read moreread less

31 citations

Proceedings Article•

Provably good and practically efficient algorithms for CMP

[...]

Dummy Fill¹, Chunyang Feng¹, Hai Zhou¹, Changhao Yan¹, Jun Tao¹, Xuan Zeng¹ - Show less +2 more•Institutions (1)

Fudan University¹

26 Jul 2009

TL;DR: This paper develops a dummy fill algorithm that is both efficient and with provably good performance and proposes a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods.

...read moreread less

Abstract: To reduce chip-scale topography variation in Chemical Mechanical Polishing (CMP) process, dummy fill is widely used to improve the layout density uniformity. Previous researches formulated the dummy fill problem as a standard Linear Program (LP). However, solving the huge linear program formed by real-life designs is very expensive and has become the hurdle in deploying the technology. Even though there exist efficient heuristics, their performance cannot be guaranteed. In this paper, we develop a dummy fill algorithm that is both efficient and with provably good performance. It is based on a fully polynomial time approximation scheme by Fleischer [4] for covering LP problems. Furthermore, based on the approximation algorithm, we also propose a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods. Experimental results demonstrate the effectiveness and efficiency of our algorithms.

...read moreread less

13 citations

Proceedings Article•DOI•

Multicore parallel min-cost flow algorithm for CAD applications

[...]

Yinghai Lu¹, Hai Zhou², Li Shang³, Xuan Zeng¹•Institutions (3)

Fudan University¹, Northwestern University², University of Colorado Boulder³

26 Jul 2009

TL;DR: This work proposes a methodology to explore concurrency via nondeterministic transactional algorithm design, and to program them on multicore processors for CAD applications, and applies it to the min-cost flow problem.

...read moreread less

Abstract: Computational complexity has been the primary challenge of many VLSI CAD applications. The emerging multicore and many-core microprocessors have the potential to offer scalable performance improvement. How to explore the multicore resources to speed up CAD applications is thus a natural question but also a huge challenge for CAD researchers. Indeed, decades of work on general-purpose compilation approaches that automatically extracts parallelism from a sequential program has shown limited success. Past work has shown that programming model and algorithm design methods have a great influence on usable parallelism. In this paper, we propose a methodology to explore concurrency via nondeterministic transactional algorithm design, and to program them on multicore processors for CAD applications. We apply the proposed methodology to the min-cost flow problem which has been identified as the key problem in many design optimizations, from wire-length optimization in detailed placement to timing-constrained voltage assignment. A concurrent algorithm and its implementation on multicore processors for min-cost flow have been developed based on the methodology. Experiments on voltage island generation in floorplanning demonstrated its efficiency and scalable speedup over different number of cores.

...read moreread less

13 citations

Proceedings Article•DOI•

An efficient current-based logic cell model for crosstalk delay analysis

[...]

Debasish Das¹, William Scott², Shahin Nazarian², Hai Zhou¹•Institutions (2)

Northwestern University¹, Magma Design Automation²

16 Mar 2009

TL;DR: Imodel is presented, a simple nonlinear logic cell model that can be derived from the typical cell libraries such as NLDM, with accuracy much higher than N LDM-based cell delay models, with a maximum runtime penalty of 19% compared to NLDm-basedcell delay models on medium sized industrial designs.

...read moreread less

Abstract: Logic Cell modeling is an important component in the analysis and design of CMOS integrated circuits, mostly due to nonlinear behavior of CMOS cells with respect to the voltage signal at their input and output pins. A current-based model for CMOS logic cells is presented which can be used for effective crosstalk noise and delta delay analysis in CMOS VLSI circuits. Existing current source models are expensive and need a new set of Spice-based characterization which is not compatible with typical EDA tools. In this paper we present Imodel, a simple nonlinear logic cell model that can be derived from the typical cell libraries such as NLDM, with accuracy much higher than NLDM-based cell delay models. In fact, our experiments show an average error of 3% compared to Spice. This level of accuracy comes with a maximum runtime penalty of 19% compared to NLDM-based cell delay models on medium sized industrial designs.

...read moreread less

10 citations

Proceedings Article•DOI•

Exception triggered DoS attacks on wireless networks

[...]

Yao Zhao¹, Sagar Vemuri¹, Jiazhen Chen¹, Yan Chen¹, Hai Zhou¹, Zhi Fu² - Show less +2 more•Institutions (2)

Northwestern University¹, Motorola²

29 Sep 2009

TL;DR: This paper has identified a practical way to launch DoS attacks on security protocols by triggering exceptions, and shows that even the latest strongly authenticated protocols such as PEAP, E AP-TLS and EAP-TTLS are vulnerable to these attacks.

...read moreread less

Abstract: Security protocols are not as secure as we assumed. In this paper, we identified a practical way to launch DoS attacks on security protocols by triggering exceptions. Through experiments, we show that even the latest strongly authenticated protocols such as PEAP, EAP-TLS and EAP-TTLS are vulnerable to these attacks. Real attacks have been implemented and tested against TLS-based EAP protocols, the major family of security protocols for Wireless LAN, as well as the Return Routability of Mobile IPv6, an emerging lightweight security protocol in new IPv6 infrastructure. DoS attacks on PEAP, one popular TLS-based EAP protocol were performed and tested on a major university's wireless network, and the attacks were highly successful. We further tested the scalability of our attack through a series of ns-2 simulations. Countermeasures for detection of such attacks and improvements of the protocols to overcome these types of DoS attacks are also proposed and verified experimentally.

...read moreread less

10 citations

Proceedings Article•DOI•

Provably good and practically efficient algorithms for CMP dummy fill

[...]

Chunyang Feng¹, Hai Zhou², Changhao Yan¹, Jun Tao¹, Xuan Zeng¹ - Show less +1 more•Institutions (2)

Fudan University¹, Northwestern University²

26 Jul 2009

TL;DR: Wang et al. as discussed by the authors developed a dummy fill algorithm that is both efficient and with provably good performance based on a fully polynomial time approximation scheme by Fleischer [4] for covering LP problems and also proposed a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods.

...read moreread less

10 citations

Proceedings Article•DOI•

Binning optimization based on SSTA for transparently-latched circuits

[...]

Min Gong¹, Hai Zhou¹, Jun Tao¹, Xuan Zeng¹•Institutions (1)

Fudan University¹

02 Nov 2009

TL;DR: The binning optimization problem that decides the bin boundaries and their testing order to maximize the benefit (considering the test cost) for a transparently-latched circuit is formulated and solved.

...read moreread less

Abstract: With increasing process variation, binning has become an important technique to improve the values of fabricated chips, especially in high performance microprocessors where transparent latches are widely used. In this paper, we formulate and solve the binning optimization problem that decides the bin boundaries and their testing order to maximize the benefit (considering the test cost) for a transparently-latched circuit. The problem is decomposed into three sub-problems which are solved sequentially. First, to compute the clock period distribution of the transparently-latched circuit, a sample-based SSTA approach is developed which is based on the generalized stochastic collocation method (gSCM) with Sparse Grid technique. The minimal clock period on each sample point is found by solving a minimal cycle ratio problem in the constraint graph. Second, a greedy algorithm is proposed to maximize the sales profit by iteratively assigning each boundary to its optimal position. Then, an optimal algorithm of O(n log n) runtime is used to generate the optimal testing order of bin boundaries to minimize the test cost, based on alphabetic tree. Experiments on all the ISCAS'89 sequential benchmarks with 65-nm technology show 6.69% profit improvement and 14.00% cost reduction in average. The results also demonstrate that the proposed SSTA method achieves an error of 0.70% and speedup of 110X in average compared with the Monte Carlo simulation. Categories and Subject Descriptors: J.6 [Computer-Aided Engineering]: Computer-Aided Design General Terms: Design, Algorithms

...read moreread less

7 citations

Journal Article•DOI•

A Timing-Dependent Power Estimation Framework Considering Coupling

[...]

Diaa.E. Khalil¹, D. Sinha², Hai Zhou³, Yehea Ismail³•Institutions (3)

Intel¹, IBM², Northwestern University³

01 Jun 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A timing-dependent dynamic power estimation framework that considers the impact of coupling in combinational circuits is proposed and based on propagated switching and timing distributions, power consumption in coupling capacitances is accurately calculated.

...read moreread less

Abstract: In this paper, a timing-dependent dynamic power estimation framework that considers the impact of coupling in combinational circuits is proposed. Relative switching activities and delays of coupled interconnects significantly affect dynamic power dissipation in parasitic coupling capacitances (coupling power). To enable capturing the switching and timing dependence, detailed switching distributions and timing information are essential in accurate estimation of dynamic power consumption. An approach to efficiently represent and propagate switching and timing distributions through circuits is developed. Based on propagated switching and timing distributions, power consumption in coupling capacitances is accurately calculated. Experimental results using ISCAS'85 benchmarks demonstrate that ignoring timing dependence of coupling power consumption can cause up to 25% error in dynamic power estimation (corresponding to 59% error in coupling power estimation).

...read moreread less

6 citations

Journal Article•DOI•

ACM Transactions on Design Automation of Electronic Systems (TODAES) special section call for papers: Parallel CAD: Algorithm design and programming

[...]

Kurt Keutzer¹, Peng Li², Li Shang³, Hai Zhou⁴•Institutions (4)

University of California, Berkeley¹, Texas A&M University², University of Colorado Boulder³, Northwestern University⁴

28 Dec 2009-ACM Transactions on Design Automation of Electronic Systems

TL;DR: This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification, as well as other topics relevant to the design of parallel CAD algorithms and software tools.

...read moreread less

Abstract: High-performance parallel computer architecture and systems have been improved at a phenomenal rate. In the meantime, VLSI computer-aided design (CAD) software for multi-billion-transistor IC design has become increasingly complex and requires prohibitively high computational resources. Recent studies have shown that, numerous CAD problems, with their high computational complexity, can greatly benefit from the fast-increasing parallel computation capabilities. However, parallel programming imposes big challenges for CAD applications. Fully exploiting the computational power of emerging generalpurpose and domain-specific multi-core/many-core processor systems, calls for fundamental research and engineering practice across every stage of parallel CAD design, from algorithm exploration, programming models, design-time and run-time environment, to CAD applications, such as verification, optimization, and simulation. This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification. More specifically, papers with in-depth and extensive coverage of the following topics will be considered, as well as other topics relevant to the design of parallel CAD algorithms and software tools.

...read moreread less

4 citations

Patent•

System and method for efficient and optimal minimum area retiming

[...]

Hai Zhou¹, Jia Wang¹•Institutions (1)

Northwestern University¹

29 Jan 2009

TL;DR: In this article, the authors propose a method for use in electronic design software that efficiently and optimally produces minimized or reduced register flip flop area or number of registers/flip flops without changing circuit timing or functionality.

...read moreread less

Abstract: A method for use in electronic design software efficiently and optimally produces minimized or reduced register flip flop area or number of registers/flip flops in a VLSI circuit design without changing circuit timing or functionality. The method dynamically generates constraints; maintains the generated constraints as a regular tree; and incrementally relocates registers/flip flops and/or the number of registers/flip flops in the circuit design.

...read moreread less

Proceedings Article•DOI•

Exploring adjacency in floorplanning

[...]

Jia Wang¹, Hai Zhou²•Institutions (2)

Illinois Institute of Technology¹, Northwestern University²

19 Jan 2009

TL;DR: A new floorplanning approach called Constrained Adjacency Graph (CAG) is described that helps exploring adjacency in floorplans and shows that betterfloorplans are found with much less running time for problems with 100 to 300 modules in comparison to a simulated annealing floorplanner based on sequence pairs.

...read moreread less

Abstract: This paper describes a new floorplanning approach called Constrained Adjacency Graph (CAG) that helps exploring adjacency in floorplans. CAG extends the previous adjacency graph approaches by adding explicit adjacency constraints to the graph edges. After sufficient and necessary conditions of CAG are developed based on dissected floorplans, CAG is extended to handle general floorplans in order to improve area without changing the adjacency relations dramatically. These characteristics are currently utilized in a randomized greedy improvement heuristic for wire length optimization. The results show that better floorplans are found with much less running time for problems with 100 to 300 modules in comparison to a simulated annealing floorplanner based on sequence pairs.

...read moreread less

Proceedings Article•DOI•

Retiming and resynthesis with sweep are complete for sequential transformation

[...]

Hai Zhou¹•Institutions (1)

Northwestern University¹

11 Dec 2009

TL;DR: This paper proves that the operations of retiming and resynthesis with sweep are complete, but with one caveat: at least one resynthesis operation needs to look through the register boundary into the logic of previous cycle.

...read moreread less

Abstract: There is a long history of investigations and debates on whether a sequence of retiming and resynthesis is complete for all sequential transformations (on steady states). It has been shown that the sweep operation, which adds or removes registers not used by any output, is necessary for some sequential transformations. However, it is an open question whether retiming and resynthesis with sweep are complete. This paper proves that the operations are complete, but with one caveat: at least one resynthesis operation needs to look through the register boundary into the logic of previous cycle. We showed that this one-cycle reachability is required for retiming and resynthesis to be complete for re-encodings with different code length. This requirement comes from the fact that Boolean circuit is used for a discrete function thus its range needs to be computed by a traversal of the circuit. In theory, five operations in the order of sweep, resynthesis, retiming, resynthesis, and sweep are already complete. However, some practical limitations on resynthesis must be considered. The complexity of retiming and resynthesis verification is also discussed.

...read moreread less

Proceedings Article•DOI•

Risk aversion min-period retiming under process variations

[...]

Jia Wang¹, Hai Zhou²•Institutions (2)

Illinois Institute of Technology¹, Northwestern University²

19 Jan 2009

TL;DR: This paper forms the risk aversion min-period retiming problem under process variations based on conventional two-stage stochastic program with fixed recourse and a risk aversion objective of the clock period, and presents a heuristic incremental algorithm to solve the proposed problem.

...read moreread less

Abstract: Recent advances in statistical timing analysis (SSTA) achieve great success in computing arrival times under variations by extending sum and maximum operations to random variables. It remains a challenge problem to apply such results in order to address the variability in circuit optimizations. In this paper, we study the statistical retiming problem, where retiming is a powerful sequential transformation that relocates flip-flops in a circuit without changing its functionality. We formulate the risk aversion min-period retiming problem under process variations based on conventional two-stage stochastic program with fixed recourse and a risk aversion objective of the clock period. We prove that the proposed problem is an integer convex program, show that the subgradient of the objective function can be derived from the combinational paths with the maximum path delay, and present a heuristic incremental algorithm to solve the proposed problem. Our approach can handle arbitrary gate delay model under process variations through sampling from a black-box and the effectiveness is confirmed by the experimental results. Further more, we point out how the current state-of-the-art SSTA techniques could be improved for future optimization algorithms when analytical models are available.

...read moreread less

Proceedings Article•DOI•

Complete-k-distinguishability for retiming and resynthesis equivalence checking without restricting synthesis

[...]

N. Liveris¹, Hai Zhou¹, Prithviraj Banerjee²•Institutions (2)

Northwestern University¹, Hewlett-Packard²

19 Jan 2009

TL;DR: It is shown how the equivalence checking problem can be simplified if the circuits satisfy the Complete-k-D property and it is proved that the method is complete for any number of retiming and resynthesis steps.

...read moreread less

Abstract: Iterative retiming and resynthesis is a powerful way to optimize sequential circuits but its massive adoption has been hampered by the hardness of verification. This paper tackles the problem of retiming and resynthesis equivalence checking on a pair of circuits. For this purpose we define the Complete-k-Distinguishability (C-k-D) property for any natural number k based on C-1-D. We show how the equivalence checking problem can be simplified if the circuits satisfy this property and prove that the method is complete for any number of retiming and resynthesis steps. We also provide a way to enforce C-k-D on the circuits without restricting the optimization power of retiming and resynthesis or increasing their complexity. Experimental results demonstrate that enforcing C-k-D property can speed up the verification process.

...read moreread less