scispace - formally typeset
Search or ask a question

Showing papers by "Hai Zhou published in 2009"


Proceedings Article•DOI•
26 Jul 2009
TL;DR: This work presents a statistical analysis framework that characterizes the lifetime reliability of nanometer-scale integrated circuits by jointly considering the impact of fabrication-induced process variation and run-time aging effects, and focuses on characterizing circuit threshold voltage lifetime variation.
Abstract: Circuit reliability is affected by various fabrication-time and run-time effects. Fabrication-induced process variation has significant impact on circuit performance and reliability. Various aging effects, such as negative bias temperature instability, cause continuous performance and reliability degradation during circuit run-time usage. In this work, we present a statistical analysis framework that characterizes the lifetime reliability of nanometer-scale integrated circuits by jointly considering the impact of fabrication-induced process variation and run-time aging effects. More specifically, our work focuses on characterizing circuit threshold voltage lifetime variation and its impact on circuit timing due to process variation and the negative bias temperature instability effect, a primary aging effect in nanometer-scale integrated circuits. The proposed work is capable of characterizing the overall circuit lifetime reliability, as well as efficiently quantifying the vulnerabilities of individual circuit elements. This analysis framework has been carefully validated and integrated into an iterative design flow for circuit lifetime reliability analysis and optimization.

101 citations


Journal Article•DOI•
TL;DR: The generalized convex sizing (GCS) problem is formulated that unifies the sizing problems and applies to sequential circuits with clock-skew optimization and an algorithm based on the method of feasible directions and min-cost network flow is designed to solve proper GCS problems.
Abstract: In this paper, we formulate the generalized convex sizing (GCS) problem that unifies the sizing problems and applies to sequential circuits with clock-skew optimization. We revisit the approach to solve the sizing problem by Lagrangian relaxation, point out several misunderstandings in the previous paper, and extend the approach to handle general convex delay functions in the GCS problems. We identify a class of proper GCS problems whose objective functions in the simplified dual problem are differentiable and transform the simultaneous sizing and clock-skew optimization problem into a proper GCS problem. We design an algorithm based on the method of feasible directions and min-cost network flow to solve proper GCS problems. The algorithm will provide evidences for infeasible GCS problems according to a condition derived by us. Experimental results confirm the efficiency and the effectiveness of our algorithm when the Elmore delay model is used.

31 citations


Proceedings Article•
Dummy Fill1, Chunyang Feng1, Hai Zhou1, Changhao Yan1, Jun Tao1, Xuan Zeng1 •
26 Jul 2009
TL;DR: This paper develops a dummy fill algorithm that is both efficient and with provably good performance and proposes a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods.
Abstract: To reduce chip-scale topography variation in Chemical Mechanical Polishing (CMP) process, dummy fill is widely used to improve the layout density uniformity. Previous researches formulated the dummy fill problem as a standard Linear Program (LP). However, solving the huge linear program formed by real-life designs is very expensive and has become the hurdle in deploying the technology. Even though there exist efficient heuristics, their performance cannot be guaranteed. In this paper, we develop a dummy fill algorithm that is both efficient and with provably good performance. It is based on a fully polynomial time approximation scheme by Fleischer [4] for covering LP problems. Furthermore, based on the approximation algorithm, we also propose a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods. Experimental results demonstrate the effectiveness and efficiency of our algorithms.

13 citations


Proceedings Article•DOI•
26 Jul 2009
TL;DR: This work proposes a methodology to explore concurrency via nondeterministic transactional algorithm design, and to program them on multicore processors for CAD applications, and applies it to the min-cost flow problem.
Abstract: Computational complexity has been the primary challenge of many VLSI CAD applications. The emerging multicore and many-core microprocessors have the potential to offer scalable performance improvement. How to explore the multicore resources to speed up CAD applications is thus a natural question but also a huge challenge for CAD researchers. Indeed, decades of work on general-purpose compilation approaches that automatically extracts parallelism from a sequential program has shown limited success. Past work has shown that programming model and algorithm design methods have a great influence on usable parallelism. In this paper, we propose a methodology to explore concurrency via nondeterministic transactional algorithm design, and to program them on multicore processors for CAD applications. We apply the proposed methodology to the min-cost flow problem which has been identified as the key problem in many design optimizations, from wire-length optimization in detailed placement to timing-constrained voltage assignment. A concurrent algorithm and its implementation on multicore processors for min-cost flow have been developed based on the methodology. Experiments on voltage island generation in floorplanning demonstrated its efficiency and scalable speedup over different number of cores.

13 citations


Proceedings Article•DOI•
16 Mar 2009
TL;DR: Imodel is presented, a simple nonlinear logic cell model that can be derived from the typical cell libraries such as NLDM, with accuracy much higher than N LDM-based cell delay models, with a maximum runtime penalty of 19% compared to NLDm-basedcell delay models on medium sized industrial designs.
Abstract: Logic Cell modeling is an important component in the analysis and design of CMOS integrated circuits, mostly due to nonlinear behavior of CMOS cells with respect to the voltage signal at their input and output pins. A current-based model for CMOS logic cells is presented which can be used for effective crosstalk noise and delta delay analysis in CMOS VLSI circuits. Existing current source models are expensive and need a new set of Spice-based characterization which is not compatible with typical EDA tools. In this paper we present Imodel, a simple nonlinear logic cell model that can be derived from the typical cell libraries such as NLDM, with accuracy much higher than NLDM-based cell delay models. In fact, our experiments show an average error of 3% compared to Spice. This level of accuracy comes with a maximum runtime penalty of 19% compared to NLDM-based cell delay models on medium sized industrial designs.

10 citations


Proceedings Article•DOI•
Yao Zhao1, Sagar Vemuri1, Jiazhen Chen1, Yan Chen1, Hai Zhou1, Zhi Fu2 •
29 Sep 2009
TL;DR: This paper has identified a practical way to launch DoS attacks on security protocols by triggering exceptions, and shows that even the latest strongly authenticated protocols such as PEAP, E AP-TLS and EAP-TTLS are vulnerable to these attacks.
Abstract: Security protocols are not as secure as we assumed. In this paper, we identified a practical way to launch DoS attacks on security protocols by triggering exceptions. Through experiments, we show that even the latest strongly authenticated protocols such as PEAP, EAP-TLS and EAP-TTLS are vulnerable to these attacks. Real attacks have been implemented and tested against TLS-based EAP protocols, the major family of security protocols for Wireless LAN, as well as the Return Routability of Mobile IPv6, an emerging lightweight security protocol in new IPv6 infrastructure. DoS attacks on PEAP, one popular TLS-based EAP protocol were performed and tested on a major university's wireless network, and the attacks were highly successful. We further tested the scalability of our attack through a series of ns-2 simulations. Countermeasures for detection of such attacks and improvements of the protocols to overcome these types of DoS attacks are also proposed and verified experimentally.

10 citations


Proceedings Article•DOI•
Chunyang Feng1, Hai Zhou2, Changhao Yan1, Jun Tao1, Xuan Zeng1 •
26 Jul 2009
TL;DR: Wang et al. as discussed by the authors developed a dummy fill algorithm that is both efficient and with provably good performance based on a fully polynomial time approximation scheme by Fleischer [4] for covering LP problems and also proposed a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods.
Abstract: To reduce chip-scale topography variation in Chemical Mechanical Polishing (CMP) process, dummy fill is widely used to improve the layout density uniformity. Previous researches formulated the dummy fill problem as a standard Linear Program (LP). However, solving the huge linear program formed by real-life designs is very expensive and has become the hurdle in deploying the technology. Even though there exist efficient heuristics, their performance cannot be guaranteed. In this paper, we develop a dummy fill algorithm that is both efficient and with provably good performance. It is based on a fully polynomial time approximation scheme by Fleischer [4] for covering LP problems. Furthermore, based on the approximation algorithm, we also propose a new greedy iterative algorithm to achieve high quality solutions more efficiently than previous Monte-Carlo based heuristic methods. Experimental results demonstrate the effectiveness and efficiency of our algorithms.

10 citations


Proceedings Article•DOI•
Min Gong1, Hai Zhou1, Jun Tao1, Xuan Zeng1•
02 Nov 2009
TL;DR: The binning optimization problem that decides the bin boundaries and their testing order to maximize the benefit (considering the test cost) for a transparently-latched circuit is formulated and solved.
Abstract: With increasing process variation, binning has become an important technique to improve the values of fabricated chips, especially in high performance microprocessors where transparent latches are widely used. In this paper, we formulate and solve the binning optimization problem that decides the bin boundaries and their testing order to maximize the benefit (considering the test cost) for a transparently-latched circuit. The problem is decomposed into three sub-problems which are solved sequentially. First, to compute the clock period distribution of the transparently-latched circuit, a sample-based SSTA approach is developed which is based on the generalized stochastic collocation method (gSCM) with Sparse Grid technique. The minimal clock period on each sample point is found by solving a minimal cycle ratio problem in the constraint graph. Second, a greedy algorithm is proposed to maximize the sales profit by iteratively assigning each boundary to its optimal position. Then, an optimal algorithm of O(n log n) runtime is used to generate the optimal testing order of bin boundaries to minimize the test cost, based on alphabetic tree. Experiments on all the ISCAS'89 sequential benchmarks with 65-nm technology show 6.69% profit improvement and 14.00% cost reduction in average. The results also demonstrate that the proposed SSTA method achieves an error of 0.70% and speedup of 110X in average compared with the Monte Carlo simulation. Categories and Subject Descriptors: J.6 [Computer-Aided Engineering]: Computer-Aided Design General Terms: Design, Algorithms

7 citations


Journal Article•DOI•
TL;DR: A timing-dependent dynamic power estimation framework that considers the impact of coupling in combinational circuits is proposed and based on propagated switching and timing distributions, power consumption in coupling capacitances is accurately calculated.
Abstract: In this paper, a timing-dependent dynamic power estimation framework that considers the impact of coupling in combinational circuits is proposed. Relative switching activities and delays of coupled interconnects significantly affect dynamic power dissipation in parasitic coupling capacitances (coupling power). To enable capturing the switching and timing dependence, detailed switching distributions and timing information are essential in accurate estimation of dynamic power consumption. An approach to efficiently represent and propagate switching and timing distributions through circuits is developed. Based on propagated switching and timing distributions, power consumption in coupling capacitances is accurately calculated. Experimental results using ISCAS'85 benchmarks demonstrate that ignoring timing dependence of coupling power consumption can cause up to 25% error in dynamic power estimation (corresponding to 59% error in coupling power estimation).

6 citations


Journal Article•DOI•
TL;DR: This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification, as well as other topics relevant to the design of parallel CAD algorithms and software tools.
Abstract: High-performance parallel computer architecture and systems have been improved at a phenomenal rate. In the meantime, VLSI computer-aided design (CAD) software for multi-billion-transistor IC design has become increasingly complex and requires prohibitively high computational resources. Recent studies have shown that, numerous CAD problems, with their high computational complexity, can greatly benefit from the fast-increasing parallel computation capabilities. However, parallel programming imposes big challenges for CAD applications. Fully exploiting the computational power of emerging generalpurpose and domain-specific multi-core/many-core processor systems, calls for fundamental research and engineering practice across every stage of parallel CAD design, from algorithm exploration, programming models, design-time and run-time environment, to CAD applications, such as verification, optimization, and simulation. This journal special section will cover recent progress on parallel CAD research, including algorithm foundations, programming models, parallel architectural-specific optimization, and verification. More specifically, papers with in-depth and extensive coverage of the following topics will be considered, as well as other topics relevant to the design of parallel CAD algorithms and software tools.

4 citations


Patent•
Hai Zhou1, Jia Wang1•
29 Jan 2009
TL;DR: In this article, the authors propose a method for use in electronic design software that efficiently and optimally produces minimized or reduced register flip flop area or number of registers/flip flops without changing circuit timing or functionality.
Abstract: A method for use in electronic design software efficiently and optimally produces minimized or reduced register flip flop area or number of registers/flip flops in a VLSI circuit design without changing circuit timing or functionality. The method dynamically generates constraints; maintains the generated constraints as a regular tree; and incrementally relocates registers/flip flops and/or the number of registers/flip flops in the circuit design.

Proceedings Article•DOI•
19 Jan 2009
TL;DR: A new floorplanning approach called Constrained Adjacency Graph (CAG) is described that helps exploring adjacency in floorplans and shows that betterfloorplans are found with much less running time for problems with 100 to 300 modules in comparison to a simulated annealing floorplanner based on sequence pairs.
Abstract: This paper describes a new floorplanning approach called Constrained Adjacency Graph (CAG) that helps exploring adjacency in floorplans. CAG extends the previous adjacency graph approaches by adding explicit adjacency constraints to the graph edges. After sufficient and necessary conditions of CAG are developed based on dissected floorplans, CAG is extended to handle general floorplans in order to improve area without changing the adjacency relations dramatically. These characteristics are currently utilized in a randomized greedy improvement heuristic for wire length optimization. The results show that better floorplans are found with much less running time for problems with 100 to 300 modules in comparison to a simulated annealing floorplanner based on sequence pairs.

Proceedings Article•DOI•
Hai Zhou1•
11 Dec 2009
TL;DR: This paper proves that the operations of retiming and resynthesis with sweep are complete, but with one caveat: at least one resynthesis operation needs to look through the register boundary into the logic of previous cycle.
Abstract: There is a long history of investigations and debates on whether a sequence of retiming and resynthesis is complete for all sequential transformations (on steady states). It has been shown that the sweep operation, which adds or removes registers not used by any output, is necessary for some sequential transformations. However, it is an open question whether retiming and resynthesis with sweep are complete. This paper proves that the operations are complete, but with one caveat: at least one resynthesis operation needs to look through the register boundary into the logic of previous cycle. We showed that this one-cycle reachability is required for retiming and resynthesis to be complete for re-encodings with different code length. This requirement comes from the fact that Boolean circuit is used for a discrete function thus its range needs to be computed by a traversal of the circuit. In theory, five operations in the order of sweep, resynthesis, retiming, resynthesis, and sweep are already complete. However, some practical limitations on resynthesis must be considered. The complexity of retiming and resynthesis verification is also discussed.

Proceedings Article•DOI•
19 Jan 2009
TL;DR: This paper forms the risk aversion min-period retiming problem under process variations based on conventional two-stage stochastic program with fixed recourse and a risk aversion objective of the clock period, and presents a heuristic incremental algorithm to solve the proposed problem.
Abstract: Recent advances in statistical timing analysis (SSTA) achieve great success in computing arrival times under variations by extending sum and maximum operations to random variables. It remains a challenge problem to apply such results in order to address the variability in circuit optimizations. In this paper, we study the statistical retiming problem, where retiming is a powerful sequential transformation that relocates flip-flops in a circuit without changing its functionality. We formulate the risk aversion min-period retiming problem under process variations based on conventional two-stage stochastic program with fixed recourse and a risk aversion objective of the clock period. We prove that the proposed problem is an integer convex program, show that the subgradient of the objective function can be derived from the combinational paths with the maximum path delay, and present a heuristic incremental algorithm to solve the proposed problem. Our approach can handle arbitrary gate delay model under process variations through sampling from a black-box and the effectiveness is confirmed by the experimental results. Further more, we point out how the current state-of-the-art SSTA techniques could be improved for future optimization algorithms when analytical models are available.

Proceedings Article•DOI•
19 Jan 2009
TL;DR: It is shown how the equivalence checking problem can be simplified if the circuits satisfy the Complete-k-D property and it is proved that the method is complete for any number of retiming and resynthesis steps.
Abstract: Iterative retiming and resynthesis is a powerful way to optimize sequential circuits but its massive adoption has been hampered by the hardness of verification. This paper tackles the problem of retiming and resynthesis equivalence checking on a pair of circuits. For this purpose we define the Complete-k-Distinguishability (C-k-D) property for any natural number k based on C-1-D. We show how the equivalence checking problem can be simplified if the circuits satisfy this property and prove that the method is complete for any number of retiming and resynthesis steps. We also provide a way to enforce C-k-D on the circuits without restricting the optimization power of retiming and resynthesis or increasing their complexity. Experimental results demonstrate that enforcing C-k-D property can speed up the verification process.