scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems in 2003"


Journal ArticleDOI
TL;DR: This paper presents an approach to the nonlinear model reduction based on representing the non linear system with a piecewise-linear system and then reducing each of the pieces with a Krylov projection, and shows that the macromodels obtained are significantly more accurate than models obtained with linear or the recently developed quadratic reduction techniques.
Abstract: In this paper, we present an approach to nonlinear model reduction based on representing a nonlinear system with a piecewise-linear system and then reducing each of the pieces with a Krylov projection. However, rather than approximating the individual components as piecewise linear and then composing hundreds of components to make a system with exponentially many different linear regions, we instead generate a small set of linearizations about the state trajectory which is the response to a "training input." Computational results and performance data are presented for an example of a micromachined switch and selected nonlinear circuits. These examples demonstrate that the macromodels obtained with the proposed reduction algorithm are significantly more accurate than models obtained with linear or recently developed quadratic reduction techniques. Also, we propose a procedure for a posteriori estimation of the simulation error, which may be used to determine the accuracy of the extracted trajectory piecewise-linear reduced-order models. Finally, it is shown that the proposed model order reduction technique is computationally inexpensive, and that the models can be constructed "on the fly," to accelerate simulation of the system response.

620 citations


Journal ArticleDOI
TL;DR: In an application important to quantum computing, the synthesis of oracle circuits for Grover's search algorithm are synthesized, and a significant improvement over a previously proposed synthesis algorithm is shown.
Abstract: Reversible or information-lossless circuits have applications in digital signal processing, communication, computer graphics, and cryptography. They are also a fundamental requirement in the emerging field of quantum computation. We investigate the synthesis of reversible circuits that employ a minimum number of gates and contain no redundant input-output line-pairs (temporary storage channels). We prove constructively that every even permutation can be implemented without temporary storage using NOT, CNOT, and TOFFOLI gates. We describe an algorithm for the synthesis of optimal circuits and study the reversible functions on three wires, reporting the distribution of circuit sizes. We also study canonical circuit decompositions where gates of the same kind are grouped together. Finally, in an application important to quantum computing, we synthesize oracle circuits for Grover's search algorithm, and show a significant improvement over a previously proposed synthesis algorithm.

514 citations


Journal ArticleDOI
TL;DR: Results indicate that the proposed scheme can provide test data compression nearly equal to that of an optimum Huffman code with much less area overhead for the decoder.
Abstract: This paper presents a compression/decompression scheme based on selective Huffman coding for reducing the amount of test data that must be stored on a tester and transferred to each core in a system-on-a-chip (SOC) during manufacturing test. The test data bandwidth between the tester and the SOC is a bottleneck that can result in long test times when testing complex SOCs that contain many cores. In the proposed scheme, the test vectors for the SOC are stored in compressed form in the tester memory and transferred to the chip where they are decompressed and applied to the cores. A small amount of on-chip circuitry is used to decompress the test vectors. Given the set of test vectors for a core, a modified Huffman code is carefully selected so that it satisfies certain properties. These properties guarantee that the codewords can be decoded by a simple pipelined decoder (placed at the serial input of the core's scan chain) that requires very small area. Results indicate that the proposed scheme can provide test data compression nearly equal to that of an optimum Huffman code with much less area overhead for the decoder.

281 citations


Journal ArticleDOI
TL;DR: This paper reports on experiences with extending model reduction techniques to nonlinear systems of differential-algebraic equations, specifically, systems representative of RF circuit components, relying generally on perturbational techniques to handle deviations from the linear time-invariant case.
Abstract: The problem of automated macromodel generation is interesting from the viewpoint of system-level design because if small, accurate reduced-order models of system component blocks can be extracted, then much larger portions of a design, or more complicated systems, can be simulated or verified than if the analysis were to have to proceed at a detailed level. The prospect of generating the reduced model from a detailed analysis of component blocks is attractive because then the influence of second-order device effects or parasitic components on the overall system performance can be assessed. In this way overly conservative design specifications can be avoided. This paper reports on experiences with extending model reduction techniques to nonlinear systems of differential-algebraic equations, specifically, systems representative of RF circuit components. The discussion proceeds from linear time-varying, to weakly nonlinear, to nonlinear time-varying analysis, relying generally on perturbational techniques to handle deviations from the linear time-invariant case. The main intent is to explore which perturbational techniques work, which do not, and outline some problems that remain to be solved in developing robust, general nonlinear reduction methods.

255 citations


Journal ArticleDOI
TL;DR: This paper shows how to construct TBR-like methods that generate guaranteed passive reduced models and in addition are applicable to state-space systems with arbitrary internal structure.
Abstract: The major concerns in state-of-the-art model reduction algorithms are: achieving accurate models of sufficiently small size, numerically stable and efficient generation of the models, and preservation of system properties such as passivity. Algorithms, such as PRIMA, generate guaranteed-passive models for systems with special internal structure, using numerically stable and efficient Krylov-subspace iterations. Truncated balanced realization (TBR) algorithms, as used to date in the design automation community, can achieve smaller models with better error control, but do not necessarily preserve passivity. In this paper, we show how to construct TBR-like methods that generate guaranteed passive reduced models and in addition are applicable to state-space systems with arbitrary internal structure.

254 citations


Journal ArticleDOI
TL;DR: This paper presents several heuristic techniques for efficient gate clustering in multithreshold CMOS circuits by modeling the problem via bin-packing and set-partitioning techniques, and four hybrid clustering techniques that combine the BP and SP techniques to produce a more efficient solution.
Abstract: Reducing power dissipation is one of the most important issues in very large scale integration design today. Scaling causes subthreshold leakage currents to become a large component of total power dissipation. Multithreshold technology has emerged as a promising technique to reduce leakage power. This paper presents several heuristic techniques for efficient gate clustering in multithreshold CMOS circuits by modeling the problem via bin-packing (BP) and set-partitioning (SP) techniques. The SP technique takes the circuit's routing complexity into consideration which is critical for deep submicron (DSM) implementations. By applying the techniques to six benchmarks to verify functionality, results obtained indicate that our proposed techniques can achieve on average 84% savings for leakage power and 12% savings for dynamic power. Furthermore, four hybrid clustering techniques that combine the BP and SP techniques to produce a more efficient solution are also devised. Ground bounce was also taken as a design parameter in the optimization problem. While accounting for noise, the proposed hybrid solution achieves on average 9% savings for dynamic power and 72% savings for leakage power dissipation at sufficient speeds and adequate noise margins.

189 citations


Journal ArticleDOI
TL;DR: Two regular circuit structures based on the programmable logic array (PLA) are proposed, which provide alternatives to the widely used standard-cell structure and have better predictability and simpler design methodologies.
Abstract: Two regular circuit structures based on the programmable logic array (PLA) are proposed. They provide alternatives to the widely used standard-cell structure and have better predictability and simpler design methodologies. A whirlpool PLA is a cyclic four-level structure, which has a compact layout. Doppio-ESPRESSO, a four-level logic minimization algorithm, is developed for the synthesis of Whirlpool PLAs. A river PLA is a stack of multiple output PLAs, which uses river routing for the interconnections of the adjacent PLAs. A synthesis algorithm for river PLAs uses multilevel logic synthesis, simulated-annealing, and ESPRESSO targeting a combination of minimal area and delay.

186 citations


Journal ArticleDOI
TL;DR: This paper presents a new compression method for embedded core-based system-on-a-chip test based on a new variable-length input Huffman coding scheme, which proves to be the key element that determines all the factors that influence the TDCE parameters.
Abstract: This paper presents a new compression method for embedded core-based system-on-a-chip test. In addition to the new compression method, this paper analyzes the three test data compression environment (TDCE) parameters: compression ratio, area overhead, and test application time, and explains the impact of the factors which influence these three parameters. The proposed method is based on a new variable-length input Huffman coding scheme, which proves to be the key element that determines all the factors that influence the TDCE parameters. Extensive experimental comparisons show that, when compared with three previous approaches, which reduce some test data compression environment's parameters at the expense of the others, the proposed method is capable of improving on all the three TDCE parameters simultaneously.

178 citations


Journal ArticleDOI
TL;DR: A rigorous analysis is presented to show that the proposed TRP technique reduces testing time compared to a conventional scan-based scheme, and improves upon prior work on run-length coding by showing that test sets that minimize switching activity during scan shifting can be more efficiently compressed using alternating run- length codes.
Abstract: We present a test resource partitioning (TRP) technique that simultaneously reduces test data volume, test application time, and scan power. The proposed approach is based on the use of alternating run-length codes for test data compression. We present a formal analysis of the amount of data compression obtained using alternating run-length codes. We show that a careful mapping of the don't-cares in precomputed test sets to 1's and 0's leads to significant savings in peak and average power, without requiring either a slower scan clock or blocking logic in the scan cells. We present a rigorous analysis to show that the proposed TRP technique reduces testing time compared to a conventional scan-based scheme. We also improve upon prior work on run-length coding by showing that test sets that minimize switching activity during scan shifting can be more efficiently compressed using alternating run-length codes. Experimental results for the larger ISCAS89 benchmarks and an IBM production circuit show that reduced test data volume, test application time, and low power-scan testing can indeed be achieved in all cases.

177 citations


Journal ArticleDOI
TL;DR: An overview of methods to automatically generate posynomial response surface models for the performance characteristics of analog integrated circuits based on numerical simulation data, capable of generatingPosynomial performance expressions for both linear and nonlinear circuits and circuit characteristics, at SPICE-level accuracy.
Abstract: This paper presents an overview of methods to automatically generate posynomial response surface models for the performance characteristics of analog integrated circuits based on numerical simulation data. The methods are capable of generating posynomial performance expressions for both linear and nonlinear circuits and circuit characteristics, at SPICE-level accuracy. This approach allows for automatic generation of an accurate sizing model for a circuit that composes a geometric program that fully describes the analog circuit sizing problem. The automatic generation avoids the time-consuming and approximate nature of handcrafted analytic model generation. The methods are based on techniques from design of experiments and response surface modeling. Attention is paid to estimating the relative "goodness-of-fit" of the generated models. Experimental results illustrate the capabilities and effectiveness of the presented methods.

171 citations


Journal ArticleDOI
TL;DR: This paper provides a formal definition of the statistical delay of a circuit and derive a statistical timing analysis method from this definition, and proposes a new method for computing statistical bounds which has linear run time complexity.
Abstract: The growing impact of within-die process variation has created the need for statistical timing analysis, where gate delays are modeled as random variables. Statistical timing analysis has traditionally suffered from exponential run time complexity with circuit size, due to arrival time dependencies created by reconverging paths in the circuit. In this paper, we propose a new approach to statistical timing analysis which uses statistical bounds and selective enumeration to refine these bounds. First, we provide a formal definition of the statistical delay of a circuit and derive a statistical timing analysis method from this definition. Since this method for finding the exact statistical delay has exponential run time complexity with circuit size, we also propose a new method for computing statistical bounds which has linear run time complexity. We prove the correctness of the proposed bounds. Since we provide both a lower and upper bound on the true statistical delay, we can determine the quality of the bounds. If the computed bounds are not sufficiently close to each other, we propose a heuristic to iteratively improve the bounds using selective enumeration of the sample space with additional run time. The proposed methods were implemented and tested on benchmark circuits. The results demonstrate that the proposed bounds have only a small error, which can be further reduced using selective enumeration with modest additional run time.

Journal ArticleDOI
TL;DR: This paper presents an approach to the wordlength allocation and optimization problem for linear digital signal processing systems implemented as custom parallel processing units and proposes a heuristic approach which guarantees an optimum set of wordlengths for each internal variable.
Abstract: This paper presents an approach to the wordlength allocation and optimization problem for linear digital signal processing systems implemented as custom parallel processing units. Two techniques are proposed, one which guarantees an optimum set of wordlengths for each internal variable, and one which is a heuristic approach. Both techniques allow the user to tradeoff implementation area for arithmetic error at system outputs. Optimality (with respect to the area and error estimates) is guaranteed through modeling as a mixed integer linear program. It is demonstrated that the proposed heuristic leads to area improvements of 6% to 45% combined with speed increases compared to the optimum uniform wordlength design. In addition, the heuristic reaches within 0.7% of the optimum multiple wordlength area over a range of benchmark problems.

Journal ArticleDOI
TL;DR: Experimental results show that power grid noise can be significantly reduced after a judicious optimization of decap placement, with little change in the total chip area.
Abstract: With technology scaling, the trend for high-performance integrated circuits is toward ever higher operating frequency, lower power supply voltages, and higher power dissipation. This causes a dramatic increase in the currents being delivered through the on-chip power grid and is recognized in the 2001 International Technology Roadmap for Semiconductors as one of the difficult challenges. The addition of decoupling capacitances (decaps) is arguably the most powerful degree of freedom that a designer has for power-grid noise abatement and is becoming more important as technology scales. In this paper, we propose and demonstrate an algorithm for the automated placement and sizing of decaps in application specific integrated circuit (ASIC)-like circuits. The problem is formulated as one of nonlinear optimization and is solved using a sensitivity-based quadratic programming (QP) solver. The adjoint sensitivity method is applied to calculate the first-order sensitivities. We propose a fast convolution technique based on piecewise linear (PWL) compressions of the original and adjoint waveforms. Experimental results show that power grid noise can be significantly reduced after a judicious optimization of decap placement, with little change in the total chip area.

Journal ArticleDOI
TL;DR: In this article, the perturbation projection vector (PPV) is computed using a single linear solution of the oscillator's time or frequency-domain steady-state Jacobian matrix.
Abstract: The main effort in oscillator phase noise calculation and macromodeling lies in computing a vector function called the perturbation projection vector (PPV). Current techniques for PPV calculation use time-domain numerics to generate the system's monodromy matrix, followed by full or partial eigenanalysis. We present superior methods that find the PPV using only a single linear solution of the oscillator's time- or frequency-domain steady-state Jacobian matrix. The new methods are better suited for implementation in existing tools with harmonic balance or shooting capabilities (especially those incorporating "fast" variants), and can also be more accurate than explicit eigenanalysis. A key advantage is that they dispense with the need to select the correct one eigenfunction from amongst a potentially large set of choices, an issue that explicit eigencalculation-based methods have to face. We illustrate the new methods in detail using LC and ring oscillators.

Journal ArticleDOI
TL;DR: A new method is described which gives the designer access to the design space boundaries of a circuit topology, all with transistor-level accuracy, using multiobjective genetic optimization, the hypersurface of Pareto-optimal design points is calculated.
Abstract: A new method is described which gives the designer access to the design space boundaries of a circuit topology, all with transistor-level accuracy. Using multiobjective genetic optimization, the hypersurface of Pareto-optimal design points is calculated. Tradeoff analysis of competing performances at the design space boundaries is made possible by the application of multivariate regression techniques. This new methodology is illustrated with the presentation of the design space for two different types of circuits: a Miller-compensated operational transconductance amplifier and an LC-tank voltage-controlled oscillator.

Journal ArticleDOI
TL;DR: A computer-aided design methodology for optimizing MOS transistor current and sizing is presented where drain current ID, inversion level (represented by inversion coefficient IC), and channel length L are selected as three independent degrees of design freedom resulting in an optimized selection of channel width for layout.
Abstract: A computer-aided design (CAD) methodology for optimizing MOS transistor current and sizing is presented where drain current ID, inversion level (represented by inversion coefficient IC), and channel length L are selected as three independent degrees of design freedom resulting in an optimized selection of channel width for layout. At a given drain current I/sub D/ in saturation, a selected MOS inversion coefficient IC and channel length L define a point on an operating plane illustrating dramatic tradeoffs in circuit performance. Operation in the region of low inversion coefficient IC and long channel length L results in optimal DC gain and matching compared to the region of high inversion coefficient IC and short channel length L where bandwidth is optimal. A design methodology is presented here to enable optimum design choices throughout the continuum of inversion level IC (weak, moderate, or strong inversion) and available channel length L. The methodology is implemented in a prototype CAD system where a graphical view permits the designer to explore optimum tradeoffs against preset goals for circuit transconductance g/sub m/, output conductance g/sub ds/, drain-source saturation voltage, gain, bandwidth, white and flicker noise, and DC matching for a 0.5-/spl mu/m CMOS process. The design methodology can be readily extended to deeper submicron MOS processes through linkage to the EKV or BSIM3 MOS models or custom model equations.

Journal ArticleDOI
TL;DR: This work presents a four-stage heuristic called resource allocation for buffer and interconnect distribution for resource allocation that includes a new, efficient technique for buffer insertion using a length-based constraint.
Abstract: As technology scales, interconnect-centric design flows become imperative for achieving timing closure. Preplanning buffers and wires in the layout is critical for such flows. Both buffers and wires must be considered simultaneously, since wire routes determine buffer requirements and buffer locations constrain the wire routes. In contrast to recently proposed buffer-block planning approaches, our novel design methodology distributes a set of buffer sites throughout the design. This allows one to use a tile graph to abstract the buffer planning problem and simultaneously address wire planning. We present a four-stage heuristic called resource allocation for buffer and interconnect distribution for resource allocation that includes a new, efficient technique for buffer insertion using a length-based constraint. Extensive experiments validate the effectiveness of this approach.

Journal ArticleDOI
TL;DR: Experimental studies demonstrate that neural network modeling is an effective, fast, and accurate methodology for performance estimation for CMOS operational amplifier topologies.
Abstract: Fast and accurate performance estimation methods are essential to automated synthesis of analog circuits. Development of analog performance models is difficult due to the highly nonlinear nature of various analog performance parameters. This paper presents a neural network-based methodology for creating fast and efficient models for estimating the performance parameters of CMOS operational amplifier topologies. Effective methods for generation and use of the training data are proposed to enhance the accuracy of the neural models. The efficiency and accuracy of the resulting performance models are demonstrated via their use in a genetic algorithm-based circuit synthesis system. The genetic synthesis tool optimizes a fitness function based on user-specified performance constraints. The performance parameters of the synthesized circuits are validated by SPICE simulations and compared with those predicted by the neural network models. Experimental studies demonstrate that neural network modeling is an effective, fast, and accurate methodology for performance estimation.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the proposed sequence-of-linear-program method is orders of magnitude faster than the best-known method based on conjugate gradients with constantly better solution qualities.
Abstract: This paper presents a new method of sizing the widths of the power and ground routes in integrated circuits so that the chip area required by the routes is minimized subject to electromigration and IR voltage drop constraints. The basic idea is to transform the underlying constrained nonlinear programming problem into a sequence of linear programs. Theoretically, we show (that the sequence of linear programs always converges to the optimum solution of the relaxed convex optimization problem. Experimental results demonstrate that the proposed sequence-of-linear-program method Is orders of magnitude faster than the best-known method based on conjugate gradients with constantly better solution qualities.

Journal ArticleDOI
TL;DR: In this article, the authors propose a scheme for fast "opportunistic" symmetry extraction and also show that considerations of symmetry may lead to more efficient reductions to SAT in the VLSI routing domain.
Abstract: Research in algorithms for Boolean satisfiability (SAT) and their implementations (Goldberg and Novikov, 2002), (Moskewicz et al., 2001), (Silva and Sakallah, 1999) has recently outpaced benchmarking efforts. Most of the classic DIMACS benchmarks (ftp:dimacs.rutgers.edu/pub/challenge/sat/benchmarks/cnf ) can now be solved in seconds on commodity PCs. More recent benchmarks (Velev and Bryant, 2001) take longer to solve due to their large size, but are still solved in minutes. Yet, relatively small and difficult SAT instances must exist if P /spl ne/ NP. To this end, our paper articulates SAT instances that are unusually difficult for their size, including satisfiable instances derived from very large scale integration (VLSI) routing problems. With an efficient implementation to solve the graph automorphism problem (McKay, 1990), (Soicher, 1993) (Spitznagel, 1994), we show that in structured SAT instances, difficulty may be associated with large numbers of symmetries. We point out that a previously published symmetry extraction mechanism (Crawford et al., 1996) based on a reduction to the graph automorphism problem often produces many spurious symmetries. Our paper contributes two new reductions to graph automorphism, which extract all correct symmetries found previously (Crawford et al., 1996) as well as phase-shift symmetries not found earlier. The correctness of our reductions is rigorously proven, and they are evaluated empirically. We also formulate an improved construction of symmetry-breaking clauses in terms of permutation cycles and propose to use only generators of symmetries in this process. These ideas are implemented in a fully automated flow that first extracts symmetries from a given SAT instance, preprocesses it by adding symmetry-breaking clauses, and then calls a state-of-the-art backtrack SAT solver. Significant speed-ups are shown on many benchmarks versus direct application of the solver. In an attempt to further improve the practicality of our approach, we propose a scheme for fast "opportunistic" symmetry extraction and also show that considerations of symmetry may lead to more efficient reductions to SAT in the VLSI routing domain.

Journal ArticleDOI
TL;DR: An approach is presented for the high-level simulation and synthesis of discrete-time /spl Delta//spl Sigma/ modulators based on a simulation-based optimization strategy that determines both the optimum modulator topology and the required building block specifications such that the system specifications are satisfied at the lowest possible power consumption.
Abstract: An approach is presented for the high-level simulation and synthesis of discrete-time /spl Delta//spl Sigma/ modulators based on a simulation-based optimization strategy. The high-level synthesis approach determines both the optimum modulator topology and the required building block specifications, such that the system specifications-mainly accuracy (dynamic range) and signal bandwidth-are satisfied at the lowest possible power consumption. A genetic-based differential evolution algorithm is used in combination with a fast dedicated behavioral simulator to realistically analyze and optimize the modulator performance. The approach has been implemented in a tool called Daisy (Delta-Sigma Analysis and Synthesis). Experimental results are shown for both the analysis and synthesis capabilities, illustrating the effectiveness of the approach. The selected range of optimized /spl Delta//spl Sigma/ modulator topologies as a function of the modulator specifications for a wide range of values indicate the capabilities of and the performance range covered by the tool.

Journal ArticleDOI
TL;DR: This work presents an online I/O device scheduler, which they call LEDES and MUSCLES, for hard real-time systems that reduces the energy consumption of I/o devices and guarantees that real- time constraints are not violated.
Abstract: Energy consumption is an important design parameter for embedded and portable systems. Software-controlled (or dynamic) power management (DPM) has emerged as an attractive alternative to inflexible hardware solutions. However, DPM via I/O device scheduling for real-time systems has not been considered before. We present an online I/O device scheduler, which we call low-energy device scheduler (LEDES), for hard real-time systems that reduces the energy consumption of I/O devices. LEDES takes as inputs a predetermined task schedule and a device-usage list for each task and it generates a sequence of sleep/working states for each device such that the energy consumption of the device is minimized. It also guarantees that real-time constraints are not violated. We then present a more general I/O device scheduler, which we call multistate constrained low-energy scheduler (MUSCLES), for handling I/O devices with multiple power states. MUSCLES generates a sequence of power states for each I/O device while guaranteeing that real-time constraints are not violated. We present several realistic case studies to show that LEDES and MUSCLES reduce energy consumption significantly for hard real-time systems.

Journal ArticleDOI
TL;DR: This paper proposes the first representation of this kind of topological representation for nonslicing structure that allows us to insert an exact number of irreducible empty rooms to a mosaic floorplan such that every nonsliced floorplan can be obtained uniquely from one and only one mosaicfloorplan.
Abstract: The efficiency and effectiveness of many floorplanning methods depend very much on the representation of the geometrical relationship between the modules. A good representation can shorten the searching process so that more accurate estimations on area and interconnect costs can be performed. Nonslicing floorplan is the most general kind of floorplan that is commonly used. Unfortunately, there is not yet any complete and nonredundant topological representation for nonslicing structure. In this paper, we propose the first representation of this kind. Like some previous work (Zhou et al. 2001), we have also made use of a mosaic floorplan as an intermediate step. However, instead of including a more than sufficient number of extra dummy blocks in the set of modules (that will increase the size of the solution space significantly), our representation allows us to insert an exact number of irreducible empty rooms to a mosaic floorplan such that every nonslicing floorplan can be obtained uniquely from one and only one mosaic floorplan. The size of the solution space is only O(n!2/sup 3n//n/sup 1.5/), which is the size without empty room insertion, but every nonslicing floorplan can be generated uniquely and efficiently in linear time without any redundant representation.

Journal ArticleDOI
TL;DR: The experimental results demonstrate the difference in estimated circuit performance for the case when power supply noise effects are considered versus when these effects are ignored and indicate the need for considering power supply Noise effects on delays during path selection and dynamic timing analysis.
Abstract: The performance of deep submicron designs can be affected by various parametric variations, manufacturing defects, noise or modeling errors that are all statistical in nature. In this paper, we propose a methodology to capture the effects of these statistical variations on circuit performance. It incorporates statistical information into timing analysis to compute the performance sensitivity of internal signals subject to a given type of defect, noise or variation sources. Next, we propose a novel path and segment selection methodology for delay testing based on the results of statistical performance sensitivity analysis. The objective of path/segment selection is to identify a small set of paths and segments such that the delay tests for the selected paths/segments guarantee the detection of performance failure. We apply the proposed path selection technique for selection of a set of paths for dynamic timing analysis considering power supply noise effects. Our experimental results demonstrate the difference in estimated circuit performance for the case when power supply noise effects are considered versus when these effects are ignored. Thus, they indicate the need for considering power supply noise effects on delays during path selection and dynamic timing analysis.

Journal ArticleDOI
TL;DR: A fast but reliable way to detect routing criticalities in very large scale integration chips by using the congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm.
Abstract: We present a fast but reliable way to detect routing criticalities in very large scale integration chips. In addition, we show how this congestion estimation can be incorporated into a partitioning based placement algorithm. Different to previous approaches, we do not rerun parts of the placement algorithm or apply a postplacement optimization, but we use our congestion estimator for a dynamic avoidance of routability problems in one single run of the placement algorithm. Computational experiments on chips with up to 1300000 cells are presented. The framework reduces the usage of the most critical routing edges by 9.0% on average, the running time increase for the placement is about 8.7%. However, due to the smaller congestion, the running time of routing tools can be decreased drastically, so the total time for placement and (global) routing is decreased by 47% on average.

Journal ArticleDOI
TL;DR: A white space allocation approach that dynamically assigns white space according to the congestion distribution of the placement, combined with a multilevel placement flow, significantly improves placement routability and layout quality is presented.
Abstract: The use of white space in fixed-die standard-cell placement is an effective way to improve routability. In this paper, we present a white space allocation approach that dynamically assigns white space according to the congestion distribution of the placement. In the top-down placement flow, white space is assigned to congested regions using smooth allocating functions. A post-allocation optimization step is taken to further improve placement quality. Experimental results show that the proposed allocation approach, combined with a multilevel placement flow, significantly improves placement routability and layout quality. A set of approaches for white space allocation has been presented and compared in this paper. All of them are based on routability-driven methods. However, these approaches vary in the allocation function and allocation aggressiveness. All the placement results are investigated by feeding them into a widely used industrial router (Warp Route of Cadence). Comparisons have been made between: 1) placement with or without white space allocation; 2) different white space allocation approaches; and 3) our placement flow, industrial placement tool, and the other state-of-the-art academic placement tool.

Journal ArticleDOI
P. Pavan, L. Larcher, M. Cuozzo, P. Zuliani, A. Conte 
TL;DR: This work presents a complete compact model based on an original procedure to calculate the floating gate potential in DC conditions, without the need of any capacitive coupling coefficient, designed as a modular structure to simplify program/erase and reliability simulations.
Abstract: E/sup 2/PROM memory devices are widely used in embedded applications For an efficient design flow, a correct modeling of these memory cells in every operation condition becomes more and more important, especially due to power consumption limitations Although E/sup 2/PROM cells have being used for a long time, very few compact models have been developed Here, we present a complete compact model based on an original procedure to calculate the floating gate potential in DC conditions, without the need of any capacitive coupling coefficient This model is designed as a modular structure, so to simplify program/erase and reliability simulations Program/erase and leakage currents are included by means of simple voltage-controlled current sources implementing their analytical expression It can be used to simulate memory cells both during read operation (DC conditions) and during program and erase (transient conditions) giving always very accurate results We show also that, provided there are good descriptions of degradation mechanisms, the same model can be used also for reliability simulations, predicting charge loss due to tunnel oxide degradation

Journal ArticleDOI
TL;DR: The authors show that the charge transfer through wire segments of a net can be calculated directly by solving a system of linear equations, derived from the nodal formulation of the circuit, thereby eliminating the need for time domain simulation.
Abstract: With the increase in current densities, electromigration has become a critical concern in high-performance designs. Typically, electromigration has involved the process of time-domain simulation of drivers and interconnect to obtain average, root mean square (r.m.s.), and peak current values for each wire segment. However, this approach cannot be applied to large problem sizes where hundreds of thousands of nets must be analyzed, each consisting of many thousands of RC elements. The authors propose a static electromigration analysis approach. They show that the charge transfer through wire segments of a net can be calculated directly by solving a system of linear equations, derived from the nodal formulation of the circuit, thereby eliminating the need for time domain simulation. The authors account for the different possible switching scenarios that give rise to unidirectional or bidirectional current by separating the charge transfer from the rising and falling transitions and also propose approaches for modeling multiple simultaneous switching drivers. They implemented the proposed static analysis approach in an industrial electromigration analysis tool that was used on a number of industrial circuits, including a large microprocessor.

Journal ArticleDOI
TL;DR: Efficient computational methods for scattered point and meshless analysis of electrostatic microelectromechanical systems (MEMS) are presented, showing flexible, efficient, and attractive alternatives compared to conventional finite element/boundary element methods for self-consistent electromechanical analysis.
Abstract: We present efficient computational methods for scattered point and meshless analysis of electrostatic microelectromechanical systems (MEMS). Electrostatic MEM devices are governed by coupled mechanical and electrostatic energy domains. A self-consistent analysis of electrostatic MEMS is implemented by combining a finite cloud method-based interior mechanical analysis with a boundary cloud method (BCM)-based exterior electrostatic analysis. Lagrangian descriptions are used for both mechanical and electrostatic analyses. Meshless finite cloud and BCMs, combined with fast algorithms and Lagrangian descriptions, are flexible, efficient, and attractive alternatives compared to conventional finite element/boundary element methods for self-consistent electromechanical analysis. Numerical results are presented for MEM switches, a micromirror device, a lateral comb drive microactuator, and an electrostatic comb drive device. Simulation results are compared with experimental and previously reported data for many of the examples discussed in this paper and a good agreement is observed.

Journal ArticleDOI
TL;DR: This paper introduces a tool, SymSyn, that optimizes and maps data flow descriptions into data paths using complex arithmetic components and demonstrates how substitution can be used for multiexpression component sharing and CPD optimization.
Abstract: The growing market of multimedia applications has required the development of complex application-specified integrated circuits with significant data-path portions. Unfortunately, most high-level synthesis tools and methods cannot automatically synthesize data paths such that complex arithmetic library blocks are intelligently used. Namely, most arithmetic-level optimizations are not supported and they are left to the designer's ingenuity. In this paper, we show how symbolic algebra can be used to construct arithmetic-level decomposition algorithms. We introduce our tool, SymSyn, that optimizes and maps data flow descriptions into data paths using complex arithmetic components. SymSyn uses two new algorithms to find either minimal component mapping or minimal critical path delay (CPD) mapping of the data flow. In this paper, we give an overview of the proposed algorithms. We also show how symbolic manipulations such as tree-height-reduction, factorization, expansion, and Horner transformation are incorporated in the preprocessing step. Such manipulations are used as guidelines in initial library element selection to accelerate the proposed algorithms. Furthermore, we demonstrate how substitution can be used for multiexpression component sharing and CPD optimization.