scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Very Large Scale Integration Systems in 1997"


Journal ArticleDOI
TL;DR: In this article, the problem of testing VLSI integrated circuits in minimum time without exceeding their power ratings during test is considered, and a resource graph formulation is used for the test problem.
Abstract: This paper considers the problem of testing VLSI integrated circuits in minimum time without exceeding their power ratings during test. We use a resource graph formulation for the test problem. The solution requires finding a power-constrained schedule of tests. Two formulations of this problem are given as follows: (1) scheduling equal length tests with power constraints and (2) scheduling unequal length tests with power constraints. Optimum solutions are obtained for both formulations. Algorithms consist of four basic steps. First, a test compatibility graph is constructed from the resource graph. Second, the test compatibility graph is used to identify a complete set of time compatible tests with power dissipation information associated with each test. Third, from the set of compatible tests, lists of power compatible tests are extracted. Finally, a minimum cover table approach is used to find an optimum schedule of power compatible tests.

306 citations


Journal ArticleDOI
TL;DR: Experimental results show that using four supply voltage levels on a number of standard benchmarks, an average energy saving of 53% can be obtained compared to using one xed supply voltage level.
Abstract: We present a dynamic programming technique for solving the multiple supply voltage scheduling problem in both nonpipelined and functionally pipelined data-paths. The scheduling problem refers to the assignment of a supply voltage level (selected from a fixed and known number of voltage levels) to each operation in a data flow graph so as to minimize the average energy consumption for given computation time or throughput constraints or both. The energy model is accurate and accounts for the input pattern dependencies, re-convergent fanout induced dependencies, and the energy cost of level shifters. Experimental results show that using three supply voltage levels on a number of standard benchmarks, an average energy saving of 40.19% (with a computation time constraint of 1.5 times the critical path delay) can be obtained compared to using a single supply voltage level.

300 citations


Journal ArticleDOI
TL;DR: In this article, an instruction-level power analysis model is developed for an embedded digital signal processor (DSP) based on physical current measurements, and a scheduling technique based on the new instruction level power model is proposed.
Abstract: Power is becoming a critical constraint for designing embedded applications. Current power analysis techniques based on circuit-level or architectural-level simulation are either impractical or inaccurate to estimate the power cost for a given piece of application software. In this paper, an instruction-level power analysis model is developed for an embedded digital signal processor (DSP) based on physical current measurements. Significant points of difference have been observed between the software power model for this custom DSP processor and the power models that have been developed earlier for some general purpose commercial microprocessors. In particular, the effect of circuit state on the power cost of an instruction stream is more marked in the case of this DSP processor. In addition, the processor has special architectural features that allow dual memory accesses and packing of instructions into pairs. The energy reduction possible through the use of these features is studied. The on-chip Booth multiplier on the processor is a major source of energy consumption for DSP programs. A microarchitectural power model for the multiplier is developed and analyzed for further power minimization. In order to exploit all of the above effects, a scheduling technique based on the new instruction-level power model is proposed. Several example programs are provided to illustrate the effectiveness of this approach. Energy reductions varying from 26% to 73% have been observed. These energy savings are real and have been verified through physical measurement. It should be noted that the energy reduction essentially comes for free. It is obtained through software modification, and thus, entails no hardware overhead. In addition, there is no loss of performance since the running times of the modified programs either improve or remain unchanged.

284 citations


Journal ArticleDOI
TL;DR: In this article, the use of dynamically adjustable power supplies as a method to lower power dissipation in DSPs is analyzed, and power can be reduced substantially without sacrificing performance in fixed-throughput applications by slowing the clock and lowering supply voltage instead of idling when computational workload varies.
Abstract: The use of dynamically adjustable power supplies as a method to lower power dissipation in DSP is analyzed. Power can be reduced substantially without sacrificing performance in fixed-throughput applications by slowing the clock and lowering supply voltage instead of idling when computational workload varies. This can yield a typical power savings of 30-50%. If latency can be tolerated, buffering data and averaging processing rate can yield power reductions of an order of magnitude in some applications. Continuous variation of the supply voltage can be approximated by very crude quantization and dithering: a four-level controller is sufficient to get within a few percent of the optimal power savings. Significant savings are possible only if the voltage can be changed on the same time scale as the variations in workload. A chip has been fabricated and tested to verify the closed-loop functionality of a variable voltage system. The controller takes only 0.4 mm/sup 2/ and draws a maximum of 1 mW at 2 V with a 40 MHz clock. The control framework developed is applicable to generic DSP applications.

264 citations


Journal ArticleDOI
TL;DR: These fluctuations are shown to pose severe barriers to the scaling of supply voltage and channel length and thus, to the minimization of power dissipation and switching delay in multibillion transistor chips of the future, and can be reduced to some degree by selecting optimal values of channel width.
Abstract: Intrinsic fluctuations in threshold voltage, subthreshold swing, saturation drain current and subthreshold leakage of ultrasmall-geometry MOSFETs due to random placement of dopant atoms in the channel are examined using novel physical models and a Monte Carlo simulator. These fluctuations are shown to pose severe barriers to the scaling of supply voltage and channel length and thus, to the minimization of power dissipation and switching delay in multibillion transistor chips of the future. In particular, using the device technology and the level of integration projections of the National Technology Roadmap for Semiconductors for the next 15 years, standard and maximum deviations of threshold voltage, drive current, subthreshold swing and subthreshold leakage are shown to escalate to 40 and 600 mV, 10 and 100%, 2 and 20 mV/dec, and 10 and 10/sup 8/%, respectively, in the 0.07 /spl mu/m, 0.9 V complementary metal-oxide-semiconductor (CMOS) technology generation with 1.3-64 billion transistors on a chip in 2010. While these deviations can be reduced to some degree by selecting optimal values of channel width, the associated penalties in dynamic and static power, and in packing density demand improved MOSFET structures aimed at minimizing parameter deviations.

240 citations


Journal ArticleDOI
TL;DR: This work proposes several approaches to address the problem of power dissipation in high performance CMOS VLSI, and proposes codes that can be used on a class of terminated off-chip board-level buses with level signaling, or on tristate on-chip buses withlevel or transition signaling.
Abstract: Technology trends and especially portable applications are adding a third dimension (power) to the previously two-dimensional (speed, area) VLSI design space. A large portion of power dissipation in high performance CMOS VLSI is due to the inherent difficulties in global communication at high rates and we propose several approaches to address the problem. These techniques can be generalized at different levels in the design process. Global communication typically involves driving large capacitive loads which inherently require significant power. However, by carefully choosing the data representation, or encoding, of these signals, the average and peak power dissipation can be minimized. Redundancy can be added in space (number of bus lines), time (number of cycles) and voltage (number of distinct amplitude levels). The proposed codes can be used on a class of terminated off-chip board-level buses with level signaling, or on tristate on-chip buses with level or transition signaling.

188 citations


Journal ArticleDOI
TL;DR: It is found that circuits with a large number of critical paths and with a low logic depth are most sensitive to uncorrelated gate delay variations, and scenarios for future technologies show the increased impact of uncor related delay variations on digital design.
Abstract: The yield of low voltage digital circuits is found to he sensitive to local gate delay variations due to uncorrelated intra-die parameter deviations. Caused by statistical deviations of the doping concentration they lead to more pronounced delay variations for minimum transistor sizes. Their influence on path delays in digital circuits is verified using a carry select adder test circuit fabricated in 0.5 and 0.35 /spl mu/m complementary metal-oxide-semiconductor (CMOS) technologies with two different threshold voltages. The increase of the path delay variations for smaller device dimensions and reduced supply voltages as well as the dependence on the path length is shown. It is found that circuits with a large number of critical paths and with a low logic depth are most sensitive to uncorrelated gate delay variations. Scenarios for future technologies show the increased impact of uncorrelated delay variations on digital design. A reduction of the maximal clock frequency of 10% is found for, for example, highly pipelined systems realized in a 0.18-/spl mu/m CMOS technology.

177 citations


Journal ArticleDOI
Olivier Coudert1
TL;DR: In this article, the gate sizing algorithm (GS) is proposed to minimize the power consumption and/or the area of a circuit under some user-defined delay constraints, or to obtain the fastest circuit within a given power budget.
Abstract: Gate sizing has a significant impact on the delay, power dissipation, and area of the final circuit. It consists of choosing for each node of a mapped circuit a gate implementation in the library so that a cost function is optimized under some constraints. For instance, one wants to minimize the power consumption and/or the area of a circuit under some user-defined delay constraints, or to obtain the fastest circuit within a given power budget. Although this technology-dependent optimization has been investigated for years, the proposed approaches sometimes rely on assumptions, cost models, or algorithms that make them unrealistic or impossible to apply on real-life large circuits. We discussed here a gate sizing algorithm (GS), and show how it is used to achieve constrained optimization. It can be applied on large circuits within a reasonable CPU time, e.g., minimizing the power of a 10000 gates circuit under some delay constraint in 2 h.

162 citations


Journal ArticleDOI
TL;DR: A new family of temperature sensors will be presented, developed by the authors especially for the purpose of thermal monitoring of VLSI chips, characterized by the very low silicon area and the low power consumption.
Abstract: The paper presents appropriate sensors for the realization of the design principle of design for thermal testability (DfTT). After a short overview of the available CMOS temperature sensors, a new family of temperature sensors will be presented, developed by the authors especially for the purpose of thermal monitoring of VLSI chips. These sensors are characterized by the very low silicon area of about 0.003-0.02 mm/sup 2/ and the low power consumption (200 /spl mu/W). The accuracy is in the order of 1/spl deg/C. Using the frequency-output versions an easy interfacing of digital test circuitry is assured. They can be very easily incorporated into the usual test circuitry, via the boundary-scan architecture. The paper presents measured results obtained by the experimental circuits. The facilities provided by the sensor connected to the boundary-scan test circuitry are also demonstrated experimentally.

122 citations


Journal ArticleDOI
Wayne Wolf1
TL;DR: A new, heuristic algorithm which simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost is described.
Abstract: Many embedded computers are distributed systems, composed of several heterogeneous processors and communication links of varying speeds and topologies. This paper describes a new, heuristic algorithm which simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost. The hardware architecture of the synthesized system consists of a network of processors of multiple types and arbitrary communication topology; the software architecture consists of an allocation of processes to processors and a schedule for the processes. Most previous work in co-synthesis targets an architectural template, whereas this algorithm can synthesize a distributed system of arbitrary topology. The algorithm works from a technology database which describes the available processors, communication links, I/O devices, and implementations of processes on processors. Previous work had proposed solving this problem by integer linear programming (ILP); our algorithm is much faster than ILP and produces high-quality results.

113 citations


Journal ArticleDOI
TL;DR: The latest advances in the SISSI package (simulator for integrated structures by simultaneous iteration) are discussed, including electro-thermal ac and transient simulation and the consideration of the thermal voltage of Si-Al contacts.
Abstract: Due to severe thermal problems of today's VLSI integrated circuits the need for reliable and quick thermal, electro-thermal and logi-thermal simulation tools is increasing, In this paper, we discuss the latest advances in the SISSI package (simulator for integrated structures by simultaneous iteration) which is a tool developed originally for analog VLSI design. The improvements include electro-thermal ac and transient simulation and the consideration of the thermal voltage of Si-Al contacts. Furthermore, we introduce a new module of SISSI, LOGITHERM, which is aimed at the self-consistent logic and thermal simulation of large digital VLSI designs. The features of our simulator package are highlighted by simulation examples that are compared in most cases with measurement results.

Journal ArticleDOI
TL;DR: In this paper, a methodology for simulating the static and dynamic performance of integrated circuits in the presence of electro-thermal interactions on the integrated circuit die is presented, which is based on the coupling of a finite element method (FEM) program with a circuit simulator.
Abstract: The paper presents a methodology for simulating the static and dynamic performance of integrated circuits in the presence of electro-thermal interactions on the integrated circuit die. The technique is based on the coupling of a finite element method (FEM) program with a circuit simulator. In contrast to other known simulator couplings a time step algorithm is used, Its implementation in simulation tools is described. The thermal modeling of the die/package structure and the extended modeling of the electronic circuit is discussed. Simulation results which indicate the capabilities of the methodology for electro-thermal simulation are compared to experimental results.

Journal ArticleDOI
TL;DR: In this paper, the authors advocate the use of instruction buffering as a power-saving technique for processors for signal processing and multimedia applications, based on the runtime characteristics of signal processing applications.
Abstract: Power consumption analyzes of embedded processors indicate that a significant amount of power is consumed in accessing memory and in the control path. Based on this, and on the runtime characteristics of signal processing applications, we advocate the use of instruction buffering as a power-saving technique for processors for signal processing and multimedia applications. Two approaches, a decoded instruction buffer (DIB) and a decoded instruction cache, are considered. Performance improvements in representative applications in speech processing such as, the vector sum excited linear prediction (VSELP), linear prediction coding coefficient computation (LPC), and two-dimensional 2-D 8/spl times/8 DCT which is used in image compression, are provided. The reduction in power obtained is between between 25 and 30%.

Journal ArticleDOI
TL;DR: A new algorithm that performs binding/allocation of communication units is presented that makes use of a cost function to evaluate different allocation alternatives and illustrates through an example the usefulness of the algorithm for allocating automatically different protocols within the same application system.
Abstract: The aim of this paper is to present a communication synthesis approach stated as an allocation problem. In the proposed approach, communication synthesis allows to transform a system composed of processes that communicate via high-level primitives through abstract channels into a set of processes executed by interconnected processors that communicate via signals and share communication control. The proposed communication synthesis approach deals with both protocol selection and interface generation and is based on binding/allocation of communication units. This approach allows a wide design space exploration through automatic selection of communication protocols. We present a new algorithm that performs binding/allocation of communication units. This algorithm makes use of a cost function to evaluate different allocation alternatives. We illustrate through an example the usefulness of the algorithm for allocating automatically different protocols within the same application system.

Journal ArticleDOI
TL;DR: Fully coupled dynamic electro-thermal simulation on chip and circuit level is presented in this paper, where temperature dependent thermal conductivity of silicon is taken into account, thus solving the nonlinear heat diffusion equation.
Abstract: Fully coupled dynamic electro-thermal simulation on chip and circuit level is presented. Temperature dependent thermal conductivity of silicon is taken into account, thus solving the nonlinear heat diffusion equation. The numerical solution is carried out by using the industry-standard simulator SABER, therefore for electro-thermal simulations we are able to use the common electrical compact models by adding a heat source and thermal pins to them. The application of this technique and need for electro-thermal simulation is illustrated with the simulation of a current control circuit built into a multiwatt package.

Journal ArticleDOI
TL;DR: A quiet logic family-complementary metal-oxide-semiconductor (CMOS) current steering logic (CSL) has been developed for use in low-voltage mixed-signal integrated circuits as mentioned in this paper.
Abstract: A quiet logic family-complementary metal-oxide-semiconductor (CMOS) current steering logic (CSL)-has been developed for use in low-voltage mixed-signal integrated circuits. Compared to a CMOS static logic gate with its output range of /spl Delta/V/sub logic//spl ap/V/sub dd/, a CSL gate swings only /spl Delta/V/sub logic//spl ap/V/sub T/+0.25 V because the constant current supplied by the PMOS load device is steered to ground through either an NMOS diode-connected device or switching network. Owing to the constant current, digital switching noise is 100/spl times/ smaller than in static logic. Another useful feature which can be used to calibrate CSL speed against process, temperature, and voltage variations is propagation delay that is approximately constant versus supply voltage and linear with bias current. Several CSL circuits have been fabricated using 0.8 and 1.2 /spl mu/m high-V/sub T/ n-well CMOS processes. Two self-loaded 39-stage ring oscillators fabricated using the 1.2 /spl mu/m process (1.2 V power supply) exhibited power-delay products of 12 and 70 fJ with average propagation delays of 0.4 and 0.7 ns, respectively. High-V/sub T/ and low-V/sub T/ CSL ALU's were operational at V/sub dd//spl ap/=0.70 V and V/sub dd//spl ap/0.40 V, respectively.

Journal ArticleDOI
TL;DR: This paper introduces a novel approach to the design of memory systems, which is based on a variety of array grouping techniques and dimensional transformations, and the binding of array groups to memory components with different dimensions, access times, and number of ports.
Abstract: This paper discusses the mapping of arrays in a behavior to memories in an implementation. We introduce a novel approach to the design of memory systems, which is based on a variety of array grouping techniques and dimensional transformations, and the binding of array groups to memory components with different dimensions, access times, and number of ports. The results of design actions are computed in terms of memory cost, the number of wires necessary to connect the memory to the data path, and the limit of performance imposed by the memory design on the implementation. Three different procedures can be used to find a suitable memory design. All three procedures are directed by a weighted and constrained system cost function, which enables the expression of the user's design priorities. Compared to related research efforts, our approach improves performance by as much as 19%, reduces memory cost as 40%, and decreases the number of wires required to connect the memory to the data path by up to 57%.

Journal ArticleDOI
TL;DR: Hierarchical interconnection structures for field programmable gate arrays are proposed andExperiments on benchmark circuits show that density and performance are significantly improved.
Abstract: Field programmable gate arrays (FPGA's) suffer from lower density and lower performance than conventional gate arrays. Hierarchical interconnection structures for field programmable gate arrays are proposed. They help overcome these problems. Logic blocks in a field programmable gate array are grouped into clusters. Clusters are then recursively grouped together. To obtain the optimal hierarchical structure with high performance and high density, various hierarchical structures with the same routability are discussed. The field programmable gate arrays with new architecture can be efficiently configured with existing computer aided design algorithms. The k-way min-cut algorithm is applicable to the placement step in the implementation. Global routing paths in a field programmable gate array can be obtained easily. The placement and global routing steps can be performed simultaneously. Experiments on benchmark circuits show that density and performance are significantly improved.

Journal ArticleDOI
TL;DR: This paper proposes a novel approach to exact local code compaction based on an integer programming (IP) model, which handles time constraints and applies to a large class of instruction formats.
Abstract: This paper addresses instruction-level parallelism in code generation for digital signal processors (DSPs). In the presence of potential parallelism, the task of code generation includes code compaction, which parallelizes primitive processor operations under given dependency and resource constraints. Furthermore, DSP algorithms in most cases are required to guarantee real-time response. Since the exact execution speed of a DSP program is only known after compaction, real-time constraints should be taken into account during the compaction phase. While previous DSP code generators rely on rigid heuristics for compaction, we propose a novel approach to exact local code compaction based on an integer programming (IP) model, which handles time constraints. Due to a general problem formulation, the IP model also captures encoding restrictions and handles instructions having alternative encodings and side effects and therefore applies to a large class of instruction formats. Capabilities and limitations of our approach are discussed for different DSPs.

Journal ArticleDOI
TL;DR: Algorithms for combining dataflow and control-flow techniques into a robust scheduling system which is capable of trading optimality for execution time on-the-fly are presented.
Abstract: As high-level synthesis techniques gain acceptance among designers, it is important to be able to provide a robust system which can handle large designs in short execution times, producing high-quality results. Scheduling is one of the most complex tasks in high-level synthesis, and although many algorithms exist for solving the scheduling problem, it remains a main source of inefficiency by either not producing high-quality results, not taking into account realistic design requirements, or requiring unacceptable execution times. One of the main problems in scheduling is the dichotomy between control and data. Many algorithms to date have been able to provide scheduling solutions by looking only at either the data part or the control part of the design. This has been done in order to simplify the problem; however, it has resulted in many algorithms unable to handle efficiently large designs with complex control and data functionality. This paper presents algorithms for combining dataflow and control-flow techniques into a robust scheduling system. The main characteristics of this system are as follows: 1) it uses path-based techniques for efficient handling of control and mutual exclusiveness (for resource sharing), 2) it allows operation reordering and parallelism extraction within the context of path-based scheduling, 3) it contains a control partitioning algorithm for design space exploration as well as for reducing the number of control paths, and 4) it combines the above algorithms into an adaptive scheduling system which is capable of trading optimality for execution time on-the-fly. Results involving billions of paths are presented and analyzed.

Journal ArticleDOI
TL;DR: A symbolic model of complementary metal-oxide-semiconductor (CMOS) gates is proposed to capture the dependence of power consumption and current flows on input patterns and fan-in/fan-out conditions.
Abstract: In this paper, we present a new gate-level approach to power and current simulation. We propose a symbolic model of complementary metal-oxide-semiconductor (CMOS) gates to capture the dependence of power consumption and current flows on input patterns and fan-in/fan-out conditions. Library elements are characterized and their models are used during event-driven logic simulation to provide power information and construct time-domain current waveforms. We provide both global and local pattern-dependent estimates of power consumption and current peaks (with accuracy of 6 and 10% from SPICE, respectively), while keeping performance comparable with traditional gate-level simulation with unit delay. We use VERILOG-XL as simulation engine to grant compatibility with design tools based on Verilog HDL. A Web-based user interface allows our simulator (PPP) to be accessed through the Internet using a standard web browser.

Journal ArticleDOI
Jun Ma1, Han-Bin Liang1, R. A. Pryor1, Sunny Cheng1, M. H. Kaneshiro1, C. S. Kyono1, K. Papworth1 
TL;DR: It is shown that, compared to conventional complementary metal-oxide-semiconductor (CMOS), the GCMOS device offers the advantage of significantly higher drive current, capable of lower threshold voltage with improved punchthrough resistance, lower body effect and lower series resistance, thus making it most suitable for applications that require both high performance and low power consumption.
Abstract: Graded-Channel MOS (GCMOS) VLSI technology has been developed to meet the growing demand for low power and high performance applications. In this paper, it will be shown that, compared to conventional complementary metal-oxide-semiconductor (CMOS), the GCMOS device offers the advantage of significantly higher drive current, capable of lower threshold voltage with improved punchthrough resistance, lower body effect and lower series resistance, thus making it most suitable for applications that require both high performance and low power consumption, such as digital signal processing (DSP). This is demonstrated, for the first time, by much improved low voltage circuit performance of a DSP logic circuit fabricated using a 0.5 /spl mu/m GCMOS process. At 1.8 V, a 30% speed improvement over CMOS is achieved, and the power-delay product is reduced by 25%. In addition, similar speed improvement is achieved in SRAM's with consistent performance improvement over a wide range of temperatures between -50 and 150/spl deg/C.

Journal ArticleDOI
TL;DR: Two novel iterative algorithms and their array structures for integer modular multiplication for Rivest-Shamir-Adelman (RSA) cryptography and are based on the familiar iterative Horner's rule, but use precalculated complements of the modulus.
Abstract: We present two novel iterative algorithms and their array structures for integer modular multiplication. The algorithms are designed for Rivest-Shamir-Adelman (RSA) cryptography and are based on the familiar iterative Horner's rule, but use precalculated complements of the modulus. The problem of deciding which multiples of the modulus to subtract in intermediate iteration stages has been simplified using simple look-up of precalculated complement numbers, thus allowing a finer-grain pipeline. Both algorithms use a carry save adder scheme with module reduction performed on each intermediate partial product which results in an output in carry-save format. Regularity and local connections make both algorithms suitable for high-performance array implementation in FPGA's or deep submicron VLSI. The processing nodes consist of just one or two full adders and a simple multiplexor. The stored complement numbers need to be precalculated only when the modulus is changed, thus not affecting the performance of the main computation. In both cases, there exists a bit-level systolic schedule, which means the array can be fully pipelined for high performance and can also easily be mapped to linear arrays for various space/time tradeoffs.

Journal ArticleDOI
TL;DR: A system capable of modeling VLSI effects in a realistic and sufficiently accurate way that uses a reasonable amount of CPU resources is presented and an innovative solver is proposed.
Abstract: Needs for electro-thermal simulation of VLSI circuits, as opposed to both the system and device levels, are analyzed. A system capable of modeling these effects in a realistic and sufficiently accurate way that uses a reasonable amount of CPU resources is presented. An innovative solver is also proposed. The system is used to study the importance of some three dimensional (3-D) effects as well as metallic connections. A complete example was treated to have an insight on the type of results to be expected and the corresponding costs in terms of CPU.

Journal ArticleDOI
TL;DR: In this paper, a comprehensive analysis and estimate of simultaneous switching noise (SSN) including the velocity saturation effects seen in the submicron transistors during the switching of output drivers is presented.
Abstract: Complementary metal-oxide-semiconductor (CMOS) output buffers, comprised of a series of tapered inverters, are used to drive large off-chip capacitances. The ratio of the size of transistors between two consecutive stages is the buffer taper factor. With higher frequency of operation and simultaneous switching of the output drivers, the parasitic inductance present at the pin-pad-package interface results in significant switching noise on the power lines. A comprehensive analysis and estimate of simultaneous switching noise (SSN) including the velocity saturation effects seen in the submicron transistors during the switching of output drivers is presented. The effect of SSN on the overall buffer propagation delay and transition time is discussed. The presence of SSN results in an increase in the optimum taper factor between inverter stages for a given capacitive load. Beyond a critical value, the output transition time of a tapered buffer increases with reducing taper factor due to SSN. SSN can be reduced by skewing the switching of output buffers, SPICE simulation results show that skewing buffer switching with additional inverter stages reduces SSN and increases buffer propagation delay.

Journal ArticleDOI
TL;DR: A novel approach is proposed that distributes the clock with an H-tree, whose branches are composed of minimum-sized inverters rather than metal, to avoid the frequency limitation and obtain the highest clocking rate achievable with a given technology.
Abstract: This paper addresses the problem of clocking large high-speed digital systems, as well as deterministic skew modeling, a related problem. In order to provide a reliable skew model, and to avoid the frequency limitation, we propose a novel approach that distributes the clock with an H-tree, whose branches are composed of minimum-sized inverters rather than metal. With such a structure, we obtain the highest clocking rate achievable with a given technology. Indeed, clock rates around 1 GHz are possible with a 1.2 /spl mu/m CMOS technology. From the skew modeling standpoint, we derive an analytic expression of the skew between two leaves of the H-tree, which we consider to be the difference in root-to-leaf delay pairs. The skew upper bound obtained has an order of complexity which, with respect to the H-tree size D, is the same as the one that may be derived from the Fisher and Kung model for both side-to-side and neighbor-to-neighbor communications, i.e., a /spl Omega/(D/sup 2/), whereas, the Steiglitz and Kugelmass probabilistic model predicts /spl Theta/(D/spl times//spl radic/LogD). In an H-tree implemented with metallic lines, the leaf-to-leaf skew is obviously bounded by the delay between the root and the leaves. However, with the logic based H-tree proposed here, we arrive at a nonobvious result, which states that the leaf-to-leaf skew grows faster than the root-to-leaf delay in presence of a uniform transistor time constant gradient. This paper also proposes generalizations of the skew model to (1) the case of chips in a wafer subject to a smooth, but nonuniform gradient and (2) the case of H-tree configurations mixing logic and interconnections; in this respect, this paper covers the H-tree configurations based on the combination of logic and interconnections.

Journal ArticleDOI
TL;DR: In this paper, an exact solution methodology for solving the scheduling problem in a three-dimensional (3-D) design space is described, where the usual two-dimensional design space (which trades off area and schedule length) plus a third dimension representing clock length are used.
Abstract: This paper describes an exact solution methodology, implemented in Rensselaer's Voyager design space exploration system, for solving the scheduling problem in a three-dimensional (3-D) design space: the usual two-dimensional (2-D) design space (which trades off area and schedule length), plus a third dimension representing clock length. Unlike design space exploration methodologies which rely on bounds or estimates, this methodology is guaranteed to find the globally optimal solution to a 3-D scheduling problem. Furthermore, this methodology efficiently prunes the search space, eliminating provably inferior design points through the following: 1) a careful selection of candidate clock lengths and 2) tight bounds on the number of functional units or on the schedule length. Both chaining and multicycle operations are supported.

Journal ArticleDOI
TL;DR: This paper presents a technique to correct multiple logic design errors in a gate-level netlist by repeatedly applying a single error search and correction algorithm for circuits with a low multiplicity of errors.
Abstract: This paper presents a technique to correct multiple logic design errors in a gate-level netlist. A number of methods have been proposed for correcting single logic design errors. However, the extension of these methods to more than one error is still very limited. We direct our attention to circuits with a low multiplicity of errors. By assuming different error dependency scenarios, multiple errors are corrected by repeatedly applying a single error search and correction algorithm. Experimental results on correcting double-design errors and triple-design errors on ISCAS and MCNC benchmark circuits are included.

Journal ArticleDOI
TL;DR: In this article, the authors propose a fault detection scheme based on the principle of node covering, in which the computation of each cell is checked by a "covering" cell.
Abstract: REMOD (REprocessing with MicrO Delays) is a new method for fault-tolerant design of logic circuits composed of arrays of identical functional cells. The fault detection scheme is based on the principle of node covering, in which the computation of each cell is checked by a "covering" cell. After a faulty cell is detected, the node covering principle also allows the circuit to easily be reconfigured to perform correctly for subsequent inputs. Furthermore, the design method is extendable to multiple fault tolerance with only small increments of hardware and time. We have laid out and simulated REMOD-based circuits for adders and multipliers and show that the time overheads are a small factor of the original computation time-0 or /spl Theta/(1/n) to /spl Theta/(1/(log n)), for an n-cell circuit. For moderately complex cells, it is seen that area overhead is very reasonable as well.

Journal ArticleDOI
Qiuting Huang1, P. Basedau1
TL;DR: In this paper, a 78 MHz crystal oscillator is described, which forms part of a regulated system in a pager where the oscillation frequency is controlled digitally to sub-ppm accuracy.
Abstract: The current consumption of crystal oscillators is usually determined by the steady-state amplitude requirement, rather than the minimum transconductance for oscillation to exist, In a bipolar implementation transconductance is proportional to current, so that current consumption scales with frequency and load capacitance in the same way as transconductance. In a complementary metal-oxide-semiconductor (CMOS) implementation, current scales as the square of transconductance. It is therefore important to distinguish current from transconductance in power estimation for high frequency oscillators. Analytical expressions relating current to steady-state amplitude are used in this paper to estimate the minimum power required for a crystal oscillator at a given frequency. A 78 MHz crystal oscillator is described, which forms part of a regulated system in a pager where the oscillation frequency is controlled digitally to sub-ppm accuracy. The oscillator can be pulled from /spl plusmn/65 ppm to the required frequency with 0.2 ppm accuracy, with a maximum current consumption of 197 /spl mu/A. The circuit has been fabricated in a 1-/spl mu/m CMOS technology. The measured phase noise is -113 dBc/Hz at 300 Hz offset.