Showing papers in "IEEE Transactions on Very Large Scale Integration Systems in 1997"

PDF

Open Access

Journal Article•DOI•

Scheduling tests for VLSI systems under power constraints

[...]

R.M. Chou¹, Kewal K. Saluja¹, Vishwani D. Agrawal²•Institutions (2)

University of Wisconsin-Madison¹, Alcatel-Lucent²

01 Jun 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this article, the problem of testing VLSI integrated circuits in minimum time without exceeding their power ratings during test is considered, and a resource graph formulation is used for the test problem.

...read moreread less

Abstract: This paper considers the problem of testing VLSI integrated circuits in minimum time without exceeding their power ratings during test. We use a resource graph formulation for the test problem. The solution requires finding a power-constrained schedule of tests. Two formulations of this problem are given as follows: (1) scheduling equal length tests with power constraints and (2) scheduling unequal length tests with power constraints. Optimum solutions are obtained for both formulations. Algorithms consist of four basic steps. First, a test compatibility graph is constructed from the resource graph. Second, the test compatibility graph is used to identify a complete set of time compatible tests with power dissipation information associated with each test. Third, from the set of compatible tests, lists of power compatible tests are extracted. Finally, a minimum cover table approach is used to find an optimum schedule of power compatible tests.

...read moreread less

306 citations

Journal Article•DOI•

Energy minimization using multiple supply voltages

[...]

Jui-Ming Chang¹, Massoud Pedram¹•Institutions (1)

University of Southern California¹

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Experimental results show that using four supply voltage levels on a number of standard benchmarks, an average energy saving of 53% can be obtained compared to using one xed supply voltage level.

...read moreread less

Abstract: We present a dynamic programming technique for solving the multiple supply voltage scheduling problem in both nonpipelined and functionally pipelined data-paths. The scheduling problem refers to the assignment of a supply voltage level (selected from a fixed and known number of voltage levels) to each operation in a data flow graph so as to minimize the average energy consumption for given computation time or throughput constraints or both. The energy model is accurate and accounts for the input pattern dependencies, re-convergent fanout induced dependencies, and the energy cost of level shifters. Experimental results show that using three supply voltage levels on a number of standard benchmarks, an average energy saving of 40.19% (with a computation time constraint of 1.5 times the critical path delay) can be obtained compared to using a single supply voltage level.

...read moreread less

300 citations

Journal Article•DOI•

Power analysis and minimization techniques for embedded DSP software

[...]

Mike Tien-Chien Lee¹, Vivek Tiwari², Sharad Malik², Masahiro Fujita¹•Institutions (2)

Fujitsu¹, Princeton University²

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this article, an instruction-level power analysis model is developed for an embedded digital signal processor (DSP) based on physical current measurements, and a scheduling technique based on the new instruction level power model is proposed.

...read moreread less

Abstract: Power is becoming a critical constraint for designing embedded applications. Current power analysis techniques based on circuit-level or architectural-level simulation are either impractical or inaccurate to estimate the power cost for a given piece of application software. In this paper, an instruction-level power analysis model is developed for an embedded digital signal processor (DSP) based on physical current measurements. Significant points of difference have been observed between the software power model for this custom DSP processor and the power models that have been developed earlier for some general purpose commercial microprocessors. In particular, the effect of circuit state on the power cost of an instruction stream is more marked in the case of this DSP processor. In addition, the processor has special architectural features that allow dual memory accesses and packing of instructions into pairs. The energy reduction possible through the use of these features is studied. The on-chip Booth multiplier on the processor is a major source of energy consumption for DSP programs. A microarchitectural power model for the multiplier is developed and analyzed for further power minimization. In order to exploit all of the above effects, a scheduling technique based on the new instruction-level power model is proposed. Several example programs are provided to illustrate the effectiveness of this approach. Energy reductions varying from 26% to 73% have been observed. These energy savings are real and have been verified through physical measurement. It should be noted that the energy reduction essentially comes for free. It is obtained through software modification, and thus, entails no hardware overhead. In addition, there is no loss of performance since the running times of the modified programs either improve or remain unchanged.

...read moreread less

284 citations

Journal Article•DOI•

Embedded power supply for low-power DSP

[...]

V. Gutnik¹, Anantha P. Chandrakasan¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this article, the use of dynamically adjustable power supplies as a method to lower power dissipation in DSPs is analyzed, and power can be reduced substantially without sacrificing performance in fixed-throughput applications by slowing the clock and lowering supply voltage instead of idling when computational workload varies.

...read moreread less

Abstract: The use of dynamically adjustable power supplies as a method to lower power dissipation in DSP is analyzed. Power can be reduced substantially without sacrificing performance in fixed-throughput applications by slowing the clock and lowering supply voltage instead of idling when computational workload varies. This can yield a typical power savings of 30-50%. If latency can be tolerated, buffering data and averaging processing rate can yield power reductions of an order of magnitude in some applications. Continuous variation of the supply voltage can be approximated by very crude quantization and dithering: a four-level controller is sufficient to get within a few percent of the optimal power savings. Significant savings are possible only if the voltage can be changed on the same time scale as the variations in workload. A chip has been fabricated and tested to verify the closed-loop functionality of a variable voltage system. The controller takes only 0.4 mm/sup 2/ and draws a maximum of 1 mW at 2 V with a 40 MHz clock. The control framework developed is applicable to generic DSP applications.

...read moreread less

264 citations

Journal Article•DOI•

Intrinsic MOSFET parameter fluctuations due to random dopant placement

[...]

Xinghai Tang¹, Vivek De², James D. Meindl¹•Institutions (2)

Georgia Institute of Technology¹, Intel²

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: These fluctuations are shown to pose severe barriers to the scaling of supply voltage and channel length and thus, to the minimization of power dissipation and switching delay in multibillion transistor chips of the future, and can be reduced to some degree by selecting optimal values of channel width.

...read moreread less

Abstract: Intrinsic fluctuations in threshold voltage, subthreshold swing, saturation drain current and subthreshold leakage of ultrasmall-geometry MOSFETs due to random placement of dopant atoms in the channel are examined using novel physical models and a Monte Carlo simulator. These fluctuations are shown to pose severe barriers to the scaling of supply voltage and channel length and thus, to the minimization of power dissipation and switching delay in multibillion transistor chips of the future. In particular, using the device technology and the level of integration projections of the National Technology Roadmap for Semiconductors for the next 15 years, standard and maximum deviations of threshold voltage, drive current, subthreshold swing and subthreshold leakage are shown to escalate to 40 and 600 mV, 10 and 100%, 2 and 20 mV/dec, and 10 and 10/sup 8/%, respectively, in the 0.07 /spl mu/m, 0.9 V complementary metal-oxide-semiconductor (CMOS) technology generation with 1.3-64 billion transistors on a chip in 2010. While these deviations can be reduced to some degree by selecting optimal values of channel width, the associated penalties in dynamic and static power, and in packing density demand improved MOSFET structures aimed at minimizing parameter deviations.

...read moreread less

240 citations

Journal Article•DOI•

Low-power encodings for global communication in CMOS VLSI

[...]

Mircea R. Stan¹, Wayne Burleson²•Institutions (2)

University of Virginia¹, University of Massachusetts Amherst²

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This work proposes several approaches to address the problem of power dissipation in high performance CMOS VLSI, and proposes codes that can be used on a class of terminated off-chip board-level buses with level signaling, or on tristate on-chip buses withlevel or transition signaling.

...read moreread less

Abstract: Technology trends and especially portable applications are adding a third dimension (power) to the previously two-dimensional (speed, area) VLSI design space. A large portion of power dissipation in high performance CMOS VLSI is due to the inherent difficulties in global communication at high rates and we propose several approaches to address the problem. These techniques can be generalized at different levels in the design process. Global communication typically involves driving large capacitive loads which inherently require significant power. However, by carefully choosing the data representation, or encoding, of these signals, the average and peak power dissipation can be minimized. Redundancy can be added in space (number of bus lines), time (number of cycles) and voltage (number of distinct amplitude levels). The proposed codes can be used on a class of terminated off-chip board-level buses with level signaling, or on tristate on-chip buses with level or transition signaling.

...read moreread less

188 citations

Journal Article•DOI•

The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits

[...]

M. Eisele, J. Berthold¹, D. Schmitt-Landsiedel², R. Mahnkopf¹•Institutions (2)

Siemens¹, Technische Universität München²

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It is found that circuits with a large number of critical paths and with a low logic depth are most sensitive to uncorrelated gate delay variations, and scenarios for future technologies show the increased impact of uncor related delay variations on digital design.

...read moreread less

Abstract: The yield of low voltage digital circuits is found to he sensitive to local gate delay variations due to uncorrelated intra-die parameter deviations. Caused by statistical deviations of the doping concentration they lead to more pronounced delay variations for minimum transistor sizes. Their influence on path delays in digital circuits is verified using a carry select adder test circuit fabricated in 0.5 and 0.35 /spl mu/m complementary metal-oxide-semiconductor (CMOS) technologies with two different threshold voltages. The increase of the path delay variations for smaller device dimensions and reduced supply voltages as well as the dependence on the path length is shown. It is found that circuits with a large number of critical paths and with a low logic depth are most sensitive to uncorrelated gate delay variations. Scenarios for future technologies show the increased impact of uncorrelated delay variations on digital design. A reduction of the maximal clock frequency of 10% is found for, for example, highly pipelined systems realized in a 0.18-/spl mu/m CMOS technology.

...read moreread less

177 citations

Journal Article•DOI•

Gate sizing for constrained delay/power/area optimization

[...]

Olivier Coudert¹•Institutions (1)

Synopsys¹

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this article, the gate sizing algorithm (GS) is proposed to minimize the power consumption and/or the area of a circuit under some user-defined delay constraints, or to obtain the fastest circuit within a given power budget.

...read moreread less

Abstract: Gate sizing has a significant impact on the delay, power dissipation, and area of the final circuit. It consists of choosing for each node of a mapped circuit a gate implementation in the library so that a cost function is optimized under some constraints. For instance, one wants to minimize the power consumption and/or the area of a circuit under some user-defined delay constraints, or to obtain the fastest circuit within a given power budget. Although this technology-dependent optimization has been investigated for years, the proposed approaches sometimes rely on assumptions, cost models, or algorithms that make them unrealistic or impossible to apply on real-life large circuits. We discussed here a gate sizing algorithm (GS), and show how it is used to achieve constrained optimization. It can be applied on large circuits within a reasonable CPU time, e.g., minimizing the power of a 10000 gates circuit under some delay constraint in 2 h.

...read moreread less

162 citations

Journal Article•DOI•

CMOS sensors for on-line thermal monitoring of VLSI circuits

[...]

Vladimir Szekely, C. Marta¹, Zsolt Kohári¹, Marta Rencz¹•Institutions (1)

Budapest University of Technology and Economics¹

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new family of temperature sensors will be presented, developed by the authors especially for the purpose of thermal monitoring of VLSI chips, characterized by the very low silicon area and the low power consumption.

...read moreread less

Abstract: The paper presents appropriate sensors for the realization of the design principle of design for thermal testability (DfTT). After a short overview of the available CMOS temperature sensors, a new family of temperature sensors will be presented, developed by the authors especially for the purpose of thermal monitoring of VLSI chips. These sensors are characterized by the very low silicon area of about 0.003-0.02 mm/sup 2/ and the low power consumption (200 /spl mu/W). The accuracy is in the order of 1/spl deg/C. Using the frequency-output versions an easy interfacing of digital test circuitry is assured. They can be very easily incorporated into the usual test circuitry, via the boundary-scan architecture. The paper presents measured results obtained by the experimental circuits. The facilities provided by the sensor connected to the boundary-scan test circuitry are also demonstrated experimentally.

...read moreread less

122 citations

Journal Article•DOI•

An architectural co-synthesis algorithm for distributed, embedded computing systems

[...]

Wayne Wolf¹•Institutions (1)

Princeton University¹

01 Jun 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new, heuristic algorithm which simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost is described.

...read moreread less

Abstract: Many embedded computers are distributed systems, composed of several heterogeneous processors and communication links of varying speeds and topologies. This paper describes a new, heuristic algorithm which simultaneously synthesizes the hardware and software architectures of a distributed system to meet a performance goal and minimize cost. The hardware architecture of the synthesized system consists of a network of processors of multiple types and arbitrary communication topology; the software architecture consists of an allocation of processes to processors and a schedule for the processes. Most previous work in co-synthesis targets an architectural template, whereas this algorithm can synthesize a distributed system of arbitrary topology. The algorithm works from a technology database which describes the available processors, communication links, I/O devices, and implementations of processes on processors. Previous work had proposed solving this problem by integer linear programming (ILP); our algorithm is much faster than ILP and produces high-quality results.

...read moreread less

113 citations

Journal Article•DOI•

Electro-thermal and logi-thermal simulation of VLSI designs

[...]

Vladimir Szekely, Andras Poppe, A. Pahi, A. Csendes, G. Hajas, Marta Rencz - Show less +2 more

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The latest advances in the SISSI package (simulator for integrated structures by simultaneous iteration) are discussed, including electro-thermal ac and transient simulation and the consideration of the thermal voltage of Si-Al contacts.

...read moreread less

Abstract: Due to severe thermal problems of today's VLSI integrated circuits the need for reliable and quick thermal, electro-thermal and logi-thermal simulation tools is increasing, In this paper, we discuss the latest advances in the SISSI package (simulator for integrated structures by simultaneous iteration) which is a tool developed originally for analog VLSI design. The improvements include electro-thermal ac and transient simulation and the consideration of the thermal voltage of Si-Al contacts. Furthermore, we introduce a new module of SISSI, LOGITHERM, which is aimed at the self-consistent logic and thermal simulation of large digital VLSI designs. The features of our simulator package are highlighted by simulation examples that are compared in most cases with measurement results.

...read moreread less

Journal Article•DOI•

Electro-thermal circuit simulation using simulator coupling

[...]

S. Wunsche¹, C. Clauss, P. Schwarz, F. Winkler•Institutions (1)

Fraunhofer Society¹

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, a methodology for simulating the static and dynamic performance of integrated circuits in the presence of electro-thermal interactions on the integrated circuit die is presented, which is based on the coupling of a finite element method (FEM) program with a circuit simulator.

...read moreread less

Abstract: The paper presents a methodology for simulating the static and dynamic performance of integrated circuits in the presence of electro-thermal interactions on the integrated circuit die. The technique is based on the coupling of a finite element method (FEM) program with a circuit simulator. In contrast to other known simulator couplings a time step algorithm is used, Its implementation in simulation tools is described. The thermal modeling of the die/package structure and the extended modeling of the electronic circuit is discussed. Simulation results which indicate the capabilities of the methodology for electro-thermal simulation are compared to experimental results.

...read moreread less

Journal Article•DOI•

Instruction buffering to reduce power in processors for signal processing

[...]

R.S. Bajwa¹, M. Hiraki, Hirotsugu Kojima¹, D. J. Gorny¹, K. Nitta¹, Avadhani Shridhar¹, K. Seki¹, Katsuro Sasaki¹ - Show less +4 more•Institutions (1)

Hitachi¹

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, the authors advocate the use of instruction buffering as a power-saving technique for processors for signal processing and multimedia applications, based on the runtime characteristics of signal processing applications.

...read moreread less

Abstract: Power consumption analyzes of embedded processors indicate that a significant amount of power is consumed in accessing memory and in the control path. Based on this, and on the runtime characteristics of signal processing applications, we advocate the use of instruction buffering as a power-saving technique for processors for signal processing and multimedia applications. Two approaches, a decoded instruction buffer (DIB) and a decoded instruction cache, are considered. Performance improvements in representative applications in speech processing such as, the vector sum excited linear prediction (VSELP), linear prediction coding coefficient computation (LPC), and two-dimensional 2-D 8/spl times/8 DCT which is used in image compression, are provided. The reduction in power obtained is between between 25 and 30%.

...read moreread less

Journal Article•DOI•

Protocol selection and interface generation for HW-SW codesign

[...]

J. M. Daveau, G. F. Marchioro, T. Ben-Ismail, Ahmed Amine Jerraya

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new algorithm that performs binding/allocation of communication units is presented that makes use of a cost function to evaluate different allocation alternatives and illustrates through an example the usefulness of the algorithm for allocating automatically different protocols within the same application system.

...read moreread less

Abstract: The aim of this paper is to present a communication synthesis approach stated as an allocation problem. In the proposed approach, communication synthesis allows to transform a system composed of processes that communicate via high-level primitives through abstract channels into a set of processes executed by interconnected processors that communicate via signals and share communication control. The proposed communication synthesis approach deals with both protocol selection and interface generation and is based on binding/allocation of communication units. This approach allows a wide design space exploration through automatic selection of communication protocols. We present a new algorithm that performs binding/allocation of communication units. This algorithm makes use of a cost function to evaluate different allocation alternatives. We illustrate through an example the usefulness of the algorithm for allocating automatically different protocols within the same application system.

...read moreread less

Journal Article•DOI•

Fully coupled dynamic electro-thermal simulation

[...]

G. Digele¹, S. Lindenkreuz, E. Kasper•Institutions (1)

University of Stuttgart¹

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Fully coupled dynamic electro-thermal simulation on chip and circuit level is presented in this paper, where temperature dependent thermal conductivity of silicon is taken into account, thus solving the nonlinear heat diffusion equation.

...read moreread less

Abstract: Fully coupled dynamic electro-thermal simulation on chip and circuit level is presented. Temperature dependent thermal conductivity of silicon is taken into account, thus solving the nonlinear heat diffusion equation. The numerical solution is carried out by using the industry-standard simulator SABER, therefore for electro-thermal simulations we are able to use the common electrical compact models by adding a heat source and thermal pins to them. The application of this technique and need for electro-thermal simulation is illustrated with the simulation of a current control circuit built into a multiwatt package.

...read moreread less

Journal Article•DOI•

CMOS current steering logic for low-voltage mixed-signal integrated circuits

[...]

Hiok-Tiaq Ng¹, D.J. Allstot•Institutions (1)

Oregon State University¹

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A quiet logic family-complementary metal-oxide-semiconductor (CMOS) current steering logic (CSL) has been developed for use in low-voltage mixed-signal integrated circuits as mentioned in this paper.

...read moreread less

Abstract: A quiet logic family-complementary metal-oxide-semiconductor (CMOS) current steering logic (CSL)-has been developed for use in low-voltage mixed-signal integrated circuits. Compared to a CMOS static logic gate with its output range of /spl Delta/V/sub logic//spl ap/V/sub dd/, a CSL gate swings only /spl Delta/V/sub logic//spl ap/V/sub T/+0.25 V because the constant current supplied by the PMOS load device is steered to ground through either an NMOS diode-connected device or switching network. Owing to the constant current, digital switching noise is 100/spl times/ smaller than in static logic. Another useful feature which can be used to calibrate CSL speed against process, temperature, and voltage variations is propagation delay that is approximately constant versus supply voltage and linear with bias current. Several CSL circuits have been fabricated using 0.8 and 1.2 /spl mu/m high-V/sub T/ n-well CMOS processes. Two self-loaded 39-stage ring oscillators fabricated using the 1.2 /spl mu/m process (1.2 V power supply) exhibited power-delay products of 12 and 70 fJ with average propagation delays of 0.4 and 0.7 ns, respectively. High-V/sub T/ and low-V/sub T/ CSL ALU's were operational at V/sub dd//spl ap/=0.70 V and V/sub dd//spl ap/0.40 V, respectively.

...read moreread less

Journal Article•DOI•

Synthesis of application-specific memory designs

[...]

Herman Schmit¹, D. E. Thomas¹•Institutions (1)

Carnegie Mellon University¹

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper introduces a novel approach to the design of memory systems, which is based on a variety of array grouping techniques and dimensional transformations, and the binding of array groups to memory components with different dimensions, access times, and number of ports.

...read moreread less

Abstract: This paper discusses the mapping of arrays in a behavior to memories in an implementation. We introduce a novel approach to the design of memory systems, which is based on a variety of array grouping techniques and dimensional transformations, and the binding of array groups to memory components with different dimensions, access times, and number of ports. The results of design actions are computed in terms of memory cost, the number of wires necessary to connect the memory to the data path, and the limit of performance imposed by the memory design on the implementation. Three different procedures can be used to find a suitable memory design. All three procedures are directed by a weighted and constrained system cost function, which enables the expression of the user's design priorities. Compared to related research efforts, our approach improves performance by as much as 19%, reduces memory cost as 40%, and decreases the number of wires required to connect the memory to the data path by up to 57%.

...read moreread less

Journal Article•DOI•

Hierarchical interconnection structures for field programmable gate arrays

[...]

Yen-Tai Lai¹, Ping-Tsung Wang¹•Institutions (1)

National Cheng Kung University¹

01 Jun 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Hierarchical interconnection structures for field programmable gate arrays are proposed andExperiments on benchmark circuits show that density and performance are significantly improved.

...read moreread less

Abstract: Field programmable gate arrays (FPGA's) suffer from lower density and lower performance than conventional gate arrays. Hierarchical interconnection structures for field programmable gate arrays are proposed. They help overcome these problems. Logic blocks in a field programmable gate array are grouped into clusters. Clusters are then recursively grouped together. To obtain the optimal hierarchical structure with high performance and high density, various hierarchical structures with the same routability are discussed. The field programmable gate arrays with new architecture can be efficiently configured with existing computer aided design algorithms. The k-way min-cut algorithm is applicable to the placement step in the implementation. Global routing paths in a field programmable gate array can be obtained easily. The placement and global routing steps can be performed simultaneously. Experiments on benchmark circuits show that density and performance are significantly improved.

...read moreread less

Journal Article•DOI•

Time-constrained code compaction for DSPs

[...]

Rainer Leupers¹, Peter Marwedel•Institutions (1)

Technical University of Dortmund¹

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper proposes a novel approach to exact local code compaction based on an integer programming (IP) model, which handles time constraints and applies to a large class of instruction formats.

...read moreread less

Abstract: This paper addresses instruction-level parallelism in code generation for digital signal processors (DSPs). In the presence of potential parallelism, the task of code generation includes code compaction, which parallelizes primitive processor operations under given dependency and resource constraints. Furthermore, DSP algorithms in most cases are required to guarantee real-time response. Since the exact execution speed of a DSP program is only known after compaction, real-time constraints should be taken into account during the compaction phase. While previous DSP code generators rely on rigid heuristics for compaction, we propose a novel approach to exact local code compaction based on an integer programming (IP) model, which handles time constraints. Due to a general problem formulation, the IP model also captures encoding restrictions and handles instructions having alternative encodings and side effects and therefore applies to a large class of instruction formats. Capabilities and limitations of our approach are discussed for different DSPs.

...read moreread less

Journal Article•DOI•

Control-flow versus data-flow-based scheduling: combining both approaches in an adaptive scheduling system

[...]

Reinaldo A. Bergamaschi¹, S. Raje¹, Indira Nair¹, L. Trevillyan¹•Institutions (1)

IBM¹

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Algorithms for combining dataflow and control-flow techniques into a robust scheduling system which is capable of trading optimality for execution time on-the-fly are presented.

...read moreread less

Abstract: As high-level synthesis techniques gain acceptance among designers, it is important to be able to provide a robust system which can handle large designs in short execution times, producing high-quality results. Scheduling is one of the most complex tasks in high-level synthesis, and although many algorithms exist for solving the scheduling problem, it remains a main source of inefficiency by either not producing high-quality results, not taking into account realistic design requirements, or requiring unacceptable execution times. One of the main problems in scheduling is the dichotomy between control and data. Many algorithms to date have been able to provide scheduling solutions by looking only at either the data part or the control part of the design. This has been done in order to simplify the problem; however, it has resulted in many algorithms unable to handle efficiently large designs with complex control and data functionality. This paper presents algorithms for combining dataflow and control-flow techniques into a robust scheduling system. The main characteristics of this system are as follows: 1) it uses path-based techniques for efficient handling of control and mutual exclusiveness (for resource sharing), 2) it allows operation reordering and parallelism extraction within the context of path-based scheduling, 3) it contains a control partitioning algorithm for design space exploration as well as for reducing the number of control paths, and 4) it combines the above algorithms into an adaptive scheduling system which is capable of trading optimality for execution time on-the-fly. Results involving billions of paths are presented and analyzed.

...read moreread less

Journal Article•DOI•

Gate-level power and current simulation of CMOS integrated circuits

[...]

A. Boliolo¹, Luca Benini², G. De Micheli², Bruno Ricco¹•Institutions (2)

University of Bologna¹, Stanford University²

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A symbolic model of complementary metal-oxide-semiconductor (CMOS) gates is proposed to capture the dependence of power consumption and current flows on input patterns and fan-in/fan-out conditions.

...read moreread less

Abstract: In this paper, we present a new gate-level approach to power and current simulation. We propose a symbolic model of complementary metal-oxide-semiconductor (CMOS) gates to capture the dependence of power consumption and current flows on input patterns and fan-in/fan-out conditions. Library elements are characterized and their models are used during event-driven logic simulation to provide power information and construct time-domain current waveforms. We provide both global and local pattern-dependent estimates of power consumption and current peaks (with accuracy of 6 and 10% from SPICE, respectively), while keeping performance comparable with traditional gate-level simulation with unit delay. We use VERILOG-XL as simulation engine to grant compatibility with design tools based on Verilog HDL. A Web-based user interface allows our simulator (PPP) to be accessed through the Internet using a standard web browser.

...read moreread less

Journal Article•DOI•

Graded-channel MOSFET (GCMOSFET) for high performance, low voltage DSP applications

[...]

Jun Ma¹, Han-Bin Liang¹, R. A. Pryor¹, Sunny Cheng¹, M. H. Kaneshiro¹, C. S. Kyono¹, K. Papworth¹ - Show less +3 more•Institutions (1)

Motorola¹

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It is shown that, compared to conventional complementary metal-oxide-semiconductor (CMOS), the GCMOS device offers the advantage of significantly higher drive current, capable of lower threshold voltage with improved punchthrough resistance, lower body effect and lower series resistance, thus making it most suitable for applications that require both high performance and low power consumption.

...read moreread less

Abstract: Graded-Channel MOS (GCMOS) VLSI technology has been developed to meet the growing demand for low power and high performance applications. In this paper, it will be shown that, compared to conventional complementary metal-oxide-semiconductor (CMOS), the GCMOS device offers the advantage of significantly higher drive current, capable of lower threshold voltage with improved punchthrough resistance, lower body effect and lower series resistance, thus making it most suitable for applications that require both high performance and low power consumption, such as digital signal processing (DSP). This is demonstrated, for the first time, by much improved low voltage circuit performance of a DSP logic circuit fabricated using a 0.5 /spl mu/m GCMOS process. At 1.8 V, a 30% speed improvement over CMOS is achieved, and the power-delay product is reduced by 25%. In addition, similar speed improvement is achieved in SRAM's with consistent performance improvement over a wide range of temperatures between -50 and 150/spl deg/C.

...read moreread less

Journal Article•DOI•

VLSI array algorithms and architectures for RSA modular multiplication

[...]

Yong-Jin Jeong¹, Wayne Burleson²•Institutions (2)

Samsung¹, University of Massachusetts Amherst²

01 Jun 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: Two novel iterative algorithms and their array structures for integer modular multiplication for Rivest-Shamir-Adelman (RSA) cryptography and are based on the familiar iterative Horner's rule, but use precalculated complements of the modulus.

...read moreread less

Abstract: We present two novel iterative algorithms and their array structures for integer modular multiplication. The algorithms are designed for Rivest-Shamir-Adelman (RSA) cryptography and are based on the familiar iterative Horner's rule, but use precalculated complements of the modulus. The problem of deciding which multiples of the modulus to subtract in intermediate iteration stages has been simplified using simple look-up of precalculated complement numbers, thus allowing a finer-grain pipeline. Both algorithms use a carry save adder scheme with module reduction performed on each intermediate partial product which results in an output in carry-save format. Regularity and local connections make both algorithms suitable for high-performance array implementation in FPGA's or deep submicron VLSI. The processing nodes consist of just one or two full adders and a simple multiplexor. The stored complement numbers need to be precalculated only when the modulus is changed, thus not affecting the performance of the main computation. In both cases, there exists a bit-level systolic schedule, which means the array can be fully pipelined for high performance and can also easily be mapped to linear arrays for various space/time tradeoffs.

...read moreread less

Journal Article•DOI•

Realistic and efficient simulation of electro-thermal effects in VLSI circuits

[...]

Mohamed Nabil Sabry, A. Bontemps, V. Aubert, R. Vahrmann

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A system capable of modeling VLSI effects in a realistic and sufficiently accurate way that uses a reasonable amount of CPU resources is presented and an innovative solver is proposed.

...read moreread less

Abstract: Needs for electro-thermal simulation of VLSI circuits, as opposed to both the system and device levels, are analyzed. A system capable of modeling these effects in a realistic and sufficiently accurate way that uses a reasonable amount of CPU resources is presented. An innovative solver is also proposed. The system is used to study the importance of some three dimensional (3-D) effects as well as metallic connections. A complete example was treated to have an insight on the type of results to be expected and the corresponding costs in terms of CPU.

...read moreread less

Journal Article•DOI•

Effects of simultaneous switching noise on the tapered buffer design

[...]

Srinivasa Vemuru¹•Institutions (1)

City University of New York¹

01 Sep 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, a comprehensive analysis and estimate of simultaneous switching noise (SSN) including the velocity saturation effects seen in the submicron transistors during the switching of output drivers is presented.

...read moreread less

Abstract: Complementary metal-oxide-semiconductor (CMOS) output buffers, comprised of a series of tapered inverters, are used to drive large off-chip capacitances. The ratio of the size of transistors between two consecutive stages is the buffer taper factor. With higher frequency of operation and simultaneous switching of the output drivers, the parasitic inductance present at the pin-pad-package interface results in significant switching noise on the power lines. A comprehensive analysis and estimate of simultaneous switching noise (SSN) including the velocity saturation effects seen in the submicron transistors during the switching of output drivers is presented. The effect of SSN on the overall buffer propagation delay and transition time is discussed. The presence of SSN results in an increase in the optimum taper factor between inverter stages for a given capacitive load. Beyond a critical value, the output transition time of a tapered buffer increases with reducing taper factor due to SSN. SSN can be reduced by skewing the switching of output buffers, SPICE simulation results show that skewing buffer switching with additional inverter stages reduces SSN and increases buffer propagation delay.

...read moreread less

Journal Article•DOI•

Pipelined H-trees for high-speed clocking of large integrated systems in presence of process variations

[...]

M. Nekili¹, Guy Bois², Yvon Savaria²•Institutions (2)

École Normale Supérieure¹, École Polytechnique²

01 Jun 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel approach is proposed that distributes the clock with an H-tree, whose branches are composed of minimum-sized inverters rather than metal, to avoid the frequency limitation and obtain the highest clocking rate achievable with a given technology.

...read moreread less

Abstract: This paper addresses the problem of clocking large high-speed digital systems, as well as deterministic skew modeling, a related problem. In order to provide a reliable skew model, and to avoid the frequency limitation, we propose a novel approach that distributes the clock with an H-tree, whose branches are composed of minimum-sized inverters rather than metal. With such a structure, we obtain the highest clocking rate achievable with a given technology. Indeed, clock rates around 1 GHz are possible with a 1.2 /spl mu/m CMOS technology. From the skew modeling standpoint, we derive an analytic expression of the skew between two leaves of the H-tree, which we consider to be the difference in root-to-leaf delay pairs. The skew upper bound obtained has an order of complexity which, with respect to the H-tree size D, is the same as the one that may be derived from the Fisher and Kung model for both side-to-side and neighbor-to-neighbor communications, i.e., a /spl Omega/(D/sup 2/), whereas, the Steiglitz and Kugelmass probabilistic model predicts /spl Theta/(D/spl times//spl radic/LogD). In an H-tree implemented with metallic lines, the leaf-to-leaf skew is obviously bounded by the delay between the root and the leaves. However, with the logic based H-tree proposed here, we arrive at a nonobvious result, which states that the leaf-to-leaf skew grows faster than the root-to-leaf delay in presence of a uniform transistor time constant gradient. This paper also proposes generalizations of the skew model to (1) the case of chips in a wafer subject to a smooth, but nonuniform gradient and (2) the case of H-tree configurations mixing logic and interconnections; in this respect, this paper covers the H-tree configurations based on the combination of logic and interconnections.

...read moreread less

Journal Article•DOI•

A solution methodology for exact design space exploration in a three-dimensional design space

[...]

S. Chaudhuri¹, S.A. Blthye¹, Robert A. Walker¹•Institutions (1)

Rensselaer Polytechnic Institute¹

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, an exact solution methodology for solving the scheduling problem in a three-dimensional (3-D) design space is described, where the usual two-dimensional design space (which trades off area and schedule length) plus a third dimension representing clock length are used.

...read moreread less

Abstract: This paper describes an exact solution methodology, implemented in Rensselaer's Voyager design space exploration system, for solving the scheduling problem in a three-dimensional (3-D) design space: the usual two-dimensional (2-D) design space (which trades off area and schedule length), plus a third dimension representing clock length. Unlike design space exploration methodologies which rely on bounds or estimates, this methodology is guaranteed to find the globally optimal solution to a 3-D scheduling problem. Furthermore, this methodology efficiently prunes the search space, eliminating provably inferior design points through the following: 1) a careful selection of candidate clock lengths and 2) tight bounds on the number of functional units or on the schedule length. Both chaining and multicycle operations are supported.

...read moreread less

Journal Article•DOI•

Diagnosis and correction of multiple logic design errors in digital circuits

[...]

Pi-Yu Chung¹, Ibrahim N. Hajj²•Institutions (2)

Bell Labs¹, University of Illinois at Urbana–Champaign²

01 Jun 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents a technique to correct multiple logic design errors in a gate-level netlist by repeatedly applying a single error search and correction algorithm for circuits with a low multiplicity of errors.

...read moreread less

Abstract: This paper presents a technique to correct multiple logic design errors in a gate-level netlist. A number of methods have been proposed for correcting single logic design errors. However, the extension of these methods to more than one error is still very limited. We direct our attention to circuits with a low multiplicity of errors. By assuming different error dependency scenarios, multiple errors are corrected by repeatedly applying a single error search and correction algorithm. Experimental results on correcting double-design errors and triple-design errors on ISCAS and MCNC benchmark circuits are included.

...read moreread less

Journal Article•DOI•

REMOD: a new methodology for designing fault-tolerant arithmetic circuits

[...]

Shantanu Dutt¹, F. Hanchek¹•Institutions (1)

University of Minnesota¹

01 Mar 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this article, the authors propose a fault detection scheme based on the principle of node covering, in which the computation of each cell is checked by a "covering" cell.

...read moreread less

Abstract: REMOD (REprocessing with MicrO Delays) is a new method for fault-tolerant design of logic circuits composed of arrays of identical functional cells. The fault detection scheme is based on the principle of node covering, in which the computation of each cell is checked by a "covering" cell. After a faulty cell is detected, the node covering principle also allows the circuit to easily be reconfigured to perform correctly for subsequent inputs. Furthermore, the design method is extendable to multiple fault tolerance with only small increments of hardware and time. We have laid out and simulated REMOD-based circuits for adders and multipliers and show that the time overheads are a small factor of the original computation time-0 or /spl Theta/(1/n) to /spl Theta/(1/(log n)), for an n-cell circuit. For moderately complex cells, it is seen that area overhead is very reasonable as well.

...read moreread less

Journal Article•DOI•

Design considerations for high-frequency crystal oscillators digitally trimmable to sub-ppm accuracy

[...]

Qiuting Huang¹, P. Basedau¹•Institutions (1)

ETH Zurich¹

01 Dec 1997-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, a 78 MHz crystal oscillator is described, which forms part of a regulated system in a pager where the oscillation frequency is controlled digitally to sub-ppm accuracy.

...read moreread less

Abstract: The current consumption of crystal oscillators is usually determined by the steady-state amplitude requirement, rather than the minimum transconductance for oscillation to exist, In a bipolar implementation transconductance is proportional to current, so that current consumption scales with frequency and load capacitance in the same way as transconductance. In a complementary metal-oxide-semiconductor (CMOS) implementation, current scales as the square of transconductance. It is therefore important to distinguish current from transconductance in power estimation for high frequency oscillators. Analytical expressions relating current to steady-state amplitude are used in this paper to estimate the minimum power required for a crystal oscillator at a given frequency. A 78 MHz crystal oscillator is described, which forms part of a regulated system in a pager where the oscillation frequency is controlled digitally to sub-ppm accuracy. The oscillator can be pulled from /spl plusmn/65 ppm to the required frequency with 0.2 ppm accuracy, with a maximum current consumption of 197 /spl mu/A. The circuit has been fabricated in a 1-/spl mu/m CMOS technology. The measured phase noise is -113 dBc/Hz at 300 Hz offset.

...read moreread less