Showing papers on "Very-large-scale integration published in 1999"

PDF

Open Access

Proceedings Article•DOI•

Technology and design challenges for low power and high performance

[...]

Vivek De¹, Shekhar Borkar¹•Institutions (1)

17 Aug 1999

TL;DR: Key barriers to continued scaling of supply voltage and technology for microprocessors to achieve low-power and high-performance are discussed, with particular focus on short-channel effects, device parameter variations, excessive subthreshold and gate oxide leakage.

...read moreread less

Abstract: We discuss key barriers to continued scaling of supply voltage and technology for microprocessors to achieve low-power and high-performance. In particular, we focus on short-channel effects, device parameter variations, excessive subthreshold and gate oxide leakage, as the main obstacles dictated by fundamental device physics. Functionality of special circuits in the presence of high leakage, SRAM cell stability, bit line delay scaling, and power consumption in clocks & interconnects, will be the primary design challenges in the future. Soft error rate control and power delivery pose additional challenges. All of these problems are further compounded by the rapidly escalating complexity of microprocessor designs. The excessive leakage problem is particularly severe for battery-operated, high-performance microprocessors.

...read moreread less

342 citations

Book•

DSP integrated circuits

[...]

Lars Wanhammar¹•Institutions (1)

Linköping University¹

01 Jan 1999

TL;DR: DSP Integrated Circuits.

...read moreread less

Abstract: DSP Integrated Circuits. VLSI Circuit Technologies. Digital Signal Processing. Digital Filters. Finite Word Length Effects. DSP Algorithms. DSP System Design. Architectures for DSP. Synthesis of DSP Architectures. Digital Systems. Processing Elements. Integrated Circuit Design. Subject Index.

...read moreread less

301 citations

Proceedings Article•DOI•

Switched-capacitor DC-DC converters for low-power on-chip applications

[...]

Dragan Maksimovic¹, Sandeep Dhar¹•Institutions (1)

University of Colorado Boulder¹

01 Jul 1999

TL;DR: In this paper, the authors describe switched-capacitor DC-DC power converters (charge pumps) suitable for on-chip, low-power applications, based on connecting two identical but opposite-phase SC converters in parallel, thus eliminating the need for separate bootstrap gate drivers.

...read moreread less

Abstract: The paper describes switched-capacitor DC-DC power converters (charge pumps) suitable for on-chip, low-power applications. The proposed configurations are based on connecting two identical but opposite-phase SC converters in parallel, thus eliminating the need for separate bootstrap gate drivers. The authors focus on emerging very low-power VLSI applications such as battery-powered or self-powered signal processors where high power conversion efficiency is important and where power levels are in the milliwatt range. Conduction and switching losses are considered to allow design optimization in terms of switching frequency and component sizes. Open-loop and closed-loop operation of an experimental, fully integrated, 10 MHz voltage doubler is described. The doubler has 2 V or 3 V input and generates 3.3 V or 5 V output at up to 5 mW load. The converter circuit fabricated in a standard 1.2 /spl mu/ CMOS technology takes 0.7 mm/sup 2/ of the chip area.

...read moreread less

183 citations

Journal Article•DOI•

Reconfigurable pipelined 2-D convolvers for fast digital signal processing

[...]

B. Bosi¹, Guy Bois², Yvon Savaria²•Institutions (2)

Nortel¹, École Polytechnique de Montréal²

01 Sep 1999-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: The exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors.

...read moreread less

Abstract: In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work Complex signal transformations then become function calls, eg, C-callable functions Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated However, the proposed concept is not limited to a particular processor

...read moreread less

168 citations

Journal Article•DOI•

Crosstalk in VLSI interconnections

[...]

Ashok Vittal, L.H. Chen, Malgorzata Marek-Sadowska, Kai-Ping Wang, S. Yang - Show less +1 more

01 Dec 1999-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper provides easily computable expressions for crosstalk amplitude and pulse width in resistive, capacitively coupled lines and these expressions hold for nets with arbitrary number of pins and of arbitrary topology under any specified input excitation.

...read moreread less

Abstract: We address the problem of crosstalk computation and reduction using circuit and layout techniques in this paper. We provide easily computable expressions for crosstalk amplitude and pulse width in resistive, capacitively coupled lines. The expressions hold for nets with arbitrary number of pins and of arbitrary topology under any specified input excitation. Experimental results show that the average error is about 10% and the maximum error is less than 20%. The expressions are used to motivate circuit techniques, such as transistor sizing, and layout techniques, such as wire ordering and wire width optimization to reduce crosstalk.

...read moreread less

165 citations

Book•

Genetic Algorithms for Vlsi Design, Layout & Test Automation

[...]

Pinaki Mazumder¹, Elizabeth M. Rudnick²•Institutions (2)

University of Michigan¹, University of Illinois at Urbana–Champaign²

01 Jan 1999

TL;DR: This book provides details about some of the EDA applications where GAs have been used, including partitioning, automatic placement and routing, technology mapping for FPGAs, automatic test generation, and power estimation.

...read moreread less

Abstract: This book provides details about some of the EDA applications where GAs have been used. These applications include partitioning, automatic placement and routing, technology mapping for FPGAs, automatic test generation, and power estimation. One chapter is devoted to each of these topics. The objective is to provide examples where GAs have been successfull applied in the past so that the reader will be able to apply similar techniques in solving his/her own problems.

...read moreread less

158 citations

Journal Article•DOI•

High-level area and power estimation for VLSI circuits

[...]

Mahadevamurty Nemani¹, Farid N. Najm²•Institutions (2)

Intel¹, University of Illinois at Urbana–Champaign²

01 Jun 1999-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The proposed area model is based on transforming the given, multi-output Boolean function description into an equivalent single-output function, and is empirical, and results demonstrating its feasibility and utility are presented.

...read moreread less

Abstract: High-level power estimation, when given only a high-level design specification such as a functional or register-transfer level (RTL) description, requires high-level estimation of the circuit average activity and total capacitance. Considering that total capacitance is related to circuit area, this paper addresses the problem of computing the "area complexity" of multi-output combinational logic given only their functional description, i.e., Boolean equations, where area complexity refers to the number of gates required for an optimal multilevel implementation of the combinational logic. The proposed area model is based on transforming the multi-output Boolean function description into an equivalent single output function. The area model is empirical and results demonstrating its feasibility and utility are presented. Also, a methodology for converting the gate count estimates, obtained from the area model, into capacitance estimates is presented. High-level power estimates based on the total capacitance estimates and average activity estimates are also presented.

...read moreread less

119 citations

Proceedings Article•DOI•

Reliability-constrained area optimization of VLSI power/ground networks via sequence of linear programmings

[...]

Xiang-Dong Tan¹, C.-J.R. Shi¹, D. Lungeanu¹, Jyh-Chwen Lee, Li-Pen Yuan - Show less +1 more•Institutions (1)

University of Washington¹

01 Jun 1999

TL;DR: Experimental results demonstrate that the sequence-of-linear-programming method is orders of magnitude faster than the best-known method based on conjugate gradients, with constantly better optimization solutions.

...read moreread less

Abstract: This paper presents a new method for determining the widths of the power and ground routes in integrated circuits so that the area required by the routes is minimized subject to the reliability constraints The basic idea is to transform the resulting constrained nonlinear programming problem into a sequence of linear programs Theoretically, we show that the sequence of linear programs always converges to the optimum solution of the relaxed convex problem Experimental results demonstrate that the sequence-of-linear-programming method is orders of magnitude faster than the best-known method based on conjugate gradients, with constantly better optimization solutions

...read moreread less

116 citations

Proceedings Article•DOI•

VLSI implementation issues of TURBO decoder design for wireless applications

[...]

Zhongfeng Wang¹, H. Suzuki¹, Keshab K. Parhi•Institutions (1)

University of Minnesota¹

01 Dec 1999

TL;DR: Novel power-down techniques are proposed, which can achieve very high power- down efficiency without performance or latency degradation at the expense of negligible hardware overhead.

...read moreread less

Abstract: Finite precision effects on the performance of TURBO decoders have been analyzed and the optimal word lengths of variables have been determined considering tradeoffs between the performance and the hardware cost. It is shown that the performance degradation from the infinite precision is negligible if 4 bits are used for received bits and 6 bits for the extrinsic information. The state metrics normalization method suitable for TURBO decoders is also discussed. This method requires small amount of hardware and its speed does not depend on the number of states. Furthermore, we propose novel power-down techniques, which can achieve very high power-down efficiency without performance or latency degradation at the expense of negligible hardware overhead.

...read moreread less

114 citations

Journal Article•DOI•

Design error diagnosis and correction via test vector simulation

[...]

Andreas Veneris¹, Ibrahim N. Hajj•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Dec 1999-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A test vector simulation-based approach for multiple design error diagnosis and correction in digital VLSI circuits that is applicable to circuits with no global binary decision diagram representation.

...read moreread less

Abstract: With the increase in the complexity of digital VLSI circuit design, logic design errors can occur during synthesis. In this paper, we present a test vector simulation-based approach for multiple design error diagnosis and correction. Diagnosis is performed through an implicit enumeration of the erroneous lines in an effort to avoid the exponential explosion of the error space as the number of errors increases. Resynthesis during correction is as little as possible so that most of the engineering effort invested in the design is preserved. Since both steps are based on test vector simulation, the proposed approach is applicable to circuits with no global binary decision diagram representation. Experiments on ISCAS'85 benchmark circuits exhibit the robustness and error resolution of the proposed methodology. Experiments also indicate that test vector simulation is indeed an attractive technique for multiple design error diagnosis and correction in digital VLSI circuits.

...read moreread less

112 citations

Proceedings Article•

Multiple-Valued Logic in VLSI: Challenges and Opportunities

[...]

Elena Dubrova¹•Institutions (1)

Royal Institute of Technology¹

01 Jan 1999

TL;DR: In this paper, the authors give an overview of recent developments in multiple-valued logic circuit design, revealing both the opportunities they offer and the challenges they face, and present several potential opportunities for the improvement of present VLSI circuit designs.

...read moreread less

Abstract: In recent years, there have been major advances in integrated circuit technology which have both made feasible and generated great interest in electronic circuits which employ more than two discrete levels of signal. Such circuits, called multiple-valued logic circuits, offer several potential opportunities for the improvement of present VLSI circuit designs. In this paper, we give an overview of recent developments in multiple-valued logic circuit design, revealing both the opportunities they offer and the challenges they

...read moreread less

Journal Article•DOI•

AER image filtering architecture for vision-processing systems

[...]

Teresa Serrano-Gotarredona, Andreas G. Andreou¹, Bernabe Linares-Barranco²•Institutions (2)

Johns Hopkins University¹, Spanish National Research Council²

01 Sep 1999-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The present paper proposes the architecture, provides a circuit implementation using MOS transistors operated in weak inversion, and shows behavioral simulation results at the system level operation and some electrical simulations.

...read moreread less

Abstract: A VLSI architecture is proposed for the realization of real-time two-dimensional (2-D) image filtering in an address-event-representation (AER) vision system. The architecture is capable of implementing any convolutional kernel F(x,y) as long as it is decomposable into x-axis and y-axis components, i.e., F(x,y)=H(x)V(y), for some rotated coordinate system {x,y} and if this product can be approximated safely by a signed minimum operation. The proposed architecture is intended to be used in a complete vision system, known as the boundary contour system and feature contour system (BCS-FCS) vision model, proposed by Grossberg and collaborators. The present paper proposes the architecture, provides a circuit implementation using MOS transistors operated in weak inversion, and shows behavioral simulation results at the system level operation and some electrical simulations.

...read moreread less

Patent•

Method and apparatus for electrical and mechanical attachment, and electromagnetic interference and thermal management of high speed, high density VLSI modules

[...]

Cromwell S Daniel¹•Institutions (1)

Hewlett-Packard¹

20 Jan 1999

TL;DR: In this paper, a high speed, high density VLSI module within a limited space and in a single assembly that attaches, aligns, and manages electromagnetic interference and heat dissipation of the module is presented.

...read moreread less

Abstract: A method and apparatus for assembling a high speed, high density VLSI module in a computer system that enables attachment, support, electromagnetic interference containment, and thermal management of the VLSI module. The present invention packages a high speed, high density VLSI module within a limited space and in a single assembly that attaches, aligns, and manages electromagnetic interference and heat dissipation of the VLSI module. The present invention aligns a land grid array of a circuit board and an interposer socket assembly, and the interposer socket assembly and a land grid array of the VLSI module; in the single VLSI module assembly. An even, controlled load is placed on the interposer socket interface thereby reducing the risk of damage to the interposer socket from overloaded connections between the land grid array of the VLSI module, the interposer socket assembly, and the land grid array of the circuit board. The present invention is easy-to-use in upgrading and handling of the VLSI module.

...read moreread less

Book•

Low-voltage/low-power integrated circuits and systems : low-voltage mixed-signal circuits

[...]

Edgar Sanchez-Sinencio, Andreas G. Andreou

01 Jan 1999

TL;DR: This paper presents a meta-theoretic framework for Comparing the Bit Energy of Signal Representations at the Circuit Level and two new Directions in Low--Power Digital CMOS VLSI Design.

...read moreread less

Abstract: Foreword. Preface. Acknowledgments. Contributors. Introduction (E. Sanchez--Sinencio). A Current--Based MOSFET Model for Integrated Circuit Design (C. Montoro, et al.). A Review of the Performance of Available Integrated Circuit Components Under the Constraints of Low--Power Operation (D. Bowers). Exploiting Device Physics in Circuit Design for Efficient Computational Functions in Analog VLSI (A. Andreou). Low--Voltage Circuit Techniques Using Floating--Gate Transistors (Chong--Gun Yu and Randall Geiger). Low--Power CMOS Digital Circuits (S. Embabi). Low--Voltage Analog BiCMOS Circuit Building Blocks (J. Ramirez--Angulo). Low--Voltage CMOS Operational Amplifiers (R. Wassenaar, et al.). Low--Voltage/Low--Power Amplifiers with Optimized Dynamic Range and Bandwidth (J. Huijsing, et al.). Low--Voltage Analog CMOS Filter Design (M. Steyaert, et al.). Continuous--Time Low--Voltage Current--Mode Filters (E. Sanchez--Sinencio and S. Smith). High--Efficiency Low--Voltage DC--DC Conversion for Portable Applications (A. Stratakos, et al.). Two New Directions in Low--Power Digital CMOS VLSI Design (V. Kantabutra). Low--Power CMOS Data Conversion (M. Pelgrom). Low--Power Multiplierless YUV--to--RGB Converter Based on Human Vision Perception (T. Meng, et al.). Micropower Systems for Implantable Defibrillators and Pacemakers (M. Jabri and R. Coggins). An Information Theoretic Framework for Comparing the Bit Energy of Signal Representations at the Circuit Level (A. Andreou and P. Furth). A Synchronous Gated--Clock Strategy for Low--Power Design of Telecom ASICs (P. Vanoostende and G. Van Wauwe). Index. About the Editors.

...read moreread less

Book•DOI•

Digital Signal Processing for Multimedia Systems

[...]

Keshab K. Parhi, Takao Nishitani¹•Institutions (1)

University of Minnesota¹

01 Apr 1999

TL;DR: Part 1 System applications: multimedia systems overview video compression audio compression system synchronization approaches digital versatile disk VLSI signal processing for very high speed digital subscriber loops (VDSL) cable modems wireless communication systems.

...read moreread less

Abstract: Part 1 System applications: multimedia systems overview video compression audio compression system synchronization approaches digital versatile disk VLSI signal processing for very high speed digital subscriber loops (VDSL) cable modems wireless communication systems. Part 2 Programmable and custom architectures and algorithms: programmable DSPs RISC, video and media DSPs wireless DSPs motion estimation system design wavelet VLSI architectures DCT architectures lossless coders Viterbi decoders - algorithms and high performance architectures watermarking for multimedia systolic RLS adaptive filtering STAR-RLS filtering. Part 3 Advanced arithmetic architectures and design methodologies: division and square root finite field arithmetic cordic algorithms and architectures for fast and efficient vector-rotation implementation advanced systolic design low power design power estimation approaches system exploration for custom low power data storage and transfer hardware description and synthesis of DSP systems.

...read moreread less

Proceedings Article•DOI•

A fast, low-power logarithm approximation with CMOS VLSI implementation

[...]

S.L. SanGregory¹, D. Gallagher, R. Siferd•Institutions (1)

Air Force Institute of Technology¹

08 Aug 1999

TL;DR: A new technique and CMOS VLSI implementation for computing approximate logarithms (base 2,and 10) for binary integers is presented and the approximation is performed using only combinational logic and requires no multiplications.

...read moreread less

Abstract: A new technique and CMOS VLSI implementation for computing approximate logarithms (base 2,and 10) for binary integers is presented. The approximation is performed using only combinational logic and requires no multiplications. Additionally, as implemented, a ROM of only N/spl times/log/sub 2/(N) bits is used to convert N bit integers. The maximum error of the approximation is 1.5% when the input value is 3, and decays exponentially to less than 0.5% for input values greater than 25.

...read moreread less

Proceedings Article•DOI•

A new image encryption algorithm and its VLSI architecture

[...]

Jui-Cheng Yen, Jiun-In Guo

01 Dec 1999

TL;DR: In this paper, a new image encryption algorithm and its VLSI architecture are proposed based on a defined bit recirculation function and a binary sequence generated from a chaotic system, the gray level of each pixel in the image is transformed.

...read moreread less

Abstract: In this paper, a new image encryption algorithm and its VLSI architecture are proposed. Based on a defined bit recirculation function and a binary sequence generated from a chaotic system, the gray level of each pixel in the image is transformed. The features of the algorithm are as follows: 1) low computational complexity, 2) high security, and 3) no distortion. In order to implement the system, its VLSI architecture with low hardware complexity, high computing speed, and high feasibility for VLSI implementation is also designed. Finally, two encrypted images are simulated and the fractal dimensions of the original and encrypted images are computed to demonstrate the effectiveness of the proposed algorithm.

...read moreread less

Proceedings Article•DOI•

A novel VLSI layout fabric for deep sub-micron applications

[...]

Sunil P. Khatri, Amit Mehrotra, Robert K. Brayton, Alberto Sangiovanni-Vincentelli, Ralph H. J. M. Otten - Show less +1 more

01 Jun 1999

TL;DR: A new VLSI layout methodology which addresses the main problems faced in deep sub-micron (DSM) integrated circuit design, and shows how the uniform parasitics of the fabric give rise to a reliable and predictable design.

...read moreread less

Abstract: Proposes a new VLSI layout methodology which addresses the main problems faced in deep sub-micron (DSM) integrated circuit design. Our layout "fabric" scheme eliminates the conventional notion of power and ground routing on the integrated circuit die. Instead, power and ground are essentially "pre-routed" all over the die. By a clever arrangement of power/ground and signal pins, we almost completely eliminate the capacitive effects between signal wires. Additionally. We get a power and ground distribution network with a very low resistance at any point on the die. Another advantage of our scheme is that the arrangement of conductors ensures that on-chip inductances are uniformly negligible. Finally, characterization of the circuit delays, capacitances and resistances becomes extremely simple in our scheme, and needs to be done only once for a design. We show how the uniform parasitics of our fabric give rise to a reliable and predictable design. We have implemented our scheme using public domain layout software. Preliminary results show that it holds much promise as the layout methodology of choice in DSM integrated circuit design.

...read moreread less

Proceedings Article•DOI•

Ultra-thin body SOI MOSFET for deep-sub-tenth micron era

[...]

Yang-Kyu Choi¹, K. Asano¹, N. Lindert¹, Vivek Subramanian¹, Tsu-Jae King¹, Jeffrey Bokor¹, Chenming Hu² - Show less +3 more•Institutions (2)

University of California, Berkeley¹, Hodges University²

01 Dec 1999

TL;DR: In this paper, a 40nm-gate-length ultra-thin body (UTB) nMOSFET is proposed to eliminate the punchthrough path between source and drain.

...read moreread less

Abstract: A 40nm-gate-length ultra-thin body (UTB) nMOSFET is demonstrated. A self-aligned thin body SOI device has previously been proposed for suppressing the short channel effect. UTB structure can eliminate the punchthrough path between source and drain and provide a more evolutionary alternative to the double-gate MOSFET for deep-sub-tenth micron technology. The advantage of using UTB is illustrated through device simulation (with the aid of Silvaco ATLAS) using simple doping profiles for the body and S/D (simple Gaussian).

...read moreread less

Journal Article•DOI•

Pausible clocking-based heterogeneous systems

[...]

K.Y. Yun¹, A.E. Dooply¹•Institutions (1)

University of California, San Diego¹

01 Dec 1999-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper describes a novel communication scheme, which is guaranteed to be free of synchronization failures, amongst multiple synchronous and asynchronous modules operating independently, through an asynchronous first-in first-out (FIFO) channel.

...read moreread less

Abstract: This paper describes a novel communication scheme, which is guaranteed to be free of synchronization failures, amongst multiple synchronous and asynchronous modules operating independently. In this scheme, communication between every pair of modules is done through an asynchronous first-in first-out (FIFO) channel; communication between a module and the FIFO is done using a request/acknowledge handshaking. Synchronization of handshake signals to the local module clock is done in an unconventional way-the local clock built out of a ring oscillator is paused or stretched, if necessary, to ensure that the handshake signal satisfies setup and hold time constraints with respect to the local clock. In order to validate this scheme, we implemented a test chip in 0.5-/spl mu/m CMOS. This chip is designed as a ring, composed of two synchronous modules, an asynchronous module, and two asynchronous FIFOs. Each module functions as a receiver to one module and a sender to another module. Test results show that the chip functions reliably up to 456 MHz.

...read moreread less

Journal Article•DOI•

Neuromorphic analog VLSI sensor for visual tracking: circuits and application examples

[...]

Giacomo Indiveri

01 Nov 1999-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: In this paper, a one-dimensional visual sensor, implemented on a single VLSI chip using analog neuromorphic circuits, is proposed for selectively detecting and tracking the position of the feature with the highest spatial contrast present in the visual scene.

...read moreread less

Abstract: This paper presents a one-dimensional visual sensor, implemented on a single VLSI chip using analog neuromorphic circuits, for selectively detecting and tracking the position of the feature with the highest spatial contrast present in the visual scene. The chip's photoreceptors adapt to stationary backgrounds and can be tuned to respond maximally to specific target velocities. The sensor drastically reduces the amount of data to be transmitted to further processing stages by encoding, in real time, the position of the target in the form of a single continuous-time analog variable. We describe the circuits implementing the sensor and show applications to three examples of tracking tasks: a stand-alone visual tracking system, an active fully analog tracking system, and a mobile platform line-following system.

...read moreread less

Proceedings Article•DOI•

VLSI architecture: past, present, and future

[...]

William J. Dally¹, S. Lacy¹•Institutions (1)

Stanford University¹

21 Mar 1999

TL;DR: The architecture of a VLSI multicomputer constructed from c.

...read moreread less

Abstract: This paper examines the impact of VLSI technology on the evolution of computer architecture and projects the future of this evolution. We see that over the past 20 years, the increased density of VLSI chips was applied to close the gap between microprocessors and high-end CPUs. Today this gap is fully closed and adding devices to uniprocessors is well beyond the point of diminishing returns. To continue to convert the increasing density of VLSI to computer performance we see little alternative to building multicomputers. We sketch the architecture of a VLSI multicomputer constructed from c. 2009 processor-DRAM chips and outline some of the challenges involved in building such a system. We suggest that the software transition from sequential processors to such fine-grain multicomputers can be eased by using the multicomputer as the memory system of a conventional computer.

...read moreread less

Journal Article•DOI•

Minimizing the required memory bandwidth in VLSI system realizations

[...]

Sven Wuytack¹, F. Catthoor¹, G. de Jong², H.J. De Man¹•Institutions (2)

Katholieke Universiteit Leuven¹, Alcatel-Lucent²

01 Dec 1999-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: This paper presents the problem of storage bandwidth optimization (SBO) in VLSI system realizations and shows that it is important to take into account which data is being accessed in parallel, instead of only considering the number of simultaneous memory accesses.

...read moreread less

Abstract: In this paper, we present the problem of storage bandwidth optimization (SBO) in VLSI system realizations Our goal is to minimize the required memory bandwidth within the given cycle budget by adding ordering constraints to the flow graph This allows the subsequent memory allocation and assignment tasks to come up with a cheaper memory architecture with less memories and memory ports The importance and the effect of SBO is shown on realistic examples both in the video and asynchronous transfer-mode (ATM) domains We show that it is important to take into account which data is being accessed in parallel, instead of only considering the number of simultaneous memory accesses Our problem formulation leads to the optimization of a conflict (hyper) graph For the target domain of ATM, only flat graphs without loops have to be treated For this subproblem, a prototype tool has been implemented to demonstrate the feasibility of automating this important system design step

...read moreread less

Book•

Low-voltage CMOS VLSI circuits

[...]

James B. Kuo, Jea-Hong Luo

01 Jan 1999

TL;DR: In this article, the authors present a detailed analysis of one of today's hottest and most compelling research techniques for VLSI systems, namely very large scale integration (VLSI).

...read moreread less

Abstract: Low-voltage very large scale integration (VLSI) circuits represent the electronics of the future. All electronic products are striving to reduce power consumption to create more economical, efficient, and compact devices. Despite the inevitable trend towards low-voltage, few books address the technology needed. Geared to the needs of engineers and designers in the field, this comprehensive volume presents a remarkably detailed analysis of one of today's hottest and most compelling research techniques for VLSI systems.

...read moreread less

Journal Article•DOI•

Multiplexer-based array multipliers

[...]

Kiamal Pekmestzi¹•Institutions (1)

National and Kapodistrian University of Athens¹

01 Jan 1999-IEEE Transactions on Computers

TL;DR: In this article, the synchronous computation of the partial sums of the two operands is proposed for the parallel multiplication of two n-bit numbers, which permits an efficient realization of parallel multiplication using iterative arrays.

...read moreread less

Abstract: A new algorithm for the multiplication of two n-bit numbers based on the synchronous computation of the partial sums of the two operands is presented. The proposed algorithm permits an efficient realization of the parallel multiplication using iterative arrays. At the same time, it permits high-speed operation. Multiplier arrays for positive numbers and numbers in two's complement form based on the proposed technique are implemented. Also, an efficient pipeline form of the proposed multiplication scheme is introduced. All multipliers obtained have low circuit complexity permitting high-speed operation and the interconnections of the cells are regular, well-suited for VLSI realization.

...read moreread less

Journal Article•DOI•

Mosfet Scaling-the Driver of VLSI Technology

[...]

D.L. Critchlow¹•Institutions (1)

University of Vermont¹

01 Apr 1999

TL;DR: The history of scaling and its application to very large scale integration (VLSI) MOSFET technology is traced from 1970 to 1998 by R. Dennard et al..

...read moreread less

Abstract: This is an introduction to the Classic Paper on MOSFET scaling by R. Dennard et al., Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions,' published in the IEEE Journal of Solid-State Circuits in October 1974. The history of scaling and its application to very large scale integration (VLSI) MOSFET technology is traced from 1970 to 1998. The role of scaling in the profound improvements in power delay product over the last three decades is analyzed in basic terms.

...read moreread less

Proceedings Article•DOI•

Integrated floorplanning and interconnect planning

[...]

Hung-Ming Chen¹, Hai Zhou¹, F.Y. Young², D. F. Wong¹, Hannah H. Yang³, Naveed A. Sherwani³ - Show less +2 more•Institutions (3)

University of Texas at Austin¹, The Chinese University of Hong Kong², Intel³

07 Nov 1999

TL;DR: This work proposes a method to combine interconnect planning with floorplanning based on the Wong-Liu (1986) floorplaning algorithm, which uses a multi-stage simulated annealing approach in which different interConnect planning methods are used in different ranges of temperature to reduce running time.

...read moreread less

Abstract: The VLSI fabrication has entered the deep sub-micron era and communication between different components has significantly increased. Interconnect delay has become the dominant factor in total circuit delay. As a result, it is necessary to start interconnect planning as early as possible. In this paper, we propose a method to combine interconnect planning with floorplanning. Our approach is based on the Wong-Liu floorplanning algorithm. When the positions, orientations, and shapes of the cells are decided, the pin positions and routing of the interconnects are decided as well. We use a multi-stage simulated annealing approach in which different interconnect planning methods are used in different ranges of temperatures to reduce running time. A temperature adjustment scheme is designed to give smooth transitions between different stages of simulated annealing. Experimental results show that our approach performs well.

...read moreread less

A Media-Enhanced Vector Architecture for Embedded Memory Systems

[...]

Christoforos Kozyrakis

27 Aug 1999

TL;DR: It is demonstrated that scaling the architecture leads to near linear application speedup, and the effect of scaling the capacity and parallelism of the on-chip memory system to die area and sustained performance is evaluated.

...read moreread less

Abstract: Next generation portable devices will require processors with both low energy consumption and high performance for media functions. At the same time, modern CMOS technology creates the need for highly scalable VLSI architectures. Conventional processor architectures fail to meet these requirements. This paper presents the architecture of Vector IRAM (VIRAM), a processor that combines vector processing with embedded DRAM technology. Vector processing achieves high multimedia performance with simple hardware, while embedded DRAM provides high memory bandwidth at low energy consumption. VIRAM provides flexible support for media data types, short vectors, and DSP features. The vector pipeline is enhanced to hide DRAM latency without using caches. The peak performance is 3.2 GFLOPS (single precision) and maximum memory bandwidth is 25.6 GBytes/s. With a target power consumption of 2 Watts for the vector pipeline and the memory system, VIRAM supports 1.6 GFLOPS/Watt. For a set of representative media kernels, VIRAM sustains on average 88% of its peak performance, outperforming conventional SIMD media extensions and DSP processors by factors of 4.5 to 17. Using a clustered implementation approach, the modular design can be scaled without complicating control logic. We demonstrate that scaling the architecture leads to near linear application speedup. We also evaluate the effect of scaling the capacity and parallelism of the on-chip memory system to die area and sustained performance.

...read moreread less

Journal Article•DOI•

Decoding and Equalization with Analog Non-linear Networks†

[...]

Joachim Hagenauer¹, E. Offer¹, Cyril Méasson¹, Matthias Mörz¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Nov 1999-European Transactions on Telecommunications

TL;DR: Using analog, non-linear and highly parallel networks, this work attempts to perform decoding of block and convolutional codes, equalization of certain frequency-selective channels, decoding of multi-level coded modulation and reconstruction of coded PCM signals.

...read moreread less

Abstract: Using analog, non-linear and highly parallel networks, we attempt to perform decoding of block and convolutional codes, equalization of certain frequency-selective channels, decoding of multi-level coded modulation and reconstruction of coded PCM signals. This is in contrast to common practice where these tasks are performed by sequentially operating processors. Our advantage is that we operate fully on soft values for input and output, similar to what is done in 'turbo' decoding. However, we do not have explicit iterations because the networks float freely in continuous time. The decoder has almost no latency in time because we are only restricted by the time constants from the parasitic RC values of integrated circuits. Simulation results for several simple examples are shown which, in some cases, achieve the performance of a conventional MAP detector. For more complicated codes we indicate promising solutions with more complex analog networks based on the simple ones. Furthermore, we discuss the principles of the analog VLSI implementation of these networks.

...read moreread less

Journal Article•DOI•

Frequency-based multilayer neural network with on-chip learning and enhanced neuron characteristics

[...]

Hiroomi Hikawa¹•Institutions (1)

Oita University¹

01 May 1999-IEEE Transactions on Neural Networks

TL;DR: Simple and modular structure of the proposed MNN leads to a massive parallel and flexible network architecture, which is well suited for very large scale integration (VLSI) implementation.

...read moreread less

Abstract: A new digital architecture of the frequency-based multilayer neural network (MNN) with on-chip learning is proposed. As the signal level is expressed by the frequency, the multiplier is replaced by a simple frequency converter, and the neuron unit uses the voting circuit as the nonlinear adder to improve the nonlinear characteristic. In addition, the pulse multiplier is employed to enhance the neuron characteristics. The backpropagation algorithm is modified for the on-chip learning. The proposed MNN architecture is implemented on field programmable gate arrays (FPGA) and the various experiments are conducted to test the performance of the system. The experimental results show that the proposed neuron has a very good nonlinear function owing to the voting circuit. The learning behavior of the MNN with on-chip learning is also tested by experiments, which show that the proposed MNN has good learning and generalization capabilities. Simple and modular structure of the proposed MNN leads to a massive parallel and flexible network architecture, which is well suited for VLSI implementation.

...read moreread less

Collapse