scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 1999"


Proceedings ArticleDOI
Vivek De1, Shekhar Borkar1
17 Aug 1999
TL;DR: Key barriers to continued scaling of supply voltage and technology for microprocessors to achieve low-power and high-performance are discussed, with particular focus on short-channel effects, device parameter variations, excessive subthreshold and gate oxide leakage.
Abstract: We discuss key barriers to continued scaling of supply voltage and technology for microprocessors to achieve low-power and high-performance. In particular, we focus on short-channel effects, device parameter variations, excessive subthreshold and gate oxide leakage, as the main obstacles dictated by fundamental device physics. Functionality of special circuits in the presence of high leakage, SRAM cell stability, bit line delay scaling, and power consumption in clocks & interconnects, will be the primary design challenges in the future. Soft error rate control and power delivery pose additional challenges. All of these problems are further compounded by the rapidly escalating complexity of microprocessor designs. The excessive leakage problem is particularly severe for battery-operated, high-performance microprocessors.

342 citations


Book
01 Jan 1999
TL;DR: DSP Integrated Circuits.
Abstract: DSP Integrated Circuits. VLSI Circuit Technologies. Digital Signal Processing. Digital Filters. Finite Word Length Effects. DSP Algorithms. DSP System Design. Architectures for DSP. Synthesis of DSP Architectures. Digital Systems. Processing Elements. Integrated Circuit Design. Subject Index.

301 citations


Proceedings ArticleDOI
01 Jul 1999
TL;DR: In this paper, the authors describe switched-capacitor DC-DC power converters (charge pumps) suitable for on-chip, low-power applications, based on connecting two identical but opposite-phase SC converters in parallel, thus eliminating the need for separate bootstrap gate drivers.
Abstract: The paper describes switched-capacitor DC-DC power converters (charge pumps) suitable for on-chip, low-power applications. The proposed configurations are based on connecting two identical but opposite-phase SC converters in parallel, thus eliminating the need for separate bootstrap gate drivers. The authors focus on emerging very low-power VLSI applications such as battery-powered or self-powered signal processors where high power conversion efficiency is important and where power levels are in the milliwatt range. Conduction and switching losses are considered to allow design optimization in terms of switching frequency and component sizes. Open-loop and closed-loop operation of an experimental, fully integrated, 10 MHz voltage doubler is described. The doubler has 2 V or 3 V input and generates 3.3 V or 5 V output at up to 5 mW load. The converter circuit fabricated in a standard 1.2 /spl mu/ CMOS technology takes 0.7 mm/sup 2/ of the chip area.

183 citations


Journal ArticleDOI
TL;DR: The exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors.
Abstract: In order to make software applications simpler to write and easier to maintain, a software digital signal-processing library that performs essential signal- and image-processing functions is an important part of every digital signal processor (DSP) developer's toolset In general, such a library provides high-level interface and mechanisms, therefore, developers only need to know how to use algorithms, not the details of how they work Complex signal transformations then become function calls, eg, C-callable functions Considering the two-dimensional (2-D) convolver function as an example of great significance for DSP's, this paper proposes to replace this software function by an emulation on a field-programmable gate array (FPGA) initially configured by software programming Therefore, the exploration of the 2-D convolver's design space will provide guidelines for the development of a library of DSP-oriented hardware configurations intended to significantly speed up the performance of general DSP processors Based on the specific convolver, and considering operators supported in the library as hardware accelerators, a series of tradeoffs for efficiently exploiting the bandwidth between the general-purpose DSP and accelerators are proposed In terms of implementation, this paper explores the performance and architectural tradeoffs involved in the design of an FPGA-based 2-D convolution coprocessor for the TMS320C40 DSP microprocessor available from Texas Instruments Incorporated However, the proposed concept is not limited to a particular processor

168 citations


Journal ArticleDOI
TL;DR: This paper provides easily computable expressions for crosstalk amplitude and pulse width in resistive, capacitively coupled lines and these expressions hold for nets with arbitrary number of pins and of arbitrary topology under any specified input excitation.
Abstract: We address the problem of crosstalk computation and reduction using circuit and layout techniques in this paper. We provide easily computable expressions for crosstalk amplitude and pulse width in resistive, capacitively coupled lines. The expressions hold for nets with arbitrary number of pins and of arbitrary topology under any specified input excitation. Experimental results show that the average error is about 10% and the maximum error is less than 20%. The expressions are used to motivate circuit techniques, such as transistor sizing, and layout techniques, such as wire ordering and wire width optimization to reduce crosstalk.

165 citations


Book
01 Jan 1999
TL;DR: This book provides details about some of the EDA applications where GAs have been used, including partitioning, automatic placement and routing, technology mapping for FPGAs, automatic test generation, and power estimation.
Abstract: This book provides details about some of the EDA applications where GAs have been used. These applications include partitioning, automatic placement and routing, technology mapping for FPGAs, automatic test generation, and power estimation. One chapter is devoted to each of these topics. The objective is to provide examples where GAs have been successfull applied in the past so that the reader will be able to apply similar techniques in solving his/her own problems.

158 citations


Journal ArticleDOI
TL;DR: The proposed area model is based on transforming the given, multi-output Boolean function description into an equivalent single-output function, and is empirical, and results demonstrating its feasibility and utility are presented.
Abstract: High-level power estimation, when given only a high-level design specification such as a functional or register-transfer level (RTL) description, requires high-level estimation of the circuit average activity and total capacitance. Considering that total capacitance is related to circuit area, this paper addresses the problem of computing the "area complexity" of multi-output combinational logic given only their functional description, i.e., Boolean equations, where area complexity refers to the number of gates required for an optimal multilevel implementation of the combinational logic. The proposed area model is based on transforming the multi-output Boolean function description into an equivalent single output function. The area model is empirical and results demonstrating its feasibility and utility are presented. Also, a methodology for converting the gate count estimates, obtained from the area model, into capacitance estimates is presented. High-level power estimates based on the total capacitance estimates and average activity estimates are also presented.

119 citations


Proceedings ArticleDOI
01 Jun 1999
TL;DR: Experimental results demonstrate that the sequence-of-linear-programming method is orders of magnitude faster than the best-known method based on conjugate gradients, with constantly better optimization solutions.
Abstract: This paper presents a new method for determining the widths of the power and ground routes in integrated circuits so that the area required by the routes is minimized subject to the reliability constraints The basic idea is to transform the resulting constrained nonlinear programming problem into a sequence of linear programs Theoretically, we show that the sequence of linear programs always converges to the optimum solution of the relaxed convex problem Experimental results demonstrate that the sequence-of-linear-programming method is orders of magnitude faster than the best-known method based on conjugate gradients, with constantly better optimization solutions

116 citations


Proceedings ArticleDOI
01 Dec 1999
TL;DR: Novel power-down techniques are proposed, which can achieve very high power- down efficiency without performance or latency degradation at the expense of negligible hardware overhead.
Abstract: Finite precision effects on the performance of TURBO decoders have been analyzed and the optimal word lengths of variables have been determined considering tradeoffs between the performance and the hardware cost. It is shown that the performance degradation from the infinite precision is negligible if 4 bits are used for received bits and 6 bits for the extrinsic information. The state metrics normalization method suitable for TURBO decoders is also discussed. This method requires small amount of hardware and its speed does not depend on the number of states. Furthermore, we propose novel power-down techniques, which can achieve very high power-down efficiency without performance or latency degradation at the expense of negligible hardware overhead.

114 citations


Journal ArticleDOI
TL;DR: A test vector simulation-based approach for multiple design error diagnosis and correction in digital VLSI circuits that is applicable to circuits with no global binary decision diagram representation.
Abstract: With the increase in the complexity of digital VLSI circuit design, logic design errors can occur during synthesis. In this paper, we present a test vector simulation-based approach for multiple design error diagnosis and correction. Diagnosis is performed through an implicit enumeration of the erroneous lines in an effort to avoid the exponential explosion of the error space as the number of errors increases. Resynthesis during correction is as little as possible so that most of the engineering effort invested in the design is preserved. Since both steps are based on test vector simulation, the proposed approach is applicable to circuits with no global binary decision diagram representation. Experiments on ISCAS'85 benchmark circuits exhibit the robustness and error resolution of the proposed methodology. Experiments also indicate that test vector simulation is indeed an attractive technique for multiple design error diagnosis and correction in digital VLSI circuits.

112 citations


Proceedings Article
01 Jan 1999
TL;DR: In this paper, the authors give an overview of recent developments in multiple-valued logic circuit design, revealing both the opportunities they offer and the challenges they face, and present several potential opportunities for the improvement of present VLSI circuit designs.
Abstract: In recent years, there have been major advances in integrated circuit technology which have both made feasible and generated great interest in electronic circuits which employ more than two discrete levels of signal. Such circuits, called multiple-valued logic circuits, offer several potential opportunities for the improvement of present VLSI circuit designs. In this paper, we give an overview of recent developments in multiple-valued logic circuit design, revealing both the opportunities they offer and the challenges they

Journal ArticleDOI
TL;DR: The present paper proposes the architecture, provides a circuit implementation using MOS transistors operated in weak inversion, and shows behavioral simulation results at the system level operation and some electrical simulations.
Abstract: A VLSI architecture is proposed for the realization of real-time two-dimensional (2-D) image filtering in an address-event-representation (AER) vision system. The architecture is capable of implementing any convolutional kernel F(x,y) as long as it is decomposable into x-axis and y-axis components, i.e., F(x,y)=H(x)V(y), for some rotated coordinate system {x,y} and if this product can be approximated safely by a signed minimum operation. The proposed architecture is intended to be used in a complete vision system, known as the boundary contour system and feature contour system (BCS-FCS) vision model, proposed by Grossberg and collaborators. The present paper proposes the architecture, provides a circuit implementation using MOS transistors operated in weak inversion, and shows behavioral simulation results at the system level operation and some electrical simulations.

Patent
20 Jan 1999
TL;DR: In this paper, a high speed, high density VLSI module within a limited space and in a single assembly that attaches, aligns, and manages electromagnetic interference and heat dissipation of the module is presented.
Abstract: A method and apparatus for assembling a high speed, high density VLSI module in a computer system that enables attachment, support, electromagnetic interference containment, and thermal management of the VLSI module. The present invention packages a high speed, high density VLSI module within a limited space and in a single assembly that attaches, aligns, and manages electromagnetic interference and heat dissipation of the VLSI module. The present invention aligns a land grid array of a circuit board and an interposer socket assembly, and the interposer socket assembly and a land grid array of the VLSI module; in the single VLSI module assembly. An even, controlled load is placed on the interposer socket interface thereby reducing the risk of damage to the interposer socket from overloaded connections between the land grid array of the VLSI module, the interposer socket assembly, and the land grid array of the circuit board. The present invention is easy-to-use in upgrading and handling of the VLSI module.

Book
01 Jan 1999
TL;DR: This paper presents a meta-theoretic framework for Comparing the Bit Energy of Signal Representations at the Circuit Level and two new Directions in Low--Power Digital CMOS VLSI Design.
Abstract: Foreword. Preface. Acknowledgments. Contributors. Introduction (E. Sanchez--Sinencio). A Current--Based MOSFET Model for Integrated Circuit Design (C. Montoro, et al.). A Review of the Performance of Available Integrated Circuit Components Under the Constraints of Low--Power Operation (D. Bowers). Exploiting Device Physics in Circuit Design for Efficient Computational Functions in Analog VLSI (A. Andreou). Low--Voltage Circuit Techniques Using Floating--Gate Transistors (Chong--Gun Yu and Randall Geiger). Low--Power CMOS Digital Circuits (S. Embabi). Low--Voltage Analog BiCMOS Circuit Building Blocks (J. Ramirez--Angulo). Low--Voltage CMOS Operational Amplifiers (R. Wassenaar, et al.). Low--Voltage/Low--Power Amplifiers with Optimized Dynamic Range and Bandwidth (J. Huijsing, et al.). Low--Voltage Analog CMOS Filter Design (M. Steyaert, et al.). Continuous--Time Low--Voltage Current--Mode Filters (E. Sanchez--Sinencio and S. Smith). High--Efficiency Low--Voltage DC--DC Conversion for Portable Applications (A. Stratakos, et al.). Two New Directions in Low--Power Digital CMOS VLSI Design (V. Kantabutra). Low--Power CMOS Data Conversion (M. Pelgrom). Low--Power Multiplierless YUV--to--RGB Converter Based on Human Vision Perception (T. Meng, et al.). Micropower Systems for Implantable Defibrillators and Pacemakers (M. Jabri and R. Coggins). An Information Theoretic Framework for Comparing the Bit Energy of Signal Representations at the Circuit Level (A. Andreou and P. Furth). A Synchronous Gated--Clock Strategy for Low--Power Design of Telecom ASICs (P. Vanoostende and G. Van Wauwe). Index. About the Editors.

BookDOI
01 Apr 1999
TL;DR: Part 1 System applications: multimedia systems overview video compression audio compression system synchronization approaches digital versatile disk VLSI signal processing for very high speed digital subscriber loops (VDSL) cable modems wireless communication systems.
Abstract: Part 1 System applications: multimedia systems overview video compression audio compression system synchronization approaches digital versatile disk VLSI signal processing for very high speed digital subscriber loops (VDSL) cable modems wireless communication systems. Part 2 Programmable and custom architectures and algorithms: programmable DSPs RISC, video and media DSPs wireless DSPs motion estimation system design wavelet VLSI architectures DCT architectures lossless coders Viterbi decoders - algorithms and high performance architectures watermarking for multimedia systolic RLS adaptive filtering STAR-RLS filtering. Part 3 Advanced arithmetic architectures and design methodologies: division and square root finite field arithmetic cordic algorithms and architectures for fast and efficient vector-rotation implementation advanced systolic design low power design power estimation approaches system exploration for custom low power data storage and transfer hardware description and synthesis of DSP systems.

Proceedings ArticleDOI
08 Aug 1999
TL;DR: A new technique and CMOS VLSI implementation for computing approximate logarithms (base 2,and 10) for binary integers is presented and the approximation is performed using only combinational logic and requires no multiplications.
Abstract: A new technique and CMOS VLSI implementation for computing approximate logarithms (base 2,and 10) for binary integers is presented. The approximation is performed using only combinational logic and requires no multiplications. Additionally, as implemented, a ROM of only N/spl times/log/sub 2/(N) bits is used to convert N bit integers. The maximum error of the approximation is 1.5% when the input value is 3, and decays exponentially to less than 0.5% for input values greater than 25.

Proceedings ArticleDOI
01 Dec 1999
TL;DR: In this paper, a new image encryption algorithm and its VLSI architecture are proposed based on a defined bit recirculation function and a binary sequence generated from a chaotic system, the gray level of each pixel in the image is transformed.
Abstract: In this paper, a new image encryption algorithm and its VLSI architecture are proposed. Based on a defined bit recirculation function and a binary sequence generated from a chaotic system, the gray level of each pixel in the image is transformed. The features of the algorithm are as follows: 1) low computational complexity, 2) high security, and 3) no distortion. In order to implement the system, its VLSI architecture with low hardware complexity, high computing speed, and high feasibility for VLSI implementation is also designed. Finally, two encrypted images are simulated and the fractal dimensions of the original and encrypted images are computed to demonstrate the effectiveness of the proposed algorithm.

Proceedings ArticleDOI
01 Jun 1999
TL;DR: A new VLSI layout methodology which addresses the main problems faced in deep sub-micron (DSM) integrated circuit design, and shows how the uniform parasitics of the fabric give rise to a reliable and predictable design.
Abstract: Proposes a new VLSI layout methodology which addresses the main problems faced in deep sub-micron (DSM) integrated circuit design. Our layout "fabric" scheme eliminates the conventional notion of power and ground routing on the integrated circuit die. Instead, power and ground are essentially "pre-routed" all over the die. By a clever arrangement of power/ground and signal pins, we almost completely eliminate the capacitive effects between signal wires. Additionally. We get a power and ground distribution network with a very low resistance at any point on the die. Another advantage of our scheme is that the arrangement of conductors ensures that on-chip inductances are uniformly negligible. Finally, characterization of the circuit delays, capacitances and resistances becomes extremely simple in our scheme, and needs to be done only once for a design. We show how the uniform parasitics of our fabric give rise to a reliable and predictable design. We have implemented our scheme using public domain layout software. Preliminary results show that it holds much promise as the layout methodology of choice in DSM integrated circuit design.

Proceedings ArticleDOI
01 Dec 1999
TL;DR: In this paper, a 40nm-gate-length ultra-thin body (UTB) nMOSFET is proposed to eliminate the punchthrough path between source and drain.
Abstract: A 40nm-gate-length ultra-thin body (UTB) nMOSFET is demonstrated. A self-aligned thin body SOI device has previously been proposed for suppressing the short channel effect. UTB structure can eliminate the punchthrough path between source and drain and provide a more evolutionary alternative to the double-gate MOSFET for deep-sub-tenth micron technology. The advantage of using UTB is illustrated through device simulation (with the aid of Silvaco ATLAS) using simple doping profiles for the body and S/D (simple Gaussian).

Journal ArticleDOI
TL;DR: This paper describes a novel communication scheme, which is guaranteed to be free of synchronization failures, amongst multiple synchronous and asynchronous modules operating independently, through an asynchronous first-in first-out (FIFO) channel.
Abstract: This paper describes a novel communication scheme, which is guaranteed to be free of synchronization failures, amongst multiple synchronous and asynchronous modules operating independently. In this scheme, communication between every pair of modules is done through an asynchronous first-in first-out (FIFO) channel; communication between a module and the FIFO is done using a request/acknowledge handshaking. Synchronization of handshake signals to the local module clock is done in an unconventional way-the local clock built out of a ring oscillator is paused or stretched, if necessary, to ensure that the handshake signal satisfies setup and hold time constraints with respect to the local clock. In order to validate this scheme, we implemented a test chip in 0.5-/spl mu/m CMOS. This chip is designed as a ring, composed of two synchronous modules, an asynchronous module, and two asynchronous FIFOs. Each module functions as a receiver to one module and a sender to another module. Test results show that the chip functions reliably up to 456 MHz.

Journal ArticleDOI
TL;DR: In this paper, a one-dimensional visual sensor, implemented on a single VLSI chip using analog neuromorphic circuits, is proposed for selectively detecting and tracking the position of the feature with the highest spatial contrast present in the visual scene.
Abstract: This paper presents a one-dimensional visual sensor, implemented on a single VLSI chip using analog neuromorphic circuits, for selectively detecting and tracking the position of the feature with the highest spatial contrast present in the visual scene. The chip's photoreceptors adapt to stationary backgrounds and can be tuned to respond maximally to specific target velocities. The sensor drastically reduces the amount of data to be transmitted to further processing stages by encoding, in real time, the position of the target in the form of a single continuous-time analog variable. We describe the circuits implementing the sensor and show applications to three examples of tracking tasks: a stand-alone visual tracking system, an active fully analog tracking system, and a mobile platform line-following system.

Proceedings ArticleDOI
21 Mar 1999
TL;DR: The architecture of a VLSI multicomputer constructed from c.
Abstract: This paper examines the impact of VLSI technology on the evolution of computer architecture and projects the future of this evolution. We see that over the past 20 years, the increased density of VLSI chips was applied to close the gap between microprocessors and high-end CPUs. Today this gap is fully closed and adding devices to uniprocessors is well beyond the point of diminishing returns. To continue to convert the increasing density of VLSI to computer performance we see little alternative to building multicomputers. We sketch the architecture of a VLSI multicomputer constructed from c. 2009 processor-DRAM chips and outline some of the challenges involved in building such a system. We suggest that the software transition from sequential processors to such fine-grain multicomputers can be eased by using the multicomputer as the memory system of a conventional computer.

Journal ArticleDOI
TL;DR: This paper presents the problem of storage bandwidth optimization (SBO) in VLSI system realizations and shows that it is important to take into account which data is being accessed in parallel, instead of only considering the number of simultaneous memory accesses.
Abstract: In this paper, we present the problem of storage bandwidth optimization (SBO) in VLSI system realizations Our goal is to minimize the required memory bandwidth within the given cycle budget by adding ordering constraints to the flow graph This allows the subsequent memory allocation and assignment tasks to come up with a cheaper memory architecture with less memories and memory ports The importance and the effect of SBO is shown on realistic examples both in the video and asynchronous transfer-mode (ATM) domains We show that it is important to take into account which data is being accessed in parallel, instead of only considering the number of simultaneous memory accesses Our problem formulation leads to the optimization of a conflict (hyper) graph For the target domain of ATM, only flat graphs without loops have to be treated For this subproblem, a prototype tool has been implemented to demonstrate the feasibility of automating this important system design step

Book
01 Jan 1999
TL;DR: In this article, the authors present a detailed analysis of one of today's hottest and most compelling research techniques for VLSI systems, namely very large scale integration (VLSI).
Abstract: Low-voltage very large scale integration (VLSI) circuits represent the electronics of the future. All electronic products are striving to reduce power consumption to create more economical, efficient, and compact devices. Despite the inevitable trend towards low-voltage, few books address the technology needed. Geared to the needs of engineers and designers in the field, this comprehensive volume presents a remarkably detailed analysis of one of today's hottest and most compelling research techniques for VLSI systems.

Journal ArticleDOI
TL;DR: In this article, the synchronous computation of the partial sums of the two operands is proposed for the parallel multiplication of two n-bit numbers, which permits an efficient realization of parallel multiplication using iterative arrays.
Abstract: A new algorithm for the multiplication of two n-bit numbers based on the synchronous computation of the partial sums of the two operands is presented. The proposed algorithm permits an efficient realization of the parallel multiplication using iterative arrays. At the same time, it permits high-speed operation. Multiplier arrays for positive numbers and numbers in two's complement form based on the proposed technique are implemented. Also, an efficient pipeline form of the proposed multiplication scheme is introduced. All multipliers obtained have low circuit complexity permitting high-speed operation and the interconnections of the cells are regular, well-suited for VLSI realization.

Journal ArticleDOI
01 Apr 1999
TL;DR: The history of scaling and its application to very large scale integration (VLSI) MOSFET technology is traced from 1970 to 1998 by R. Dennard et al..
Abstract: This is an introduction to the Classic Paper on MOSFET scaling by R. Dennard et al., Design of Ion-Implanted MOSFET's with Very Small Physical Dimensions,' published in the IEEE Journal of Solid-State Circuits in October 1974. The history of scaling and its application to very large scale integration (VLSI) MOSFET technology is traced from 1970 to 1998. The role of scaling in the profound improvements in power delay product over the last three decades is analyzed in basic terms.

Proceedings ArticleDOI
07 Nov 1999
TL;DR: This work proposes a method to combine interconnect planning with floorplanning based on the Wong-Liu (1986) floorplaning algorithm, which uses a multi-stage simulated annealing approach in which different interConnect planning methods are used in different ranges of temperature to reduce running time.
Abstract: The VLSI fabrication has entered the deep sub-micron era and communication between different components has significantly increased. Interconnect delay has become the dominant factor in total circuit delay. As a result, it is necessary to start interconnect planning as early as possible. In this paper, we propose a method to combine interconnect planning with floorplanning. Our approach is based on the Wong-Liu floorplanning algorithm. When the positions, orientations, and shapes of the cells are decided, the pin positions and routing of the interconnects are decided as well. We use a multi-stage simulated annealing approach in which different interconnect planning methods are used in different ranges of temperatures to reduce running time. A temperature adjustment scheme is designed to give smooth transitions between different stages of simulated annealing. Experimental results show that our approach performs well.

27 Aug 1999
TL;DR: It is demonstrated that scaling the architecture leads to near linear application speedup, and the effect of scaling the capacity and parallelism of the on-chip memory system to die area and sustained performance is evaluated.
Abstract: Next generation portable devices will require processors with both low energy consumption and high performance for media functions. At the same time, modern CMOS technology creates the need for highly scalable VLSI architectures. Conventional processor architectures fail to meet these requirements. This paper presents the architecture of Vector IRAM (VIRAM), a processor that combines vector processing with embedded DRAM technology. Vector processing achieves high multimedia performance with simple hardware, while embedded DRAM provides high memory bandwidth at low energy consumption. VIRAM provides flexible support for media data types, short vectors, and DSP features. The vector pipeline is enhanced to hide DRAM latency without using caches. The peak performance is 3.2 GFLOPS (single precision) and maximum memory bandwidth is 25.6 GBytes/s. With a target power consumption of 2 Watts for the vector pipeline and the memory system, VIRAM supports 1.6 GFLOPS/Watt. For a set of representative media kernels, VIRAM sustains on average 88% of its peak performance, outperforming conventional SIMD media extensions and DSP processors by factors of 4.5 to 17. Using a clustered implementation approach, the modular design can be scaled without complicating control logic. We demonstrate that scaling the architecture leads to near linear application speedup. We also evaluate the effect of scaling the capacity and parallelism of the on-chip memory system to die area and sustained performance.

Journal ArticleDOI
TL;DR: Using analog, non-linear and highly parallel networks, this work attempts to perform decoding of block and convolutional codes, equalization of certain frequency-selective channels, decoding of multi-level coded modulation and reconstruction of coded PCM signals.
Abstract: Using analog, non-linear and highly parallel networks, we attempt to perform decoding of block and convolutional codes, equalization of certain frequency-selective channels, decoding of multi-level coded modulation and reconstruction of coded PCM signals. This is in contrast to common practice where these tasks are performed by sequentially operating processors. Our advantage is that we operate fully on soft values for input and output, similar to what is done in 'turbo' decoding. However, we do not have explicit iterations because the networks float freely in continuous time. The decoder has almost no latency in time because we are only restricted by the time constants from the parasitic RC values of integrated circuits. Simulation results for several simple examples are shown which, in some cases, achieve the performance of a conventional MAP detector. For more complicated codes we indicate promising solutions with more complex analog networks based on the simple ones. Furthermore, we discuss the principles of the analog VLSI implementation of these networks.

Journal ArticleDOI
Hiroomi Hikawa1
TL;DR: Simple and modular structure of the proposed MNN leads to a massive parallel and flexible network architecture, which is well suited for very large scale integration (VLSI) implementation.
Abstract: A new digital architecture of the frequency-based multilayer neural network (MNN) with on-chip learning is proposed. As the signal level is expressed by the frequency, the multiplier is replaced by a simple frequency converter, and the neuron unit uses the voting circuit as the nonlinear adder to improve the nonlinear characteristic. In addition, the pulse multiplier is employed to enhance the neuron characteristics. The backpropagation algorithm is modified for the on-chip learning. The proposed MNN architecture is implemented on field programmable gate arrays (FPGA) and the various experiments are conducted to test the performance of the system. The experimental results show that the proposed neuron has a very good nonlinear function owing to the voting circuit. The learning behavior of the MNN with on-chip learning is also tested by experiments, which show that the proposed MNN has good learning and generalization capabilities. Simple and modular structure of the proposed MNN leads to a massive parallel and flexible network architecture, which is well suited for VLSI implementation.