Showing papers on "Very-large-scale integration published in 1992"

PDF

Open Access

Journal Article•DOI•

[...]

Anantha P. Chandrakasan¹, Samuel Sheng¹, Robert W. Brodersen¹•Institutions (1)

01 Apr 1992-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, techniques for low power operation are presented which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations to reduce power consumption in CMOS digital circuits while maintaining computational throughput.

...read moreread less

Abstract: Motivated by emerging battery-operated applications that demand intensive computation in portable environments, techniques are investigated which reduce power consumption in CMOS digital circuits while maintaining computational throughput. Techniques for low-power operation are shown which use the lowest possible supply voltage coupled with architectural, logic style, circuit, and technology optimizations. An architecturally based scaling strategy is presented which indicates that the optimum voltage is much lower than that determined by other scaling considerations. This optimum is achieved by trading increased silicon area for reduced power consumption. >

...read moreread less

2,690 citations

Proceedings Article•DOI•

Estimation of average switching activity in combinational and sequential circuits

[...]

Abhijit Ghosh¹, Srinivas Devadas², Kurt Keutzer³, Jacob K. White²•Institutions (3)

Mitsubishi¹, Massachusetts Institute of Technology², Synopsys³

01 Jul 1992

TL;DR: The authors address the problem of estimating the average power dissipated in VLSI combinational and sequential circuits, under random input sequences, by presenting methods to probabilistically estimate switching activity in sequential circuits.

...read moreread less

Abstract: The authors address the problem of estimating the average power dissipated in VLSI combinational and sequential circuits, under random input sequences. Switching activity is strongly affected by gate delays and for this reason a general delay model is used in estimating switching activity. The method takes into account correlation caused at internal gates in the circuit due to reconvergence of input signals. In sequential circuits, the input sequence applied to the combinational portion of the circuit is highly correlated because some of the inputs to the combinational logic are flip-flop outputs representing the state of the circuit. Methods are presented to probabilistically estimate switching activity in sequential circuits. These methods automatically compute the switching rates and correlations between flip-flop outputs. >

...read moreread less

506 citations

Journal Article•DOI•

A 200-MHz 64-b dual-issue CMOS microprocessor

[...]

Daniel W. Dobberpuhl, R. Witek, R. Allmon, R. Anglin, D. Bertucci, S.M. Britton, L. Chao, R.A. Conrad, D.E. Dever, B. Gieseke, Soha Hassoun, G. Hoeppner, K. Kuchler, M. Ladd, B.M. Leary, L. Madden, Edward J. McLellan, D.R. Meyer, J. Montanaro, Donald A. Priore, V. Rajagopalan, S. Samudrala, S. Santhanam - Show less +19 more

19 Feb 1992

TL;DR: A RISC (reduced-instruction-set computer)-style microprocessor operating up to 200 MHz, implements a 64-b architecture that provides huge linear address space without bottlenecks that would impede highly concurrent implementations.

...read moreread less

Abstract: A RISC (reduced-instruction-set computer)-style microprocessor operating up to 200 MHz, implements a 64-b architecture that provides huge linear address space without bottlenecks that would impede highly concurrent implementations. Fully pipelined and capable of issuing two instructions per clock cycle, this implementation can execute up to 400 M operations per second. The chip includes an 8-kB I-cache, an 8-kB D-cache, and two associated translation buffers, a four-entry 32-B/entry write buffer, a pipelined 64-b integer execution unit with 32-entry register file, and a pipelined floating-point unit with an additional 32 registers. The pin interface includes integral support for an external secondary cache. The package is a 431-pin PGA with 140 pins dedicated to VDD/VSS. The chip is fabricated in 0.75- mu m n-well CMOS with three layers of metallization. The die measures 16.8*13.9 mm/sup 2/ and contains 1.68 M transistors. Power dissipation is 30 W from a 3.3-V supply at 200 MHz. >

...read moreread less

401 citations

Dissertation•

VLSI analogs of neuronal visual processing: a synthesis of form and function

[...]

Misha Mahowald

01 Jan 1992

TL;DR: The overall synthetic visual system demonstrates that analog VLSI can capture a significant portion of the function of neural structures at a systems level, and that incorporating neural architectures leads to new engineering approaches to computation in V LSI.

...read moreread less

Abstract: This thesis describes the development and testing of a simple visual system fabricated using complementary metal-oxide-semiconductor (CMOS) very large scale integration (VLSI) technology. This visual system is composed of three subsystems. A silicon retina, fabricated on a single chip, transduces light and performs signal processing in a manner similar to a simple vertebrate retina. A stereocorrespondence chip uses bilateral retinal input to estimate the location of objects in depth. A silicon optic nerve allows communication between chips by a method that preserves the idiom of action potential transmission in the nervous system. Each of these subsystems illuminates various aspects of the relationship between VLSI analogs and their neurobiological counterparts. The overall synthetic visual system demonstrates that analog VLSI can capture a significant portion of the function of neural structures at a systems level, and concomitantly, that incorporating neural architectures leads to new engineering approaches to computation in VLSI. The relationship between neural systems and VLSI is rooted in the shared limitations imposed by computing in similar physical media. The systems discussed in this text support the belief that the physical limitations imposed by the computational medium significantly affect the evolving algorithm. Since circuits are essentially physical structures, I advocate the use of analog VLSI as powerful medium of abstraction, suitable for understanding and expressing the function of real neural systems. The working chip elevates the circuit description to a kind of synthetic formalism. The behaving physical circuit provides a formal test of theories of function that can be expressed in the language of circuits.

...read moreread less

314 citations

Dissertation•

Wiring considerations in analog VLSI systems, with application to field-programmable networks

[...]

Massimo A. Sivilotti¹•Institutions (1)

California Institute of Technology¹

01 Jan 1992

TL;DR: A polynomial-time programming algorithm for embedding the desired circuit graph onto the prefabricated routing resources is presented, and is implemented as part of a general design tool for specifying, manipulating and comparing circuit netlists.

...read moreread less

Abstract: This thesis develops a theoretical model for the wiring complexity of wide classes of systems, relating the degree of connectivity of a circuit to the dimensionality of its interconnect technology. This model is used to design an efficient, hierarchical interconnection network capable of accommodating large classes of circuits. Predesigned circuit elements can be incorporated into this hierarchy, permitting semi-customization for particular classes of systems (e.g., photoreceptors included on vision chips). A polynomial-time programming algorithm for embedding the desired circuit graph onto the prefabricated routing resources is presented, and is implemented as part of a general design tool for specifying, manipulating and comparing circuit netlists. This thesis presents a system intended to facilitate analog circuit design. At its core is a VLSI chip that is electrically configured in the field by selectively connecting predesigned elements to form a desired circuit, which is then tested electrically. The system may be considered a hardware accelerator for simulation, and its large capacity permits testing system ideas, which is impractical using current means. A fast-turnaround simulator permitting rapid conception and evaluation of circuit ideas is an invaluable aid to developing an understanding of system design in a VLSI context. We have constructed systems using both reconfigurable interconnection switches and laser-programmed interconnect. Prototypes capable of synthesizing circuits consisting of over 1000 transistors have been constructed. The flexibility of the system has been demonstrated, and data from parametric tests have proven the validity of the approach. Finally, this thesis presents several new circuits that have become key components in many analog VLSI systems. Fast, dense and provably safe one-phase latches and hierarchical arbiters are presented, as are a low-noise analog switch, an isotropic novelty filter, a dense, active high-resistance element, and a subthreshold differential amplifier with a large linear input range.

...read moreread less

303 citations

Journal Article•DOI•

Circuit models for three-dimensional geometries including dielectrics

[...]

Albert E. Ruehli¹, H. Heeb¹•Institutions (1)

IBM¹

01 Jul 1992-IEEE Transactions on Microwave Theory and Techniques

TL;DR: In this paper, the authors extended the partial element equivalent circuit (PEEC) approach to include arbitrary homogeneous dielectric regions, and applied the new circuit models in the frequency as well as the time domain.

...read moreread less

Abstract: The partial element equivalent circuit (PEEC) approach has proved useful for modeling many different electromagnetic problems. The technique can be viewed as an approach for the electrical circuit modeling for arbitrary 3-D geometries. Recently, the authors extended the method to include retardation with the rPEEC models. So far the dielectrics have been taken into account only in an approximate way. In this work, they generalize the technique to include arbitrary homogeneous dielectric regions. The new circuit models are applied in the frequency as well as the time domain. The time solution allows the modeling of VLSI systems which involve interconnects as well as nonlinear transistor circuits. >

...read moreread less

289 citations

Proceedings Article•DOI•

Estimation of average switching activity in combinational and sequential circuits

[...]

Ghosh¹, Devadas, Keutzer, White•Institutions (1)

Mitsubishi¹

01 Jun 1992

270 citations

Journal Article•DOI•

Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks

[...]

Marwan A. Jabri¹, Barry Flower¹•Institutions (1)

University of Sydney¹

01 Jan 1992-IEEE Transactions on Neural Networks

TL;DR: It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation is more economical for parallel analog implementations and is suitable for multilayer recurrent networks as well.

...read moreread less

Abstract: Previous work on analog VLSI implementation of multilayer perceptrons with on-chip learning has mainly targeted the implementation of algorithms such as back-propagation. Although back-propagation is efficient, its implementation in analog VLSI requires excessive computational hardware. It is shown that using gradient descent with direct approximation of the gradient instead of back-propagation is more economical for parallel analog implementations. It is shown that this technique (which is called 'weight perturbation') is suitable for multilayer recurrent networks as well. A discrete level analog implementation showing the training of an XOR network as an example is presented. >

...read moreread less

264 citations

Journal Article•DOI•

VLSI architecture for block-matching motion estimation algorithm

[...]

Chaur-Heh Hsieh, T.-P. Lin

01 Jun 1992-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A VLSI architecture for implementing a full-search block-matching algorithm is presented, based on a systolic array processor and shift register arrays with programmable length, which has the following advantages: it allows serial data input to save the pin counts but performs parallel processing.

...read moreread less

Abstract: The block-matching motion estimation is the most popular method for motion-compensated coding of image sequence. A VLSI architecture for implementing a full-search block-matching algorithm is presented. Based on a systolic array processor and shift register arrays with programmable length, the proposed architecture has the following advantages: it allows serial data input to save the pin counts but performs parallel processing; it is flexible in adaptation to the dimensional change of the search area via simple control; it can operate in real time for videoconference applications; and it is simple and modular in design, and thus is suitable for VLSI implementation. >

...read moreread less

232 citations

Journal Article•DOI•

The message-driven processor: a multicomputer processing node with efficient mechanisms

[...]

William J. Dally¹, J.A.S. Fiske¹, John S. Keen¹, Richard Lethin¹, Michael D. Noakes¹, P.R. Nuth¹, R. E. Davison, G. A. Fyler² - Show less +4 more•Institutions (2)

Massachusetts Institute of Technology¹, Intel²

01 Mar 1992-IEEE Micro

TL;DR: The message-driven processor (MDP), a 36-b, 1.1-million transistor, VLSI microcomputer, specialized to operate efficiently in a multicomputer, is described and incorporates primitive mechanisms for communication, synchronization, and naming which support most proposed parallel programming models.

...read moreread less

Abstract: The message-driven processor (MDP), a 36-b, 1.1-million transistor, VLSI microcomputer, specialized to operate efficiently in a multicomputer, is described. The MDP chip includes a processor, a 4096-word by 36-b memory, and a network port. An on-chip memory controller with error checking and correction (ECC) permits local memory to be expanded to one million words by adding external DRAM chips. The MDP incorporates primitive mechanisms for communication, synchronization, and naming which support most proposed parallel programming models. The MDP system architecture, instruction set architecture, network architecture, implementation, and software are discussed. >

...read moreread less

221 citations

Journal Article•DOI•

A new class of iterative Steiner tree heuristics with good performance

[...]

Andrew B. Kahng¹, Gabriel Robins¹•Institutions (1)

University of California, Los Angeles¹

01 Jul 1992-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The method yields results that reduce wire length by up to 2% to 3% over the previous methods, and is the first heuristic which has been shown to have a performance ratio less than 3/2.

...read moreread less

Abstract: A fast approach to the minimum rectilinear Steiner tree (MRST) problem is presented. The method yields results that reduce wire length by up to 2% to 3% over the previous methods, and is the first heuristic which has been shown to have a performance ratio less than 3/2; in fact, the performance ratio is less than or equal to 4/3 on the entire class of instances where the ratio c(MST)/c(MRST) is exactly equal to 3/2. The algorithm has practical asymptotic complexity owing to an elegant implementation which uses methods from computation geometry and which parallelizes readily. A randomized variation of the algorithm, along with a batched variant, has also proved successful. >

...read moreread less

Journal Article•DOI•

High-speed parallel CRC circuits in VLSI

[...]

T.-B. Pei¹, C. Zukowski¹•Institutions (1)

Columbia University¹

01 Apr 1992-IEEE Transactions on Communications

TL;DR: It is shown that parallel architectures fall somewhat short of ideal speedups in practice, but they should still enable current CMOS technologies to go well beyond 1 Gb/s data rates.

...read moreread less

Abstract: The use of VLSI technology to speed up cyclic redundancy checking (CRC) circuits used for error detection in telecommunications systems is investigated By generalizing the analysis of a parallel prototype, performance is estimated over a wide range of external constraints and design choices It is shown that parallel architectures fall somewhat short of ideal speedups in practice, but they should still enable current CMOS technologies to go well beyond 1 Gb/s data rates >

...read moreread less

Journal Article•DOI•

Improved implementation of the silicon cochlea

[...]

L. Watts¹, D.A. Kerns¹, Richard F. Lyon¹, Carver A. Mead¹•Institutions (1)

California Institute of Technology¹

01 May 1992-IEEE Journal of Solid-state Circuits

TL;DR: The original 'Analog electronic cochlea' design is discussed in light of issues, and circuit and layout techniques are described which significantly improve its performance, robustness, and efficiency.

...read moreread less

Abstract: The original 'Analog electronic cochlea' of R.F. Lyon and C.A. Mead (Trans. Acoust., Speech, Signal Processing, vol.36, no.7, p.1119-34, 1988) used a cascade of second-order filter sections in subthreshold analog VLSI to implement a low-power, real-time model of early auditory processing. Experience with many silicon-cochlea chips has allowed the identification of a number of important design issues, namely dynamic range, stability, device mismatch, and compactness. In this, paper, the original design is discussed in light of these issues, and circuit and layout techniques are described which significantly improve its performance, robustness, and efficiency. Measurements from test chips verify the improved performance. >

...read moreread less

Journal Article•DOI•

A reconfigurable VLSI neural network

[...]

S. Satyanarayana¹, Yannis Tsividis², Hans Peter Graf³•Institutions (3)

Philips¹, National and Kapodistrian University of Athens², AT&T³

01 Jan 1992-IEEE Journal of Solid-state Circuits

TL;DR: The results of this work indicate that reconfigurable neural networks built using distributed neuron synapses can be used to solve various problems efficiently.

...read moreread less

Abstract: Due to the variety of architectures that need be considered while attempting solutions to various problems using neural networks, the implementation of a neural network with programmable topology and programmable weights has been undertaken. A new circuit block, the distributed neuron-synapse, has been used to implement a 1024 synapse reconfigurable network on a VLSI chip. In order to evaluate the performance of the VLSI chip, a complete test setup consisting of hardware for configuring the chip, programming the synaptic weights, presenting analog input vectors to the chip, and recording the outputs of the chip, has been built. Following the performance verification of each circuit block on the chip, various sample problems were solved. In each of the problems the synaptic weights were determined by training the neural network using a gradient-based learning algorithm which is incorporated in the experimental test setup. The results of this work indicate that reconfigurable neural networks built using distributed neuron synapses can be used to solve various problems efficiently. >

...read moreread less

Journal Article•DOI•

Beware the isochronic fork

[...]

Kees van Berkel¹•Institutions (1)

Philips¹

01 Jun 1992-Integration

TL;DR: It is shown that it is also important to limit the variation in logic threshold voltages of VLSI operators to produce uniform thresholds, and that a particular CMOS implementation of sequential operators is not capable of producing uniform thresholds.

...read moreread less

Journal Article•DOI•

Putting routing tables in silicon

[...]

T.-B. Pei¹, C. Zukowski¹•Institutions (1)

Columbia University¹

01 Jan 1992-IEEE Network

TL;DR: The routing table problem is presented by discussing the available architectures and how they are related and it is shown that simple table lookup is just a special case of the standard trie structure and that the use of partitioning combined with the Trie structure provides a continuum that can lead to a CAM implementation at one extreme.

...read moreread less

Abstract: Moving routing tables from RAM to custom or semicustom VLSI can lower cost and boost performance. The routing table problem is presented by discussing the available architectures and how they are related. It is shown that simple table lookup is just a special case of the standard trie structure and that the use of partitioning combined with the trie structure provides a continuum that can lead to a CAM implementation at one extreme. The high-level tradeoffs in the choice of various parameters for the trie are estimated. A careful choice of word size can balance the requirements for speed with the costs of area. Also considered are the costs and benefits of splitting the table into a number of tries, which are searched simultaneously. VLSI implementations are outlined, and the costs are compared. General CAM structures are not needed for the routing table application, and custom CAMs can be very efficient. Tries, however, can be competitive in many cases, due to the resources available for building conventional memories. >

...read moreread less

Proceedings Article•DOI•

CRIS: a test cultivation program for sequential VLSI circuits

[...]

Daniel G. Saab¹, Youssef G. Saab², Jacob A. Abraham²•Institutions (2)

University of Missouri¹, University of Texas at Austin²

08 Nov 1992

TL;DR: An approach to cultivating a test for combinational and sequential VLSI circuits described hierarchically at the transistor, gate, and higher levels is discussed, based on continuous mutation of a given input sequence and on analyzing the mutated vectors for selecting the test set.

...read moreread less

Abstract: This paper discusses a novel approach to cultivating a test for combinational and sequential VLSI circuits described hierarchically at the transistor, gate, and higher levels. The approach is based on continuous mutation of a given input sequence and on analyzing the mutated vectors for selecting the test set. The approach uses hierarchical simulation technique in the analysis to drastically reduce the memory requirement, thus allowing the test generation for large VLSI circuits. The algorithms are at the switch level so that general MOS digital designs can be handed, and both stuck-at and transistor faults are handle accurately. The approach has been implemented in a hierarchical test generation system, CRIS, that runs under UNM on SPARC workstations. CRIS has been used successfully to generate tests with high fault coverage for large combinational and sequential circuits.

...read moreread less

Journal Article•DOI•

A VLSI neural processor for image data compression using self-organization networks

[...]

Wai-Chi Fang¹, Bing J. Sheu¹, Oscal T.-C. Chen¹, J. Choi¹•Institutions (1)

University of Southern California¹

01 May 1992-IEEE Transactions on Neural Networks

TL;DR: An adaptive electronic neural network processor has been developed for high-speed image compression based on a frequency-sensitive self-organization algorithm that is quite efficient and can achieve near-optimal results.

...read moreread less

Abstract: An adaptive electronic neural network processor has been developed for high-speed image compression based on a frequency-sensitive self-organization algorithm The performance of this self-organization network and that of a conventional algorithm for vector quantization are compared The proposed method is quite efficient and can achieve near-optimal results The neural network processor includes a pipelined codebook generator and a paralleled vector quantizer, which obtains a time complexity O(1) for each quantization vector A mixed-signal design technique with analog circuitry to perform neural computation and digital circuitry to process multiple-bit address information are used A prototype chip for a 25-D adaptive vector quantizer of 64 code words was designed, fabricated, and tested It occupies a silicon area of 46 mm*68 mm in a 20 mu m scalable CMOS technology and provides a computing capability as high as 32 billion connections/s The experimental results for the chip and the winner-take-all circuit test structure are presented >

...read moreread less

Journal Article•DOI•

Integrated pulse stream neural networks: results, issues, and pointers

[...]

Alister Hamilton¹, Alan F. Murray¹, D.J. Baxter¹, S. Churcher¹, H.M. Reekie¹, Lionel Tarassenko² - Show less +2 more•Institutions (2)

University of Edinburgh¹, University of Oxford²

01 May 1992-IEEE Transactions on Neural Networks

TL;DR: Results from working analog VLSI implementations of two different pulse stream neural network forms are reported, and a strategy for interchip communication of large numbers of neural states has been implemented in silicon.

...read moreread less

Abstract: Results from working analog VLSI implementations of two different pulse stream neural network forms are reported. The circuits are rendered relatively invariant to processing variations, and the problem of cascadability of synapses to form large systems is addressed. A strategy for interchip communication of large numbers of neural states has been implemented in silicon and results are presented. The circuits demonstrated confront many of the issues that blight massively parallel analog systems, and offer solutions. >

...read moreread less

Journal Article•DOI•

Grain-size considerations for optoelectronic multistage interconnection networks.

[...]

Ashok V. Krishnamoorthy¹, Philippe J. Marchand¹, Fouad Kiamilev¹, Sadik C. Esener¹•Institutions (1)

University of California, San Diego¹

10 Sep 1992-Applied Optics

TL;DR: This paper investigates, at the system level, the performance-cost trade-off between optical and electronic interconnects in an optoelectronic interconnection network and indicates that system bandwidth can be increased, but at the price of reduced performance/cost.

...read moreread less

Abstract: This paper investigates, at the system level, the performance–cost trade-off between optical and electronic interconnects in an optoelectronic interconnection network. The specific system considered is a packet-switched, free-space optoelectronic shuffle-exchange multistage interconnection network (MIN). System bandwidth is used as the performance measure, while system area, system power, and system volume constitute the cost measures. A detailed design and analysis of a two-dimensional (2-D) optoelectronic shuffle-exchange routing network with variable grain size K is presented. The architecture permits the conventional 2 × 2 switches or grains to be generalized to larger K × K grain sizes by replacing optical interconnects with electronic wires without affecting the functionality of the system. Thus the system consists of logKN optoelectronic stages interconnected with free-space K-shuffles. When K = N, the MIN consists of a single electronic stage with optical input–output. The system design uses an efficient 2-D VLSI layout and a single diffractive optical element between stages to provide the 2-D K-shuffle interconnection. Results indicate that there is an optimum range of grain sizes that provides the best performance per cost. For the specific VLSI/GaAs multiple quantum well technology and system architecture considered, grain sizes larger than 256 × 256 result in a reduced performance, while grain sizes smaller than 16 × 16 have a high cost. For a network with 4096 channels, the useful range of grain sizes corresponds to approximately 250–400 electronic transistors per optical input–output channel. The effect of varying certain technology parameters such as the number of hologram phase levels, the modulator driving voltage, the minimum detectable power, and VLSI minimum feature size on the optimum grain-size system is studied. For instance, results show that using four phase levels for the interconnection hologram is a good compromise for the cost functions mentioned above. As VLSI minimum feature sizes decrease, the optimum grain size increases, whereas, if optical interconnect performance in terms of the detector power or modulator driving voltage requirements improves, the optimum grain size may be reduced. Finally, several architectural modifications to the system, such as K × K contention-free switches and sorting networks, are investigated and optimized for grain size. Results indicate that system bandwidth can be increased, but at the price of reduced performance/cost. The optoelectronic MIN architectures considered thus provide a broad range of performance/cost alternatives and offer a superior performance over purely electronic MIN’s.

...read moreread less

Proceedings Article•DOI•

CRIS: A test cultivation program for sequential VLSI circuits

[...]

Abraham

01 Jan 1992

TL;DR: In this paper, a test generation approach for combinational and sequential VLSI circuits described hierarchically at the transistor, gate, and higher levels is discussed, based on continuous mutation of a given input sequence and on analyzing the mutated vectors for selecting the test set.

...read moreread less

Abstract: An approach to cultivating a test for combinational and sequential VLSI circuits described hierarchically at the transistor, gate, and higher levels is discussed. The approach is based on continuous mutation of a given input sequence and on analyzing the mutated vectors for selecting the test set. The approach uses a hierarchical simulation technique in the analysis to drastically reduce the memory requirement, thus allowing the test generation for large VLSI circuits. The algorithms are at the switch level so that general MOS digital designs can be handled, and both stuck-at and transistor faults are handled accurately. The approach was implemented in a hierarchical test generation system, CRIS, that runs under UNIX on SPARC workstations. CRIS was used successfully to generate tests with high fault coverage for large combinational and sequential circuits. >

...read moreread less

Proceedings Article•

A Parallel Gradient Descent Method for Learning in Analog VLSI Neural Networks

[...]

Joshua Alspector, Ron Meir¹, B. Yuhas, Anthony Jayakumar, D. Lippe² - Show less +1 more•Institutions (2)

Technion – Israel Institute of Technology¹, Massachusetts Institute of Technology²

30 Nov 1992

TL;DR: A perturbation technique that measures, not calculates, the gradient, since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent.

...read moreread less

Abstract: Typical methods for gradient descent in neural network learning involve calculation of derivatives based on a detailed knowledge of the network model. This requires extensive, time consuming calculations for each pattern presentation and high precision that makes it difficult to implement in VLSI. We present here a perturbation technique that measures, not calculates, the gradient. Since the technique uses the actual network as a measuring device, errors in modeling neuron activation and synaptic weights do not cause errors in gradient descent. The method is parallel in nature and easy to implement in VLSI. We describe the theory of such an algorithm, an analysis of its domain of applicability, some simulations using it and an outline of a hardware implementation.

...read moreread less

Journal Article•DOI•

The efficient memory-based VLSI array designs for DFT and DCT

[...]

Jiun-In Guo¹, Chi-Min Liu¹, Chein-Wei Jen¹•Institutions (1)

National Chiao Tung University¹

01 Jan 1992-IEEE Transactions on Circuits and Systems Ii: Analog and Digital Signal Processing

TL;DR: Efficient memory-based VLSI arrays and a new design approach for the discrete Fourier transform (DFT) and discrete cosine transform (DCT) are presented.

...read moreread less

Abstract: Efficient memory-based VLSI arrays and a new design approach for the discrete Fourier transform (DFT) and discrete cosine transform (DCT) are presented. The DFT and DCT are formulated as cyclic convolution forms and mapped into linear arrays which characterize small numbers of I/O channels and low I/O bandwidth. Since the multipliers consume much hardware area, the designs utilize small ROMs and adders to implement the multiplications. Moreover, the ROM size can be reduced effectively by arranging the data in the designs appropriately. The arrays outperform others in the architectural topology (local and regular connection), computing speeds, hardware complexity, the number of I/O channels, and I/O bandwidth. They benefit from the advantages of both systolic array and the memory-based architectures. >

...read moreread less

Proceedings Article•DOI•

Design of delay insensitive circuits using multi-ring structures

[...]

Jens Sparsø¹, Jørgen Staunstrup¹, Michael Dantzer-Sørensen¹•Institutions (1)

University of Copenhagen¹

01 Nov 1992

TL;DR: The design and VLSI implementation of a delay insensitive circuit that computes the inner product of two vec.tors is described, based on an iterative serial-parallel multiplication algorithm.

...read moreread less

Abstract: The design and VLSI implementation of a delay insensitive circuit that computes the inner product of two vec.tors is described. The circuit is based on an iterative serial-parallel multiplication algorithm. The design is based on a data flow approach using pipelines and rings that are combined into larger multi ring structures by the joining and forking of signals. The implementation is based on a small set of building blocks (latches, combinational circuits and switches) that are composed of C-elements and simple gates. By following this approach, delay insensitive circuits with nontrivial functionality and reasonable performance are readily designed. >

...read moreread less

Journal Article•DOI•

Lneuro 1.0: a piece of hardware LEGO for building neural network systems

[...]

N. Mauduit¹, Marc Duranton², Jean Gobert², J.-A. Sirat²•Institutions (2)

University of California, San Diego¹, Philips²

01 May 1992-IEEE Transactions on Neural Networks

TL;DR: The architecture is scalable and flexible enough to be useful for simulating various kinds of networks and paradigms, and the speedup factor increases regularly with the number of clusters involved (to a factor of 80).

...read moreread less

Abstract: Neural network simulations on a parallel architecture are reported. The architecture is scalable and flexible enough to be useful for simulating various kinds of networks and paradigms. The computing device is based on an existing coarse-grain parallel framework (INMOS transputers), improved with finer-grain parallel abilities through VLSI chips, and is called the Lneuro 1.0 (for LEP neuromimetic) circuit. The modular architecture of the circuit makes it possible to build various kinds of boards to match the expected range of applications or to increase the power of the system by adding more hardware. The resulting machine remains reconfigurable to accommodate a specific problem to some extent. A small-scale machine has been realized using 16 Lneuros, to experimentally test the behavior of this architecture. Results are presented on an integer version of Kohonen feature maps. The speedup factor increases regularly with the number of clusters involved (to a factor of 80). Some ways to improve this family of neural network simulation machines are also investigated. >

...read moreread less

Journal Article•DOI•

Cellular neural networks: Theory and circuit design

[...]

Josef A. Nossek¹, Gerhard Seiler¹, Tamás Roska², Leon O. Chua³•Institutions (3)

Technische Universität München¹, Hungarian Academy of Sciences², University of California, Berkeley³

01 Sep 1992-International Journal of Circuit Theory and Applications

TL;DR: An abstract normalized definition of cellular neural networks with arbitrary interconnection topology is given and the property of convergence is found to be of central importance: large classes of convergent CNNs in practice always asymptotically approach some stable equilibrium where each component of the corresponding output is binary-valued.

...read moreread less

Abstract: Cellular neural networks or CNNs are a novel neural network architecture introduced by Chua and Yang which is very general and flexible, has some important properties desirable for design applications and can be efficiently implemented on custom hardware based on analogue VLSI technology. In this paper an abstract normalized definition of cellular neural networks with arbitrary interconnection topology is given. Instead of stability, the property of convergence is found to be of central importance: large classes of convergent CNNs in practice always asymptotically approach some stable equilibrium where each component of the corresponding output is binary-valued. A highly efficient CMOS-compatible CNN circuit architecture is then presented where a basic cell consists of only two fully differential op amps, two capacitors and several MOSFETs, while a variable interconnection weight is realized with only four MOSFETs. Since all these elements are standard components in the current analogue IC technology and since all network functions are implemented directly on the device level, this architecture promises high cell and interconnection densities and extremely high operating speeds.

...read moreread less

Journal Article•DOI•

Simulation and analysis of transient faults in digital circuits

[...]

F.L. Yang¹, Resve A. Saleh¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Mar 1992-IEEE Journal of Solid-state Circuits

TL;DR: In this paper, a simulation tool called DYNAMO has been developed to study the effect of transient faults in large digital circuits, which allows transient faults to be introduced in a circuit during a transient analysis so that its behavior can be observed and recorded.

...read moreread less

Abstract: To study the effect of transient faults in large digital circuits, a simulation tool called DYNAMO has been developed. It allows transient faults to be introduced in a circuit during a transient analysis so that its behavior can be observed and recorded. For efficiency, a dynamic mixed-mode simulation approach is employed whereby the representation of various portions of the circuit may switch between different levels of abstraction during the simulation, as dictated by the location of the transient fault and the resulting behavior of the circuit. Experiments have shown very encouraging results with significant speedups in CPU run times relative to the previous approach. The results of transient-fault simulation using the DYNAMO program on an avionic control microprocessor are also included. >

...read moreread less

Proceedings Article•DOI•

Zero skew clock net routing

[...]

Chao, Hsu, Ho

01 Jun 1992

Journal Article•DOI•

Automatic verification of sequential control systems using temporal logic

[...]

Il Moon¹, Gary J. Powers¹, Jerry R. Burch¹, Edmund M. Clarke¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 1992-Aiche Journal

TL;DR: A model-based verification method is developed and applied to validation of VLSI circuits and to reveal discrete event errors, the method is applied to a simple combustion system and an alarm acknowledge system.

...read moreread less

Abstract: Clarke et al. (1986) have developed a model-based verification method and have applied it to validation of VLSI circuits. We have used the method to test automatically the safety and operability of discrete chemical process control systems. The technique involves: (1) a “system model” describing the process and its software; (2) “assertions” in temporal logic expressing user-supplied questions about the system behavior with respect to safety and operability; and (3) a “model checker” that determines if the system model satisfies each of the assertions and provides a counterexample to locate the error if one exists. Temporal logic is used for reasoning about occurrence of events over time. To reveal discrete event errors, we have applied the verification method to a simple combustion system and an alarm acknowledge system.

...read moreread less

Journal Article•DOI•

Performance of synchronous and asynchronous schemes for VLSI systems

[...]

M. Afghahi¹, C. Svensson¹•Institutions (1)

Linköping University¹

01 Jul 1992-IEEE Transactions on Computers

TL;DR: System and delay models necessary for the study of time performances of synchronous and asynchronous systems are developed and a mode of clocking that reduces the clock skew substantially is proposed and examined.

...read moreread less

Abstract: Continuous advances in VLSI technology have made it possible to implement a system on a chip. One consequence of this is that the system will use a homogeneous technology for interconnections, gates, and synchronizers. Another consequence is that the system size and operation speed increase, which leads to increased problems with timing and synchronization. System and delay models necessary for the study of time performances of synchronous and asynchronous systems are developed. Clock skew is recognized as a key factor for the performance of synchronous systems. A mode of clocking that reduces the clock skew substantially is proposed and examined. Time penalty introduced by synchronizers is recognized as a key factor for the performance of asynchronous systems. This parameter is expressed in terms of system parameters. Different techniques and recommendations concerning performance improvement of synchronous and asynchronous systems are discussed. >

...read moreread less

Collapse