scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 1990"


Book
01 Jan 1990

1,840 citations


Book ChapterDOI
01 Mar 1990
TL;DR: Asynchronous techniques —that is, techniques that do not use clocks to implement sequencing— are currently attracting considerable interest for digital VLSI circuit design, in particular when the circuits produced are delay-insensitive (DI).
Abstract: Asynchronous techniques —that is, techniques that do not use clocks to implement sequencing— are currently attracting considerable interest for digital VLSI circuit design, in particular when the circuits produced are delay-insensitive (DI). A digital circuit is DI when its correct operation is independent of the delays in operators and in the wires connecting the operators, except that the delays are finite and positive.

418 citations


Journal ArticleDOI
TL;DR: Taguchi's off-line quality control methods for product and process improvement emphasize experiments to design quality "into" products and processes.
Abstract: Taguchi's off-line quality control methods for product and process improvement emphasize experiments to design quality "into" products and processes. In Very Large Scale Integrated (VLSI) circuit design, the application of interest here, compu..

380 citations


Proceedings ArticleDOI
26 Feb 1990
TL;DR: The MasPar MP-1 architecture is described, a massively parallel SIMD machine with the following key characteristics: scalable architecture in terms of the number of processing elements, system memory, and system communication bandwidth.
Abstract: The MasPar MP-1 architecture is described. It is a massively parallel SIMD machine with the following key characteristics: scalable architecture in terms of the number of processing elements, system memory, and system communication bandwidth; reduced-instruction-set-computer-like instruction set design which leverages optimizing compiler technology; adherence to industry-standard floating point formats, specifically VAX and IEEE floating point; and an architectural design amenable to a VLSI implementation. The architecture provides not only high computational capability, but also a mesh and global interconnect style of communication. The computational model and subsystems of the MP-1, including the interconnection mechanisms, are described. >

305 citations


Journal ArticleDOI
TL;DR: A general-purpose fuzzy logic inference engine for real-time control applications, designed and fabricated in a 1.1- mu m, 3.3-V, double-level-metal CMOS technology, is discussed.
Abstract: A general-purpose fuzzy logic inference engine for real-time control applications, designed and fabricated in a 1.1- mu m, 3.3-V, double-level-metal CMOS technology, is discussed. Up to 102 rules are processed in parallel with a single 688 K transistor device. Features include a dynamically reconfigurable and cascadable architecture, TTL-compatible host interface, laser-programmable redundancy, a special mode for testability, RAM rule storage, and on-chip fuzzification and defuzzification. >

294 citations


Proceedings ArticleDOI
26 Feb 1990
TL;DR: The design and implementation of the MasPar MP-1 are described, a general-purpose massively parallel computer system that achieves peak computation rates beyond a billion floating point operations per second yet is priced like a minicomputer.
Abstract: The design and implementation of the MasPar MP-1 are described It is a general-purpose massively parallel computer system that achieves peak computation rates beyond a billion floating point operations per second yet is priced like a minicomputer The architecture of the MP-1 is scalable in a way that permits its computational power to be increased along two axes: the performance of each processor and the number of processors This feasibility is well matched to VLSI technology where circuit densities continue to increase at a rapid rate The scalable nature of massively parallel systems protects the customers' software investment while providing an increasing performance in successive products The array control unit, processor array, processor elements, memory, X-Net mesh interconnect, and multistage crossbar interconnect are discussed >

225 citations


Journal ArticleDOI
01 Feb 1990
TL;DR: The current status of VLSI layout and directions for future research are addressed, and the field of computational geometry and its application to layout-in particular, to gridless routing and compaction-are reviewed, and layout engines are considered.
Abstract: The current status of VLSI layout and directions for future research are addressed, with emphasis on the authors' own work Necessary terminology and definitions and, whenever possible, a precise formulation of the problems are provided Placement and floorplanning for both the sea-of-gates and building-block designs are examined The former emphasizes the connectivity specification, whereas the latter must also consider module shape and size Global routing based on a method of successive cuts on a chip is discussed This is a hierarchical top-down approach that is useful for both of the above designs A two-dimensional detailed routing problem and the rip-up and rerouting problem are also discussed The field of computational geometry and its application to layout-in particular, to gridless routing and compaction-are reviewed, and layout engines are considered >

225 citations


Patent
30 Jan 1990
TL;DR: The data-flow architecture and software environment for high-performance signal and data procesing as mentioned in this paper is based on the Data-Flow processor, which is a three-dimensional bussed packet routing network.
Abstract: A data-flow architecture and software environment for high-performance signal and data procesing. The programming environment allows applications coding in a functional high-level language 20 which a compiler 30 converts to a data-flow graph form 40 which a global allocator 50 then automatically partitions and distributes to multiple processing elements 80, or in the case of smaller problems, coding in a data-flow graph assembly language so that an assembler 15 operates directly on an input data-flow graph file 13 and produces an output which is then sent to a local allocator 17 for partitioning and distribution. In the former case a data-flow processor description file 45 is read into the global allocator 50, and in the latter case a data-flow processor description file 14 is read into the assembler 15. The data-flow processor 70 consists of multiple processing elements 80 connected in a three-dimensional bussed packet routing network. Data enters and leaves the processor 70 via input/output devices 90 connected to the processor. The processing elements are designed for implementation in VLSI (Very large scale integration) to provide realtime processing with very large throughput. The modular nature of the computer allows adding more processing elements to meet a range of throughout and reliability requirements. Simulation results have demonstrated high-performance operation, with over 64 million operations per second being attainable using only 64 processing elements.

209 citations


Journal ArticleDOI
TL;DR: A current-estimation approach to support the analysis of electromigration (EM) failures in power supply and ground buses of CMOS VLSI circuits is discussed and has shown excellent accuracy and dramatic speedups compared with traditional approaches.
Abstract: A current-estimation approach to support the analysis of electromigration (EM) failures in power supply and ground buses of CMOS VLSI circuits is discussed. It uses the original concept of probabilistic simulation to efficiently generate accurate estimates of the expected current waveform required for electromigration analysis. Thus, the approach is pattern-independent and relieves the designer of the tedious task of specifying logical input waveforms. This approach has been implemented in the program CREST (current estimator) which has shown excellent accuracy and dramatic speedups compared with traditional approaches. The approach and its implementation are described, and the results of numerous CREST runs on real circuits are presented. >

202 citations


Proceedings ArticleDOI
01 May 1990
TL;DR: It is argued analog circuits will remain irreplaceable for the implementation of the interface circuitry between digital processing and the external world but most of their characteristics can be improved with scaled-down processes.
Abstract: It is argued analog circuits will remain irreplaceable for the implementation of the interface circuitry between digital processing and the external world. Most of their characteristics can be improved with scaled-down processes. Analog circuits will also compete with digital and will remain advantageous for low precision signal processing. Since little precision is required to carry out many cognitive tasks, analog VLSI (very-large-scale integration) will dominate in the implementation of many types of neural networks. >

200 citations


Journal ArticleDOI
TL;DR: A functional-level concurrent error-detection scheme is presented for such VLSI signal processing architectures as those proposed for the FFT and QR factorization, and it is shown that the error coverage is high with large word sizes.
Abstract: The increasing demands for high-performance signal processing along with the availability of inexpensive high-performance processors have results in numerous proposals for special-purpose array processors for signal processing applications. A functional-level concurrent error-detection scheme is presented for such VLSI signal processing architectures as those proposed for the FFT and QR factorization. Some basic properties involved in such computations are used to check the correctness of the computed output values. This fault-detection scheme is shown to be applicable to a class of problems rather than a particular problem, unlike the earlier algorithm-based error-detection techniques. The effects of roundoff/truncation errors due to finite-precision arithmetic are evaluated. It is shown that the error coverage is high with large word sizes. >

Book
03 Jan 1990
TL;DR: In this article, the authors describe a complementary metal-oxide-semiconductor (CMOS) very-large-scale integrated (VLSI) circuit implementing a connectionist neural-network model.
Abstract: The authors describe a complementary metal-oxide-semiconductor (CMOS) very-large-scale integrated (VLSI) circuit implementing a connectionist neural-network model. It consists of an array of 54 simple processors fully interconnected with a programmable connection matrix. This experimental design tests the behavior of a large network of processors integrated on a chip. The circuit can be operated in several different configurations by programming the interconnections between the processors. Tests made with the circuit working as an associative memory and as a pattern classifier were so encouraging that the chip has been interfaced to a minicomputer and is being used as a coprocessor in pattern-recognition experiments. This mode of operation is making it possible to test the chip's behavior in a real application and study how pattern-recognition algorithms can be mapped in such a network. >

Journal ArticleDOI
01 Feb 1990
TL;DR: In this article, the manufacturing-oriented component of the CAD of VLSI circuits is discussed, and a number of issues and design problems relevant to achieving a high level of IC manufacturability are examined.
Abstract: It is noted that the nominal design created by CAD tools must often be modified to maximize manufacturing yield. Such maximization must be performed during the design to achieve an acceptable level of initial manufacturing yield and during fabrication to achieve the maximum rate of yield improvement in the entire product development cycle. The manufacturing-oriented component of the CAD of VLSI circuits is discussed. The concept of design for manufacturability is explained, and a number of issues and design problems relevant to achieving a high level of IC manufacturability are examined. An overview of needed and existing CAD tools that can be used to solve previously listed problems is presented. >

Proceedings ArticleDOI
16 Apr 1990
TL;DR: The modified comparison rule is found to produce a more efficient ACS architecture than previous results based on subtraction, and an efficient VLSI design of ACS units based on this technique is discussed.
Abstract: In the realization of Viterbi decoders with finite precision arithmetic, the values of the survivor metrics computed by the add-compare-select (ACS) recursion must remain within a finite numerical range to avoid catastrophic overflow (or underflow) situations. The authors compare several metric normalization techniques which are suitable for VLSI implementations with fixed-point arithmetic. The modulo normalization technique is found to be the most local and uniform approach. An efficient VLSI design of ACS units based on this technique is discussed. The modified comparison rule is found to produce a more efficient ACS architecture than previous results based on subtraction. >

Journal ArticleDOI
TL;DR: A technique called micro rollback, which allows most of the performance penalty for concurrent error detection to be eliminated, is presented, and several critical issues related to its use in a complete system are discussed.
Abstract: A technique called micro rollback, which allows most of the performance penalty for concurrent error detection to be eliminated, is presented. Detection is performed in parallel with the transmission of information between modules, thus removing the delay for detection from the critical path. Erroneous information may thus reach its destination module several clock cycles before an error indication. Operations performed on this erroneous information are undone using a hardware mechanism for fast rollback of a few cycles. The implementation of a VLSI processor capable of micro rollback is discussed, as well as several critical issues related to its use in a complete system. >

Proceedings ArticleDOI
11 Nov 1990
TL;DR: A fast heuristic algorithm based on a simple, local criterion is proposed that is able to prove that for highly structured circuits the clusters found by this algorithm correspond with high probability to the 'natural' clusters.
Abstract: Circuit partitioning plays a fundamental role in hierarchical layout systems. Identifying the strongly connected subcircuits, the clusters, of the logic can significantly reduce the delay of the circuit and the total interconnection length. Finding such a cluster partition however, is NP-complete. The authors propose a fast heuristic algorithm based on a simple, local criterion. They are able to prove that for highly structured circuits the clusters found by this algorithm correspond with high probability to the 'natural' clusters. An application to large scale real world circuits shows that by this method the number of nets cut is reduced by up to 46% compared to the standard mincut approach. >

Proceedings ArticleDOI
W. Swartz1, Carl Sechen1
01 Jan 1990
TL;DR: Novel algorithms are described for timing driven placement and routing of rectilinearly shaped macro cells and a negative feedback scheme is described that optimizes the relative weighting between the primary objective term and the penalty function terms in the cost function.
Abstract: Novel algorithms are described for timing driven placement and routing of rectilinearly shaped macro cells. Algorithms are also presented for the implementation of simulated annealing, based on a theoretically derived statistical annealing schedule. A negative feedback scheme is described that optimizes the relative weighting between the primary objective term and the penalty function terms in the cost function. A placement refinement method has been developed for rectilinear cells which spaces the cells at a density which avoids the need for post-routing compaction. In addition, a detailed routing method has been developed which avoids the classically difficult problem of defining channels for detailed routing. The result for the ami33 benchmark circuit is better than the previously published results. >

Journal ArticleDOI
TL;DR: A neural network implementation that uses MOSFET analog multipliers to construct weighted sums is described, which permits asynchronous analog operation of Hopfield-style networks with fully programmable digital weights.
Abstract: A neural network implementation that uses MOSFET analog multipliers to construct weighted sums is described. The scheme permits asynchronous analog operation of Hopfield-style networks with fully programmable digital weights. This approach avoids the use of components that waste chip area of require special processing. Two small chips have been fabricated and tested-one implementing a fully connected (recursive) network and the other containing isolated portions of a neuron. The fully connected network chip successfully solves simple graph partitioning problems, in confirmation of network simulations performed using an analytic model of the analog neuron. This result verifies the operation of the complete network, including common-mode biasing circuits and connection weight data paths. A direct scaling of this chip would allow the complete integration of 81-neuron fully connected networks with 6-b plus sign connection weights using 1.25- mu m design rules on a 1-cm die. >

Journal ArticleDOI
TL;DR: Yield models for predicting the yield of chips with redundancy are introduced, and the optimal amount of redundancy is determined.
Abstract: The defects that can occur when manufacturing VLSI ICs and the faults that can result are described. Some commonly used restructuring techniques for avoiding defective components are discussed. Several defect-tolerant designs of memory ICs, logic ICs, and wafer-scale circuits are presented. Yield models for predicting the yield of chips with redundancy are introduced, and the optimal amount of redundancy is determined. >

Proceedings ArticleDOI
17 Jun 1990
TL;DR: The key innovation is the representation of neuron activations and synaptic weights as stochastic functions of time, leading to efficient implementations of the synapses, and a flexible architecture that permits the realization of arbitrary network topologies and dimensions.
Abstract: An approach to solving the two most serious shortcomings of previous artificial neural network implementations is discussed. A flexible architecture that permits the realization of arbitrary network topologies and dimensions is presented. Furthermore, the performance of this architecture is independent of the size of the network and permits the processing of typically 100000 patterns per second. The key innovation is the representation of neuron activations and synaptic weights as stochastic functions of time, leading to efficient implementations of the synapses. High densities of synapses per silicon area, exceeding even analog implementations, have been achieved. Finally, the neuron activations are represented digitally, as are the synaptic computations, thereby permitting fabrication of digital neural network architectures using a variety of standard, low-cost semiconductor processes. A pair of general-purpose chips (SU3232 and NU32) that permit post facto construction of neural networks of arbitrary topology and virtually unlimited dimensions is presented

Journal ArticleDOI
TL;DR: A report is presented on a multiplication scheme (left-to-right, carry-free, LRCF) that performs the multiplication most-significant bit first and produces a conventional sign-and-magnitude product by means of an on-the-fly conversion.
Abstract: Conventional schemes for fast multiplication accumulate the partial products in redundant form (carry-save or signed-digit) and convert the result to conventional representation in the last step. This step requires a carry-propagate adder which is comparatively slow and occupies a significant area of the chip in a VLSI implementation. A report is presented on a multiplication scheme (left-to-right, carry-free, LRCF) that does not require this carry-propagate step. The LRCF scheme performs the multiplication most-significant bit first and produces a conventional sign-and-magnitude product (most significant n bits) by means of an on-the-fly conversion. The resulting implementation is fast and regular and is very well suited for VLSI. The LRCF scheme for general radix r and a radix-4 signed-digit implementation are presented. >

Journal ArticleDOI
TL;DR: A single-phase clocking scheme is described that provides a structure that can contain all components of a digital VLSI system, including static, dynamic, and precharged logic as well as memories and PLAs.
Abstract: Two of the main consequences of advances in VLSI technologies are increased cost of design and wiring. In CMOS synchronous systems, this cost is partly due to tedious synchronization of different clock phases and routing of these clock signals. Here, a single-phase clocking scheme that makes the design very compact and simple is described. It is shown that this scheme is general, simple, and safe. It provides a structure that can contain all components of a digital VLSI system, including static, dynamic, and precharged logic as well as memories and PLAs. Clock and data signals are presented in a clean way that makes VLSI circuits and systems well suited for design compilation. >


Journal ArticleDOI
TL;DR: A general-purpose median filter unit configuration in the form of two single-chip median filters, one extensible and one real time, is described, along with some possible applications.
Abstract: A general-purpose median filter unit configuration in the form of two single-chip median filters, one extensible and one real time, is described. The networks of the chips are pipelined and systolic at bit level and based on odd/even transposition sorting. The chips are implemented in 3- mu m standard CMOS using full-custom VLSI design techniques. The exact median of elements, in a window size w=9 with arbitrary word length L, can be found using only one extensible median filter chip. The filter can be extended to arbitrary window size and word lengths by using many chips. Simulation results show that the extensible median filter chip can be clocked up to 40 MHz and can generate 30/L megamedians per second. The real-time median filter chip can find the exact running medians of elements in a window of a fixed size w=9 with L=8. Simulations show that it can generate up to 50 megamedians per second with a 50-MHz clock. The algorithms, VLSI implementations, and chip test results are presented, along with some possible applications. >

Journal ArticleDOI
TL;DR: Two algorithms are presented for computing the discrete cosine transform (DCT) on existing VLSI structures and a new prime factor DCT algorithm is presented for the class of DCTs of length N=N/ sub 1/*N/sub 2/, where N/sub 1/ and N/ sub 2/ are relatively prime and odd numbers.
Abstract: Two algorithms are presented for computing the discrete cosine transform (DCT) on existing VLSI structures. First, it is shown that the N-point DCT can be implemented on the existing systolic architecture for the N-point discrete Fourier transform (DFT) by introducing some modifications. Second, a new prime factor DCT algorithm is presented for the class of DCTs of length N=N/sub 1/*N/sub 2/, where N/sub 1/ and N/sub 2/ are relatively prime and odd numbers. It is shown that the proposed algorithm can be implemented on the already existing VLSI structures for prime factor DFT. The number of multipliers required is comparable to that required for the other fast DCT algorithms. It is shown that the discrete sine transform (DST) can be computed by the same structure. >

24 Jan 1990
TL;DR: The existence of decision algorithms with low-degree polynomial running times for a number of well-studied graph layout, placement, and routing problems is nonconstructively proved using the recent Robertson–Seymour theorems on the well-partial-ordering of graphs.
Abstract: We nonconstructively prove the existence of decision algorithms with low-degree polynomial running times for a number of well-studied graph layout, placement, and routing problems. Some were not previously known to be in p at all; others were only known to be in p by way of brute force or dynamic programming formulations with unboundedly high-degree polynomial running times. Our methods include the application of the recent Robertson-Seymour theorems on the well-partial-ordering of graphs under both the minor and immersion orders. We also briefly address the complexity of search versions of these problems.

Journal ArticleDOI
TL;DR: A polynomial time algorithm is developed for finding feasible reconfigurations in an augmented single-track model and in array grid models with multiple-track switches and it is shown that the set of conditions in the reconfigurability theorem is not necessary.
Abstract: The issue of developing efficient algorithms for reconfiguring processor arrays in the presence of faulty processors and fixed hardware resources is discussed The models discussed consist of a set of identical processors embedded in a flexible interconnection structure that is configured in the form of a rectangular grid An array grid model based on single-track switches is considered An efficient polynomial time algorithm is proposed for determining feasible reconfigurations for an array with a given distribution of faulty processors In the process, it is shown that the set of conditions in the reconfigurability theorem is not necessary A polynomial time algorithm is developed for finding feasible reconfigurations in an augmented single-track model and in array grid models with multiple-track switches >

Journal ArticleDOI
01 Apr 1990
TL;DR: Switched Currents as mentioned in this paper is a new analogue sampled-data signal processing technique that can be implemented without the need for linear floating capacitors, which enables signal processors to be integrated in a standard digital VLSI process.
Abstract: ‘Switched Currents’ is a new analogue sampled-data signal processing technique that can be implemented without the need for linear floating capacitors. This feature enables signal processors to be integrated in a standard digital VLSI process (CMOS), making the technique ideally suited for mixed analogue/digital ICs. Switched current cells are described, and a configuration for a universal integrator is developed that is exactly equivalent to a switched capacitor counterpart. With this integrator, complete switched current filters may be constructed from switched capacitor prototypes. This demonstrated by the ‘exact’ design and simulation of a sixth order lowpass filter with a cutoff frequency of 5MHz.