scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 1985"


Journal ArticleDOI
TL;DR: A general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives is provided in this article, where a broad range of application domains including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing.
Abstract: High speed signal processing depends critically on parallel processor technology. In most applications, general-purpose parallel computers cannot offer satisfactory real-time processing speed due to severe system overhead. Therefore, for real-time digital signal processing (DSP) systems, special-purpose array processors have become the only appealing alternative. In designing or using such array Processors, most signal processing algorithms share the critical attributes of regularity, recursiveness, and local communication. These properties are effectively exploited in innovative systolic and wavefront array processors. These arrays maximize the strength of very large scale integration (VLSI) in terms of intensive and pipelined computing, and yet circumvent its main limitation on communication. The application domain of such array processors covers a very broad range, including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing, This article provides a general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives.

1,633 citations


Journal ArticleDOI
TL;DR: Cosmic Cube as discussed by the authors is a hardware simulation of a future VLSI implementation that will consist of single-chip nodes, which offers high degrees of concurrency in applications and suggests that future machines with thousands of nodes are both feasible and attractive.
Abstract: Sixty-four small computers are connected by a network of point-to-point communication channels in the plan of a binary 6-cube. This “Cosmic Cube” computer is a hardware simulation of a future VLSI implementation that will consist of single-chip nodes. The machine offers high degrees of concurrency in applications and suggests that future machines with thousands of nodes are both feasible and attractive.

1,232 citations


Journal ArticleDOI
TL;DR: In this article, the authors presented a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer, and proved that a fat-tree of a given size is nearly the best routing network of that size.
Abstract: The author presents a new class of universal routing networks, called fat-trees, which might be used to interconnect the processors of a general-purpose parallel supercomputer. A fat-tree routing network is parameterized not only in the number of processors, but also in the amount of simultaneous communication it can support. Since communication can be scaled independently from the number of processors, substantial hardware can be saved for such applications as finite-element analysis without resorting to a special-purpose architecture. It is proved that a fat-tree of a given size is nearly the best routing network of that size. This universality theorem is established using a three-dimensional VLSI model that incorporates wiring as a direct cost. In this model, hardware size is measured as physical volume. It is proved that for any given amount of communications hardware, a fat-tree built from that amount of hardware can stimulate every other network built from the same amount of hardware, using only slightly more time (a polylogarithmic factor greater).

1,147 citations


Journal ArticleDOI
A.E. Dunlop1, B.W. Kernighan2
TL;DR: A method of automatic placement for standard cells (polycells) that yields areas within 10-20 percent of careful hand placements is described, based on graph partitioning to identify groups of modules that ought to be close to each other.
Abstract: This paper describes a method of automatic placement for standard cells (polycells) that yields areas within 10-20 percent of careful hand placements. The method is based on graph partitioning to identify groups of modules that ought to be close to each other, and a technique for properly accounting for external connections at each level of partitioning. The placement procedure is in production use as part of an automated design system; it has been used in the design of more than 40 chips, in CMOS, NMOS, and bipolar technologies.

493 citations



Journal ArticleDOI
TL;DR: In this article, a model for interconnection time delay is developed that includes the effects of scaling transistor, interconnection, and chip dimensions, and propagation delays of aluminum, WSi 2, and polysilicon lines are compared.
Abstract: The propagation delay of interconnection lines is a major factor in determining the performance Of VLSI circuits because the RC time delay of these lines increases rapidly as chip size is increased and cross-sectional interconnection dimensions are reduced. In this paper, a model for interconnection time delay is developed that includes the effects of scaling transistor, interconnection, and chip dimensions. The delays of aluminum, WSi 2 , and polysilicon lines are compared, and propagation delays in future VLSI circuits are projected. Properly scaled multilevel conductors, repeaters, cascaded drivers, and cascaded driver/ repeater combinations are investigated as potential methods for reducing propagation delay. The model yields optimal cross-sectional interconnection dimensions and driver/repeater configurations that can lower propagation delays by more than an order of magnitude in MOSFET circuits.

400 citations


Journal ArticleDOI
TL;DR: In this article, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m) with the simple squaring property of the normal basis representation used together with this multiplier.
Abstract: Finite field arithmetic logic is central in the implementation of Reed-Solomon coders and in some cryptographic algorithms. There is a need for good multiplication and inversion algorithms that can be easily realized on VLSI chips. Massey and Omura [1] recently developed a new multiplication algorithm for Galois fields based on a normal basis representation. In this paper, a pipeline structure is developed to realize the Massey-Omura multiplier in the finite field GF(2m). With the simple squaring property of the normal basis representation used together with this multiplier, a pipeline architecture is also developed for computing inverse elements in GF(2m). The designs developed for the Massey-Omura multiplier and the computation of inverse elements are regular, simple, expandable, and therefore, naturally suitable for VLSI implementation.

373 citations


Journal ArticleDOI
TL;DR: The proposed algorithm for optimal state assignment is based on an innovative strategy: logic minimization of the combinational component of the finite state machine is applied before state encoding, and has been coded in a computer program, KISS, and tested on several examples of finite state machines.
Abstract: Computer-Aided synthesis of sequential functions of VLSI systems, such as microprocessor control units, must include design optimization procedures to yield area-effective circuits. We model sequential functions as deterministic synchronous Finite State Machines (FSM's), and we consider a regular and structured implementation by means of Programmable Logic Arrays (PLA's) and feedback registers. State assignment, i.e., binary encoding of the internal states of the finite state machine, affects substantially the silicon area taken by such an implementation. Several state assignment techniques have been proposed in the past. However, to the best of our knowledge, no Computer-Aided Design tool is in use today for an efficient encoding of control logic. We propose an algorithm for optimal state assignment. Optimal state assignment is based on an innovative strategy: logic minimization of the combinational component of the finite state machine is applied before state encoding. Logic minimization is performed on a symbolic (code independent) description of the finite state machine. The minimal symbolic representation defines the constraints of a new encoding problem, whose solutions are the state assignments that allow the implementation of the PLA with at most as many product-terms as the cardinality of the minimal symbolic representation. In this class, an optimal encoding is one of minimal length satisfying these constraints. A heuristic algorithm constructs a solution to the constrained encoding problem. The algorithm has been coded in a computer program, KISS, and tested on several examples of finite state machines. Experimental results have shown that the method is an effective tool for designing finite state machines.

340 citations


Journal ArticleDOI
TL;DR: Although the underlying network problems are NP-complete, it is proved that the procedures are reliable by assuming a probabilistic model of cell failure, thus minimizing the communication time between cells.
Abstract: VLSI technologists are fast developing wafer-scale integration. Rather than partitioning a silicon wafer into chips as is usually done, the idea behind wafer-scale integration is to assemble an entire system (or network of chips) on a single wafer, thus avoiding the costs and performance loss associated with individual packaging of chips. A major problem with assembling a large system of microprocessors on a single wafer, however, is that some of the processors, or cells, on the wafer are likely to be defective. In the paper, we describe practical procedures for integrating "around" such faults. The procedures are designed to minimize the length of the longest wire in the system, thus minimizing the communication time between cells. Although the underlying network problems are NP-complete, we prove that the procedures are reliable by assuming a probabilistic model of cell failure. We also discuss applications of the work to problems in VLSI layout theory, graph theory, fault-tolerant systems, planar geometry, and the probabilistic analysis of algorithms.

268 citations


Journal ArticleDOI
TL;DR: An overview of the physical principles and numerical methods used to solve the coupled system of non-linear partial differential equations that model the transient behavior of silicon VLSI device structures and a simple data structure for nonsymmetric matrices with symmetric nonzero structures is presented.
Abstract: In this paper, we present an overview of the physical principles and numerical methods used to solve the coupled system of nonlinear partial differential equations that model the transient behavior of silicon VLSI device structures. We also describe how the same techniques are applicable to circuit simulation. A composite linear multistep formula is introduced as the time-integration scheme. Newton-iterative methods are exploited to solve the nonlinear equations that arise at each time step. We also present a simple data structure for nonsymmetric matrices with symmetric nonzero structures that facilitates iterative or direct methods with substantial efficiency gains over other storage schemes. Several computational examples, including a CMOS latchup problem, are presented and discussed.

265 citations


Journal ArticleDOI
TL;DR: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes, using a modified Euclidean algorithm for computing the error-locator polynomial.
Abstract: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes. An important ingredient of this design is a modified Euclidean algorithm for computing the error-locator polynomial. The computation of inverse field elements is completely avoided in this modification of Euclid's algorithm. The new decoder is regular and simple, and naturally suitable for VLSI implementation. An example illustrating both the pipeline and systolic array aspects of this decoder structure is given for a (15,9) RS code.

Journal ArticleDOI
TL;DR: The delay modeler executes 10 000 times as fast as SPICE, yet produces delay estimates that are typically within 10 percent of SPICE for digital circuits.
Abstract: Crystal is a timing verification program for digital nMOS and CMOS circuits. Using the circuit extracted from a mask set, the program determines the length of each clock phase and pinpoints the longest paths. Crystal can process circuits with about 40 000 transistors in about 20-30 min of VAX-11/780 CPU time. The program uses a switch-level approach in which the circuit is decomposed into chains of switches called stages. A depth-first search, with pruning, is used to trace out stages and locate the critical paths. Bidirectional pass transistor arrays are handled by having the designer tag such structures with flow control information, which is used by Crystal to avoid endless searches. Delays are computed on a stage-by-stage basis, using a simple resistor-switch model based on rise-time ratios (a measure of how fully turned-on the transistors in the stage are). The delay modeler executes 10 000 times as fast as SPICE, yet produces delay estimates that are typically within 10 percent of SPICE for digital circuits.

Journal ArticleDOI
TL;DR: This article describes efforts to build a knowledge-based expert system for designing testable VLSI chips and introduces a framework for a methodology incorporating structural, behavioral, qualitative, and quantitative aspects of known DFT techniques.
Abstract: The complexity of VLSI circuits has increased the need for design for testability (DFT). Numerous techniques for designing more easily tested circuits have evolved over the years, with particular emphasis on built-in testing approaches. What has not evolved is a design methodology for evaluating and making choices among the numerous existing approaches. This article describes efforts to build a knowledge-based expert system for designing testable VLSI chips. A framework for a methodology incorporating structural, behavioral, qualitative, and quantitative aspects of known DFT techniques is introduced. This methodology provides a designer with a systematic DFT synthesis approach. The process of partitioning a design into subcircuits for individual processing is described and a new concept?I-path?is used to transfer data from one place in the circult to another. Rules for applying testable design methodologies to circuit partitions and for evaluating the various solutions obtained are also presented. Finally, a case study using a prototype system is described.

Journal ArticleDOI
TL;DR: In this paper, a design methodology was developed that yields devices which have low threshold voltage, high drive current, low leakage current, tight parameteric control, and reduced topology, while requiring no nonstandard materials, processes, and tools.
Abstract: Building on nearly two decades of reported results for MOSFET's fabricated in small-grain polycrystalline silicon, a design methodology is developed that yields devices which have low threshold voltage, high drive current, low leakage current, tight parameteric control, and reduced topology, while requiring no nonstandard materials, processes, and tools. Design criteria and device performance are discussed, grain boundary characterization techniques are described, technological issues pertinent to VLSI implementation are investigated, and long-term device reliability is studied. The potential applications of the polysilicon MOSFET's in high-density dRAM and sRAM are explored. The successful implementation of an experimental stacked CMOS 64K static RAM proves the utility of these devices for three-dimensional integration in a VLSI environment.

Journal ArticleDOI
TL;DR: Optimization of geometrical design rules, evaluation of VLSI IC artwork, and maximization of the wafer yield are discussed as examples illustrating applications and advantages of the proposed modeling technique.
Abstract: In this paper, a modeling technique, describing IC manufacturing yield losses in terms of parameters characterizing lithography related point defects and line registration errors, is presented. Optimization of geometrical design rules, evaluation of VLSI IC artwork, and maximization of the wafer yield are discussed as examples illustrating applications and advantages of the proposed modeling technique.

Book
01 Jan 1985
TL;DR: This dissertation shows that the recent trend in computer architecture towards instruction sets of increasing complexity leads to inefficient use of scarce resources and investigates the alternative of Reduced Instruction Set Computer (RISC) architectures which allow effective use of on-chip transistors in functional units that provide fast access to frequently used operands and instructions.
Abstract: The Reduced Instruction Set Computer (RISC) concept is an important new way of optimizing computer architecture. This book demonstrates the practicality of the RISC approach. Integrated circuits offer compact and low-cost implementation of digital systems, and provide performance gains through their high-bandwidth on-chip communication. In a single chip microcomputer, however, the implementation trade-offs are different from those in traditional broad-based main frame computers. Because the total silicon area and the amount of allowable power dissipation are strictly limited, extra resources added to speed up some function of the chip will typically slow down other operations. This work demonstrates that the recent trend in computer architecture toward the use of increasingly complex instruction sets leads to the inefficient use of those scarce resources. Reduced Instruction Set Computer architectures offer an alternative by allowing for the effective use of on-chip transistors in functional units that provide fast access to frequently used operands and instructions.Contents: Introduction (the RISC Concept, Effective Use of Hardware Resources, Evolution of the Berkeley RISC Project); The Nature of General Purpose Computations; The RISC I and 11 Architecture and Pipeline; The RISC II Design and Layout; Debugging and Testing RISC II; Additional Hardware Support for General-Purpose Computations; Conclusions; Appendix A: Detailed Description of the RISC 11 Architecture.Manolis G. H. Katevenis received his doctorate from the University of California at Berkeley. He is currently Assistant Professor of Computer Science at Stanford University. "Reduced Instruction Set Computer Architectures for VLSI" is the winner of the 1984 Doctoral Dissertation Award.

Proceedings ArticleDOI
01 Dec 1985
TL;DR: These algorithms subsume most of the polynomial-time algorithms in the literature for planar routing and routability testing in the rectilinear grid model and provide an explicit construction of a database to support computation involving the layout topology.
Abstract: This paper studies the problem of routing wires in a grid among features on one layer of a VLSI chip, when a sketch of the layer is given. A sketch specifies the positions of features and the topology of the interconnecting wires. We give polynomial-time algorithms that (1) determine the routability of a sketch, and (2) produce a routing of a sketch that optimizes both individual and total wire length. These algorithms subsume most of the polynomial-time algorithms in the literature for planar routing and routability testing in the rectilinear grid model. We also provide an explicit construction of a database, called the rubber-band equivalent, to support computation involving the layout topology.

Journal ArticleDOI
TL;DR: An overview of the physical principles and numerical methods used to solve the coupled system of non-linear partial differential equations that model the transient behavior of silicon VLSI device structures and a simple data structure for nonsymmetric matrices with symmetric nonzero structures is presented.
Abstract: In this paper, we present an overview of the physical principles and numerical methods used to solve the coupled system of non-linear partial differential equations that model the transient behavior of silicon VLSI device structures. We also describe how the same techniques are applicable to circuit simulation. A composite linear multistep formula is introduced as the time-integration scheme. Newton-iterative methods are exploited to solve the nonlinear equations that arise at each time step. We also present a simple data structure for nonsymmetric matrices with symmetric nonzero structures that facilitates iterative or direct methods with substantial efficiency gains over other storage schemes. Several computational examples, including a CMOS latchup problem, are presented and discussed.

Book
01 Jan 1985
TL;DR: A review of microelectronics and an introduction to MOS technology basic electrical properties of MOS and BiCMOS circuit design processes basic circuit concepts scaling and illustration of the design process.
Abstract: A review of microelectronics and an introduction to MOS technology basic electrical properties of MOS and BiCMOS circuit design processes basic circuit concepts scaling of MOS circuits subsystem design and layout subsystems design processes illustration of the design process - computational elements memory, registers and aspects of system timing practical aspects and testability some CMOS design projects ultra fast VLSI circuits and systems - introduction to GaAs technology.

Proceedings ArticleDOI
Michael Burstein1, Mary N. Youssef1
01 Jun 1985
TL;DR: This work presents a new approach to the automatic layout design for VLSI chips which incorporates timing information to influence the placement and wiring processes, and adds a third phase of timing to the hierarchy without affecting the computational complexity of the basic algorithm.
Abstract: We present a new approach to the automatic layout design for VLSI chips which incorporates timing information to influence the placement and wiring processes. This approach is an extension of the hierarchical layout method, in which placement and wiring are performed simultaneously [1]. We add a third phase of timing to the hierarchy, without affecting the computational complexity of the basic algorithm. Prior to the physical design, timing analysis is performed using statistical estimates for the unknown parameters; namely the lengths of interconnecting wires. The output of this analysis includes a measure for each net that indicates the degree of its contribution to the timing problem. This set of measures is used to bias the placement at the highest level of the hierarchy. Since wiring is performed after each level of partitioning, lengths of interconnecting nets among the partitions become available. These data are used to update the timing information that bias the design. Preliminary results show that, while delays due to interconnections are reduced, wireability of the chip does not deteriorate.

Book
01 Jan 1985
TL;DR: Design for testability techniques offer one approach toward alleviating this situation by adding enough extra circuitry to a circuit or chip to reduce the complexity of testing.
Abstract: Today's computers must perform with increasing reliability, which in turn depends on the problem of determining whether a circuit has been manufactured properly or behaves correctly. However, the greater circuit density of VLSI circuits and systems has made testing more difficult and costly. This book notes that one solution is to develop faster and more efficient algorithms to generate test patterns or use design techniques to enhance testability - that is, "design for testability." Design for testability techniques offer one approach toward alleviating this situation by adding enough extra circuitry to a circuit or chip to reduce the complexity of testing. Because the cost of hardware is decreasing as the cost of testing rises, there is now a growing interest in these techniques for VLSI circuits.The first half of the book focuses on the problem of testing: test generation, fault simulation, and complexity of testing. The second half takes up the problem of design for testability: design techniques to minimize test application and/or test generation cost, scan design for sequential logic circuits, compact testing, built-in testing, and various design techniques for testable systems.Hideo Fujiwara is an associate professor in the Department of Electronics and Communication, Meiji University. Logic Testing and Design for Testability is included in the Computer Systems Series, edited by Herb Schwetman.

Journal ArticleDOI
01 Jun 1985
TL;DR: The architecture and its implementation ofAbsm~-PIPE (Parallel Instructions and Pipelined Execution) is a research vehicle for high performance VLSI architectures and organizations and makes extensive use of architectural queues.
Abstract: Absm~-PIPE (Parallel Instructions and Pipelined Execution) is a research vehicle for stud)~ng high performance VLSI architectures and organizations. Its principal features are: 1) it is pipelined, 2) it makes extensive use of architectural queues, 3) it is capable of a decoupled mode of operation where two processors cooperate in executing the same task and communicate via hardware queues. 4) it has an instruction cache, and 5) it has a memory interface that allows overlapped memory transactions. This paper describes the architecture and its implementation. Results of extensive simulation studies are reported.

Journal ArticleDOI
TL;DR: A new family of ternary logic circuits that uses both depletion and enhancement types of complementary metal-oxide semiconductor (CMOS) transistors is presented.
Abstract: A new family of ternary logic circuits that uses both depletion and enhancement types of complementary metal-oxide semiconductor (CMOS) transistors is presented. These circuits use two power supplies, each below the transistor's threshold voltages, and do not include resistors. Circuit designs of basic ternary operators (inverters, NAND, NOR) are described. These basic ternary operators can be used as building blocks in the VLSI implementation of three-valued digital systems. An example of the design of a ternary full adder using this family of logic circuits is also presented.


Journal ArticleDOI
Fisher1, Kung1
TL;DR: This paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best possible synchronization schemes for large processor arrays are proposed.
Abstract: Highly parallel VLSI computing structures consist of many processing elements operating simultaneously. In order for such processing elements to communicate among themselves, some provision must be made for synchronization of data transfer. The simplest means of synchronization is the use of a global clock. Unfortunately, large clocked systems can be difficult to implement because of the inevitable problem of clock skews and delays, which can be especially acute in VLSI systems as feature sizes shrink. For the near term, good engineering and technology improvements can be expected to maintain the feasibility of clocking in such systems; however, clock distribution problems crop up in any technology as systems grow. An alternative means of enforcing necessary synchronization is the use of self-timed asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best possible synchronization schemes for large processor arrays are proposed.

Proceedings ArticleDOI
01 May 1985
TL;DR: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes, and naturally suitable for VLSI implementation.
Abstract: A pipeline structure of a transform decoder similar to a systolic array is developed to decode Reed-Solomon (RS) codes. The error locator polynomial is computed by a modified Euclid's algorithm which avoids computing inverse field elements. The new decoder is regular and simple, and naturally suitable for VLSI implementation.

Journal ArticleDOI
TL;DR: Parallel algorithms for data compression by textual substitution that are suitable for VLSI implementation are studied and both “static” and “dynamic” dictionary schemes are considered.
Abstract: Parallel algorithms for data compression by textual substitution that are suitable for VLSI implementation are studied. Both “static” and “dynamic” dictionary schemes are considered.

Journal ArticleDOI
TL;DR: Magic is a new IC layout system that includes several facilities traditionally contained in separate batch-processing programs that incorporate expertise about design rules, connectivity, and routing directly into the layout editor and uses this information to provide several unusual features.
Abstract: Magic is a new IC layout system that includes several facilities traditionally contained in separate batch-processing programs. Magic incorporates expertise about design rules, connectivity, and routing directly into the layout editor and uses this information to provide several unusual features. They include a continuous design-rule checker that operates in background and maintains an up-to-date picture of violations; a hierarchical circuit extractor that only re-extracts portions of the circuit that have changed; an operation called plowing that permits interactive stretching and compaction; and a suite of routing tools that can work under and around existing connections in the channels. A design style called logs and a data structure called corner stitching are used to achieve an efficient implementation of the system.

Journal ArticleDOI
TL;DR: By processing several practical VLSI circuits, it is shown that the method is very effective for handling various kinds of blocks and is able to reduce the design effort required to achieve the chip floor plan.
Abstract: In a hierarchical VLSI layout design, the block-level layout design is called a "chip floor plan" In this paper, a semi-automatic VLSI chip floor plan algorithm and its implementation are presented The initial block placement is obtained by an attractive and repulsive force method (AR method), and the subsequent block packing process is performed by gradually moving and reshaping blocks with chip boundary shrinking The chip area estimation is performed by using individual block area calculations from empirically obtained equations A set of interactive commands is also provided to facilitate the manual optimization processes using a color graphic terminal By processing several practical VLSI circuits, it is shown that the method is very effective for handling various kinds of blocks and is able to reduce the design effort required to achieve the chip floor plan

Journal Article
TL;DR: The paper discusses VLSI design, Trace theory, Finite state machines, and an implementation strategy based on what the authors have rejected.