scispace - formally typeset
Search or ask a question

Showing papers on "Very-large-scale integration published in 1984"


Journal ArticleDOI
01 Jul 1984
TL;DR: The combination of decreasing feature sizes and increasing chip sizes is leading to a communication crisis in the area of VLSI circuits and systems, and the possibility of applying optical and electrooptical technologies to such interconnection problems is investigated.
Abstract: The combination of decreasing feature sizes and increasing chip sizes is leading to a communication crisis in the area of VLSI circuits and systems. It is anticipated that the speeds of MOS circuits will soon be limited by interconnection delays, rather than gate delays. This paper investigates the possibility of applying optical and electrooptical technologies to such interconnection problems. The origins of the communication crisis are discussed. Those aspects of electrooptic technology that are applicable to the generation, routing, and detection of light at the level of chips and boards are reviewed. Algorithmic implications of interconnections are discussed, with emphasis on the definition of a hierarchy of interconnection problems from the signal-processing area having an increasing level of complexity. One potential application of optical interconnections is to the problem of clock distribution, for which a single signal must be routed to many parts of a chip or board. More complex is the problem of supplying data interconnections via optical technology. Areas in need of future research are identified.

1,187 citations


Book
01 Jan 1984

862 citations


Journal ArticleDOI
TL;DR: A formal theory of MOS logic circuits is developed starting from a description of circuit behavior in terms of switch graphs and an algorithm for a logic simulator based on the switch-level model which computes the new state of the network by solving a set of equations in a simple, discrete algebra.
Abstract: The switch-level model describes the logical behavior of digital systems implemented in metal oxide semiconductor (MOS) technology. In this model a network consists of a set of nodes connected by transistor "switches" with each node having a state 0, 1, or X (for invalid or uninitialized), and each transistor having a state "open," "closed," or "indeterminate." Many characteristics of MOS circuits can be modeled accurately, including: ratioed, complementary, and precharged logic; dynamic and static storage; (bidirectional) pass transistors; buses; charge sharing; and sneak paths. In this paper we present a formal development of the switch-level model starting from a description of circuit behavior in terms of switch graphs. Then we describe an algorithm for a logic simulator based on the switch-level model which computes the new state of the network by solving a set of equations in a simple, discrete algebra. This algorithm has been implemented in the simulator MOSSIM II and operates at speeds approaching those of conventional logic gate simulators. By developing a formal theory of MOS logic circuits, we have achieved a greater degree of generality and accuracy than is found in other logic simulators for MOS.

386 citations


01 Jan 1984

329 citations


Journal ArticleDOI
TL;DR: VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.
Abstract: In some signal processing applications, it is desirable to build very high performance fast Fourier transform (FFT) processors. To meet the performance requirements, these processors are typically highly pipelined. Until the advent of VLSI, it was not possible to build a single chip which could be used to construct pipeline FFT processors of a reasonable size. However, VLSI implementations have constraints which differ from those of discrete implementations, requiring another look at some of the typical FFT'algorithms in the light of these constraints.

327 citations


Journal ArticleDOI
TL;DR: The algorithms are presented under a simplified model of VLSI circuits, and the storage requirements of the structure are discussed.
Abstract: Corner stitching is a technique for representing rectangular two-dimensional objects. It is especially well suited for interactive VLSI layout editing systems. The data structure has two important features: first, empty space is represented explicitly; and second, rectangular areas are stitched together at their corners like a patchwork quilt. This organization results in fast algorithms (linear or constant expected time) for searching, creation, deletion, stretching, and compaction. The algorithms are presented under a simplified model of VLSI circuits, and the storage requirements of the structure are discussed. Corner stitching has been implemented in a working layout editor. Initial measurements indicate that it requires about three times as much memory space as the simplest possible representation.

286 citations



Journal ArticleDOI
TL;DR: This tutorial paper addresses some of the principles and provides examples of concurrent architectures and designs that have been inspired by VLSI technology.
Abstract: This tutorial paper addresses some of the principles and provides examples of concurrent architectures and designs that have been inspired by VLSI technology. The circuit density offered by VLSI provides the means for implementing systems with very large numbers of computing elements, while its physical characteristics provide an incentive to organize systems so that the elements are relatively loosely coupled. One class of computer architectures that evolve from this reasoning include an interesting and varied class of concurrent machines that adhere to a structural model based on the repetition of regularly connected elements. The systems included under this structural model range from 1) systems that combine storage and logic at a fine grain size, and are typically aimed at computations with images or storage retrieval, to 2) systems that combine registers and arithmetic at a medium grain size to form computational or systolic arrays for signal processing and matrix computations, to 3) arrays of instruction interpreting computers that use teamwork to perform many of the same demanding computations for which we use high-performance single process computers today.

252 citations


Proceedings ArticleDOI
25 Jun 1984
TL;DR: The Magic layout system incorporates expertise about design rules and connectivity directly into the layout system in order to implement powerful new operations, including: a continuous design-rule checker that operates in background to maintain an up-to-date picture of violations.
Abstract: Magic is a "smart" layout system for integrated circuits. The user interface is based on a new design style called logs, which combines the efficiency of mask-level design with the flexibility of symbolic design. The system incorporates expertise about design rules and connectivity directly into the layout system in order to implement powerful new operations, including: a continuous design-rule checker that operates in background to maintain an up-to-date picture of violations; an operation called plowing that permits interactive stretching and compaction; and routing tools that can work under and around existing connections in the channels. Magic uses a new data structure called corner stitching to achieve an efficient implementation of these operations.

244 citations


Journal ArticleDOI
Hennessy1
TL;DR: In a VLSI implementation of an architecture, many problems can arise from the base technology and its limitations, so the architects must be aware of these limitations and understand their implications at the instruction set level.
Abstract: A processor architecture attempts to compromise between the needs of programs hosted on the architecture and the performance attainable in implementing the architecture. The needs of programs are most accurately reflected by the dynamic use of the instruction set as the target for a high level language compiler. In VLSI, the issue of implementation of an instruction set architecture is significant in determining the features of the architecture. Recent processor architectures have focused on two major trends: large microcoded instruction sets and simplified, or reduced, instruction sets. The attractiveness of these two approaches is affected by the choice of a single-chip implementation. The two different styles require different tradeoffs to attain an implementation in silicon with a reasonable area. The two styles consume the chip area for different purposes, thus achieving performance by different strategies. In a VLSI implementation of an architecture, many problems can arise from the base technology and its limitations. Although circuit design techniques can help alleviate many of these problems, the architects must be aware of these limitations and understand their implications at the instruction set level.

216 citations


Journal ArticleDOI
TL;DR: Attention is paid to fabrication tolerances, wire capacitance, wire resistance, coupling capacitances and capacitance associated with contacts and the aspect ratio of (non-rectangular) transistors.

Journal ArticleDOI
TL;DR: An MOS ternary-logic family comprising a set of inverters, NOR gates, and NAND gates is proposed, and an implementation of the cyclic convolution is concluded, an application in which a significant advantage can be gained through the use of ternaries digital hardware.
Abstract: An MOS ternary-logic family comprising a set of inverters, NOR gates, and NAND gates is proposed. These gates are used to design basic ternary arithmetic and memory circuits. The circuits thus obtained are then used to synthesize complex ternary arithmetic circuits and shift registers. The ternary circuits developed are shown to have some significant advantages relative to other known ternary circuits; these include low power dissipation and reduced propagation delay and component count. For a given dynamic range, the complexity of the new ternary circuits is shown to be comparable to that of corresponding binary circuits. Nevertheless, the associated reduction in the wordlength in the case of the ternary circuits tends to alleviate to a large extent the pin limitation problem associated with VLSI implementation. The authors conclude with an implementation of the cyclic convolution, an application in which a significant advantage can be gained through the use of ternary digital hardware.

Patent
28 Jun 1984
TL;DR: In this paper, the layout of a master-slice VLSI chip is alternated in an iterative process, where the chip area is partitioned into sub-areas of decreasing size, the set of components is partitioning into subsets which fit to the respective subareas, and after each partitioning step the global wiring is determined for the existing subnets of the whole network.
Abstract: For designing the layout of a master-slice VLSI chip steps for placing components and for determining the wiring pattern interconnecting them are alternated in an iterative process. The chip area is partitioned into subareas of decreasing size, the set of components is partitioned into subsets which fit to the respective subareas, and after each partitioning step the global wiring is determined for the existing subnets of the whole network. Due to this interrelation of placement and wiring procedures, advantages with respect to total wire length, overflow number of wires, and processing time can be gained.

Proceedings ArticleDOI
06 Aug 1984
TL;DR: This paper presents a VLSI design language μFP, which is a variant of Backus' FP, and shows how μFP is constructed by the addition of a single combining form μ, which encapsulates a very simple notion of “state”.
Abstract: In this paper, we present a VLSI design language mFP, which is a variant of Backus' FP [Backus 78, 81]. mFP differs from conventional VLSI design languages in that it can describe both the semantics (or behaviour) of a circuit and its layout (or floorplan) [Sheeran 83].We chose to base our design language on FP for several reasons. Functional programs are easier to write and to reason about than imperative ones. We hope to bring some of these benefits to IC design. FP, in particular, is designed to allow the programmer to reason about his or her programs by manipulating the programs themselves. Likewise, in mFP, programs (or circuit descriptions) are just expressions “made” from a small number of primitive functions and combining forms (functionals that map functions into functions). These functions and combining forms (CFs) were chosen because they have nice algebraic properties. Thus, circuit descriptions are concise and can be easily manipulated using the algebraic laws of the language. Also, each CF has a simple geometric interpretation, so that every mFP expression has an associated floorplan. This interpretation exists because mFP expressions represent functions rather than objects, allowing us to associate a function with each section of the floorplan. Most VLSI design languages are designed either for layout description or for behavioural specification. mFP, with its dual interpretation, allows the designer to consider the effect on the final layout of a particular design decision or to manipulate the layout while keeping the semantics constant. In the following sections, we show how mFP is constructed from FP by the addition of a single combining form m, which encapsulates a very simple notion of “state”.

Proceedings ArticleDOI
25 Jun 1984
TL;DR: Three delay models for large digital MOS circuits are presented, ranging from an RC model that typically errs by 25% to a slope-based model whose delay estimates are typically within 10% of SPICE's estimates.
Abstract: This paper presents fast, simple, and relatively accurate delay models for large digital MOS circuits. Delay modeling is organized around chains of switches and nodes called stages, instead of logic gates. The use of stages permits both logic gates and pass transistor arrays to be handled in a uniform fashion. Three delay models are presented, ranging from an RC model that typically errs by 25% to a slope-based model whose delay estimates are typically within 10% of SPICE's estimates. The slope model is parameterized in terms of the ratio between the slopes of a stage's input and output waveforms. All the models have been implemented in the Crystal timing analyzer. They are evaluated by comparing their delay estimates to SPICE, using a dozen critical paths from two VLSI designs.

Proceedings Article
16 Oct 1984
TL;DR: A methodology relating physical features of point defects inherent in the fabrication process to the circuit-level faulty behaviors caused by these defects is proposed and a simulation approach to support this methodology is introduced.
Abstract: A methodology relating physical features of point defects inherent in the fabrication process to the circuit-level faulty behaviors caused by these defects is proposed. A simulation approach to support this methodology is introduced and illustrated using an example n-MOS circuit. Using this methodology, technology and layout dependent faults can be generated and ranked according to their likelihood. Using a ranked fault list, a new and more effective testing approach for MOS VLSI circuits can be developed.

Proceedings ArticleDOI
19 Mar 1984
TL;DR: The paper presents a revised functional description of Volder's Coordinate Rotation Digital Computer algorithm (CORDIC), as well as allied VLSI implementable processor architectures, and benefits the execution speed in array configurations, since it will allow pipelining at the bit level.
Abstract: The paper presents a revised functional description of Volder's Coordinate Rotation Digital Computer algorithm (CORDIC), as well as allied VLSI implementable processor architectures. Both pipelined and sequential structures are considered. In the general purpose or multi-function case, pipeline length (number of cycles), function evaluation time and accuracy are all independent of the various executable functions. High regularity and minimality of data-paths, simplicity of control circuits and enhancement of function evaluation speed are ensured, partly by mapping a unified set of micro-operations, and partly by invoking a natural encoding of the angle parameters. The approach benefits the execution speed in array configurations, since it will allow pipelining at the bit level, thereby providing fast VLSI implementations of certain algorithms exhibiting substantial structural pipelining or parallelism.

Patent
12 Jul 1984
TL;DR: In this article, a system for cooling integrated circuit chips and particularly those involving very large scale integrated circuits is described, where the cooling chip is provided with a plurality of spaced parallel grooves which extend along the one side or surface opposite the surface that is in bearing contact with the integrated circuit chip.
Abstract: A system for cooling integrated circuit chips and particularly those involving very large scale integrated circuits; the system provides for closely associating the heat-sink or heat exchange element with the integrated circuit chip by having the heat-sink, in the form of a "cooling chip", in intimate contact with the back surface of an integrated circuit chip (in a "flip chip" configuration, the front, or circuit-implemented, surface, makes contact with a ceramic carrier or module); the cooling chip is provided with a plurality of spaced parallel grooves which extend along the one side or surface opposite the surface that is in bearing contact with the integrated circuit chip, whereby liquid coolant flows through the grooves so as to remove heat from the integrated circuit chip; further included in the system is a specially configured bellows for conducting the liquid coolant from a source to the heat-sink, and for removing the liquid coolant; a coolant distribution means, in the form of at least one glass plate or manifold, is provided with spaced passageways interconnecting the respective incoming and outgoing coolant flow paths of the bellows with the heat-sink.

Journal ArticleDOI
TL;DR: The testability of two well-known array multiplier structures is studied in detail and it is shown that, with appropriate cell design, array multipliers can be designed to be very easily testable.
Abstract: Array multipliers are well suited for VLSI implementation because of the regularity in their iterative structure. However, most VLSI circuits are difficult to test. This correspondence shows that, with appropriate cell design, array multipliers can be designed to be very easily testable. An array multiplier is called C-testable if all its adder cells can be exhaustively tested while requiring only a constant number of test patterns. The testability of two well-known array multiplier structures is studied in detail. The conventional design of the carry–save array multiplier is modified. The modified design is shown to be C-testable and requires only 16 test patterns. Similar results are obtained for the Baugh–Wooley two's complement array multiplier. A modified design of the Baugh–Wooley array multiplier is shown to be C-testable and requires 55 test patterns. The C-testability of two other array multipliers, namely the carry–propagate and the TRW designs, is also presented.

Journal ArticleDOI
TL;DR: Two models of the cost of data movement in parallel numerical algorithms are described, one suitable for shared memory multiprocessors where each processor has vector capabilities and the other applicable to highly parallel nonshared memory MIMD systems.
Abstract: This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.

Journal ArticleDOI
TL;DR: Performance measures for the evaluation of the effectiveness and area utilization of various fault-tolerant techniques are devised and the reduction in wafer yield is analyzed and the possibility of yield enhancement through redundancy is investigated.
Abstract: Fault-tolerance is undoubtedly a desirable property of any processor array. However, increased design and implementation costs should be expected when fault-tolerance is being introduced into the architecture of a processor array. When the processor array is implemented within a single VLSI chip, these cost increases are directly related to the chip silicon area. Thus, the increase in area should be weighed against the improved performance of the gracefully degrading fault-tolerant processor array. In addition, a larger chip area might reduce the wafer yield to an unaceptable level making the use of fault-tolerant VLSI processor arrays impractical. The objective of this paper is to devise performance measures for the evaluation of the effectiveness and area utilization of various fault-tolerant techniques. Another goal is to analyze the reduction in wafer yield and investigate the possibility of yield enhancement through redundancy.

Journal ArticleDOI
TL;DR: A substrate resistance modeling technique which may be applied to the design of both FET and bipolar chips and its use in developing a substrate resistance model required for studying a disturb problem encountered with a high-speed array chip is described.
Abstract: With the advent of VLSI and the use of statistical simulation techniques to perform integrated circuit design, modeling of chip substrate resistance is becoming increasingly important to successful chip design. This paper will present a substrate resistance modeling technique which may be applied to the design of both FET and bipolar chips. After briefly presenting the theory behind the technique, we will describe its use in developing a substrate resistance model required for studying a disturb problem encountered with a high-speed array chip. The steps involved in building and simplifying the substrate model will be described. The effect on circuit simulations and noise sensitivity will then be shown.

Proceedings Article
16 Oct 1984
TL;DR: This contribution outlines the applications of a recently proposed testability analysis algorithm that has been implemented as a controllability/ observability program named COP, and establishes that CPU cost grows only linearly with the size of the modules in all of the applications considered to date.
Abstract: This contribution outlines the applications of a recently proposed testability analysis algorithm that has been implemented as a controllability/ observability program named COP. Several benchmark modules, some approaching the complexity of VLSI, established that CPU cost grows only linearly with the size of the modules in all of the applications considered to date. These include effective heuristics for automatic test pattern generation (ATPG), assessment of random pattern testability, effective approach to approximate fault simulation. structural partitioning into cones, critical delay path tracing and characterization of the module in terms of a "testability signature". In view of the increasing complexity of VLSI, such performance provides strong motivation that the calibration of testability analysis be enhanced and its role further expanded.

Proceedings ArticleDOI
25 Jun 1984
TL;DR: An O(n /sup 2/) algorithm for finding a rectangular dual of a planar triangulated graph is presented and is useful for solving area planning problems in VLSI IC design.
Abstract: An O(n /sup 2/) algorithm for finding a rectangular dual of a planar triangulated graph is presented. In practice, almost linear running times have been observed. The algorithm is useful for solving area planning problems in VLSI IC design.

Journal ArticleDOI
TL;DR: In this correspondence the logic structure of a universal VLSI chip called the symbol-slice Reed-Solomon (RS) encoder chip is presented and it is shown that a (255, 223) RS encoder requiring around 40 discrete CMOS IC's may be replaced by anRS encoder consisting of four identical interconnected V LSI RS encoding chips.
Abstract: In this paper, the known decoding procedures for Reed-Solomon (RS) codes are modified to obtain a repetitive and recursive decoding technique which is suitable for VLSI implementation and pipelining. The chip architectures of two basic building blocks for VLSI RS decoder systems are then presented. It is shown that a VLSI RS decoder has the potential advantage of achieving a high decoding speed through parallel-pipeline processing.

Journal ArticleDOI
TL;DR: In this paper, the authors analyzed on-chip interconnection delay in very high-speed LSI/VLSI's in the time domain, changing interconnection geometry, substrate resistivity, and terminal conditions.
Abstract: Using an MIS (metal-insulator-semiconductor) microstrip-line model for interconnection and its equivalent circuit representation, on-chip interconnection delay in very high-speed LSI/VLSI's is analyzed in the time domain, changing interconnection geometry, substrate resistivity, and terminal conditions. The results show the following: 1) the "lumped capacitance" approximation is inapplicable for interconnections in very high-speed LSI/VLSI's (t pd of below 100-200 ps); 2) as compared to the semi-insulating substrate, the presence of the slow-wave mode and mode transition in the semiconducting substrates causes 1.5-2 times increase in the delay time and 2-10 times increase in the rise time; and 3) in order to realize propagation delay times of less than 100 ps per gate at LSI/VLSI levels, the effective signal source resistance of the gate should be less than 500 Ω so as to long interconnections.

Journal ArticleDOI
01 Jun 1984
TL;DR: In this article, the use of redundancy for the yield improvement of VLSI circuits is explored through the use a mathematical model, and it is shown that interconnection density and pattern complexities around each section determines the effectiveness of yield improvement.
Abstract: Redundancy of both logic circuits and interconnections is the core principle of both RVLSI (Restructurable or Fault-Tolerant VLSI) and WSI (Wafer Scale Integration). For varying complexity and sizes of circuits different factors of redundancy are required. Effective use of redundancy requires understanding of the failures and failure modes at different stages of the processing and lifetime of VLSI and WSI circuits. This paper consists of two parts. In Part I, sources of failures for MOS devices are discussed. Manifestations of physical failures are described. Use of redundancy for the yield improvement of VLSI circuits is explored through the use of a mathematical model. It is shown that interconnection density and pattern complexities around each section determines the effectiveness of yield improvement. In Part II (to be published in a forthcoming issue), programmable interconnect technologies are described to facilitate restructuring of VLSI and WSI circuits, in this case as they apply to yield improvement through the use of redundancy.

01 Jan 1984

Journal ArticleDOI
Johnny James LeBlanc1
TL;DR: The advantages of this technique, namely very low hardware overhead cost, design-independent implementation, and effective static testing, make LOCST an attractive and powerful technique.
Abstract: A built-in self-test technique utilizing on-chip pseudorandom-pattern generation, on-chip signature analysis, a ``boundary scan'' feature, and an on-chip monitor test controller has been implemented on three VLSI chips by the IBM Federal Systems Division. This method (designated LSSD on-chip self-test, or LOCST) uses existing level-sensitive scan design strings to serially scan random test patterns to the chip's combinational logic and to collect test results. On-chip pseudorandom-pattern generation and signature analysis compression are provided via existing latches, which are configured into linear-feedback shift registers during the self-test operation. The LOCST technique is controlled through the on-chip monitor, IBM FSD's standard VLSI test interface/controller. Boundary scan latches are provided on all primary inputs and primary outputs to maximize self-test effectiveness and to facilitate chip I/O testing. Stuck-fault simulation using statistical fault analysis was used to evaluate test coverage effectiveness. Total test coverage values of 81.5, 85.3, and 88.6 percent were achieved for the three chips with less than 5000 random-pattern sequences. Outstanding test coverage (≫97%) was achieved for the interior logic of the chips. The advantages of this technique, namely very low hardware overhead cost (≪2%), design-independent implementation, and effective static testing, make LOCST an attractive and powerful technique.

Proceedings ArticleDOI
25 Jun 1984
TL;DR: Macro-models are developed and new theorems on the optimal sizing of the transistors in a critical path are presented, and the results of a design automation procedure to perform the optimization is discussed.
Abstract: The problem of optimally sizing the transistors in a digital MOS VLSI circuit is examined. Macro-models are developed and new theorems on the optimal sizing of the transistors in a critical path are presented. The results of a design automation procedure to perform the optimization is discussed.