scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Micro in 1994"


Journal ArticleDOI
TL;DR: Maintaining a finite-state machine model throughout, this approach automatically synthesizes the entire design, including hardware-software interfaces, and preserves the formal properties of the design.
Abstract: Designers generally implement embedded controllers for reactive real-time applications as mixed software-hardware systems. In our formal methodology for specifying, modeling, automatically synthesizing, and verifying such systems, design takes place within a unified framework that prejudices neither hardware nor software implementation. After interactive partitioning, this approach automatically synthesizes the entire design, including hardware-software interfaces. Maintaining a finite-state machine model throughout, it preserves the formal properties of the design. It also allows verification of both specification and implementation, as well as the use of specification refinement through formal verification. >

214 citations


Journal ArticleDOI
TL;DR: The approach presented involves injecting transient faults into integrated circuits by using heavy-ion radiation from a Californium-252 source to inject faults at internal locations in VLSI circuits.
Abstract: Fault injection is an effective method for studying the effects of faults in computer systems and for validating fault-handling mechanisms. The approach presented involves injecting transient faults into integrated circuits by using heavy-ion radiation from a Californium-252 source. The proliferation of safety-critical and fault-tolerant systems using VLSI technology makes such attempts to inject faults at internal locations in VLSI circuits increasingly important. >

188 citations


Journal ArticleDOI
TL;DR: While keeping the system interface compatible with the 601 microprocessor, the 604 microprocessor is improved upon it by incorporating a phase-locked loop and an IEEE-Std 1149.1 boundary-scan QTAG) interface on chip.
Abstract: Somerset Design Center he 604 microprocessor is the third member of the PowerPC family being developed jointly by Apple, IBM, and Motorola. Developed for use in desktop personal computers, workstations, and servers, this 32-bit implementation works with the software and bus in the PowerPC 601 and 603 microprocessors.‘m3 While keeping the system interface compatible with the 601 microprocessor, we improved upon it by incorporating a phase-locked loop and an IEEE-Std 1149.1 boundary-scan QTAG) interface on chip. In addition, an advanced machine organization delivers one and a half to two times the 601’s integer performance.

125 citations


Journal ArticleDOI
TL;DR: This framework provides a basis for understanding transient fault problems in digital systems and can be helpful in selecting optimum techniques to mask or eliminate transient fault effects in developed systems.
Abstract: It is hard to shield systems effectively from transient faults (fault avoidance techniques). So some other means must be employed to assure appropriate levels of transient fault tolerance (insensitivity to transient faults). They are based on fault-masking and fault recovery ideas. Having analyzed this problem, the author identifies critical design points and outlines some practical solutions that refer to efficient on-line detectors (detecting errors during the system operation) and error handling procedures. This framework provides a basis for understanding transient fault problems in digital systems. It can be helpful in selecting optimum techniques to mask or eliminate transient fault effects in developed systems. >

94 citations


Journal ArticleDOI
TL;DR: Europe's Delta-4 project argues persuasively for implementing fault tolerance in a distributed fashion by replicating capsules/spl minus/runtime representations of application objects/splminus/on distributed, LAN-interconnected nodes.
Abstract: Because they avoid extensive redesign of specialized hardware, software-implemented approaches to fault tolerance are very resilient to change. Europe's Delta-4 project argues persuasively for implementing fault tolerance in a distributed fashion. The Delta-4 approach achieves fault tolerance by replicating capsules/spl minus/runtime representations of application objects/spl minus/on distributed, LAN-interconnected nodes. It can configure capsule groups to tolerate either stopping or arbitrary failures. Its multipoint protocols serve to coordinate capsule groups and for error processing and fault treatment. >

93 citations


Journal ArticleDOI
TL;DR: This survey addresses the challenge of tuning the hardware to its software applications, considers different architectures and their uses, and reports on the status of CAD codesign tools, with particular reference to simulation and synthesis.
Abstract: Most digital systems consist of a hardware component and software programs that execute on the hardware platform. Obviously, a system can deliver higher performance when we tune the hardware to its software applications and vice versa. Today's novel architectures and the possible use of computer-aided design tools have created new opportunities to find solutions to codesign problems. This survey addresses this challenge, considers different architectures and their uses, and reports on the status of CAD codesign tools, with particular reference to simulation and synthesis. >

79 citations


Journal ArticleDOI
TL;DR: The TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architecture that dispatches up to four instructions each clock cycle to two floating-point execution units, two memory load/store units, and two integer execution units.
Abstract: Designed to efficiently support large, real-world, floating-point-intensive applications, the TFP (short for Tremendous Floating-Point) microprocessor is a superscalar implementation of the Mips Technologies architecture. This floating-point, computation-oriented processor uses a superscalar machine organization that dispatches up to four instructions each clock cycle to two floating-point execution units, two memory load/store units, and two integer execution units. Its split-level cache structure reduces cache misses by directing integer data references to a 16-Kbyte on-chip cache, while channeling floating-point data references off chip to a 4 Mbyte cache. >

66 citations



Journal ArticleDOI
TL;DR: This work has divided constraints into two levels, corresponding to low-level interactions with device interfaces and high-level real-time response and rate requirements, and developed solutions tailored to each aspect.
Abstract: In designing Chinook, a hardware-software cosynthesis system for reactive real-time controllers, the impact of timing constraints on software scheduling has been a central concern. By dividing constraints into two levels, corresponding to low-level interactions with device interfaces and high-level real-time response and rate requirements, we have developed solutions tailored to each aspect. These scheduling techniques enable Chinook to map a high-level specification onto a specified collection of processors and peripheral devices while respecting performance requirements. >

59 citations


Journal ArticleDOI
TL;DR: A low-power analog integrated circuit which implements a biologically inspired algorithm for the spectral analysis of sound, and supports the modification of these parameters via Fowler-Nordheim tunneling, under the control of a digital interface.
Abstract: Presents a low-power analog integrated circuit which implements a biologically inspired algorithm for the spectral analysis of sound. The chip features an efficient interface to digital systems; preserving analog processing's low-power, high-density advantages requires careful attention to interface issues. To send the spectral representation off chip, it generates a sparse coding of the output spectrum, and communicates the code as an asynchronous stream of events. We store parameters for the spectral analysis algorithm as charge on floating nodes, and support the modification of these parameters via Fowler-Nordheim tunneling, under the control of a digital interface. A prototype system uses this chip as a preprocessor. >

57 citations


Journal ArticleDOI
TL;DR: This special issue reports on the methodological progress in some selected areas of fault tolerance as well as practical experience gained by developing concrete fault-tolerant systems.
Abstract: This special issue reports on the methodological progress in some selected areas of fault tolerance as well as practical experience gained by developing concrete fault-tolerant systems. Correspondingly, the issue contains contributions from the research community as well as from industry.

Journal ArticleDOI
TL;DR: The PowerPC, a new RISC architecture derived from IBM’s POWER architecture, is currently available as an open-system standard to provide both software compatibility among PowerPC processors and maximum implementation flexibility.
Abstract: The PowerPC, a new RISC architecture derived from IBM’s POWER architecture, is currently available as an open-system standard. To provide both software compatibility among PowerPC processors and maximum implementation flexibility, its designers specified the PowerPC architecture in four books. As we show here, Book 1 describes the user-mode programming model and instruction set common to all PowerPC processors.

Journal ArticleDOI
TL;DR: The simulation compiler is described and it is shown how it can be used to improve simulation performance by up to a factor of two over an all-software simulator.
Abstract: Our approach to digital system simulation compiles a high-level system model into a high-performance simulator that consists of software and hardware components. The target architecture for the simulation compiler is a tightly coupled processor and field-programmable gate array. We describe the simulation compiler and show how it can be used to improve simulation performance by up to a factor of two over an all-software simulator. >

Journal ArticleDOI
S. Undy1, M. Bass, D. Hollenbeck, W. Kever, Larry J. Thayer 
TL;DR: Hewlett-Packard's latest workstation class lets designers optimize performance and cost at the system level with its Hummingbird microprocessor, which features two-way superscalar execution incorporating two integer units and a floating-point unit.
Abstract: With just three VLSI parts, Hewlett-Packard's latest workstation class lets designers optimize performance and cost at the system level. Its Hummingbird microprocessor features two-way superscalar execution incorporating two integer units, a floating-point unit, a 1-Kbyte internal instruction cache, an integrated external cache controller, an integrated memory and I/O controller, plus enhancements for little-endian and multimedia applications. Its Artist graphics controller integrates a graphical user interface accelerator, a frame buffer controller, and a video controller on a single chip. >

Journal ArticleDOI
Keith Boland1, Apostolos Dollas
TL;DR: This research survey starts with the fundamentals of single-level caches and moves to the need for multilevel cache hierarchies, and looks at some of the current techniques for boosting cache performance, especially compiler-based methods for code restructuring and instruction and data prefetching.
Abstract: By examining the rate at which successive generations of processor and DRAM cycle times have been diverging over time, we can track the latency problem of computer memory systems. Our research survey starts with the fundamentals of single-level caches and moves to the need for multilevel cache hierarchies. We look at some of the current techniques for boosting cache performance, especially compiler-based methods for code restructuring and instruction and data prefetching. These two areas will likely yield improvements for a much larger domain of applications in the future. >

Journal ArticleDOI
TL;DR: The coherence problem in multilevel cache hierarchies and large-scale, shared-memory multiprocessors and the principles of the two major groups of hardware protocols are discussed and relevant representatives are summarized.
Abstract: Improving performance and scalability in shared-memory multiprocessors requires an appropriate solution to the well-known cache coherence problem. Hardware schemes-highly convenient because of their transparency for software-offer fully dynamic solutions, with an ability to achieve high performance. In Part 1 of this two-part series, we discussed the principles of the two major groups of hardware protocols and summarized relevant representatives. Here, we also briefly consider the coherence problem in multilevel cache hierarchies and large-scale, shared-memory multiprocessors. >

Journal ArticleDOI
TL;DR: To guide design decisions in developing an optimized architecture for automotive powertrain modules, analysis was relied upon, a key to hardware-software codesign, and the methodology extends to similar real-time embedded systems.
Abstract: To guide design decisions in developing an optimized architecture for automotive powertrain modules, we relied upon analysis, a key to hardware-software codesign. Complicating such efforts are ongoing refinements to the underlying algorithms, ever stricter government standards, reusability demands, and late-arriving specifications for the controlled components. In our approach, configuration-level analysis lets us quickly and efficiently explore a large design space. Behavioral-level analysis validates decisions and optimizes hardware and software. Our codesign methodology extends to similar real-time embedded systems. >

Journal ArticleDOI
D.P. Foty1, E.J. Nowak1
TL;DR: Certain limits influence MOSFET technology in low-voltage applications and strong off-state power consumption requirements and increasing numbers of FETs in each integrated circuit, combined with the physical limit to the subthreshold slope, force designers to choose between high performance and high density.
Abstract: Certain limits influence MOSFET technology in low-voltage applications. When we reduce the power supply voltage in modern short-channel devices, both active power dissipation and hot carrier reliability improve more than Linearly. However, strong off-state power consumption requirements and increasing numbers of FETs in each integrated circuit, combined with the physical limit to the subthreshold slope, force designers to choose between high performance and high density. >

Journal ArticleDOI
TL;DR: A generic analog chip could implement in a parallel way all basic functions found in these algorithms, permitting construction of a fast, portable classification system.
Abstract: Many neural-like algorithms currently under study support classification tasks. Several of these algorithms base their functionality on LVQ-like procedures to find locations of centroids in the data space, and on kernel (or radial-basis) functions centered on these centroids to approximate functions or probability densities. A generic analog chip could implement in a parallel way all basic functions found in these algorithms, permitting construction of a fast, portable classification system. >

Journal ArticleDOI
TL;DR: The PowerPC is a new RISC architecture derived from IBM's POWER architecture that simplifies implementations, increase clock rates, enable a higher degree of superscalar execution, extend the architecture to 64 bits, and add multiprocessor support.
Abstract: The PowerPC is a new RISC architecture derived from IBM's POWER architecture. The changes made to POWER simplify implementations, increase clock rates, enable a higher degree of superscalar execution, extend the architecture to 64 bits, and add multiprocessor support. For compatibility with existing software, the developers retained POWER's basic instruction set, opcode assignments, and programming model. >

Journal ArticleDOI
TL;DR: This special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds and enables this type of computation to tolerate weight discretization, synapse nonlinearity, noise, and other non-ideal effects.
Abstract: Targeted at high-energy physics research applications, our special-purpose analog neural processor can classify up to 70 dimensional vectors within 50 nanoseconds. The decision-making process of the implemented feedforward neural network enables this type of computation to tolerate weight discretization, synapse nonlinearity, noise, and other non-ideal effects. Although our prototype does not take advantage of advanced CMOS technology, and was fabricated using a 2.5-/spl mu/m CMOS process, it performs 6 billion multiplications per second, with only 2 W dissipation, and has as high as 1.5 Gbyte/s equivalent bandwidth. >

Journal ArticleDOI
TL;DR: This work designed the optical neurochip so that variable sensitivity photodiodes are monolithically integrated on top of an LED array, serving both as fast analog multipliers and as on-chip weight storage elements with learning capability.
Abstract: An array of photodetectors, each having a variable sensitivity, forms the most important component of our two gallium arsenide chips. We designed the optical neurochip so that variable sensitivity photodiodes are monolithically integrated on top of an LED array, serving both as fast analog multipliers and as on-chip weight storage elements with learning capability. Our artificial retina device combines a VSPD array with a neural network for postprocessing, allowing us to perform fast, yet flexible, processing operations on projected images. >

Journal ArticleDOI
TL;DR: An amorphous silicon memory is developed which is presented in experiments incorporating the device in a modest pulse stream neural chip and a target-based training algorithm, which is demonstrated in a prototype learning device using a realistic problem.
Abstract: EPSILON, a large, working, VLSI device, demonstrates pulse stream methods in the wider context of analog neural networks. EPSILON uses dynamic weight storage techniques, but a nonvolatile alternative is desirable. To that end, we have developed an amorphous silicon memory, which we present in experiments incorporating the device in a modest pulse stream neural chip. We have also developed a target-based training algorithm, which we demonstrate in a prototype learning device using a realistic problem. Finally, we explore system-level problems in experiments with a second version of EPSILON in a small, autonomous robot. >

Journal ArticleDOI
TL;DR: This article will show that this new bus definition has demonstrated sufficient performance capability and flexibility to become the standard interface for several follow-on projects, including the PowerPC 603 and 604 microprocessors.
Abstract: he 601 is the first implementation of the PowerPC architecture. Given this project’s aggressive schedule goals, the 601 designers chose to use Motorola’s existing 88110 bus, with some enhancements, rather than introducing an entirely new bus definition. As this article will show, this new bus definition has demonstrated sufficient performance capability and flexibility to become the standard interface for several follow-on projects, including the PowerPC 603 and 604 microprocessors. The bus definition elements common to all of these projects-PowerPC 601, 603, and 60Gform the PowerPC 60X bus. The 60X bus must support a wide range of system configurations, including single-processor, low-cost laptop machines, high-performance desktop personal computers and workstations, and multiprocessor, file and compute server systems. Figure 1 shows a typical box system configuration. The bus provides the interconnection and transfer protocols between one or more processor nodes, memory, and typically at least one expansion bridge to a system bus such as PCI, Microchannel, or VME. The processor node consists of two or more PowerPC microprocessors and, optionally, a secondary cache (L2 cache). Figure 1 also shows the L2 cache as a look-through design, though lookaside cache designs can be used. Depending on the goals for a particular system, the graphics subsystem (not shown) may attach directly to the 60X bus, or to the system bus. The primary considerations for the 601 project were quick time to market and high performance. We targeted the 601 towards systems ranging from desktop PCs to multiprocessor file/compute servers. Therefore, the bus definition had to provide a robust multiprocessing solution with minimal silicon and schedule impact. The 603 project goals dictated reduced multiprocessing support and several minor changes in the bus definition, while maintaining backwards compatibility with the 601. Though the 604 maintains this backwards compatibility with the 601, it again incorporates a small evolutionary step forward in the bus definition.

Journal ArticleDOI
TL;DR: This proposed combination of analog and digital technologies produces a densely packed, high-speed, scalable architecture, designed to easily accommodate learning capabilities.
Abstract: Developed for the VLSI implementation of neural network models, our novel analog architecture adds flexibility and adaptability by incorporating digital processing capabilities. Its systolic-based architecture avoids static storage of analog values by transferring the activation values through the chip's processing units. This proposed combination of analog and digital technologies produces a densely packed, high-speed, scalable architecture, designed to easily accommodate learning capabilities. >

Journal Article
TL;DR: A fabricated integrated circuit for self-organizing feature maps is presented based on the idea, that restrictions to the algorithm can simplify the implementation and performance figures for a system architecture based on these chips are presented.
Abstract: The use of self-organizing feature maps in real-time applications requires a high computational performance. Especially for embedded systems neural network chips are needed. In this paper a fabricated integrated circuit for self-organizing feature maps is presented. The architecture of this digital chip is based on the idea, that restrictions to the algorithm can simplify the implementation. Using the Manhattan Distance and a special treatment of the adaptation factor α decreases the necessary chip area, so that a high number of processor elements can be integrated on one chip. The effects of these restrictions on the function of the self-organizing feature map are discussed. The paper concludes with performance figures for a system architecture based on these chips

Journal ArticleDOI
TL;DR: The design, fabrication, and performance of DOEs for use as free-space interconnection elements in optical-computing and photonic-switching applications are reviewed, focusing on Fourier- and Fresnel-regime surface-relief structures and their microlithographic fabrication.
Abstract: Diffractive optical elements easily handle tasks such as the generation of uniform arrays of beams, beam steering and focusing, and aberration correction that conventional optics alone often find difficult or impossible. Here we review the design, fabrication, and performance of DOEs for use as free-space interconnection elements in optical-computing and photonic-switching applications. Our discussion focuses on Fourier- and Fresnel-regime surface-relief structures and their microlithographic fabrication. We include two practical examples to demonstrate the potential of such optical elements. >

Journal ArticleDOI
TL;DR: It is argued that storage is the critical enabling technology for many new multimedia applications and addressing its rapidly increasing requirements is key to bringing forward this new technology.
Abstract: Notebook, palmtop, and pen-based computers have unique storage requirements. In addition to a rapid growth in capacity to support emerging multimedia applications, portable systems require small, removable media; high shock resistance; thermal resilience; and light-weight, unique form factors dictated by ergonomic constraints and choices. It is argued that storage is the critical enabling technology for many new multimedia applications and addressing its rapidly increasing requirements is key to bringing forward this new technology. >

Journal ArticleDOI
L. Spainhower1, T.A. Gregg1, R. Chillarege
TL;DR: It is argued that the Model 982 provides fault tolerance by combining enhanced circuit-level error detection and failure isolation techniques with system-level techniques exploiting inherent redundancy.
Abstract: Consolidated work loads running around the clock means that today's large, general-purpose computers must meet high availability demands. To meet these demands, it is argued that the Model 982 provides fault tolerance by combining enhanced circuit-level error detection and failure isolation techniques with system-level techniques exploiting inherent redundancy. >

Journal ArticleDOI
TL;DR: The modular design approach for fault-tolerant Boolean n-cube architectures lets us reduce hardware cost and increase system reliability, while assuring high performance during message routing, while keeping system costs low.
Abstract: The modular design approach we propose for fault-tolerant Boolean n-cube architectures lets us reduce hardware cost and increase system reliability, while assuring high performance during message routing Use of a specific module size with shared links and switches during reconfiguration helps keep system costs low Though this strategy may increase the number of routing steps needed to transmit a message between nodes, routing performance nonetheless surpasses competing approaches, if we consider switching delays >