scispace - formally typeset
Search or ask a question

Showing papers on "PowerPC published in 2001"


Proceedings ArticleDOI
A. Gattiker1, S. Nassif, R. Dinakar, C. Long
26 Mar 2001
TL;DR: This paper presents a means for estimating parametric timing yield and guiding robust design for-quality in the presence of manufacturing and operating environment variations by basing the proposed methodology on a post-processing step applied to the report generated as a by-product of static timing analysis.
Abstract: This paper presents a means for estimating parametric timing yield and guiding robust design for-quality in the presence of manufacturing and operating environment variations. Dual emphasis is on computational efficiency and providing meaningful robust-design guidance. Computational efficiency is achieved by basing the proposed methodology on a post-processing step applied to the report generated as a by-product of static timing analysis. Efficiency is also ensured by exploiting the fact that for small processing/environment variations, a linear model is adequate for capturing the resulting delay change. Meaningful design guidance is achieved by analyzing the timing-related influence of variations on a path-by-path basis, allowing designers perform a quality-oriented design pass focused on key paths. A coherent strategy is provided to handle both die-to-die and within-die variations. Examples from a PowerPC microprocessor illustrate the methodology and its capabilities.

130 citations


Proceedings ArticleDOI
27 May 2001
TL;DR: In this article, a GA-based framework is presented to automatically generate biases for a new version of the PowerPC architecture, and the results show that the GA is effective in achieving high buffer utilization and the best approach to use depends on whether the objectives are related.
Abstract: Biased random instruction generators are commonly used in architectural verification of microprocessors, with biases specified manually by designers. As the complexity of processors grows, so does the complexity of specifying biases. Automatic bias generation speeds up the verification flow and may lead to better coverage of potential design errors. In this work, we present a genetic algorithm based framework to automatically generate biases. We target utilization of specific buffers for a new version of the PowerPC architecture. Our results show that the GA is effective in achieving high buffer utilization. Also, in targeting multiple objectives, the best approach to use depends on whether the objectives are related.

42 citations


Proceedings ArticleDOI
01 Jan 2001
TL;DR: The ObjectAgent system is being developed to create an agent-based software architecture for autonomous distributed systems that uses agents to implement all of the software functionality and communicate through simplified natural language messages.
Abstract: The ObjectAgent system is being developed to create an agent-based software architecture for autonomous distributed systems. Agents are used to implement all of the software functionality and communicate through simplified natural language messages. Decision-making and fault detection and recovery capabilities are built-in at all levels. During the first phase of development, ObjectAgent was prototyped in Matlab. A complete, GUI-based environment was developed for the creation, simulation, and analysis of multiagent multisatellite systems. Collision avoidance and reconfiguration simulations were performed for a cluster of four satellites. ObjectAgent is now being ported to C++ for demonstration on a real-time, distributed testbed and deployment on TechSat 21 in 2003. The present architecture runs on a PowerPC 750 running Enea's OSE operating system. A preliminary demonstration of using ObjectAgent to perform a cluster reconfiguration of three satellites was performed in November 2000.

40 citations


Proceedings ArticleDOI
Ying Zhang1, Kimon Roufas, M. Yim
29 Oct 2001
TL;DR: This work presents a software architecture for modular, self-reconfigurable robots, in particular the PolyBot, which has been developed through its third generation, and features a multi-master/multi-slave structure in amulti-threaded environment, with three layers of communication protocols.
Abstract: Modular, self-reconfigurable robots show the promise of great versatility, robustness and low cost. However, programming such robots for specific tasks, with hundreds of modules and each of which with multiple actuators and sensors, can be tedious and error-prone. The extreme versatility of the modular systems requires a new paradigm in programming. We present a software architecture for this type of robot, in particular the PolyBot, which has been developed through its third generation. The architecture, based on the properties of the PolyBot electro-mechanical design, features a multi-master/multi-slave structure in a multi-threaded environment, with three layers of communication protocols. The architecture is currently being implemented for Motorola PowerPC using vxWorks.

38 citations


Journal ArticleDOI
01 Mar 2001
TL;DR: In this article, the architecture of a new class of computers optimized for lattice QCD calculations, called QCD On a Chip (QCDOC), is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry.
Abstract: The architecture of a new class of computers, optimized for lattice QCD calculations, is described. An individual node is based on a single integrated circuit containing a PowerPC 32-bit integer processor with a 1 Gflops 64-bit IEEE floating point unit, 4 Mbyte of memory, 8 Gbit/sec nearest-neighbor communications and additional control and diagnostic circuitry. The machine's name, QCDOC, derives from “QCD On a Chip”.

37 citations


Journal Article
TL;DR: In this article, a power-performance modeling toolkit is developed to aid in the evaluation and definition of future power-efficient, PowerPC processors, based on real, circuit-level power simulation data.
Abstract: We describe a new power-performance modeling toolkit, developed to aid in the evaluation and definition of future power-efficient, PowerPC processors. The base performance models in use in this project are: (a) a fast but cycle-accurate, parameterized research simulator and (b) a slower, pre-RTL reference model that models a specific high-end machine in full, latch-accurate detail. Energy characterizations are derived from real, circuit-level power simulation data. These are then combined to form higher-level energy models that are driven by microarchitecture-level parameters of interest. The overall methodology allows us to conduct power-performance tradeoff studies in defining the follow-on design points within a given product family. We present a few experimental results to illustrate the kinds of tradeoffs one can study using this tool.

35 citations


Journal Article
TL;DR: The predictor is able to compute a good estimation of the WCET even for complex tasks that contain a lot of dynamic cache usage, and its requirements are met by today's performance monitoring hardware.
Abstract: The control system of many complex mechatronic products requires for each task the Worst Case Execution Time (WCET), which is needed for the scheduler's admission tests and subsequently limits a task's execution time during operation. If a task exceeds the WCET, this situation is detected and either a handler is invoked or control is transferred to a human operator. Such control systems usually support preemptive multitasking, and if an object-oriented programming language (e.g., Java, C++, Oberon) is used, then the system may also provide dynamic loading and unloading of software components (modules). Only modern, state-of-the art microprocessors can provide the necessary compute cycles, but this combination of features (preemption, dynamic unloading of modules, advanced processors) creates unique challenges when estimating the WCET. Preemption makes it difficult to take the state of the caches and pipelines into account when determining the WCET, yet for modem processors, a WCET based on worst-case assumptions about caches and pipelines is too large to be useful, especially for big and complex real-time products. Since modules can be loaded and unloaded, each task must be analyzed in isolation, without explicit reference to other tasks that may execute concurrently. To obtain a realistic estimate of a task's execution time, we use static analysis of the source code combined with information about the task's runtime behavior. Runtime information is gathered by the performance monitor that is included in the processor's hardware implementation. Our predictor is able to compute a good estimation of the WCET even for complex tasks that contain a lot of dynamic cache usage, and its requirements are met by today's performance monitoring hardware. The paper includes data to evaluate the effectiveness of the proposed technique for a number of robotics control kernels that are written in an object-oriented programming language and execute on a PowerPC 604e-based system.

29 citations


Proceedings ArticleDOI
10 Mar 2001
TL;DR: The RAD750 achieves radiation hardness of 1E-11 upsets/bit-day and is designed for use in high performance spaceborne applications and a new companion ASIC, the Power PCI, provides the bridge between the RAD750, the 33 MHz PCI backplane bus, and system memory.
Abstract: BAE SYSTEMS has developed the RAD750/sup TM/, a fully licensed radiation hardened implementation of the PowerPC 750/sup TM/ microprocessor, based on the original design database. The processor is implemented in a 2.5 volt, 0.25 micron, six-layer metal CMOS technology. Employing a superscalar RISC architecture, processor performance of 240 million Dhrystone 2.1 instructions per second (MIPS) at 133 MHz is provided, while dissipating less than six watts of power. The RAD750 achieves radiation hardness of 1E-11 upsets/bit-day and is designed for use in high performance spaceborne applications. A new companion ASIC, the Power PCI, provides the bridge between the RAD750, the 33 MHz PCI backplane bus, and system memory. The Power PCI is implemented in a 3.3 volt, 0.5 micron, five-layer metal CMOS technology, and achieves radiation hardness of <1E-10 upsets/bit-day. This paper describes the implementation of both designs.

29 citations


Proceedings ArticleDOI
13 Jun 2001
TL;DR: The focus of this paper is upon the provision of virtual memory for processes of all integrity levels without complicating the timing analysis of safety-critical processes with hard deadlines and for lower integrity processes without hard deadlines.
Abstract: Conventionally, the use of virtual memory in safety-critical real-time systems has been avoided, one reason being the difficulties it provides to timing analysis. The difficulties arise due to the Memory Management Unit (MMU) on commercial processors being optimised to improve average performance, to the detriment of simple worst-case analysis. However within safety-critical systems, there is a move towards implementations where processes of differing integrity levels are allocated to the same processor. This requires adequate partitioning between processes of different integrity levels. One method for achieving this in the context of commercial processor is via use of the MMU and its support for virtual memory. The focus of this paper is upon the provision of virtual memory for processes of all integrity levels without complicating the timing analysis of safety-critical processes with hard deadlines. Also, for lower integrity processes without hard deadlines, the flexibility of the virtual memory provided does not restrict the process functionality, The virtual memory system proposed is generic and can be implemented on many commercial architectures e.g. PowerPC, ARM and MIPS. This paper details the PowerPC implementation.

27 citations


01 Jan 2001
TL;DR: In this paper, a parallel MPI messagepassing library was implemented in Fortran77 and C. This library enabled port code, without modification, from other parallel processors to the Macintosh cluster and achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations.
Abstract: We have constructed a parallel cluster consisting of 25 Apple Macintosh G3 and G4 computers running the MacOS, and have achieved very good performance on numerically intensive, parallel plasma particle-in-cell simulations. A subset of the MPI messagepassing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. For large problems where message packets are large and relatively few in number, performance of 50-150 MFlops/node is possible, depending on the problem. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. Introduction In recent years there has been a growing interest in clustering commodity computers to build inexpensive parallel computers. A number of projects have demonstrated that for certain classes of problems, this is a viable approach for cheap, numerically intensive computing. The most common platform for building such a parallel cluster is based on the Pentium processor running the Linux version of Unix [1]. When Apple Computer introduced the Macintosh G3 computer based on the Motorola PowerPC 750 processor, we decided to investigate whether a cluster based on the G3 was practical. This investigation was initially motivated by the impressive single node performance we achieved on our well-benchmarked suite of plasma particle-in-cell (PIC) simulation codes [23] on the Macintosh G3, as shown in Table I. This was due in part to the availability of an excellent optimizing Fortran compiler for the Macintosh produced by the Absoft Corporation [4]. Not only was the performance faster than the Pentiums, but it was comparable to the performance achieved on some of the Crays.

20 citations


Proceedings ArticleDOI
23 Apr 2001
TL;DR: In this article, a PIM-based multiprocessor system, the System Level Intelligent Intensive Computing (SLIIC) Quick look (QL) board, is discussed.
Abstract: The growing gap in performance between processor and memory speeds has created a problem for data-intensive applications. A recent approach for solving this problem is to use processor-in-memory (PIM) technology. PIM technology integrates a processor on a DRAM memory chip, which increases bandwidth between the processor and memory. In this paper, we discuss a PIM-based multiprocessor system, the System Level Intelligent Intensive Computing (SLIIC) Quick look (QL) board. This system includes eight COTS PIM chips and two FPGA chips that implement a flexible interconnect network. The performance of the SLIIC QL board is measured and analyzed for the distributed corner-turn application. We show that the performance of the current SLIIC QL on the distributed corner turn application is better than a PowerPC-based multicomputer that consumes more power and occupies more area. This advantage, which can be achieved in a limited context, demonstrates that even limited COTS PIMs have some advantages for data-intensive computations.

01 Feb 2001
TL;DR: A modification to the SimpleScalar tool set to support the PowerPC ISA is described and modifications to the suite of five simulators that model the micro-architecture at different levels of detail are made.
Abstract: In this report, we describe a modification to the SimpleScalar tool set to support the PowerPC ISA. Our work is based on Version 3.0 of the publicly available SimpleScalar tool set. We briefly describe features of the PowerPC ISA relevant to the simulator and provide operating system specific implementation details. We made modifications to the suite of five simulators that model the micro-architecture at different levels of detail. The timing simulator sim-outorder simulates PowerPC binaries on the Register Update Unit (RUU) micro-architecture. The five simulators were tested by simulating the SPEC CPU95 benchmarks to completion. The tool set simulates binaries compiled for 32-bit IBM AIX running on PowerPC.

15 Oct 2001
TL;DR: The Integrated Computer Control System (ICCS) for the National Ignition Facility (NIF) is a layered architecture of 300 front-end processors coordinated by supervisor subsystems including automatic beam alignment and wavefront control, laser and target diagnostics, pulse power, and shot control timed to 30 ps.
Abstract: The Integrated Computer Control System (ICCS) for the National Ignition Facility (NIF) is a layered architecture of 300 front-end processors (FEP) coordinated by supervisor subsystems including automatic beam alignment and wavefront control, laser and target diagnostics, pulse power, and shot control timed to 30 ps. FEP computers incorporate either VxWorks on PowerPC or Solaris on UltraSPARC processors that interface to over 45,000 control points attached to VME-bus or PCI-bus crates respectively. Typical devices are stepping motors, transient digitizers, calorimeters, and photodiodes. The front-end layer is divided into another segment comprised of an additional 14,000 control points for industrial controls including vacuum, argon, synthetic air, and safety interlocks implemented with Allen-Bradley programmable logic controllers (PLCs). The computer network is augmented asynchronous transfer mode (ATM) that delivers video streams from 500 sensor cameras monitoring the 192 laser beams to operator workstations. Software is based on an object-oriented framework using CORBA distribution that incorporates services for archiving, machine configuration, graphical user interface, monitoring, event logging, scripting, alert management, and access control. Software coding using a mixed language environment of Ada95 and Java is one-third complete at over 300 thousand source lines. Control system installation is currently under way for the first 8 beams, with project completion scheduled for 2008.

Posted Content
TL;DR: The Integrated Computer Control System (ICCS) for the National Ignition Facility (NIF) is a layered architecture of 300 front-end processors (FEP) coordinated by supervisor subsystems including automatic beam alignment and wavefront control, laser and target diagnostics, pulse power, and shot control timed to 30 ps.
Abstract: The Integrated Computer Control System (ICCS) for the National Ignition Facility (NIF) is a layered architecture of 300 front-end processors (FEP) coordinated by supervisor subsystems including automatic beam alignment and wavefront control, laser and target diagnostics, pulse power, and shot control timed to 30 ps. FEP computers incorporate either VxWorks on PowerPC or Solaris on UltraSPARC processors that interface to over 45,000 control points attached to VME-bus or PCI-bus crates respectively. Typical devices are stepping motors, transient digitizers, calorimeters, and photodiodes. The front-end layer is divided into another segment comprised of an additional 14,000 control points for industrial controls including vacuum, argon, synthetic air, and safety interlocks implemented with Allen-Bradley programmable logic controllers (PLCs). The computer network is augmented asynchronous transfer mode (ATM) that delivers video streams from 500 sensor cameras monitoring the 192 laser beams to operator workstations. Software is based on an object-oriented framework using CORBA distribution that incorporates services for archiving, machine configuration, graphical user interface, monitoring, event logging, scripting, alert management, and access control. Software coding using a mixed language environment of Ada95 and Java is one-third complete at over 300 thousand source lines. Control system installation is currently under way for the first 8 beams, with project completion scheduled for 2008.

Proceedings ArticleDOI
G.R. Brown1
10 Mar 2001
TL;DR: The RHPPC is a radiation hardened processor derived from PowerPC 603e/sup TM/ technology licensed from Motorola allowing all the mature COTS PowerPC/Sup TM/ software development tools to be used with the RHPPC.
Abstract: The RHPPC Single Board Computer (SEC) has an open architecture based on COTS standards for form factor, instruction set, operating system, backplane bus, and I/O. The RHPPC is a radiation hardened processor derived from PowerPC 603e/sup TM/ technology licensed from Motorola. The RHPPC is 100% software compatible with the commercial PowerPC603e/sup TM/ part allowing all the mature COTS PowerPC/sup TM/ software development tools to be used with the RHPPC. The RHPPC SEC architecture has been defined in conjunction with several key users. The RHPPC has a VxWorks/sup TM/ integrated operating environment consisting of startup code (SUROM), a Board Support package (BSP) and I/O drivers. This flight code is written in C using Wind River's COTS Tornado/sup TM/ software development environment. The RHPPC SEC provides enough throughput to enable on-board payload processing or can be used for spacecraft control functions. The RHPPC SEC offers 210 MIPS performance.

Proceedings ArticleDOI
17 Jun 2001
TL;DR: A new framework for selecting, duplicating and sequencing instructions so as to decrease register pressure based on backwards scheduling and a unique feature of this approach is the ability to perform these transformations on intermediate-language instructions in a general dependence graph.
Abstract: In this paper, we present a new framework for selecting, duplicating and sequencing instructions so as to decrease register pressure. The motivation for this work is to target current and future high-performance processors where reductions in register pressure in the compiled programs can lead to improved performance.For instruction selection and duplication, a unique feature of our approach is the ability to perform these transformations on intermediate-language instructions in a general dependence graph that contains both true and non-true dependences, unlike past work that restricted their attention to a single expression tree or a single expression dag. For instruction sequencing, we present a new algorithm for reducing register pressure that is based on backwards schedulingWe present preliminary performance results to validate our approach. Our results show that register-sensitive instruction duplication can deliver significant speedups (up to 1.22x) for the SPECint95 benchmarks on an IA-32 processor. We also show that register-sensitive sequencing delivers smaller speedups (up to 1.12x) for the SPECjvm and Java Grande benchmarks on a PowerPC processor (when utilizing two-thirds of its registers). We expect to see more significant speedups due to register-sensitive sequencing on processors with fewer register than the PowerPC (such as the IA-32).

Proceedings ArticleDOI
Pradip Bose1
01 Jul 2001
TL;DR: The authors review the performance validation methodology that they have developed and experimented with over the past few years and present examples and experimental results illustrating the use of this methodology in high end PowerPC processor development projects.
Abstract: The focus of today's processor validation methodology is primarily on ensuring functional integrity. Increasingly, however, pre-silicon performance validation is becoming part of the design verification challenge. Identification and elimination of performance deficiencies and bugs in the design prior to tape-out is an important aspect of building robust and dependable hardware. Many performance bugs are caused by latent functional defects in the pre-silicon software model of the machine. Besides, robust performance can be a key determinant of quality of service in applications like Web-serving. The authors review the performance validation methodology that they have developed and experimented with over the past few years. They also present examples and experimental results illustrating the use of this methodology in high end PowerPC processor development projects. The scope of the paper is limited to architectural performance, measured by metrics like instructions per cycle (IPC) or its inverse, CPI.

Proceedings ArticleDOI
13 Mar 2001
TL;DR: This paper will fully analyze all the timing paths using the ATPG techniques, thus overcoming the gap between the testing and timing analysis techniques and enabling us to do false path identification at the full-chip level of the circuit.
Abstract: Static timing anaylsis sets the industry standard in the design methodology of high speed/performance microprocessors to determine whether timing requirements have been met. Unfortunately, not all the paths identified using such analysis can be sensitized. This leads to a pessimistic estimation of the processor speed. Also, no amount of engineering effort spent on optimizing such paths can improve the timing performance of the chip. In the past we demonstrated initial results of how ATPG techniques can be used to identify false paths efficiently. Due to the gap between the physical design on which the static timing analysis of the chip is bused and the test view on which the ATPG techniques are applied to identify false paths, in many cases only sections of some of the paths in the full-chip were analyzed in our initial results. In this paper, we will fully analyze all the timing paths using the ATPG techniques, thus overcoming the gap between the testing and timing analysis techniques. This enables us to do false path identification at the full-chip level of the circuit. Results of applying our technique to the second generation G4 PowerPC/sup TM/ will be presented.

Proceedings ArticleDOI
02 Dec 2001
TL;DR: It is found that two multithreaded Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates.
Abstract: Java has, in recent years, become fairly popular as a platform for commercial servers However, the behavior of Java server applications has not been studied extensively We characterize two multithreaded Java server benchmarks, SPECjbb2000 and VolanoMark 212, on two IBM PowerPC architectures, the RS64-111 and the POWER3-11, and compare them to more traditional workloads as represented by selected benchmarks from SPECint2000 We find that our Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates These benchmarks also exhibit high L2 miss rates due mostly to loads As one would expect, instruction cache and L2 misses are primary contributors to CPI Also, the proportion of zero dispatch cycles is high, indicating the difficulty in exploiting ILP for these workloads

Book ChapterDOI
TL;DR: A methodology in which the behavior of a switch level device is specified using abstract parameterized regular expressions to generate a finite automaton that forms a symbolic simulation model representing an abstraction of the array core embedded in a larger design under analysis.
Abstract: We present a methodology in which the behavior of a switch level device is specified using abstract parameterized regular expressions. These specifications are used to generate a finite automaton representing an abstraction of the behavior of a blockof memory comprised of a set of such switch level devices. The automaton, in conjunction with an Efficient Memory Model [1], [2] for the devices, forms a symbolic simulation model representing an abstraction of the array core embedded in a larger design under analysis. Using Symbolic Trajectory Evaluation, we checkthe equivalence between a register transfer level description and a schematic description augmented with abstract specifications for one of the custom memories embedded in the MPC7450 PowerPC processor.

Proceedings ArticleDOI
G.R. Brown1
14 Oct 2001
TL;DR: Honeywell is developing a product line of standard electronics boards for on-orbit payload processing applications based on commercial standards for instruction sets, form factors, back plane busses, I/O and software tools.
Abstract: Honeywell is developing a product line of standard electronics boards for on-orbit payload processing applications. This product line is based on commercial standards for instruction sets, form factors, back plane busses, I/O and software tools. This architecture is based on a data flow model. The radiation hardened, Single Board Computer (SBC) is based on PowerPC 603e/sup TM/ technology licensed from Motorola. The SBC is one building block for realizing a space computing solution. Other building blocks offered by Honeywell are a radiation hardened vector processor (RHVP), mass memory, sensor interface electronics, and I/O suite. The RHPPC SBC provides enough throughput to enable on-board payload processing or can be used for spacecraft control functions. The current generation of space computers is based on R3000 class architectures and offer 20 MIPS performance. The RHPPC SBC offers 222 MIPS performance.

Journal ArticleDOI
TL;DR: It is found that simple decode predictors can reach better than 90% accuracy for guiding speculative decode and advocate adoption of speculative decode to optimize instruction translations for the common case.
Abstract: We present the design of a PowerPC-based simulation infrastructure for architectural research. Our infrastructure uses an execution-driven out-of-order processor timing simulator from the SimpleScalar tool set. While porting SimpleScalar to the PowerPC architecture, we would like to remain compatible with other versions of SimpleScalar. We accomplish this by performing dynamic binary translation of the PowerPC instruction set architecture to the SimpleScalar instruction set architecture, and by mapping the PowerPC architectural state onto the SimpleScalar register set. Using this infrastructure, we execute unmodified PowerPC binaries on an out-of-order processor timing simulator which implements the SimpleScalar architecture. We describe and investigate trade-offs in the translation of some complex PowerPC instructions and advocate adoption of speculative decode to optimize instruction translations for the common case. We find that simple decode predictors can reach better than 90% accuracy for guiding speculative decode.

Journal ArticleDOI
TL;DR: A programmable thermal management interface circuit for PowerPC systems has been designed, implemented, and tested for the Integrated Thermal Management (ITEM) System and yields intricate control and optimal management with little system overhead and minimum hardware requirements.

Proceedings ArticleDOI
07 Dec 2001
TL;DR: It is shown that (i) a simulation model is an approximation of the corresponding abstract specification and (ii) the abstracted memory core can be composed with the un-abstracted surrounding logic using a simple theory of composition.
Abstract: We present a methodology in which the behavior of custom memories can be abstracted by a couple of artifacts-one for the interface and another for the contents. Memories consisting of several ports result into several user-provided abstract specifications, which in turn can be converted to simulation models. We show that (i) a simulation model is an approximation of the corresponding abstract specification and (ii) the abstracted memory core can be composed with the un-abstracted surrounding logic using a simple theory of composition. We make use of this methodology to verify equivalence between register transfer level and transistor level descriptions of custom memories.

Proceedings ArticleDOI
29 Mar 2001
TL;DR: This paper compares three industry-adopted methodologies for testing custom blocks and pros and cons are analyzed and discussed based on factors such as stability of the methodologies, resulting sizes of gate-level models, ATPG process, and testing quality in terms of non-target defect detection.
Abstract: Custom circuits, in contrast to those synthesized by automatic tools, are manually designed blocks of which performance is critical to the full chip operation. Testing these blocks represents a major DFT challenge and hence, a crucial time-to-market factor in microprocessor design flow. This paper compares three industry-adopted methodologies for testing custom blocks. Pros and cons are analyzed and discussed based on factors such as stability of the methodologies, resulting sizes of gate-level models, ATPG process, and testing quality in terms of non-target defect detection. Experience and results from a recent PowerPC microprocessor are reported.

01 Jan 2001
TL;DR: In this paper, a distributed architecture utilizes SPARC AXi computers running Solaris to perform real-time image processing of sensor data and PowerPC-based computers running VxWorks to compute mirror commands.
Abstract: The National Ignition Facility (NIF) requires that pulses from each of the 192 laser beams be positioned on target with an accuracy of 50 um rms. Beam quality must be sufficient to focus a total of 1.8 MJ of 0.351-um light into a 600-um-diameter volume. An optimally flat beam wavefront can achieve this pointing and focusing accuracy. The control system corrects wavefront aberrations by performing closed-loop compensation during laser alignment to correct for gas density variations. Static compensation of flashlamp-induced thermal distortion is established just prior to the laser shot. The control system compensates each laser beam at 10 Hz by measuring the wavefront with a 77-lenslet Hartmann sensor and applying corrections with a 39-actuator deformable mirror. The distributed architecture utilizes SPARC AXi computers running Solaris to perform real-time image processing of sensor data and PowerPC-based computers running VxWorks to compute mirror commands. A single pair of SPARC and PowerPC processors accomplishes wavefront control for a group of eight beams. The software design uses proven adaptive optic control algorithms that are implemented in a multi-tasking environment to economically control the beam wavefronts in parallel. Prototype tests have achieved a closed-loop residual error of 0.03 waves rms.

Proceedings ArticleDOI
G. Vandling1
30 Oct 2001
TL;DR: The combination of accurate memory models and good delay testing has produced a tenfold reduction in customer returns for this chip compared with prior PowerPC programs.
Abstract: This paper describes the approach used to model the memory circuits contained in the Gekko microprocessor and the delay testing that was done at functional speeds using these models. The combination of accurate memory models and good delay testing has produced a tenfold reduction in customer returns for this chip compared with prior PowerPC programs.

Proceedings ArticleDOI
28 Mar 2001
TL;DR: This paper describes several practical approaches used in timing convergence of the IBM Gekko PowerPC/sup 1/ microprocessor that is used in the Nintendo Gamecube system and their impact on the timing and size of the microprocessor.
Abstract: Wire capacitance models used in some synthesis tools have been based on number of fanouts. These wire capacitance models can be misleading when compared to real wiring. This discrepancy can cause synthesis tools to optimize incorrectly causing severe problems with chip level timing convergence. Designs may take longer than expected and designers may work on timing paths that are not critical thus increasing the design cycle. In sub-micron designs it is crucial to improve the timing convergence between synthesis and physical design. This paper describes several practical approaches used in timing convergence of the IBM Gekko PowerPC/sup 1/ microprocessor that is used in the Nintendo Gamecube system. The impact of each approach is evaluated on the timing and size of the microprocessor.

Proceedings ArticleDOI
27 Jul 2001
TL;DR: The use of performance monitoring to characterize the machine learning based data mining program C4.5 running on an IBM Power II processor node in an IBM RS/6000 SP is explored.
Abstract: In many fields, such as data mining and e-commerce, performance issues are typically addressed by waiting for the next generation of processors and/or distributing the application in a parallel environment. An alternative has been to instrument the code so that observation can drive modifications to improve performance. Success is measured typically by the improvement in wall clock time of program execution. In the latest generation of commercial processors (IBM Power/PowerPC, Compaq Alpha, Intel Pentium III) programmable counters are included in the hardware to gather data that can be used for performance monitoring. These counters allow internal events in the processor to be observed without impacting the performance of the program that is being monitored. This paper explores the use of performance monitoring to characterize the machine learning based data mining program C4.5 running on an IBM Power II processor node in an IBM RS/6000 SP. Development and verification of the methodology to utilize the performance monitoring hardware is presented. The starting point of this work is an existing performance monitoring application that has been extended to allow monitoring of individual programs running on the single chip implementation of the Power II architecture. Examples of the data collected from the monitoring of C4.5 are presented and analyzed. With the experience gained from the work on a single node, the potential issues in extending this methodology into a parallel environment such as the IBM RS/6000 SP are explored.

Journal Article
TL;DR: Based on the FIFO communication model of MPI/PVM, an implementation of record and replay technique is presented that produces much less temporal and spatial overhead and is provided with an easy way to completely debug indeterminate MPI-PVM parallel programs.
Abstract: This paper discusses how to completely debug indeterminate MPI/PVM parallel programs. Due to the indeterminacy, the previous bugs may be non repeatable in successive executions during a cyclic debugging session. Based on the FIFO communication model of MPI/PVM, an implementation of record and replay technique is presented. Moreover, users are provided with an easy way to completely debug their programs by covering all possible execution paths through controllable replay. Comparied with other solutions, the proposed method produces much less temporal and spatial overhead. The implementation has been completed on two kinds of message passing architectures: one is Dawning 2000 super server (that was developed by the National Research Center for Intelligent Computing Systems of China) with single processor (PowerPC) nodes which are interconnected by a custom built wormhole mesh network; the other is a cluster of workstations (PowerPC/AIX) which has been built in National High Performance Computing Center at Hefei.