Showing papers on "PowerPC published in 2004"

PDF

Open Access

Journal Article•DOI•

Mambo: a full system simulator for the PowerPC architecture

[...]

Patrick J. Bohrer¹, James L. Peterson¹, Mootaz Elnozahy¹, Ramakrishnan Rajamony¹, Ahmed Gheith¹, Ron Rockhold¹, Charles R. Lefurgy¹, Hazim Shafi¹, Tarun Nakra¹, Rick Simpson¹, Evan Speight¹, Kartik Sudeep¹, Eric Van Hensbergen¹, Lixin Zhang¹ - Show less +10 more•Institutions (1)

IBM¹

01 Mar 2004

TL;DR: The experience in implementing the simulator and its uses within IBM to model future systems, support early software development, and design new system software are described.

...read moreread less

Abstract: Mambo is a full-system simulator for modeling PowerPC-based systems. It provides building blocks for creating simulators that range from purely functional to timing-accurate. Functional versions support fast emulation of individual PowerPC instructions and the devices necessary for executing operating systems. Timing-accurate versions add the ability to account for device timing delays, and support the modeling of the PowerPC processor microarchitecture. We describe our experience in implementing the simulator and its uses within IBM to model future systems, support early software development, and design new system software.

...read moreread less

152 citations

Proceedings Article•DOI•

ArchC: a systemC-based architecture description language

[...]

Sandro Rigo, Guido Araujo, Marcus Bartholomeu, Rodolfo Azevedo

27 Oct 2004

TL;DR: This paper presents an architecture description language (ADL) called ArchC, which is an open-source SystemC-based language that is specialized for processor architecture description that has a storage-based co-verification mechanism that automatically checks the consistency of a refined ArchC model against a reference (functional) description.

...read moreread less

Abstract: This paper presents an architecture description language (ADL) called ArchC, which is an open-source SystemC-based language that is specialized for processor architecture description. Its main goal is to provide enough information, at the right level of abstraction, in order to allow users to explore and verify new architectures, by automatically generating software tools like simulators and co-verification interfaces. ArchC's key features are a storage-based co-verification mechanism that automatically checks the consistency of a refined ArchC model against a reference (functional) description, memory hierarchy modeling capability, the possibility of integration with other SystemC IPs and the automatic generation of high-level SystemC simulators. We have used ArchC to synthesize both functional and cycle-based simulators for the MIPS, Intel 8051 and SPARC V8 processors, as well as functional models of modern architectures like TMS320C62x, XScale and PowerPC.

...read moreread less

93 citations

Proceedings Article•DOI•

On correlating structural tests with functional tests for speed binning of high performance design

[...]

J. Zeng¹, Magdy S. Abadir¹, G. Vandling², Li-C. Wang³, A. Kolhatkar¹, Jacob A. Abraham⁴ - Show less +2 more•Institutions (4)

Freescale Semiconductor¹, Cadence Design Systems², University of California, Santa Barbara³, University of Texas at Austin⁴

09 Sep 2004

TL;DR: This work investigates the correlation between functional test frequency and that of various types of structural patterns on MPC7455, a Motorola processor executing to the PowerPC/spl trade/ instruction set architecture.

...read moreread less

Abstract: The use of functional vectors has been an industry standard for speed binning purposes of high performance ICs. This practice can be prohibitively expensive as the ICs become faster and more complex. In comparison, structural patterns can target performance related faults in a more systematic manner. To make structural testing an effective alternative to functional testing for speed binning, structural patterns need to correlate with functional test frequencies closely. We investigate the correlation between functional test frequency and that of various types of structural patterns on MPC7455, a Motorola processor executing to the PowerPC/spl trade/ instruction set architecture.

...read moreread less

82 citations

Proceedings Article•DOI•

Error sensitivity of the Linux kernel executing on PowerPC G4 and Pentium 4 processors

[...]

Weining Gu¹, Zbigniew Kalbarczyk¹, Ravishankar K. Iyer¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

28 Jun 2004

TL;DR: Analysis of the obtained data indicates significant differences between the two platforms in how errors manifest and how they are detected in the hardware and the operating system.

...read moreread less

Abstract: The goals of this study are: (i) to compare Linux kernel (2.4.22) behavior under a broad range of errors on two target processors - the Intel Pentium 4 (P4) running RedHat Linux 9.0 and the Motorola PowerPC (G4) running YellowDog Linux 3.0 - and (ii) to understand how architectural characteristics of the target processors impact the error sensitivity of the operating system. Extensive error injection experiments involving over 115,000 faults/errors are conducted targeting the kernel code, data, stack, and CPU system registers. Analysis of the obtained data indicates significant differences between the two platforms in how errors manifest and how they are detected in the hardware and the operating system. In addition to quantifying the observed differences and similarities, the paper provides several examples to support the insights gained from this research.

...read moreread less

62 citations

Journal Article•DOI•

Elliptic and hyperelliptic curves on embedded μP

[...]

Thomas Wollinger¹, Jan Pelzl¹, Volker Wittelsberger¹, Christof Paar¹, Gokay Saldamli², Çetin Kaya Koç² - Show less +2 more•Institutions (2)

Ruhr University Bochum¹, Oregon State University²

01 Aug 2004-ACM Transactions in Embedded Computing Systems

TL;DR: This contribution appears to be the first thorough comparison of two public-key families, namely elliptic curve (ECC) and hyperelliptic curve cryptosystems on a wide range of embedded processor types (ARM, ColdFire, PowerPC).

...read moreread less

Abstract: It is widely recognized that data security will play a central role in future IT systems. Providing public-key cryptographic primitives, which are the core tools for security, is often difficult on embedded processor due to computational, memory, and power constraints. This contribution appears to be the first thorough comparison of two public-key families, namely elliptic curve (ECC) and hyperelliptic curve cryptosystems on a wide range of embedded processor types (ARM, ColdFire, PowerPC). We investigated the influence of the processor type, resources, and architecture regarding throughput. Further, we improved previously known HECC algorithms resulting in a more efficient arithmetic.

...read moreread less

43 citations

Architecture of a Reconfigurable Software Receiver

[...]

Gregory W. Heckler, James L. Garrison

24 Sep 2004

TL;DR: The Purdue Software Receiver (PSR) as mentioned in this paper is a real-time software defined GPS receiver developed at Purdue University for research and teaching purposes, which is designed to maximize reusability of the code.

...read moreread less

Abstract: The Purdue Software Receiver (PSR) is a real-time software defined GPS receiver developed at Purdue University for research and teaching purposes. The receiver’s software architecture was designed to maximize reusability of the code. This includes employing the receiver in a non real-time mode as a postprocessing tool for sampled GPS data as well as a realtime mode operating from an antenna and digital receiver card. Real-time operation is enabled by single instruction multiple data (SIMD) instructions found on modern x86 and PowerPC processors. The PSR is coded in C++, making use of threaded objects to encapsulate functions and related data together and to reduce unnecessary copying of data. A software construct termed the “pipewall” is used to separate the low level (correlation and tracking) functions from the higher level navigation processing. A short description of a laboratory GPS signal recording system will also be presented.

...read moreread less

35 citations

Proceedings Article•DOI•

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

[...]

Leonardo Bachega¹, Siddhartha Chatterjee¹, Kenneth Alan Dockser², John A. Gunnels¹, Manish Gupta¹, Fred G. Gustavson¹, Christopher A. Lapkowski¹, Gary K. Liu¹, M. Mendell¹, Charles D. Wait¹, T. J. Chris Ward¹ - Show less +7 more•Institutions (2)

IBM¹, Research Triangle Park²

29 Sep 2004

TL;DR: Preliminary performance data shows that the algorithm-compiler-hardware combination delivers a significant fraction of peak floating-point performance for compute-bound kernels such as matrix multiplication, and delivery of peak memory bandwidth for memory-bound kernel such as daxpy, while being largely insensitive to data alignment.

...read moreread less

Abstract: We describe the design, implementation, and evaluation of a dual-issue SIMD-like extension of the PowerPC 440 floating-point unit (FPU) core. This extended FPU is targeted at both IBM's massively parallel Blue-Gene/L machine as well as more pervasive embedded platforms. It has several novel features, such as a computational crossbar and cross-load/store instructions, which enhance the performance of numerical codes. We further discuss the hardware-software co-design that was essential to fully realize the performance benefits of the FPU when constrained by the memory bandwidth limitations and high penalties for misaligned data access imposed by the memory hierarchy on a BlueGene/L node. We describe several novel compiler and algorithmic techniques to take advantage of this architecture. Using both hand-optimized and compiled code for key linear algebraic kernels, we validate the architectural design choices, evaluate the success of the compiler, and quantify the effectiveness of the novel algorithm design techniques. Preliminary performance data shows that the algorithm-compiler-hardware combination delivers a significant fraction of peak floating-point performance for compute-bound kernels such as matrix multiplication, and delivers a significant fraction of peak memory bandwidth for memory-bound kernels such as daxpy, while being largely insensitive to data alignment.

...read moreread less

34 citations

Patent•

Method and system for parallel CRC calculation

[...]

Roger Maitland¹, Mark Turnbull¹•Institutions (1)

Nortel¹

12 Oct 2004

TL;DR: In this paper, a system and method for a parallel table look-up operation for a set of parallel inputs are presented. But this method requires the use of the PowerPC Altivec vperm instruction.

...read moreread less

Abstract: A system and method for a parallel CRC calculation is provided. A set of parallel inputs are loaded into a control register, and this control register is then used with a parallel table look-up operation to look up CRC entries for each of the inputs using a single instruction. This is repeated until each input has been processed completely to produce a complete CRC. The parallel table look-up operation may be executed using the PowerPC Altivec vperm instruction.

...read moreread less

28 citations

Proceedings Article•DOI•

PowerTune: advanced frequency and power scaling on 64b PowerPC microprocessor

[...]

Cedric Lichtenau¹, Mathew I. Ringler, T. Pfluger, Stephen Frank Geissler, Rolf Hilgendorf, Jay G. Heaslip, U. Weiss, Peter A. Sandon, Norman J. Rohrer, Erwin B. Cohen, M. Canada - Show less +7 more•Institutions (1)

IBM¹

13 Sep 2004

TL;DR: The challenges and implementation of a dynamically controlled clock frequency with noise suppression as well as a synchronization circuit for a multi-processor system are discussed.

...read moreread less

Abstract: PowerTune is a power-management technique for a multi-gigahertz superscalar 64b PowerPC/sup /spl reg// processor in a 90nm technology. This paper discusses the challenges and implementation of a dynamically controlled clock frequency with noise suppression as well as a synchronization circuit for a multi-processor system.

...read moreread less

24 citations

Journal Article•DOI•

Progress in real-time feedback control systems in RFX

[...]

Oliviero Barana¹, Adriano Luchetta¹, Gabriele Manduchi¹, Cesare Taliercio¹•Institutions (1)

European Atomic Energy Community¹

01 Jun 2004-Fusion Engineering and Design

TL;DR: Wind River VxWorks has been chosen as real-time operating system and PowerPC and Pentium processors were considered as candidates and tested and the first one has been selected due to the better performance in floating point computation.

...read moreread less

24 citations

Patent•

Graded task switching method based on PowerPC processor structure

[...]

Sun Xiaomin, Cai Yunpeng

21 Apr 2004

TL;DR: In this article, the task contex of the user can be divided into three parts of basic, expansion and selectable according to speciality of Power PC processor structure, and only three stack entering modes of Basic, Basic and Expansion as well as all context part are applied according to condition of system disposal and task dispatching.

...read moreread less

Abstract: The method has the following characteristics: the task contex of the user can be divided into three parts of basic, expansion and selectable according to speciality of Power PC processor structure. In interruption process, only three stack entering modes of basic, basicand expansion as well as all context part are applied according to condition of system disposal and task dispatching. The basic part stack entering is executed first. After interruption process is finished the nature of task dispatching is judged for selecting to execute the next stage of stack entering operation, to call dispatcher or to return to the user task in order to reduce unnecessary stacking operation.

...read moreread less

Book Chapter•DOI•

The PowerPC Backend Molen Compiler

[...]

Elena Moscu Panainte¹, Koen Bertels¹, Stamatis Vassiliadis¹•Institutions (1)

Delft University of Technology¹

30 Aug 2004

TL;DR: On the backend C compiler developed to target the Virtex II Pro PowerPC processor and to incorporate the Molen architecture programming paradigm, the performance efficiency is achieved using automatically generated but non-optimized DCT* hardware implementation.

...read moreread less

Abstract: In this paper, we report on the backend C compiler developed to target the Virtex II Pro PowerPC processor and to incorporate the Molen architecture programming paradigm. To verify the compiler, we used the multimedia video frame M-JPEG encoder of which the Discrete Cosine Transform (DCT*) function was mapped on the FPGA. We obtained an overall speedup of 2.5 against a maximal theoretical speedup of 2.96. The performance efficiency of 84 % is achieved using automatically generated but non-optimized DCT* hardware implementation.

...read moreread less

Book Chapter•DOI•

The Virtex II Pro TM MOLEN Processor

[...]

Georgi Kuzmanov¹, Georgi Gaydadjiev¹, Stamatis Vassiliadis¹•Institutions (1)

Delft University of Technology¹

21 Jul 2004

TL;DR: The paper focuses on hardware synthesis results and experimental performance evaluation, proving the viability of the MOLEN concept, where the MPEG-2 application is accelerated very closely to its theoretical limits by implementing SAD, DCT and IDCT as reconfigurable co-processors.

...read moreread less

Abstract: We use the Xilinx Virtex II ProTM technology as prototyping platform to design a MOLEN polymorphic processor, a custom computing machine based on the co-processor architectural paradigm. The PowerPC embedded in the FPGA is operating as a general purpose (core) processor and the reconfigurable fabric is used as a reconfigurable co-processor. The paper focuses on hardware synthesis results and experimental performance evaluation, proving the viability of the MOLEN concept. More precisely, the MPEG-2 application is accelerated very closely to its theoretical limits by implementing SAD, DCT and IDCT as reconfigurable co-processors. For a set of popular test video sequences the MPEG-2 encoder overall speedup is in the range between 2.64 and 3.18. The speedup of the MPEG-2 decoder varies between 1.65 and 1.94.

...read moreread less

Proceedings Article•

PowerPC 970 in 130nm and 90nm technologies

[...]

Norman J. Rohrer, M. Canada, Erwin B. Cohen, Mat Ringler, Mike Mayfield, Peter A. Sandon, Paul D. Kartschoke, Jay G. Heaslip, James Allen, Peter Mccormick, Thomas Pflüger, Jeff Zimmerman, Cedric Lichtenau, Tobias Werner, Gerard M. Salem, Mike Ross, David Appenzeller, Dana J. Thygesen - Show less +14 more

01 Jan 2004

Proceedings Article•DOI•

Cooperative software multithreading to enhance utilization of embedded processors for network applications

[...]

Carsten Albrecht¹, Rainer Hagenau¹, A. Doring•Institutions (1)

University of Lübeck¹

08 Mar 2004

TL;DR: This work aims at reducing the overhead for cooperative multithreading context switches at compile time by using standard compiler techniques such as context-insensitive analysis and register usage is rearranged to reduce the amount of context-switch code.

...read moreread less

Abstract: Multithreading is an efficient way to improve efficiency of processor cores in embedded products for networking infrastructures. To make such improvements also accessible to processor cores without hardware support for multithreading, we present a concept for efficient software multithreading through compiler post-pass optimization of the application code. Our approach aims at reducing the overhead for cooperative multithreading context switches at compile time by using standard compiler techniques such as context-insensitive analysis. Additionally, register usage is rearranged to reduce the amount of context-switch code by exploiting multiple-load/store instructions. Performance model analysis encourages the use of software multithreading to improve processor utilization by showing the benefit of our approach. We present results obtained by an implementation for the PowerPC ISA (Instruction Set Architecture) using the code of a real network application (iSCSI). We were able to reduce the expected run-time of a context switch to as little as 38% of the original.

...read moreread less

Journal Article•DOI•

A distributed memory parallel implementation of the multigrid method for solving three-dimensional implicit solid mechanics problems

[...]

A. Namazifard¹, I. D. Parsons²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Lawrence Livermore National Laboratory²

28 Oct 2004-International Journal for Numerical Methods in Engineering

TL;DR: The parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems is described and an algebraic framework for the parallel computations is presented, and an object‐based programming methodology using Fortran90 is described.

...read moreread less

Abstract: We describe the parallel implementation of a multigrid method for unstructured finite element discretizations of solid mechanics problems. We focus on a distributed memory programming model and use the MPI library to perform the required interprocessor communications. We present an algebraic framework for our parallel computations, and describe an object-based programming methodology using Fortran90. The performance of the implementation is measured by solving both fixed- and scaled-size problems on three different parallel computers (an SGI Origin2000, an IBM SP2 and a Cray T3E). The code performs well in terms of speedup, parallel efficiency and scalability. However, the floating point performance is considerably below the peak values attributed to these machines. Lazy processors are documented on the Origin that produce reduced performance statistics. The solution of two problems on an SGI Origin2000, an IBM PowerPC SMP and a Linux cluster demonstrate that the algorithm performs well when applied to the unstructured meshes required for practical engineering analysis. Copyright © 2004 John Wiley & Sons, Ltd.

...read moreread less

Proceedings Article•DOI•

PowerPC 970 in 130 nm and 90 nm technologies

[...]

Norman J. Rohrer¹, M. Canada¹, Erwin B. Cohen¹, Mathew I. Ringler¹, M. Mayfield, Peter A. Sandon, Paul D. Kartschoke, Jay G. Heaslip, James Allen, P. McCormick, T. Pfluger, Jeffrey S. Zimmerman, Cedric Lichtenau, Tobias Werner, Gerard M. Salem, M. Ross, David Appenzeller, Dana J. Thygesen - Show less +14 more•Institutions (1)

IBM¹

13 Sep 2004

TL;DR: A 64 b PowerPC microprocessor is introduced in 130 nm and redesigned in 90 nm SOI technology, which features PowerTune for rapid frequency and power scaling and electronic fuses.

...read moreread less

Abstract: A 64 b PowerPC microprocessor is introduced in 130 nm and redesigned in 90 nm SOI technology. PowerPC 970 implements a SIMD instruction set with 512 kB L2 cache. It runs at 2.0 GHz with a 1.0 GHz bus in 130 nm. The 90 nm design features PowerTune for rapid frequency and power scaling and electronic fuses.

...read moreread less

Dissertation•

Embedded software streaming via block streaming

[...]

Pramote Kuacharoen, Vincent J. Mooney

01 Jan 2004

TL;DR: This dissertation presents a streaming method, which is implemented and simulated on an MBX860 board and on a hardware/software co-simulation platform in which the PowerPC architecture was used, that enables small memory footprint devices to run applications larger than the physical memory by using the memory management technique.

...read moreread less

Abstract: Downloading software from a server usually takes a noticeable amount of time, that is, noticeable to the user who wants to run the program. However, this issue can be mitigated by the use of streaming software. Software steaming is a means by which software can begin execution even while transmission of the full software program may still be in progress. Therefore, the application load time (i.e., the amount of time from when an application is selected for download to when the application can be executed) observed by the user can be significantly reduced. Moreover, unneeded software components might not be downloaded to the device, lowering memory and bandwidth usages. As a result, resource utilization such as memory and bandwidth usage may also be more efficient. Using our streaming method, an embedded device can support a wide range of applications which can be run on demand. Software streaming also enables small memory footprint devices to run applications larger than the physical memory by using our memory management technique. In this dissertation, we present a streaming method we call block streaming to transmit stream-enabled applications, including stream-enabled file I/O. We implemented a tool to partition software into blocks which can be transmitted (streamed) to the embedded device. Our streaming method was implemented and simulated on an MBX860 board and on a hardware/software co-simulation platform in which we used the PowerPC architecture. We show a robotics application that, with our software streaming method, is able to meet its deadline. The application load time for this application also improves by a factor of more than 10X when compared to downloading the entire application before running it. The experimental results also show that our implementation improves file I/O operation latency; in our examples, the performance improves up to 55.83X when compared with direct download. Finally, we show a stream-enabled game application combined with stream-enabled file I/O for which the user can start playing the game 3.18X more quickly than using only the stream-enabled game program file alone.

...read moreread less

Proceedings Article•

Real-time combustion knock processing using a single instruction multiple data automotive PowerPC system-on-a-chip

[...]

M. Anas¹, R.J. Paling¹, William H. Nailon, D.R.S. Cumming•Institutions (1)

Motorola¹

20 Jul 2004

TL;DR: The efficient coding and optimisation techniques used for the single instruction multiple data implementation of the algorithm have been shown to improve overall performance and as a result utilises minimum combustion event timing.

...read moreread less

Abstract: This paper discusses a novel high performance knock processing strategy using a next generation Motorola automotive PowerPC system-on-a-chip. The proposed methodology is based on an auxiliary signal processing extension to the main PowerPC system-on-a-chip core along with various intelligent autonomous on-chip modules. Real-time software development techniques with an advanced software circular buffer implementation for processing the streaming knock sensor data have been developed. Various single instruction multiple data software optimisation techniques are employed to reduce the real-time knock algorithmic execution time. Real-time and simulation results are presented for the detection of knock on a four cylinder internal combustion engine, however, the approach is widely applicable. The efficient coding and optimisation techniques used for the single instruction multiple data implementation of the algorithm have been shown to improve overall performance and as a result utilises minimum combustion event timing.

...read moreread less

Journal Article•DOI•

Enhanced equivalence checking: toward a solidarity of functional verification and manufacturing test generation

[...]

Jayanta Bhadra¹, N. Krishnamurthy¹, Magdy S. Abadir¹•Institutions (1)

Freescale Semiconductor¹

01 Nov 2004-IEEE Design & Test of Computers

TL;DR: This article uses RTL, gate, and switch models of a design in two different flows one for test and one for functional verification to show that rectifying constraints and merging tests between the-two flows saves significant presilicon debug effort.

...read moreread less

Abstract: This article, from the Motorola (now Freescale) PowerPC design group, presents an interesting synergy among test, equivalence verification, and constraints. The authors use RTL, gate, and switch models of a design in two different flows one for test and one for functional verification to show that rectifying constraints and merging tests between the-two flows saves significant presilicon debug effort.

...read moreread less

Proceedings Article•DOI•

16-bit FP sub-word parallelism to facilitate compiler vectorization and improve performance of image and media processing

[...]

D. Etiemble, L. Lacassagne¹•Institutions (1)

University of Paris-Sud¹

15 Aug 2004

TL;DR: This work considers the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing and shows that significant speed-up is obtained compared to 32-bit FP versions.

...read moreread less

Abstract: We consider the implementation of 16-bit floating point instructions on a Pentium 4 and a PowerPC G5 for image and media processing. By measuring the execution time of benchmarks with these new simulated instructions, we show that significant speed-up is obtained compared to 32-bit FP versions. For image processing, the speed-up both comes from doubling the number of operations per SIMD instruction and the better cache behavior with byte storage. For data stream processing with arrays of structures, the speed-up mainly comes from the wider SIMD instructions.

...read moreread less

Book Chapter•DOI•

EVE, an Object Oriented SIMD Library

[...]

Joel Falcou¹, Jocelyn Serot¹•Institutions (1)

Centre national de la recherche scientifique¹

06 Jun 2004

TL;DR: Eve (Expressive Velocity Engine), an object oriented C++ library designed to ease the process of writing efficient numerical applications using AltiVec, the SIMD extension designed by Apple, Motorola and IBM for PowerPC processors, offers a significant improvement in terms of expressivity.

...read moreread less

Abstract: This paper describes eve (Expressive Velocity Engine), an object oriented C++ library designed to ease the process of writing efficient numerical applications using AltiVec, the SIMD extension designed by Apple, Motorola and IBM for PowerPC processors. Compared to the Altivec original C API, eve, offers a significant improvement in terms of expressivity. By relying on template metaprogramming techniques, this is not obtained at the expense of efficiency.

...read moreread less

Proceedings Article•DOI•

Efficient data driven run-time code generation

[...]

Karine Brifault, Henri-Pierre Charles

22 Oct 2004

TL;DR: A low-level compiling technique based on a minimal code generator with parametric embedded sections to generate binary code at run-time for intensively reused functions in graphic applications where the advantages of dynamic compilation have not been fully taken into account yet.

...read moreread less

Abstract: Knowledge of data values at run-time allows us to generate better code in terms of efficiency, size and power consumption.This paper introduces a low-level compiling technique based on a minimal code generator with parametric embedded sections to generate binary code at run-time. This generator called a "compilet" creates code and allocates registers using the data input. Then, it generates the needed instructions. Our measurements, performed on Itanium 2 and PowerPC platforms have shown a speed improvement of 43% on the Itanium 2 platform and 41% on the PowerPC one.The proposed technique proves to be particularly useful in the case of intensively reused functions in graphic applications, where the advantages of dynamic compilation have not been fully taken into account yet.

...read moreread less

Proceedings Article•DOI•

On correlating structural tests with functional tests for speed binning

[...]

Jing Zeng¹, M. Abadir¹•Institutions (1)

Motorola¹

05 Apr 2004

TL;DR: This paper demonstrates the correlations between the functional test frequency and that of various types of structural patterns on MPC7455, a Motorola processor executing to the PowerPC/sup /spl trade// instruction set architecture.

...read moreread less

Abstract: The utilization of functional vectors has been an industry standard for speed binning purpose. This practice can be prohibitively expensive as the ICs become faster and more complex. In comparison, structural patterns can target performance related faults in a more systematic manner. To make structural test an effective alternative to functional test for speed binning, structural patterns need to correlate with functional test frequency closely. In this paper, we demonstrate the correlations between the functional test frequency and that of various types of structural patterns on MPC7455, a Motorola processor executing to the PowerPC/sup /spl trade// instruction set architecture.

...read moreread less

IBM powerPC 405 SEU mitigation using processor voting techniques in Xilinx Virtex-I1 pro FPGA

[...]

Mandy Wang, Gary S. Bolotin

08 Sep 2004

TL;DR: At Jet Propulsion Laboratory (JPL), the feasibility of running multiple processors running in a lock step fashion to accomplish SEU mitigation and fault tolerance is demonstrated.

...read moreread less

Abstract: Not until recently, Xilinx has developed a new field programmable gate array (FPGA) device family, Virtex-I1 Pro. In this single device, not only dies it have density logic cells (3K to125K), gigabit connectivity, on chip memory, digital clock management, but also it can have up to four IBM PowerPC 405 Processor hard cores, running up to 400MHz and 633 Mbps. To utilize this cutting edge device in space applications, a few Single Event Upset (SEU) mitigation techniques need to be implemented to a design for the device. At Jet Propulsion Laboratory (JPL), we have successfully demonstrated the feasibility of running multiple processors running in a lock step fashion to accomplish SEU mitigation and fault tolerance.

...read moreread less

Proceedings Article•DOI•

System X: building the Virginia Tech supercomputer

[...]

S. Varadarajan

11 Oct 2004

TL;DR: Transparent fault tolerance for massively parallel supercomputers, scalable network emulation, compiler directed strategies for flexible data sharing models, and routing algorithms for backbone IP networks are focused on.

...read moreread less

Abstract: System X was conceived in March 2003, designed in July 2003, and by October it had achieved a sustained performance of 10.28 Teraflops, making it the third fastest supercomputer in the world today. System X has several novel features. First, it is based on an Apple G5 platform with the new IBM PowerPC 970 64-bit CPUs. Secondly, it uses a high performance switched communications fabric called Infiniband. Finally, system X is cooled by a hybrid liquid-air cooling system. In this paper, the author presents the motivation for System X, its architecture, and the challenges faced in building, deploying, and maintaining a large-scale supercomputer. The paper is focused on transparent fault tolerance for massively parallel supercomputers, scalable network emulation, compiler directed strategies for flexible data sharing models, and routing algorithms for backbone IP networks

...read moreread less

Proceedings Article•DOI•

Development of BGA solution for the IBM PowerPC 970 module in Apple's Power Mac G5

[...]

David L. Edwards¹, H. Chambers, Mukta G. Farooq, L. Goldmann, A. Salehi - Show less +1 more•Institutions (1)

IBM¹

01 Jun 2004

TL;DR: This paper describes how through a cooperative effort between Apple and IBM, a BGA reliability enhancement was evaluated and successfully implemented, which strengthens the BGA connections between the processor module and the processor card and increases long term reliability performance affected by creep and cyclic fatigue.

...read moreread less

Abstract: Apple's Power Mac G5 systems use either one or two IBM PowerPC 970 chips. Initial systems built with the PowerPC 970 64-bit processor run at speeds up to 2.0 GHz. These chips are packaged on IBM ceramic BGA (ball grid array) modules. The high performance modules dissipate high power, which presents new packaging challenges. One of these challenges has been addressed successfully by improving the thermo-mechanical integrity of the solder interconnections between the chip carrier module and the organic processor board. The PowerPC 970 chip dissipates high power in a small area and is aggressively cooled using a state-of-the art heatsink design. This paper describes how through a cooperative effort between Apple and IBM, a BGA reliability enhancement was evaluated and successfully implemented. Use of BGA underfill strengthens the BGA connections between the processor module and the processor card and increases long term reliability performance affected by creep and cyclic fatigue.

...read moreread less

CFD and EDA tools The Interoperability of FLOTHERM® and Board Station®/AutoTherm®: Concurrent Design of a Freescale PowerPC™ RISC Microprocessor-based Microcomputer

[...]

Gary Kromann, Vincent Pimont, Steve Addison

01 Jan 2004

TL;DR: In this paper, the thermal aspects of this concurrent process that required the use of a board-level (AutoTherm from Mentor Graphics) and system-level thermal analysis tool (FLOTHERM from Flomerics) were discussed.

...read moreread less

Abstract: Summary This paper discusses an attempt to bring thermal analysis early in the printed-circuit board design process, when designing Motorola’s PowerPC 603™ and PowerPC 604™ microprocessor-based desktop system. The goal was to assess a methodology that should help to define a real concurrent design process for future projects. We emphasize here the thermal aspects of this concurrent process that required the use of a board-level (AutoTherm from Mentor Graphics) and system-level thermal analysis tool (FLOTHERM from Flomerics). After describing the project, and the dataflow currently available between AutoTherm and FLOTHERM, we describe the practical steps that were carried out in this project, and how thermal design has finally been included as one of the constraint during the component placement phase on the printed-circuit board design. Overall the experience gained through this project on multi-level thermal analysis, as well as, working in a cross-functional team environment is presented. Also presented are the steps for implementing such a concurrent design flow.

...read moreread less

Patent•

Method for keep Bootrom and VxWorks image compiling and running normally

[...]

Wu Xingmei

10 Nov 2004

TL;DR: In this article, the authors propose a method to make Bootrom and VxWorks images able to be normally compiled and run all the time, which redefines and redistributes the address space for RAM_LOW_ADRS, RAM_HIGH ADRS, etc in the two images, and accordingly processes the change of the system's own Memory pool.

...read moreread less

Abstract: The invention provides a method to make Bootrom and VxWorks images able to be normally compiled and run all the time, which redefines and redistributes the address space for RAM_LOW_ADRS, RAM_HIGH_ADRS, etc in the two images, and accordingly processes the change of the system's own Memory pool, so that when the length of the codes of the two images in the products with PowerPC series as CPU is more than 32M, they can still be normally compiled and run.

...read moreread less

Proceedings Article•DOI•

Architectural approaches for dynamic translation and reconfiguration

[...]

Brian F. Veale¹, John K. Antonio¹, Monte P. Tull¹•Institutions (1)

University of Oklahoma¹

02 Apr 2004

TL;DR: Inspired by features from both the DAISY and Crusoe/spl trade/ microprocessors, a conceptual design of a dynamically reconfigurable microprocessor is given.

...read moreread less

Abstract: A microprocessor taxonomy is introduced based on whether: (1) the hardware is static or reconfigurable and (2) the code translation process is static or dynamic. The IBM DAISY and Transmeta Crusoe/spl trade/ microprocessors are reviewed. These static hardware microprocessors support a dynamic translation process to execute programs originally compiled for the PowerPC and Intel/spl reg/ X86 microprocessors, respectively. Inspired by features from both the DAISY and Crusoe/spl trade/ microprocessors, a conceptual design of a dynamically reconfigurable microprocessor is given. Driven by the results of a preliminary study, a specific approach to designing a reconfigurable microprocessor is presented. As a part of this approach, the concept of partitioning the instruction set of a microprocessor in order to support an application, instead of partitioning the functionality of the application, is developed.

...read moreread less