scispace - formally typeset
Search or ask a question
Topic

PowerPC

About: PowerPC is a research topic. Over the lifetime, 1184 publications have been published within this topic receiving 22297 citations. The topic is also known as: ppc.


Papers
More filters
Proceedings ArticleDOI
14 Apr 2008
TL;DR: Preliminary work is presented on a domain- specific compiler that generates implementations for arbitrary sequences of basic linear algebra operations and tunes them for memory efficiency.
Abstract: The performance bottleneck for many scientific applications is the cost of memory access inside linear algebra kernels. Tuning such kernels for memory efficiency is a complex task that reduces the productivity of computational scientists. Software libraries such as the Basic Linear Algebra Subprograms (BLAS) ameliorate this problem by providing a standard interface for which computer scientists and hardware vendors have created highly-tuned implementations. Scientific applications often require a sequence of BLAS operations, which presents further opportunities for memory optimization. However, because BLAS are tuned in isolation they do not take advantage of these opportunities. This phenomenon motivated the recent addition to the BLAS of several routines that perform sequences of operations. Unfortunately, the exact sequence of operations needed in a given situation is highly application dependent, so many more routines are needed. In this paper we present preliminary work on a domain- specific compiler that generates implementations for arbitrary sequences of basic linear algebra operations and tunes them for memory efficiency. We report experimental results for dense kernels and show speedups of 25 % to 120 % relative to sequences of calls to GotoBLAS and vendor-tuned BLAS on Intel Xeon and IBM PowerPC platforms.

48 citations

Journal ArticleDOI
TL;DR: A generic framework for defining instructions, programs, and the semantics of their instantiation by operations in a multiprocessor environment that allows an architect to reveal the programming view induced by a shared-memory architecture and guides architecture-level verification.
Abstract: This paper introduces a generic framework for defining instructions, programs, and the semantics of their instantiation by operations in a multiprocessor environment. The framework captures information flow between operations in a multiprocessor program by means of a reads-from mapping from read operations to write operations. Two fundamental relations are defined on the operations: a program order between operations which instantiate the program of some processor and view orders which are specific to each shared memory model. An operation cannot read from the "hidden" pastor from the future; the future and the past causality can be examined either relative to the program order or relative to the view orders. A shared memory model specifies, for a given program, the permissible transformation of resource states. The memory model should reflect the programmer's view by citing the guaranteed behavior of the multiprocessor in the interface visible to the programmer. The model should retrain from dictating the design practices that should be followed by the implementation. Our framework allows an architect to reveal the programming view induced by a shared-memory architecture; it serves programmers exploring the limits of the programming interface and guides architecture-level verification. The framework is applicable for complex, commercial architectures as it can capture subtle programming-interface details, exposing the underlying aggressive microarchitecture mechanisms. As an illustration, we define the shared memory model supported by the PowerPC architecture, within our framework.

48 citations

Proceedings ArticleDOI
J. Parry, H. Rosten, G.B. Kromann1
28 May 1996
TL;DR: In this paper, thermal models of the PowerPC 603 and 604 microprocessors in controlled-collapsed chip connection/ceramic-ball-grid-array (C4/CBGA) single-chip package are derived from "detailed" three-dimensional (3D) conduction models by both analytical and data fitting techniques, and the behavioral correctness of these models is assessed by comparing the die-junction temperatures predicted for the compact model with the detailed model results for a range of boundary conditions applied at the surfaces of the package.
Abstract: Thermal resistance networks or "compact" models of the PowerPC 603 and PowerPC 604 microprocessors in controlled-collapsed-chip-connection/ceramic-ball-grid-array (C4/CBGA) single-chip package are derived from "detailed" three-dimensional (3-D) conduction models of the parts by both analytical and data fitting techniques. The behavioral correctness of these models is assessed by comparing the die-junction temperatures predicted for the compact model with the detailed model results for a range of boundary conditions applied at the surfaces of the package. The performance of these models is then verified by comparing the detailed and compact models in an application-specific environment (a wind tunnel) using a computational-fluid dynamics program. The interaction between the package and its environment is also discussed. The work reported here forms part of a long term European research program to create and validate generic thermal models of a range of electronic parts.

47 citations

Proceedings ArticleDOI
26 Oct 2003
TL;DR: The system overview of the Java Just-In-Time (JIT) compiler is described, which is the basis for the latest production version of IBM Java JIT compiler that supports a diversity of processor architectures including both 32-bit and 64-bit modes, CISC, RISC, and VLIW architectures.
Abstract: This paper describes the system overview of our Java Just-In-Time (JIT) compiler, which is the basis for the latest production version of IBM Java JIT compiler that supports a diversity of processor architectures including both 32-bit and 64-bit modes, CISC, RISC, and VLIW architectures. In particular, we focus on the design and evaluation of the cross-platform optimizations that are common across different architectures. We studied the effectiveness of each optimization by selectively disabling it in our JIT compiler on three different platforms: IA-32, IA-64, and PowerPC. Our detailed measurements allowed us to rank the optimizations in terms of the greatest performance improvements with the smallest compilation times. The identified set includes method inlining only for tiny methods, exception check eliminations using forward dataflow analysis and partial redundancy elimination, scalar replacement for instance and class fields using dataflow analysis, optimizations for type inclusion checks, and the elimination of merge points in the control flow graphs. These optimizations can achieve 90% of the peak performance for two industry-standard benchmark programs on these platforms with only 34% of the compilation time compared to the case for using all of the optimizations.

47 citations

Journal ArticleDOI
James E. Smith1, Shlomo Weiss
TL;DR: A discussion is given on two RISC implementations: from Digital Equipment Corporation, the Alpha 21064, and from IBM/Motorola/Apple, the PowerPC 601; both are superscalar implementations, that is, they can sustain execution of two or more instructions per clock cycle.
Abstract: A discussion is given on two RISC implementations: from Digital Equipment Corporation, the Alpha 21064, and from IBM/Motorola/Apple, the PowerPC 601. Both are superscalar implementations, that is, they can sustain execution of two or more instructions per clock cycle. Otherwise, these two implementations present vastly different philosophies for achieving high performance. The PowerPC 601 focuses on powerful instructions and great flexibility in processing order, while the Alpha 21064 depends on a very fast clock, with simpler instructions and a more streamlined implementation structure. These two RISC microprocessors exemplify contrasting, but equally valid, implementation philosophies. An overview is given of the instruction sets and the authors emphasize the differences in design: PowerPC uses powerful instructions so that fewer are needed to get the job done; Alpha uses simple instructions so that the hardware can be kept simpler and faster. The authors also discuss the pipelined implementations of the two architectures; again, the contrast is between powerful and simple. >

47 citations


Network Information
Related Topics (5)
Scalability
50.9K papers, 931.6K citations
77% related
CMOS
81.3K papers, 1.1M citations
77% related
Software
130.5K papers, 2M citations
77% related
Integrated circuit
82.7K papers, 1M citations
76% related
Cache
59.1K papers, 976.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232
20226
20215
20208
201916
201823