scispace - formally typeset
Search or ask a question
Topic

PowerPC

About: PowerPC is a research topic. Over the lifetime, 1184 publications have been published within this topic receiving 22297 citations. The topic is also known as: ppc.


Papers
More filters
Journal ArticleDOI
01 Sep 2019
TL;DR: This multi-node characterization of the Emu Chick extends an earlier single-node investigation of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication and demonstrates that for many basic operations the EmU Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture.
Abstract: The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less “Gossamer” cores for computational work and rely on a typical stationary core (PowerPC) to run basic operating system functions and migrate threads between nodes. In this multi-node characterization of the Emu Chick, we extend an earlier single-node investigation [1] of the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication. We compare the Emu Chick hardware to architectural simulation and an Intel Xeon-based platform. Our results demonstrate that for many basic operations the Emu Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture although bandwidth usage suffers for computationally intensive workloads like SpMV. Moreover, the Emu Chick provides stable, predictable performance with up to 65% of the peak bandwidth utilization on a random-access pointer chasing benchmark with weak locality.

8 citations

01 Jan 2002
TL;DR: A performance analysis of SMT in a PowerPC-based wide superscalar processor architecture under a broad range of workloads, which includes combinations of TPC-C, SPECint and SPECfp is presented.
Abstract: Simultaneous multithreading (SMT) is an approach to address the well-known problems of memory accesses increasingly dominating processor execution time and of limited instruction level parallelism. Previous research has explored the benefits and limitations of SMT based on specific processor architectures under a variety of workloads. In this paper, we present a performance analysis of SMT in a PowerPC-based wide superscalar processor architecture under a broad range of workloads, which includes combinations of TPC-C, SPECint and SPECfp. Although some of our results are consistent with previous work, our results also demonstrate some differences and we use these results to explore and identify the primary causes of such differences. This includes an investigation of thread characteristics that work well together in SMT environments, and thread characteristics that do not work well together.

8 citations

Proceedings ArticleDOI
02 Dec 2001
TL;DR: It is found that two multithreaded Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates.
Abstract: Java has, in recent years, become fairly popular as a platform for commercial servers However, the behavior of Java server applications has not been studied extensively We characterize two multithreaded Java server benchmarks, SPECjbb2000 and VolanoMark 212, on two IBM PowerPC architectures, the RS64-111 and the POWER3-11, and compare them to more traditional workloads as represented by selected benchmarks from SPECint2000 We find that our Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates These benchmarks also exhibit high L2 miss rates due mostly to loads As one would expect, instruction cache and L2 misses are primary contributors to CPI Also, the proportion of zero dispatch cycles is high, indicating the difficulty in exploiting ILP for these workloads

8 citations

Proceedings ArticleDOI
18 May 2009
TL;DR: This work investigates how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism, and shows that a share memory architecture like the PowerPC 970MP of Marenostrum can surpass a heterogeneous machine like the current Cell BE.
Abstract: The exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm, which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, Cell BE machines and MareNostrum. Our results show that a share memory architecture like the PowerPC 970MP of Marenostrum can surpass a heterogeneous machine like the current Cell BE. Our quantitative analysis includes not only a study of scalability of the performance in terms of speedup, but also includes the analysis of bottlenecks in the execution of the application. This analysis is carried out through the study of the execution phases that the application presents.

8 citations

Journal ArticleDOI
TL;DR: A hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques that enables automatic prefetch and modulo scheduling transformations and can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.
Abstract: Ease of programming is one of the main requirements for the broad acceptance of multicore systems without hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques. Memory accesses are classified at compile time into two classes: high locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software-cache overhead in the innermost loop. The cache design enables automatic prefetch and modulo scheduling transformations. Performance evaluation indicates that optimized software-cache structures combined with the proposed prefetch techniques translate into speedup between 10 and 20 percent. As a result of the proposed technique, we can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.

8 citations


Network Information
Related Topics (5)
Scalability
50.9K papers, 931.6K citations
77% related
CMOS
81.3K papers, 1.1M citations
77% related
Software
130.5K papers, 2M citations
77% related
Integrated circuit
82.7K papers, 1M citations
76% related
Cache
59.1K papers, 976.6K citations
76% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20232
20226
20215
20208
201916
201823