Topic

PowerPC

About: PowerPC is a research topic. Over the lifetime, 1184 publications have been published within this topic receiving 22297 citations. The topic is also known as: ppc.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A Microbenchmark Characterization of the Emu Chick

[...]

Jeffrey Young¹, Eric R. Hein, Srinivas Eswar¹, Patrick Lavin¹, Jiajia Li², Jason Riedy¹, Richard Vuduc¹, Thomas M. Conte¹ - Show less +4 more•Institutions (2)

Georgia Institute of Technology¹, Pacific Northwest National Laboratory²

01 Sep 2019

TL;DR: This multi-node characterization of the Emu Chick extends an earlier single-node investigation of the the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication and demonstrates that for many basic operations the EmU Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture.

...read moreread less

Abstract: The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less “Gossamer” cores for computational work and rely on a typical stationary core (PowerPC) to run basic operating system functions and migrate threads between nodes. In this multi-node characterization of the Emu Chick, we extend an earlier single-node investigation [1] of the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix-vector multiplication. We compare the Emu Chick hardware to architectural simulation and an Intel Xeon-based platform. Our results demonstrate that for many basic operations the Emu Chick can use available memory bandwidth more efficiently than a more traditional, cache-based architecture although bandwidth usage suffers for computationally intensive workloads like SpMV. Moreover, the Emu Chick provides stable, predictable performance with up to 65% of the peak bandwidth utilization on a random-access pointer chasing benchmark with weak locality.

...read moreread less

8 citations

Performance Analysis of Simultaneous Multithreading in a PowerPC-based Processor

[...]

F.N. Eskesen, M. Hack, T. Kimbrel, Mark S. Squillante, Richard J. Eickemeyer, Steven R. Kunkel - Show less +2 more

01 Jan 2002

TL;DR: A performance analysis of SMT in a PowerPC-based wide superscalar processor architecture under a broad range of workloads, which includes combinations of TPC-C, SPECint and SPECfp is presented.

...read moreread less

Abstract: Simultaneous multithreading (SMT) is an approach to address the well-known problems of memory accesses increasingly dominating processor execution time and of limited instruction level parallelism. Previous research has explored the benefits and limitations of SMT based on specific processor architectures under a variety of workloads. In this paper, we present a performance analysis of SMT in a PowerPC-based wide superscalar processor architecture under a broad range of workloads, which includes combinations of TPC-C, SPECint and SPECfp. Although some of our results are consistent with previous work, our results also demonstrate some differences and we use these results to explore and identify the primary causes of such differences. This includes an investigation of thread characteristics that work well together in SMT environments, and thread characteristics that do not work well together.

...read moreread less

8 citations

Proceedings Article•DOI•

Workload characterization of multithreaded Java servers on two PowerPC processors

[...]

P. Seshadri, A. Mericas¹•Institutions (1)

University of Michigan¹

02 Dec 2001

TL;DR: It is found that two multithreaded Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates.

...read moreread less

Abstract: Java has, in recent years, become fairly popular as a platform for commercial servers However, the behavior of Java server applications has not been studied extensively We characterize two multithreaded Java server benchmarks, SPECjbb2000 and VolanoMark 212, on two IBM PowerPC architectures, the RS64-111 and the POWER3-11, and compare them to more traditional workloads as represented by selected benchmarks from SPECint2000 We find that our Java server benchmarks have generally the same characteristics on both platforms: in particular, high instruction cache, ITLB, and BTAC (Branch Target Address Cache) miss rates These benchmarks also exhibit high L2 miss rates due mostly to loads As one would expect, instruction cache and L2 misses are primary contributors to CPI Also, the proportion of zero dispatch cycles is high, indicating the difficulty in exploiting ILP for these workloads

...read moreread less

8 citations

Proceedings Article•DOI•

Quantitative analysis of sequence alignment applications on multiprocessor architectures

[...]

Friman Sánchez Castaño¹, Alex Ramirez², Mateo Valero²•Institutions (2)

Polytechnic University of Catalonia¹, Barcelona Supercomputing Center²

18 May 2009

TL;DR: This work investigates how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism, and shows that a share memory architecture like the PowerPC 970MP of Marenostrum can surpass a heterogeneous machine like the current Cell BE.

...read moreread less

Abstract: The exponential growth of databases that contains biological information (such as protein and DNA data) demands great efforts to improve the performance of computational platforms. In this work we investigate how bioinformatics applications benefit from parallel architectures that combine different alternatives to exploit coarse- and fine-grain parallelism. As a case of analysis we study the performance behavior of the Ssearch application that implements the Smith-Waterman algorithm, which is a dynamic programing approach that explores the similarity between a pair of sequences. The inherent large parallelism of the algorithm makes it ideal for architectures supporting multiple dimensions of parallelism (TLP, DLP and ILP). We study how this algorithm can take advantage of different parallel machines like the SGI Altix, IBM Power6, Cell BE machines and MareNostrum. Our results show that a share memory architecture like the PowerPC 970MP of Marenostrum can surpass a heterogeneous machine like the current Cell BE. Our quantitative analysis includes not only a study of scalability of the performance in terms of speedup, but also includes the analysis of bottlenecks in the execution of the application. This analysis is carried out through the study of the execution phases that the application presents.

...read moreread less

8 citations

Journal Article•DOI•

Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

[...]

Nikola Vujic, Marc Gonzalez Tallada, Xavier Martorell, Eduard Ayguadé

01 Apr 2010-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques that enables automatic prefetch and modulo scheduling transformations and can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.

...read moreread less

Abstract: Ease of programming is one of the main requirements for the broad acceptance of multicore systems without hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques. Memory accesses are classified at compile time into two classes: high locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software-cache overhead in the innermost loop. The cache design enables automatic prefetch and modulo scheduling transformations. Performance evaluation indicates that optimized software-cache structures combined with the proposed prefetch techniques translate into speedup between 10 and 20 percent. As a result of the proposed technique, we can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.

...read moreread less

8 citations

Collapse

Network Information

Performance

Metrics

1,193

Papers

22,843

Citations

No. of papers in the topic in previous years
Year	Papers
2023	2
2022	6
2021	5
2020	8
2019	16
2018	23

PowerPC

Papers published on a yearly basis

Papers

Trending Questions (3)

Network Information

Related Topics (5)

Performance

Metrics