Journal ArticleDOI
Cell broadband engine architecture and its first implementation: a performance view
Reads0
Chats0
TLDR
It is shown that the Cell/B.E.E., or Cell Broadband Engine, processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.Abstract:
The Cell Broadband Engine™ (Cell/B.E.) processor is the first implementation of the Cell Broadband Engine Architecture (CBEA), developed jointly by Sony, Toshiba, and IBM. In addition to use of the Cell/B.E. processor in the Sony Computer Entertainment PLAYSTATION® 3 system, there is much interest in using it for workstations, media-rich electronics devices, and video and image processing systems. The Cell/B.E. processor includes one PowerPC® processor element (PPE) and eight synergistic processor elements (SPEs). The CBEA is designed to be well suited for a wide variety of programming models, and it allows for partitioning of work between the PPE and the eight SPEs. In this paper we show that the Cell/B.E. processor can outperform other modern processors by approximately an order of magnitude and by even more in some cases.read more
Citations
More filters
Proceedings ArticleDOI
The potential of the cell processor for scientific computing
TL;DR: This work introduces a performance model for Cell and applies it to several key scientific computing kernels: dense matrix multiply, sparse matrix vector multiply, stencil computations, and 1D/2D FFTs, and proposes modest microarchitectural modifications that could significantly increase the efficiency of double-precision calculations.
Proceedings ArticleDOI
Entering the petaflop era: the architecture and performance of Roadrunner
Kevin J. Barker,Kei Davis,Adolfy Hoisie,Darren J. Kerbyson,Michael Lang,Scott Pakin,José Carlos Sancho +6 more
TL;DR: A detailed architectural description of Roadrunner and a detailed performance analysis of the system are presented and a case study of optimizing the MPI-based application Sweep3D to exploit Roadrunner's hybrid architecture is also included.
Journal ArticleDOI
State-of-the-art in heterogeneous computing
TL;DR: In this paper, the authors present an overview of the state-of-the-art in heterogeneous computing, focusing on three commonly found architectures: the Cell Broadband Engine Architecture, graphics processing units (GPUs), and field programmable gate arrays (FPGAs).
Место и роль инновационных технологий на уроках математики
TL;DR: In this paper, the authors proposed a method to compute the probability of a given node having a negative value for a given value of 0, i.e., a node having no negative value is 0.
Proceedings ArticleDOI
CudaDMA: optimizing GPU memory bandwidth via warp specialization
TL;DR: This work proposes an approach for programming GPUs with tightly-coupled specialized DMA warps for performing memory transfers between on-chip and off-chip memories, and presents an extensible API, CudaDMA, that encapsulates synchronization and common sequential and strided data transfer patterns.
References
More filters
Journal ArticleDOI
A Fast Computational Algorithm for the Discrete Cosine Transform
TL;DR: A Fast Discrete Cosine Transform algorithm has been developed which provides a factor of six improvement in computational complexity when compared to conventional DiscreteCosine Transform algorithms using the Fast Fourier Transform.
Journal ArticleDOI
Introduction to the cell multiprocessor
TL;DR: This paper discusses the history of the project, the program objectives and challenges, the disign concept, the architecture and programming models, and the implementation of the Cell multiprocessor.
Journal ArticleDOI
Synergistic Processing in Cell's Multicore Architecture
TL;DR: The streamlined architecture provides an efficient multithreaded execution environment for both scalar and SIMD threads and represents a reaffirmation of the RISC principles of combining leading edge architecture and compiler optimizations.
Proceedings ArticleDOI
Optimizing Compiler for the CELL Processor
Alexandre E. Eichenberger,Kevin John Patrick O'brien,Peng Wu,Tong Chen,Peter Howland Oden,Daniel A. Prener,J.C. Shepherd,Byoungro So,Zehra Sura,Amy Wang,Tao Zhang,Peng Zhao,Michael K. Gschwind +12 more
TL;DR: Several compiler techniques that aim at automatically generating high quality codes over a wide range of heterogeneous parallelism available on the CELL processor are described and results indicate that significant speedup can be achieved with a high level of support from the compiler.
Journal ArticleDOI
Software libraries for linear algebra computations on high performance computers
Jack Dongarra,David W. Walker +1 more
TL;DR: This paper discusses the design of linear algebra libraries for high performance computers, with particular emphasis on the development of scalable algorithms for multiple instruction multiple data (MIMD) distributed memory concurrent computers.