scispace - formally typeset
Proceedings ArticleDOI

Performance and power evaluation of an in-line accelerator

Reads0
Chats0
TLDR
In this paper, a processor-attached in-line accelerator provides high-performance SIMD computing and power efficiency by means of a very large register file and a set of vector multimedia extensions based on IBM's PowerPC VMX.
Abstract
In this paper we evaluate the performance and power of a processor-attached in-line accelerator. The accelerator provides high-performance SIMD computing and power efficiency by means of a very large register file and a set of vector multimedia extensions based on IBM's PowerPC VMX. Our experiments show significant performance improvements and power reduction, compared to a baseline vector execution unit, mainly due to the drastic decrease of memory accesses caused by the software-managed locality of the very large register file. Total execution time is, on average, reduced by 61%, while consuming 55% less energy.

read more

Citations
More filters
Proceedings ArticleDOI

Architectural perspectives of future wireless base stations based on the IBM PowerEN™ processor

TL;DR: The applicability and potential benefits of the IBM PowerEN processor in the realm of base stations for the 3G and 4G standards are studied, and the in-line universal accelerator and the PIR strategy focusing on two specific applications for base stations are evaluated.
Dissertation

Performance and power optimizations in chip multiprocessors for throughput-aware computation

TL;DR: This thesis presents innovations to improve bandwidth and power consumption in chip multiprocessors (CMPs) for throughput-aware computation: a bandwidth-optimized last-level cache (LLC), an bandwidth- Optimized vector register file, and a power/performance-aware thread placement heuristic.
Dissertation

Raising the level of abstraction : simulation of large chip multiprocessors running multithreaded applications

TL;DR: This thesis proposes a simulation methodology that employs a trace-driven simulator together with a runtime sytem that allows the proper simulation of multithreaded applications by reproducing the timing-dependent dynamic behavior at simulation time.
References
More filters
Proceedings ArticleDOI

A High-Performance SIMD Floating Point Unit for BlueGene/L: Architecture, Compilation, and Algorithm Design

TL;DR: Preliminary performance data shows that the algorithm-compiler-hardware combination delivers a significant fraction of peak floating-point performance for compute-bound kernels such as matrix multiplication, and delivery of peak memory bandwidth for memory-bound kernel such as daxpy, while being largely insensitive to data alignment.
Proceedings ArticleDOI

VICTORIA: VMX indirect compute technology oriented towards in-line acceleration

TL;DR: The VICTORIA PowerPC architecture is described, which is based on the iVMX accelerator technology, which extends the existing VMX architecture with indirect register addressing and opens the door for highly optimized vector algorithms that can sustain very high processing rates.
Related Papers (5)