Open Access
Federation: Out-of-Order Execution using Simple In-Order Cores
TLDR
Federating each pair of neighboring, scalar cores provides a scalable, energy-efficient, and area-efficient solution for limited thread counts, with the ability to boost performance across a wide range ofthread counts, until thread count returns to a level at which the baseline, multithreaded, “throughput mode” can resume.Abstract:
Manycore architectures with dozens, hundreds, or thousands of threads are likely to use single-issue, in-order execution cores with simple pipelines but multiple thread contexts per core. This approach is beneficial for throughput but only with thread counts high enough to keep most thread contexts occupied. If these manycore architectures do not want to be limited to niches with embarrassing levels of parallelism, they must cope with the case when thread count is limited: too many threads for dedicated, high-performance cores (which come at high area cost), but too few to exploit the huge number of thread contexts. The only solution is to augment the simple, scalar cores. This paper describes how to create an out-of-order processor on the fly by “federating” each pair of neighboring, scalar cores.This adds a few new structures between each pair but otherwise repurposes the existing cores. It can be accomplished with less than 2KB of extra hardware per pair, nearly doubling the performance of a single, scalar core and approaching that of a traditional, dedicated 2-way out-of-order core. The key insights that make this possible are the use of the large number of registers in multi-threaded scalar cores to support out-of-order execution and the removal of large, associative structures. Federation provides a scalable, energy-efficient, and area-efficient solution for limited thread counts, with the ability to boost performance across a wide range of thread counts, until thread count returns to a level at which the baseline, multithreaded, “throughput mode” can resume.read more
Citations
More filters
Proceedings ArticleDOI
Federation: repurposing scalar cores for out-of-order instruction issue
TL;DR: A way to repurpose a pair of scalar cores into a 2-way out-of-order issue core with minimal area overhead and achieves comparable performance to a dedicated out- of-order core and dissipates less power as well.
Journal ArticleDOI
Scaling Power and Performance viaProcessor Composability
Madhu Saravana Sibi Govindan,Behnam Robatmili,Dong Li,Bertrand A. Maher,Aaron L. Smith,Stephen W. Keckler,Doug Burger +6 more
TL;DR: The study shows that composing multiple dual-issue cores (up to eight) provides performance scaling that is as energy-efficient as frequency scaling in a balanced microarchitecture, and is considerably more efficient than scaling the voltage to achieve additional performance once the maximum frequency at the minimum voltage is attained.
Journal ArticleDOI
Multitasking workload scheduling on flexible core chip multiprocessors
TL;DR: This paper describes a new resource allocation and scheduling problem which must determine how many logical processors should be configured, how powerful each processor should be, and where/when each task should run, and examines and evaluates several algorithms appropriate for such flexible-core CMPs.
Proceedings ArticleDOI
Strategies for mapping dataflow blocks to distributed hardware
TL;DR: By choosing an appropriate runtime block mapping strategy, average performance can be increased by 18%, while simultaneously reducing average operand communication by 70%, saving energy as well as improving performance.
Proceedings ArticleDOI
Multitasking workload scheduling on flexible-core chip multiprocessors
TL;DR: Flexible-core CMPs introduce a new resource allocation and scheduling problem which must determine how many logical processors should be configured, how powerful each processor should be, and where/when each task should run.
References
More filters
Proceedings ArticleDOI
Wattch: a framework for architectural-level power analysis and optimizations
TL;DR: Wattch is presented, a framework for analyzing and optimizing microprocessor power dissipation at the architecture-level and opens up the field of power-efficient computing to a wider range of researchers by providing a power evaluation methodology within the portable and familiar SimpleScalar framework.
The Landscape of Parallel Computing Research: A View from Berkeley
Krste Asanovic,Ras Bodik,Bryan Catanzaro,Joseph Gebis,Parry Husbands,Kurt Keutzer,David A. Patterson,William Plishker,John Shalf,Samuel Williams,Katherine Yelick +10 more
TL;DR: The parallel landscape is frame with seven questions, and the following are recommended to explore the design space rapidly: • The overarching goal should be to make it easy to write programs that execute efficiently on highly parallel computing systems • The target should be 1000s of cores per chip, as these chips are built from processing elements that are the most efficient in MIPS (Million Instructions per Second) per watt, MIPS per area of silicon, and MIPS each development dollar.
Journal ArticleDOI
A Survey of General-Purpose Computation on Graphics Hardware
John D. Owens,David Luebke,Naga K. Govindaraju,Mark J. Harris,Jens Krüger,Aaron Lefohn,Timothy John Purcell +6 more
TL;DR: This report describes, summarize, and analyzes the latest research in mapping general‐purpose computation to graphics hardware.
Proceedings ArticleDOI
Automatically characterizing large scale program behavior
TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.
Journal ArticleDOI
Niagara: a 32-way multithreaded Sparc processor
TL;DR: The Niagara processor implements a thread-rich architecture designed to provide a high-performance solution for commercial server applications that exploits the thread-level parallelism inherent to server applications, while targeting low levels of power consumption.