Federation: Out-of-Order Execution using Simple In-Order Cores

Open Access

Federation: Out-of-Order Execution using Simple In-Order Cores

TLDR

Federating each pair of neighboring, scalar cores provides a scalable, energy-efficient, and area-efficient solution for limited thread counts, with the ability to boost performance across a wide range ofthread counts, until thread count returns to a level at which the baseline, multithreaded, “throughput mode” can resume.

Abstract:

Manycore architectures with dozens, hundreds, or thousands of threads are likely to use single-issue, in-order execution cores with simple pipelines but multiple thread contexts per core. This approach is beneficial for throughput but only with thread counts high enough to keep most thread contexts occupied. If these manycore architectures do not want to be limited to niches with embarrassing levels of parallelism, they must cope with the case when thread count is limited: too many threads for dedicated, high-performance cores (which come at high area cost), but too few to exploit the huge number of thread contexts. The only solution is to augment the simple, scalar cores. This paper describes how to create an out-of-order processor on the fly by “federating” each pair of neighboring, scalar cores.This adds a few new structures between each pair but otherwise repurposes the existing cores. It can be accomplished with less than 2KB of extra hardware per pair, nearly doubling the performance of a single, scalar core and approaching that of a traditional, dedicated 2-way out-of-order core. The key insights that make this possible are the use of the large number of registers in multi-threaded scalar cores to support out-of-order execution and the removal of large, associative structures. Federation provides a scalable, energy-efficient, and area-efficient solution for limited thread counts, with the ability to boost performance across a wide range of thread counts, until thread count returns to a level at which the baseline, multithreaded, “throughput mode” can resume.

Federation: Out-of-Order Execution using Simple In-Order Cores

Citations

Federation: repurposing scalar cores for out-of-order instruction issue

Scaling Power and Performance viaProcessor Composability

Multitasking workload scheduling on flexible core chip multiprocessors

Strategies for mapping dataflow blocks to distributed hardware

Multitasking workload scheduling on flexible-core chip multiprocessors

References

Wattch: a framework for architectural-level power analysis and optimizations

The Landscape of Parallel Computing Research: A View from Berkeley

A Survey of General-Purpose Computation on Graphics Hardware

Automatically characterizing large scale program behavior

Niagara: a 32-way multithreaded Sparc processor

Related Papers (5)

Core fusion: accommodating software diversity in chip multiprocessors

Composable Lightweight Processors

Extending Multicore Architectures to Exploit Hybrid Parallelism in Single-thread Applications

The Art of Deception: Adaptive Precision Reduction for Area Efficient Physics Acceleration

Amdahl's Law in the Multicore Era