Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

doi:10.1109/TPDS.2010.169

Journal ArticleDOI

Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

Sheng Li, +3 more

- 01 Jul 2011 -

IEEE Transactions on Parallel and Distri...

- Vol. 22, Iss: 7, pp 1178-1191

Chats0

TLDR

This work proposes a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an “unlimited” number of dynamically created lightweight threads with very low thread management and synchronization overhead.

Abstract:

Irregular and dynamic applications, such as graph problems and agent-based simulations, often require fine-grained parallelism to achieve good performance. However, current multicore processors only provide architectural support for coarse-grained parallelism, making it necessary to use software-based multithreading environments to effectively implement fine-grained parallelism. Although these software-based environments have demonstrated superior performance over heavyweight, OS-level threads, they are still limited by the significant overhead involved in thread management and synchronization. In order to address this, we propose a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an “unlimited” number of dynamically created lightweight threads with very low thread management and synchronization overhead. The LCMT architecture can be implemented atop a mainstream architecture with minimum extra hardware to leverage existing legacy software environments. We compare the LCMT architecture with a Niagara-like baseline architecture. Our results show up to 1.8X better scalability, 1.91X better performance, and more importantly, 1.74X better performance per watt, using the LCMT architecture for irregular and dynamic benchmarks, when compared to the baseline architecture. The LCMT architecture delivers similar performance to the baseline architecture for regular benchmarks.

Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

Citations

System implications of memory reliability in exascale computing

Methods and apparatus to perform error detection and correction

Communication centric, multi-core, fine-grained processor architecture

Recomposing an Irregular Algorithm Using a Novel Low-Level PGAS Model

VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors

References

A hierarchical O(N log N) force-calculation algorithm

OpenMP: an industry standard API for shared-memory programming

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

The implementation of the Cilk-5 multithreaded language

Intel Threading Building Blocks

Related Papers (5)

Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

A clustered approach to multithreaded processors

Composable Lightweight Processors

Multithreading Architecture

Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective