Compiler-assisted data distribution for chip multiprocessors

doi:10.1145/1854273.1854335

Proceedings ArticleDOI

Compiler-assisted data distribution for chip multiprocessors

- pp 501-512

TLDR

This paper presents a compiler-based approach used for analyzing data access behavior in multi-threaded applications and shows a 20% speedup over shared caching and 5% speed up over the closest runtime approximation, “first touch”.

Abstract:

Data access latency, a limiting factor in the performance of chip multiprocessors, grows significantly with the number of cores in non-uniform cache architectures with distributed cache banks. To mitigate this effect, it is necessary to leverage the data access locality and choose an optimum data placement. Achieving this is especially challenging when other constraints such as cache capacity, coherence messages and runtime overhead need to be considered. This paper presents a compiler-based approach used for analyzing data access behavior in multi-threaded applications. The proposed experimental compiler framework employs novel compilation techniques to discover and represent multi-threaded memory access patterns (MMAPs). At run time, symbolic MMAPs are resolved and used by a partitioning algorithm to choose a partition of allocated memory blocks among the forked threads in the analyzed application. This partition is used to enforce data ownership by associating the data with the core that executes the thread owning the data. We demonstrate how this information can be used in an experimental architecture to accelerate applications. In particular, our compiler assisted approach shows a 20% speedup over shared caching and 5% speedup over the closest runtime approximation, "first touch".

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Complexity-effective multicore coherence

Alberto Ros, +1 more

TL;DR: A virtually costless coherence that outperforms a MESI directory protocol while at the same time reducing shared cache and network energy consumption for 15 parallel benchmarks, on 16 cores is shown.

...read moreread less

Patent

System and method for simplifying cache coherence using multiple write policies

Stefanos Kaxiras, +1 more

TL;DR: In this article, a multi-core cache coherence system with a local/shared cache hierarchy is described. The system includes multiple processor cores, a main memory, and a local cache memory associated with each core for storing cache lines accessible only by the associated core.

...read moreread less

Proceedings ArticleDOI

A software approach for combating asymmetries of non-volatile memories

Yong Li, +2 more

TL;DR: This paper proposes software dispatch, a cross-layer approach to distribute data to appropriate memory resources based on an application's data access characteristics, and demonstrates the application of the proposed technique through a case study system with hybrid memory caches.

...read moreread less

Proceedings ArticleDOI

Compiler support for selective page migration in NUMA architectures

Guilherme Guaglianoni Piccoli, +5 more

TL;DR: This paper proposes compiler analyses and code generation methods to support a lightweight runtime system that dynamically migrates memory pages to improve data locality and compares the approach against Minas, a middleware that supports NUMA-aware data allocation, and shows that it can outperform it by up to 50% in some cases.

...read moreread less

Proceedings ArticleDOI

Practically private: enabling high performance CMPs through compiler-assisted data classification

Yong Li, +2 more

TL;DR: It is demonstrated that practically private data is ubiquitous in parallel applications and leveraging this classification provides opportunities to benefit performance, and a novel compiler-based approach to speculatively detect a third classification: practically private is developed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Compilers: Principles, Techniques, and Tools

Alfred V. Aho, +2 more

TL;DR: This book discusses the design of a Code Generator, the role of the Lexical Analyzer, and other topics related to code generation and optimization.

...read moreread less

Proceedings ArticleDOI

The PARSEC benchmark suite: characterization and architectural implications

Christian Bienia, +3 more

TL;DR: This paper presents and characterizes the Princeton Application Repository for Shared-Memory Computers (PARSEC), a benchmark suite for studies of Chip-Multiprocessors (CMPs), and shows that the benchmark suite covers a wide spectrum of working sets, locality, data sharing, synchronization and off-chip traffic.

...read moreread less

Journal ArticleDOI

Simics: A full system simulation platform

Peter S. Magnusson, +8 more

- 01 Feb 2002 -

IEEE Computer

TL;DR: Simics is a platform for full system simulation that can run actual firmware and completely unmodified kernel and driver code, and it provides both functional accuracy for running commercial workloads and sufficient timing accuracy to interface to detailed hardware models.

...read moreread less

Book

Compilers: Principles, Techniques, and Tools (2nd Edition)

Alfred V. Aho, +3 more

Journal ArticleDOI

SUIF: an infrastructure for research on parallelizing and optimizing compilers

Robert P. Wilson, +10 more

- 01 Dec 1994 -

Sigplan Notices

TL;DR: The SUIF compiler is built into a powerful, flexible system that may be useful for many other researchers and the authors invite you to use and welcome your contributions to this infrastructure.

...read moreread less

IEEE Computer

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

Milo M. K. Martin, +8 more

- 01 Nov 2005 -

ACM Sigarch Computer Architecture News

Compiler-assisted data distribution for chip multiprocessors

Citations

Complexity-effective multicore coherence

System and method for simplifying cache coherence using multiple write policies

A software approach for combating asymmetries of non-volatile memories

Compiler support for selective page migration in NUMA architectures

Practically private: enabling high performance CMPs through compiler-assisted data classification

References

Compilers: Principles, Techniques, and Tools

The PARSEC benchmark suite: characterization and architectural implications

Simics: A full system simulation platform

Compilers: Principles, Techniques, and Tools (2nd Edition)

SUIF: an infrastructure for research on parallelizing and optimizing compilers

Related Papers (5)

Reactive NUCA: near-optimal block placement and replication in distributed caches

Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

The PARSEC benchmark suite: characterization and architectural implications

Simics: A full system simulation platform

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset