ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

doi:10.1007/978-3-540-92990-1_26

Book ChapterDOI

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

Mohammad Hammoud, +2 more

- pp 355-372

Chats0

TLDR

Simulation results using a full system simulator demonstrate that the proposed controlled migration scheme outperforms the shared caching strategy and compares favorably with previously proposed replication schemes.

Abstract:

This paper proposes and studies a hardware-based adaptive controlled migration strategy for managing distributed L2 caches in chip multiprocessors. Building on an area-efficient shared cache design, the proposed scheme dynamically migrates cache blocks to cache banks that best minimize the average L2 access latency. Cache blocks are continuously monitored and the locations of the optimal corresponding cache banks are predicted to effectively alleviate the impact of non-uniform cache access latency. By adopting migration alone without replication, the exclusiveness of cache blocks is maintained, thus further optimizing the cache miss rate. Simulation results using a full system simulator demonstrate that the proposed controlled migration scheme outperforms the shared caching strategy and compares favorably with previously proposed replication schemes.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

CloudCache: Expanding and shrinking private caches

Hyunjin Lee, +2 more

TL;DR: This work proposes a novel scalable cache management framework called CloudCache that creates dynamically expanding and shrinking L2 caches for working threads with fine-grained hardware monitoring and control and demonstrates that CloudCache significantly improves performance of a wide range of workloads when all or a subset of cores are occupied.

...read moreread less

Proceedings ArticleDOI

HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

Javier Lira, +2 more

TL;DR: A novel and implementable data search algorithm for D-NUCA designs in CMP architectures, called HK-N UCA (Home Knows where to find data within the NUCA cache), which exploits migration features by providing fast and power efficient accesses to data which is located close to the requesting core.

...read moreread less

Patent

Accelerating cache state transfer on a directory-based multicore architecture

Yan Solihin

TL;DR: In this paper, the authors describe techniques for accelerating cache state transfer in a multicore processor, which includes first, second, and third tiles, and include a directory at the third tile corresponding to the block addresses.

...read moreread less

Journal ArticleDOI

Exploiting replication to improve performances of NUCA-based CMP systems

Pierfrancesco Foglia, +1 more

- 28 Mar 2014 -

ACM Transactions in Embedded Computing S...

TL;DR: Results show that a Re-NUCA LLC permits to improve performances of more than 5% on average, and up to 15% for applications that strongly suffer from conflicting access to shared data, while reducing network traffic and power consumption with respect to D-N UCA caches.

...read moreread less

Proceedings ArticleDOI

The auction: optimizing banks usage in Non-Uniform Cache Architectures

Javier Lira, +2 more

TL;DR: A novel mechanism based on the bank replacement policy for NUCA caches on CMP, called The Auction, which manages the cache efficiently and significantly reduces the requests to the off-chip memory by increasing the hit ratio in the NU CA cache.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Proceedings ArticleDOI

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Changkyu Kim, +2 more

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

...read moreread less

Proceedings ArticleDOI

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

Sriram R. Vangal, +13 more

TL;DR: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz, designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

...read moreread less

Journal ArticleDOI

Low-Latency Virtual-Channel Routers for On-Chip Networks

Robert Mullins, +2 more

TL;DR: Simulations illustrate that dramatic cycle time improvements are possible without compromising router efficiency, and these reductions permit flits to be routed in a single cycle, maximising the effectiveness of the router's limited buffering resources.

...read moreread less

Proceedings ArticleDOI

Managing Wire Delay in Large Chip-Multiprocessor Caches

Bradford M. Beckmann, +1 more

TL;DR: This paper develops L2 cache designs for CMPs that incorporate block migration, stride-based prefetching between L1 and L2 caches, and presents a hybrid design-combining all three techniques-that improves performance by an additional 2% to 19% overPrefetching alone.

...read moreread less

ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

Citations

CloudCache: Expanding and shrinking private caches

HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

Accelerating cache state transfer on a directory-based multicore architecture

Exploiting replication to improve performances of NUCA-based CMP systems

The auction: optimizing banks usage in Non-Uniform Cache Architectures

References

The SPLASH-2 programs: characterization and methodological considerations

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

Low-Latency Virtual-Channel Routers for On-Chip Networks

Managing Wire Delay in Large Chip-Multiprocessor Caches

Related Papers (5)

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Reactive NUCA: near-optimal block placement and replication in distributed caches

Managing Wire Delay in Large Chip-Multiprocessor Caches

The PARSEC benchmark suite: characterization and architectural implications

Optimizing Replication, Communication, and Capacity Allocation in CMPs