Dynamic cache clustering for chip multiprocessors

doi:10.1145/1542275.1542289

Proceedings ArticleDOI

Dynamic cache clustering for chip multiprocessors

Mohammad Hammoud, +2 more

- pp 56-67

Chats0

TLDR

Simulation results using a full-system simulator demonstrate that DCC outperforms alternative L2 cache designs and uniquely and efficiently optimizes both metrics and continuously tracks a near-optimal cache organization from many possible configurations.

Abstract:

This paper proposes DCC (Dynamic Cache Clustering), a novel distributed cache management scheme for large-scale chip multiprocessors. Using DCC, a per-core cache cluster is comprised of a number of L2 cache banks and cache clusters are constructed, expanded, and contracted dynamically to match each core's cache demand. The basic trade-offs of varying the on-chip cache clusters are average L2 access latency and L2 miss rate. DCC uniquely and efficiently optimizes both metrics and continuously tracks a near-optimal cache organization from many possible configurations. Simulation results using a full-system simulator demonstrate that DCC outperforms alternative L2 cache designs.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

CloudCache: Expanding and shrinking private caches

Hyunjin Lee, +2 more

TL;DR: This work proposes a novel scalable cache management framework called CloudCache that creates dynamically expanding and shrinking L2 caches for working threads with fine-grained hardware monitoring and control and demonstrates that CloudCache significantly improves performance of a wide range of workloads when all or a subset of cores are occupied.

...read moreread less

Proceedings ArticleDOI

METE: meeting end-to-end QoS in multicores through system-wide resource management

Akbar Sharifi, +4 more

TL;DR: The collected results indicate that the proposed scheme is able to provision shared resources among co-runner applications dynamically over the course of execution, to provide end-to-end QoS and satisfy specified performance targets.

...read moreread less

These de doctorat de l'universite pierre et marie curie

M. Daoudi Khalid

TL;DR: In this article, NIR spectra of macroaggregates of the matrix of the forest site exhibited high variability inside groups of macro aggregregates and PCA projections did not allow to visualize a clear superimposition of spectral signatures of unknown macro aggregates taken from the soil matrix with structures of known origin produced in laboratory cultures.

...read moreread less

Proceedings ArticleDOI

HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

Javier Lira, +2 more

TL;DR: A novel and implementable data search algorithm for D-NUCA designs in CMP architectures, called HK-N UCA (Home Knows where to find data within the NUCA cache), which exploits migration features by providing fast and power efficient accesses to data which is located close to the requesting core.

...read moreread less

Journal ArticleDOI

Victim retention for reducing cache misses in tiled chip multiprocessors

Shirshendu Das, +1 more

- 01 Jun 2014 -

Microprocessors and Microsystems

TL;DR: Experimental evaluation using full-system simulation shows that CMP-VR has less off-chip miss-rate as compared to baseline Tiled CMP, and reduction in CPI and miss rate together guarantees performance improvement.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Proceedings ArticleDOI

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Moinuddin K. Qureshi, +1 more

TL;DR: In this article, the authors propose a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources.

...read moreread less

Proceedings ArticleDOI

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Changkyu Kim, +2 more

TL;DR: This paper proposes physical designs for these Non-Uniform Cache Architectures (NUCAs) and extends these physical designs with logical policies that allow important data to migrate toward the processor within the same level of the cache.

...read moreread less

Proceedings ArticleDOI

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

Sriram R. Vangal, +13 more

TL;DR: A 275mm2 network-on-chip architecture contains 80 tiles arranged as a 10 times 8 2D array of floating-point cores and packet-switched routers, operating at 4GHz, designed to achieve a peak performance of 1.0TFLOPS at 1V while dissipating 98W.

...read moreread less

Journal ArticleDOI

Low-Latency Virtual-Channel Routers for On-Chip Networks

Robert Mullins, +2 more

TL;DR: Simulations illustrate that dramatic cycle time improvements are possible without compromising router efficiency, and these reductions permit flits to be routed in a single cycle, maximising the effectiveness of the router's limited buffering resources.

...read moreread less

Related Papers (5)

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Changkyu Kim, +2 more

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

Milo M. K. Martin, +8 more

- 01 Nov 2005 -

ACM Sigarch Computer Architecture News

Dynamic cache clustering for chip multiprocessors

Citations

CloudCache: Expanding and shrinking private caches

METE: meeting end-to-end QoS in multicores through system-wide resource management

These de doctorat de l'universite pierre et marie curie

HK-NUCA: Boosting Data Searches in Dynamic Non-Uniform Cache Architectures for Chip Multiprocessors

Victim retention for reducing cache misses in tiled chip multiprocessors

References

The SPLASH-2 programs: characterization and methodological considerations

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS

Low-Latency Virtual-Channel Routers for On-Chip Networks

Related Papers (5)

An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

Reactive NUCA: near-optimal block placement and replication in distributed caches

The PARSEC benchmark suite: characterization and architectural implications

Cooperative Caching for Chip Multiprocessors