Topic

Smart Cache

About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Predicting inter-thread cache contention on a chip multi-processor architecture

[...]

Dhruba Chandra¹, Fei Guo¹, Seongbeom Kim¹, Yan Solihin¹•Institutions (1)

North Carolina State University¹

12 Feb 2005

TL;DR: Three performance models are proposed that predict the impact of cache sharing on co-scheduled threads and the most accurate model, the inductive probability model, achieves an average error of only 3.9%.

...read moreread less

Abstract: This paper studies the impact of L2 cache sharing on threads that simultaneously share the cache, on a chip multi-processor (CMP) architecture. Cache sharing impacts threads nonuniformly, where some threads may be slowed down significantly, while others are not. This may cause severe performance problems such as sub-optimal throughput, cache thrashing, and thread starvation for threads that fail to occupy sufficient cache space to make good progress. Unfortunately, there is no existing model that allows extensive investigation of the impact of cache sharing. To allow such a study, we propose three performance models that predict the impact of cache sharing on co-scheduled threads. The input to our models is the isolated L2 cache stack distance or circular sequence profile of each thread, which can be easily obtained on-line or off-line. The output of the models is the number of extra L2 cache misses for each thread due to cache sharing. The models differ by their complexity and prediction accuracy. We validate the models against a cycle-accurate simulation that implements a dual-core CMP architecture, on fourteen pairs of mostly SPEC benchmarks. The most accurate model, the inductive probability model, achieves an average error of only 3.9%. Finally, to demonstrate the usefulness and practicality of the model, a case study that details the relationship between an application's temporal reuse behavior and its cache sharing impact is presented.

...read moreread less

543 citations

Proceedings Article•DOI•

A low-overhead coherence solution for multiprocessors with private cache memories

[...]

Mark S. Papamarcos¹, Janak H. Patel¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1984

TL;DR: This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus that aims at reducing bus traffic and hence bus wait time and increases the overall processor utilization.

...read moreread less

Abstract: This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus. The solution aims at reducing bus traffic and hence bus wait time. This in turn increases the overall processor utilization. Unlike most traditional high-performance coherence solutions, this solution does not use any global tables. Furthermore, this coherence scheme is modular and easily extensible, requiring no modification of cache modules to add more processors to a system. The performance of this scheme is evaluated by using an approximate analysis method. It is shown that the performance of this scheme is closely tied with the miss ratio and the amount of sharing between processors.

...read moreread less

531 citations

Proceedings Article•DOI•

An evaluation of directory schemes for cache coherence

[...]

Anant Agarwal¹, Richard Simoni¹, John L. Hennessy¹, Mark Horowitz¹•Institutions (1)

Stanford University¹

17 May 1988

TL;DR: In this article, the cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes, which have been given less attention in the past several years.

...read moreread less

Abstract: The problem of cache coherence in shared-memory multiprocessors has been addressed using two basic approaches: directory schemes and snoopy cache schemes. Directory schemes have been given less attention in the past several years, while snoopy cache methods have become extremely popular. Directory schemes for cache coherence are potentially attractive in large multiprocessor systems that are beyond the scaling limits of the snoopy cache schemes. Slight modifications to directory schemes can make them competitive in performance with snoopy cache schemes for small multiprocessors. Trace driven simulation, using data collected from several real multiprocessor applications, is used to compare the performance of standard directory schemes, modifications to these schemes, and snoopy cache protocols.

...read moreread less

525 citations

Journal Article•DOI•

Hierarchical Web caching systems: modeling, design and experimental results

[...]

Hao Che, Ye Tung¹, Zhijun Wang²•Institutions (2)

University of South Alabama¹, University of Texas at Arlington²

07 Nov 2002-IEEE Journal on Selected Areas in Communications

TL;DR: Both model-based and real trace simulation studies show that the proposed cooperative architecture results in more than 50% memory saving and substantial central processing unit (CPU) power saving for the management and update of cache entries compared with the traditional uncooperative hierarchical caching architecture.

...read moreread less

Abstract: This paper aims at finding fundamental design principles for hierarchical Web caching. An analytical modeling technique is developed to characterize an uncooperative two-level hierarchical caching system where the least recently used (LRU) algorithm is locally run at each cache. With this modeling technique, we are able to identify a characteristic time for each cache, which plays a fundamental role in understanding the caching processes. In particular, a cache can be viewed roughly as a low-pass filter with its cutoff frequency equal to the inverse of the characteristic time. Documents with access frequencies lower than this cutoff frequency have good chances to pass through the cache without cache hits. This viewpoint enables us to take any branch of the cache tree as a tandem of low-pass filters at different cutoff frequencies, which further results in the finding of two fundamental design principles. Finally, to demonstrate how to use the principles to guide the caching algorithm design, we propose a cooperative hierarchical Web caching architecture based on these principles. Both model-based and real trace simulation studies show that the proposed cooperative architecture results in more than 50% memory saving and substantial central processing unit (CPU) power saving for the management and update of cache entries compared with the traditional uncooperative hierarchical caching architecture.

...read moreread less

512 citations

Proceedings Article•DOI•

Lockup-free instruction fetch/prefetch cache organization

[...]

David Kroft

12 May 1981

TL;DR: A cache organization is presented that essentially eliminates a penalty on subsequent cache references following a cache miss and has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped.

...read moreread less

Abstract: In the past decade, there has been much literature describing various cache organizations that exploit general programming idiosyncrasies to obtain maximum hit rate (the probability that a requested datum is now resident in the cache). Little, if any, has been presented to exploit: (1) the inherent dual input nature of the cache and (2) the many-datum reference type central processor instructions.No matter how high the cache hit rate is, a cache miss may impose a penalty on subsequent cache references. This penalty is the necessity of waiting until the missed requested datum is received from central memory and, possibly, for cache update. For the two cases above, the cache references following a miss do not require the information of the datum not resident in the cache, and are therefore penalized in this fashion.In this paper, a cache organization is presented that essentially eliminates this penalty. This cache organizational feature has been incorporated in a cache/memory interface subsystem design, and the design has been implemented and prototyped. An existing simple instruction set machine has verified the advantage of this feature; future, more extensive and sophisticated instruction set machines may obviously take more advantage. Prior to prototyping, simulations verified the advantage.

...read moreread less

504 citations

Collapse

Network Information

Performance

Metrics

7,847

Papers

186,437

Citations

No. of papers in the topic in previous years
Year	Papers
2023	50
2022	114
2021	5
2020	1
2019	8
2018	18

Smart Cache

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics