Topic

Smart Cache

About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch

[...]

Jaewoong Sim¹, Gabriel H. Loh¹, Hyesoon Kim², Mike O'Connor¹, Mithuna Thottethodi¹ - Show less +1 more•Institutions (2)

Advanced Micro Devices¹, Georgia Institute of Technology²

01 Dec 2012

TL;DR: Two innovations that exploit the bursty nature of memory requests to streamline the DRAM cache are presented, including a low-cost Hit-Miss Predictor (HMP) that virtually eliminates the hardware overhead of the previously proposed multi-megabyte Miss Map structure and a Self-Balancing Dispatch mechanism that dynamically sends some requests to the off-chip memory even though the request may have hit in the die-stackedDRAM cache.

...read moreread less

Abstract: Die-stacking technology allows conventional DRAM to be integrated with processors. While numerous opportunities to make use of such stacked DRAM exist, one promising way is to use it as a large cache. Although previous studies show that DRAM caches can deliver performance benefits, there remain inefficiencies as well as significant hardware costs for auxiliary structures. This paper presents two innovations that exploit the bursty nature of memory requests to streamline the DRAM cache. The first is a low-cost Hit-Miss Predictor (HMP) that virtually eliminates the hardware overhead of the previously proposed multi-megabyte Miss Map structure. The second is a Self-Balancing Dispatch (SBD) mechanism that dynamically sends some requests to the off-chip memory even though the request may have hit in the die-stacked DRAM cache. This makes effective use of otherwise idle off-chip bandwidth when the DRAM cache is servicing a burst of cache hits. These techniques, however, are hampered by dirty (modified) data in the DRAM cache. To ensure correctness in the presence of dirty data in the cache, the HMP must verify that a block predicted as a miss is not actually present, otherwise the dirty block must be provided. This verification process can add latency, especially when DRAM cache banks are busy. In a similar vein, SBD cannot redirect requests to off-chip memory when a dirty copy of the block exists in the DRAM cache. To relax these constraints, we introduce a hybrid write policy for the cache that simultaneously supports write-through and write-back policies for different pages. Only a limited number of pages are permitted to operate in a write-back mode at one time, thereby bounding the amount of dirty data in the DRAM cache. By keeping the majority of the DRAM cache clean, most HMP predictions do not need to be verified, and the self balancing dispatch has more opportunities to redistribute requests (i.e., only requests to the limited number of dirty pages must go to the DRAM cache to maintain correctness). Our proposed techniques improve performance compared to the Miss Map-based DRAM cache approach while simultaneously eliminating the costly Miss Map structure.

...read moreread less

90 citations

Patent•

Tagging packets with a lookup key to facilitate usage of a unified packet forwarding cache

[...]

Kjeld Egevang¹, Niels Beier, Jacob Christensen¹•Institutions (1)

Intel¹

27 Sep 2002

TL;DR: In this article, the authors present an approach and methods for a Network Address Translation (NAT)-aware unified cache, which allows multiple packet processing applications distributed among one or more processors of a network device to share a unified cache without requiring a cache synchronization protocol.

...read moreread less

Abstract: Apparatus and methods are provided for a Network Address Translation (NAT)-aware unified cache. According to one embodiment, multiple packet-processing applications distributed among one or more processors of a network device share one or more unified caches without requiring a cache synchronization protocol. When a packet is received at the network device, a first packet-processing application, such as NAT or another application that modifies part of the packet header upon which a cache lookup key is based, tags the packet with a cache lookup key based upon the original contents of the packet header. Then, other packet-processing applications attempting to access the cache entry from the unified cache subsequent to the tagging by the first packet-processing application use the tag (the cache lookup key generated by the first packet-processing application) rather than determining the cache lookup key based upon the current contents of the packet header.

...read moreread less

89 citations

Proceedings Article•

JIA-JIA : An SVM System Based on A New Cache Coherence Protocol

[...]

W. Hu

01 Jan 1999

89 citations

Proceedings Article•DOI•

Adjustable block size coherent caches

[...]

Czarek Dubnicki¹, Thomas J. LeBlanc•Institutions (1)

University of Rochester¹

01 Apr 1992

TL;DR: It is concluded that an adjustable block size cache offers significantly better performance than every fixed block size caches, especially when there is variability in the granularity of sharing exhibited by applications.

...read moreread less

Abstract: Several studies have shown that the performance of coherent caches depends on the relationship between the granularity of sharing and locality exhibited by the program and the cache block size. Large cache blocks exploit processor and spatial locality, but may cause unnecessary cache invalidations due to false sharing. Small cache blocks can reduce the number of cache invalidations, but increase the nuber of bus or network transactions required to load data into the cache. In this paper we describe a cache organization that dynamically adjusts the cache block size according to recently observed reference behavior. Cache blocks are split across cache lines when false sharing occurs, ad merged back into a single cache line to explit spatial locality. To evaluate this cache organization, we simulate a scalable multiprocessor with coherent caches, using a suite of memory reference traces to model program behavior. We show that for evry fixed block size, some program suffers a 33% increase in the average waiting time per reference, and a factor of 2 increase in the average number of words transferred per reference, when compared against the performance of an adjustable block size cache. In the few cases where adjusting the block size does not provide superior performance, it comes within 7% of the best fixed block size alternative. We conclude that an adjustable block size cache offers significantly better performance than every fixed block size cache, especially when there is variability in the granularity of sharing exhibited by applications.

...read moreread less

89 citations

Proceedings Article•DOI•

Main-memory index structures with fixed-size partial keys

[...]

Philip L. Bohannon¹, Peter Mcllroy¹, Rajeev Rastogi¹•Institutions (1)

Alcatel-Lucent¹

01 May 2001

TL;DR: This paper proposes two index structures, pkT-trees and pkB-tree, which significantly reduce cache misses by storing partial-key information in the index, and shows that a small, fixed amount of key information allows most cache misses to be avoided, allowing for a simple node structure and efficient implementation.

...read moreread less

Abstract: The performance of main-memory index structures is increasingly determined by the number of CPU cache misses incurred when traversing the index. When keys are stored indirectly, as is standard in main-memory databases, the cost of key retrieval in terms of cache misses can dominate the cost of an index traversal. Yet it is inefficient in both time and space to store even moderate sized keys directly in index nodes. In this paper, we investigate the performance of tree structures suitable for OLTP workloads in the face of expensive cache misses and non-trivial key sizes. We propose two index structures, pkT-trees and pkB-trees, which significantly reduce cache misses by storing partial-key information in the index. We show that a small, fixed amount of key information allows most cache misses to be avoided, allowing for a simple node structure and efficient implementation. Finally, we study the performance and cache behavior of partial-key trees by comparing them with other main-memory tree structures for a wide variety of key sizes and key value distributions.

...read moreread less

89 citations

Collapse

Network Information

Performance

Metrics

7,847

Papers

186,437

Citations

No. of papers in the topic in previous years
Year	Papers
2023	50
2022	114
2021	5
2020	1
2019	8
2018	18

Smart Cache

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics