scispace - formally typeset
Search or ask a question
Topic

Smart Cache

About: Smart Cache is a research topic. Over the lifetime, 7680 publications have been published within this topic receiving 180618 citations.


Papers
More filters
Proceedings Article
12 Sep 1994
TL;DR: It is shown that there are significant benefits in redesigning traditional query processing algorithms so that they can make better use of the cache, and new algorithms run 8%-200% faster than the traditional ones.
Abstract: The current main memory (DRAM) access speeds lag far behind CPU speeds Cache memory, made of static RAM, is being used in today’s architectures to bridge this gap It provides access latencies of 2-4 processor cycles, in contrast to main memory which requires 15-25 cycles Therefore, the performance of the CPU depends upon how well the cache can be utilized We show that there are significant benefits in redesigning our traditional query processing algorithms so that they can make better use of the cache The new algorithms run 8%-200% faster than the traditional ones

215 citations

Journal ArticleDOI
TL;DR: This article proposes SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries to improve the hit ratio of SDC by using an adaptive prefetching strategy.
Abstract: This article discusses efficiency and effectiveness issues in caching the results of queries submitted to a Web search engine (WSE). We propose SDC (Static Dynamic Cache), a new caching strategy aimed to efficiently exploit the temporal and spatial locality present in the stream of processed queries. SDC extracts from historical usage data the results of the most frequently submitted queries and stores them in a static, read-only portion of the cache. The remaining entries of the cache are dynamically managed according to a given replacement policy and are used for those queries that cannot be satisfied by the static portion. Moreover, we improve the hit ratio of SDC by using an adaptive prefetching strategy, which anticipates future requests by introducing a limited overhead over the back-end WSE. We experimentally demonstrate the superiority of SDC over purely static and dynamic policies by measuring the hit ratio achieved on three large query logs by varying the cache parameters and the replacement policy used for managing the dynamic part of the cache. Finally, we deploy and measure the throughput achieved by a concurrent version of our caching system. Our tests show how the SDC cache can be efficiently exploited by many threads that concurrently serve the queries of different users.

213 citations

Journal ArticleDOI
TL;DR: A unified cache maintenance algorithm, LNC-R-WS-U, is described, which integrates both cache replacement and consistency algorithms and considers in the eviction consideration the validation rate of each document, as provided by the cache consistency component of LNC.R-W3-U.
Abstract: Caching at proxy servers is one of the ways to reduce the response time perceived by World Wide Web users. Cache replacement algorithms play a central role in the response time reduction by selecting a subset of documents for caching, so that a given performance metric is maximized. At the same time, the cache must take extra steps to guarantee some form of consistency of the cached documents. Cache consistency algorithms enforce appropriate guarantees about the staleness of the cached documents. We describe a unified cache maintenance algorithm, LNC-R-WS-U, which integrates both cache replacement and consistency algorithms. The LNC-R-WS-U algorithm evicts documents from the cache based on the delay to fetch each document into the cache. Consequently, the documents that took a long time to fetch are preferentially kept in the cache. The LNC-R-W3-U algorithm also considers in the eviction consideration the validation rate of each document, as provided by the cache consistency component of LNC-R-WS-U. Consequently, documents that are infrequently updated and thus seldom require validations are preferentially retained in the cache. We describe the implementation of LNC-R-W3-U and its integration with the Apache 1.2.6 code base. Finally, we present a trace-driven experimental study of LNC-R-W3-U performance and its comparison with other previously published algorithms for cache maintenance.

211 citations

Dissertation
01 Jan 1989
TL;DR: Measurements of actual supercomputer cache performance has not been previously undertaken, and PFC-Sim, a program-driven event tracing facility that can simulate data cache performance of very long programs, is used to measure the performance of various cache structures.
Abstract: Measurements of actual supercomputer cache performance has not been previously undertaken. PFC-Sim is a program-driven event tracing facility that can simulate data cache performance of very long programs. PFC-Sim simulates cache concurrently with program execution, allowing very long traces to be used. Programs with traces in excess of 4 billion entries have been used to measure the performance of various cache structures. PFC-Sim was used to measure the cache performance of array references in a benchmark set of supercomputer applications, RiCEPS. Data cache hit ratios varied on average between 70% for a 16K cache and 91% for a 256K cache. Programs with very large working sets generate poor cache performance even with large caches. The hit ratios of individual references are measured to either 0% or 100%. By locating the references that miss, attempts to improve memory performance can focus on references where improvement is possible. The compiler can estimate the number of loop iterations which can execute without filling the cache, the overflow iteration. The overflow iteration combined with the dependence graph can be used to determine at each reference whether execution will result in hits or misses. Program transformation can be used to improve cache performance by reordering computation to move references to the same memory location closer together, thereby eliminating cache misses. Using the overflow iteration, the compiler can often do this transformation automatically. Standard blocking transformations cannot be used on many loop nests that contain transformation preventing dependences. Wavefront blocking allows any loop nest to be blocked, when the components of dependence vectors are bounded. When the cache misses cannot be eliminated, software prefetching can overlap the miss delays with computation. Software prefetching uses a special instruction to preload values into the cache. A cache load resembles a register load in structure, but does not block computation and only moves the address into cache where a later register load will be required. The compiler can inform the cache (on average) over 100 cycles before a load is required. Cache misses can be serviced in parallel with computation.

210 citations

Proceedings ArticleDOI
03 Dec 2003
TL;DR: NuRAPID is proposed, which averages sequential tag-data access to decouple data placement from tag placement, resulting in higher performance and substantially lower cache energy.
Abstract: Wire delays continue to grow as the dominant component oflatency for large caches.A recent work proposed an adaptive,non-uniform cache architecture (NUCA) to manage large, on-chipcaches.By exploiting the variation in access time acrosswidely-spaced subarrays, NUCA allows fast access to closesubarrays while retaining slow access to far subarrays.Whilethe idea of NUCA is attractive, NUCA does not employ designchoices commonly used in large caches, such as sequential tag-dataaccess for low power.Moreover, NUCA couples dataplacement with tag placement foregoing the flexibility of dataplacement and replacement that is possible in a non-uniformaccess cache.Consequently, NUCA can place only a few blockswithin a given cache set in the fastest subarrays, and mustemploy a high-bandwidth switched network to swap blockswithin the cache for high performance.In this paper, we proposethe Non-uniform access with Replacement And PlacementusIng Distance associativity" cache, or NuRAPID, whichleverages sequential tag-data access to decouple data placementfrom tag placement.Distance associativity, the placementof data at a certain distance (and latency), is separated from setassociativity, the placement of tags within a set.This decouplingenables NuRAPID to place flexibly the vast majority offrequently-accessed data in the fastest subarrays, with fewerswaps than NUCA.Distance associativity fundamentallychanges the trade-offs made by NUCA's best-performingdesign, resulting in higher performance and substantiallylower cache energy.A one-ported, non-banked NuRAPIDcache improves performance by 3% on average and up to 15%compared to a multi-banked NUCA with an infinite-bandwidthswitched network, while reducing L2 cache energy by 77%.

210 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
92% related
Server
79.5K papers, 1.4M citations
88% related
Scalability
50.9K papers, 931.6K citations
88% related
Network packet
159.7K papers, 2.2M citations
85% related
Quality of service
77.1K papers, 996.6K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202350
2022114
20215
20201
20198
201818