scispace - formally typeset
Search or ask a question
Topic

Cache pollution

About: Cache pollution is a research topic. Over the lifetime, 11353 publications have been published within this topic receiving 262139 citations.


Papers
More filters
Proceedings ArticleDOI
30 Mar 2004
TL;DR: This work shows that the standard hash join algorithm/or disk-oriented databases (i.e. GRACE) spends over 73% of its user time stalled on CPU cache misses, and explores the use of prefetching to improve its cache performance.
Abstract: Hash join algorithms suffer from extensive CPU cache stalls. We show that the standard hash join algorithm/or disk-oriented databases (i.e. GRACE) spends over 73% of its user time stalled on CPU cache misses, and explores the use of prefetching to improve its cache performance. Applying prefetching to hash joins is complicated by the data dependencies, multiple code paths, and inherent randomness of hashing. We present two techniques, group prefetching and software-pipelined prefetching, that overcome these complications. These schemes achieve 2.0-2.9X speedups for the join phase and 1.4-2.6X speedups for the partition phase over GRACE and simple prefetching approaches. Compared with previous cache-aware approaches (i.e. cache partitioning), the schemes are at least 50% faster on large relations and do not require exclusive use of the CPU cache to be effective.

67 citations

Patent
13 Jun 2000
TL;DR: In this article, a memory allocator provides a cache blocks private to each thread of a multi-threaded application, and thereby minimizes performance losses associated with mutual exclusion (MUTEX) contention, MUTEX locking and/or coalescence operations.
Abstract: A memory allocator provides a cache blocks private to each thread of a multi-threaded application, and thereby minimizes performance losses associated with mutual exclusion (MUTEX) contention, MUTEX locking and/or coalescence operations. The memory allocator maintains thread local cache slots in a linked list of arrays. Upon a memory allocation request from a thread, blocks of the memory, which ordinarily require MUTEX locking, are cached in the local thread cache slot allocated to the requesting thread, and the request is satisfied from the cache slot allocated to the requesting thread. Each cache slot is private to the thread to which it is assigned, and thus does not require MUTEX locking. Further, the cache slots do not require defragmentation thereof, and thus require no coalescence operations. Thus, the performance of the multi-threaded application program is optimized.

67 citations

Patent
Ching-Farn E. Wu1
22 Nov 1994
TL;DR: In this article, a two-level virtual/real cache system and a method for detecting and resolving synonyms in the two level virtual and real cache system are described. Butler et al. use a translation lookaside buffer (TLB) for translating virtual to real addresses for accessing the second level real cache.
Abstract: A two-level virtual/real cache system, and a method for detecting and resolving synonyms in the two-level virtual/real cache system, are described. Lines of a first level virtual cache are tagged with a virtual address and a real pointer which points to a corresponding line in a second level real cache. Lines in the second level real cache are tagged with a real address and a virtual pointer which points to a corresponding line in the first level virtual cache, if one exists. A translation-lookaside buffer (TLB) is used for translating virtual to real addresses for accessing the second level real cache. Synonym detection is performed at the second level real cache. An inclusion bit I is set in a directory of the second level real cache to indicate that a particular line is included in the first level virtual cache. Another bit, called a buffer bit B, is set whenever a line in the first level virtual cache is placed in a first level virtual cache writeback buffer for updating main memory. When a first level cache miss occurs, the TLB generates a corresponding real address for that page and the first level virtual cache selects a line for replacement and also notifies the second level real cache which line it chooses for replacement. The real address is then used to access the second level real cache. Synonym detection and resolution are performed by the second level real cache.

67 citations

Patent
Couleur J1, Lange R1, Pine D1
05 Nov 1973
TL;DR: In this article, the cache store is cleared by resetting tag directory indicators, a round robin counter and a column full flag, for each column in a four level set associative tag directory to the cache.
Abstract: In a multiprocessor data processing system, all processors must have access to certain communications tables stored in the main memory shared by the processors. Each processor has a cache store embedded within for its individual use. A cache store in one processor might contain data from the communication tables which is obsoleted by operations of a second processor. The cache store clearing apparatus invalidates its data information any time its processor accesses the communication tables. The cache store is cleared by resetting tag directory indicators, a round robin counter and a column full flag, for each column in a four level set associative tag directory to the cache store. The data in the cache store need not be cleared. Using the four level set associative tag directory permits the data information in the cache store to be invalidated by a 16 pulse burst of signals for 1K words of cache store directed to the tag directory indicators.

67 citations

Book
01 Jan 1975
TL;DR: It is shown that the performance of the Direct Mapping buffer under near-optimal restructuring is comparable to the performanceOf the Fully Associative buffer, this restructuring is shown to be potentially stronger than that of buffer replacement algorithms.
Abstract: Using the Independent Reference assumption to model program behavior, the performance of different buffer organizations (Fully Associative, Direct Mapping, Set Associative. and Sector) are analyzed' (1) The expressions for their fault rate are derived To show more explicitly the dependence of the fault rate on the factors that affect it, distribution-free upper bounds on fault rates are computed for the Direct Mapping, Set Associative, and Sector buffers The use of such bounds is illustrated in the case of the Direct Mapping buffer (2) The performance of the buffers for FIFO and Random Replacement are shown to be identical (3) It is possible to restructure programs to take advantage of the basic organization of the buffers The effect of such restructuring is quantified for the Direct Mapping buffer It is shown that the performance of the Direct Mapping buffer under near-optimal restructuring is comparable to the performance of the Fully Associative buffer Further, the effect of this restructuring is shown to be potentially stronger than that of buffer replacement algorithms

67 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Compiler
26.3K papers, 578.5K citations
89% related
Scalability
50.9K papers, 931.6K citations
87% related
Server
79.5K papers, 1.4M citations
86% related
Static routing
25.7K papers, 576.7K citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202342
2022110
202112
202020
201915
201830