scispace - formally typeset
Search or ask a question
Topic

Cache invalidation

About: Cache invalidation is a research topic. Over the lifetime, 10539 publications have been published within this topic receiving 245409 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Two schemes for implementing associativity greater than two are proposed, which are an extension of the column-associative cache and the parallel multicolumn cache, which can effectively reduce the average access time.
Abstract: In the race to improve cache performance, many researchers have proposed schemes that increase a cache's associativity. The associativity of a cache is the number of places in the cache where a block may reside. In a direct-mapped cache, which has an associativity of 1, there is only one location to search for a match for each reference. In a cache with associativity n-an n-way set-associative cache-there are n locations. Increasing associativity reduces the miss rate by decreasing the number of conflict, or interference, references. The column-associative cache and the predictive sequential associative cache seem to have achieved near-optimal performance for an associativity of two. Increasing associativity beyond two, therefore, is one of the most important ways to further improve cache performance. We propose two schemes for implementing associativity greater than two: the sequential multicolumn cache, which is an extension of the column-associative cache, and the parallel multicolumn cache. For an associativity of four, they achieve the low miss rate of a four-way set-associative cache. Our simulation results show that both schemes can effectively reduce the average access time.

84 citations

Proceedings ArticleDOI
09 Mar 2015
TL;DR: A priority-based cache allocation (PCAL) that provides preferential cache capacity to a subset of high-priority threads while simultaneously allowing lower priority threads to execute without contending for the cache is proposed.
Abstract: GPUs employ massive multithreading and fast context switching to provide high throughput and hide memory latency. Multithreading can Increase contention for various system resources, however, that may result In suboptimal utilization of shared resources. Previous research has proposed variants of throttling thread-level parallelism to reduce cache contention and improve performance. Throttling approaches can, however, lead to under-utilizing thread contexts, on-chip interconnect, and off-chip memory bandwidth. This paper proposes to tightly couple the thread scheduling mechanism with the cache management algorithms such that GPU cache pollution is minimized while off-chip memory throughput is enhanced. We propose priority-based cache allocation (PCAL) that provides preferential cache capacity to a subset of high-priority threads while simultaneously allowing lower priority threads to execute without contending for the cache. By tuning thread-level parallelism while both optimizing caching efficiency as well as other shared resource usage, PCAL builds upon previous thread throttling approaches, improving overall performance by an average 17% with maximum 51%.

84 citations

Proceedings ArticleDOI
04 Jun 2011
TL;DR: The parallel cache-oblivious (PCO) model is presented, a relatively simple modification to the CO model that can be used to account for costs on a broad range of cache hierarchies, and a new scheduler is described, which attains provably good cache performance and runtime on parallel machine models with hierarchical caches.
Abstract: For nested-parallel computations with low depth (span, critical path length) analyzing the work, depth, and sequential cache complexity suffices to attain reasonably strong bounds on the parallel runtime and cache complexity on machine models with either shared or private caches. These bounds, however, do not extend to general hierarchical caches, due to limitations in (i) the cache-oblivious (CO) model used to analyze cache complexity and (ii) the schedulers used to map computation tasks to processors. This paper presents the parallel cache-oblivious (PCO) model, a relatively simple modification to the CO model that can be used to account for costs on a broad range of cache hierarchies. The first change is to avoid capturing artificial data sharing among parallel threads, and the second is to account for parallelism-memory imbalances within tasks. Despite the more restrictive nature of PCO compared to CO, many algorithms have the same asymptotic cache complexity bounds.The paper then describes a new scheduler for hierarchical caches, which extends recent work on "space-bounded schedulers" to allow for computations with arbitrary work imbalance among parallel subtasks. This scheduler attains provably good cache performance and runtime on parallel machine models with hierarchical caches, for nested-parallel computations analyzed using the PCO model. We show that under reasonable assumptions our scheduler is "work efficient" in the sense that the cost of the cache misses are evenly balanced across the processors---i.e., the runtime can be determined within a constant factor by taking the total cost of the cache misses analyzed for a computation and dividing it by the number of processors. In contrast, to further support our model, we show that no scheduler can achieve such bounds (optimizing for both cache misses and runtime) if work, depth, and sequential cache complexity are the only parameters used to analyze a computation.

84 citations

Patent
30 Sep 1999
TL;DR: In this paper, a multiple level cache structure and multiple level caching method that distributes I/O processing loads including caching operations between processors to provide higher performance processing, especially in a server environment is presented.
Abstract: This inventive provides a multiple level cache structure and multiple level caching method that distributes I/O processing loads including caching operations between processors to provide higher performance I/O processing, especially in a server environment. A method of achieving optimal data throughput by taking full advantage of multiple processing resources is disclosed. A method for managing the allocation of the data caches to optimize the host access time and parity generation is disclosed. A cache allocation for RAID stripes guaranteed to provide fast access times for the XOR engine by ensuring that all cache lines are allocated from the same cache level is disclosed. Allocation of cache lines for RAID levels which do not require parity generation and are allocated in such manner as to maximize utilization of the memory bandwidth is disclosed. Parity generation which is optimized for use of the processor least utilized at the time the cache lines are allocated, thereby providing for dynamic load balancing amongst the multiple processing resources, is disclosed. An inventive cache line descriptor for maintaining information about which cache data pool the cache line resides within, and an inventive cache line descriptor which includes enhancements to allow for movement of cache data from one cache level to another is disclosed. A cache line descriptor with enhancements for tracking the cache within which RAID stripe cache lines siblings reside is disclosed. System, apparatus, computer program product, and methods to support these aspects alone and in combination are also provided.

84 citations

Patent
29 May 2002
TL;DR: In this paper, a method and apparatus in a data processing system for caching data in an internal cache and in an external cache is presented. But this method is limited to the case where a set of fragments is received for caching and a location is identified to store each fragment within the plurality of fragments.
Abstract: A method and apparatus in a data processing system for caching data in an internal cache and in an external cache A set of fragments is received for caching. A location is identified to store each fragment within the plurality of fragments based on a rate of change of data in each fragment. The set of fragments is stored in the internal cache and the external cache using the location identified for each fragment within the plurality of fragments.

84 citations


Network Information
Related Topics (5)
Cache
59.1K papers, 976.6K citations
93% related
Scalability
50.9K papers, 931.6K citations
88% related
Server
79.5K papers, 1.4M citations
88% related
Network packet
159.7K papers, 2.2M citations
83% related
Dynamic Source Routing
32.2K papers, 695.7K citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202344
2022117
20214
20208
20197
201820