Top 53 papers published in the topic of Smart Cache in 1991

Proceedings Article•DOI•

The cache performance and optimizations of blocked algorithms

[...]

Monica D. Lam, Edward E. Rothberg, Michael E. Wolf

01 Apr 1991

TL;DR: It is shown that the degree of cache interference is highly sensitive to the stride of data accesses and the size of the blocks, and can cause wide variations in machine performance for different matrix sizes.

...read moreread less

Abstract: Blocking is a well-known optimization technique for improving the effectiveness of memory hierarchies. Instead of operating on entire rows or columns of an array, blocked algorithms operate on submatrices or blocks, so that data loaded into the faster levels of the memory hierarchy are reused. This paper presents cache performance data for blocked programs and evaluates several optimization to improve this performance. The data is obtained by a theoretical model of data conflicts in the cache, which has been validated by large amounts of simulation. We show that the degree of cache interference is highly sensitive to the stride of data accesses and the size of the blocks, and can cause wide variations in machine performance for different matrix sizes. The conventional wisdom of frying to use the entire cache, or even a fixed fraction of the cache, is incorrect. If a fixed block size is used for a given cache size, the block size that minimizes the expected number of cache misses is very small. Tailoring the block size according to the matrix size and cache parameters can improve the average performance and reduce the variance in performance for different matrix sizes. Finally, whenever possible, it is beneficial to copy non-contiguous reused data into consecutive locations.

...read moreread less

982 citations

Proceedings Article•DOI•

The effect of context switches on cache performance

[...]

Jeffrey C. Mogul, Anita Borg

01 Apr 1991

TL;DR: This work fed address traces of the processes running on a multi-tasking operating system through a cache simulator, to compute accurate cache-hit rates over short intervals, and estimated the cache performance reduction caused by a context switch.

...read moreread less

Abstract: The sustained performance of fast processors is critically dependent on cache performance. Cache performance in turn depends on locality of reference. When an operating system switches contexts, the assumption of locality may be violated because the instructions and data of the newly-scheduled process may no longer be in the cache(s). Context-switching thus has a cost above that associated with that of the operations performed by the kernel. We fed address traces of the processes running on a multi-tasking operating system through a cache simulator, to compute accurate cache-hit rates over short intervals. By marking the output of such a simulation whenever a context switch occurs, and then aggregating the post-context-switch results of a large number of context switches, it is possible to estimate the cache performance reduction caused by a switch. Depending on cache parameters the net cost of a context switch appears to be in the thousands of cycles, or tens to hundreds of microseconds. This technical note is a preprint of a paper to appear in the Proceedings of the Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission.

...read moreread less

272 citations

Proceedings Article•DOI•

Data caching tradeoffs in client-server DBMS architectures

[...]

Michael J. Carey¹, Michael J. Franklin¹, Miron Livny¹, Eugene J. Shekita¹•Institutions (1)

University of Wisconsin-Madison¹

01 Apr 1991

TL;DR: This paper presents a range of lock-based cache consistency algorithms that arise by viewing cache consistency as aiant of the well-understood problem of replicated data management, and uses a detailed simulation model to study the performance of these algorithm over a wide range of workloads end system resource configurations.

...read moreread less

Abstract: In this paper, we examine the performance tradeoffs that are raised by caching data in the client workstations of a client-server DBMS. We begin by presenting a range of lock-based cache consistency algorithms that arise by viewing cache consistency as a v~iant of the well-understood problem of replicated data management. We then use a detailed simulation model to study the performance of these algorithm over a wide range of workloads end system resource configurations. The results illustrate the key performance tradeoffs related to clientserver cache consistency, and should be of use to designers of next-generation DBMS prototypes and products.

...read moreread less

230 citations

Proceedings Article•DOI•

Data prefetching in multiprocessor vector cache memories

[...]

John Wai Cheong Fu¹, Janak H. Patel¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Apr 1991

TL;DR: This paper reports the cache performance of a set of vectorized numerical program from the Perfect Club benchmarks and describes two simple prefetch schemes to reduce the influence of long stride vector accesses and misses due IO block invalidations in mulliprocessor vector caches.

...read moreread less

Abstract: This paper reports the cache performance of a set of vectorized numerical program from the Perfect Club benchmarks. Using a low cost trace driven simularion technique we show how a non-prefetching vector cache can result in unpredictable performance and how rhis unpredictability makes it difficult to find a good block size. We describe two simple prefetch schemes to reduce the influence of long stride vector accesses and misses due IO block invalidations in mulliprocessor vector caches. These two schemes are shown to have better performance than a non-prefetching cache.

...read moreread less

163 citations

Patent•

Shared two level cache including apparatus for maintaining storage consistency

[...]

Patrick Gallagher¹, Gregor Steven Lee¹, Reeve Stephen Michael¹•Institutions (1)

IBM¹

20 Aug 1991

TL;DR: In this paper, a multilevel cache buffer for a multiprocessor system is described, where each processor has a level one cache storage unit which interfaces with a level two cache unit and main storage unit shared by all processors.

...read moreread less

Abstract: A multilevel cache buffer for a multiprocessor system in which each processor has a level one cache storage unit which interfaces with a level two cache unit and main storage unit shared by all processors. The multiprocessors share the level two cache according to a priority algorithm. When data in the level two cache is updated, corresponding data in level one caches is invalidated until it is updated.

...read moreread less

115 citations

Patent•

Method and apparatus for incorporating cache line replacement and cache write policy information into tag directories in a cache system

[...]

Roger E. Tipley, Philip C. Kelly¹•Institutions (1)

Hewlett-Packard¹

30 Aug 1991

TL;DR: In this article, a method and apparatus for incorporating cache line replacement and cache write policy information into the tag directories in a cache system is presented, which can be generalized to caches which include a number of ways greater than two by using a pseudo-LRU algorithm and utilizing group select bits in each way to distinguish between least recently used groups.

...read moreread less

Abstract: A method and apparatus for incorporating cache line replacement and cache write policy information into the tag directories in a cache system. In a 2 way set-associative cache, one bit in each way's tag RAM is reserved for LRU information, and the bits are manipulated such that the Exclusive-OR of each way's bits points to the actual LRU cache way. Since all of these bits must be read when the cache controller determines whether a hit or miss has occurred, the bits are available when a cache miss occurs and a cache line replacement is required. The method can be generalized to caches which include a number of ways greater than two by using a pseudo-LRU algorithm and utilizing group select bits in each of the ways to distinguish between least recently used groups. Cache write policy information is stored in the tag RAM's to designate various memory areas as write-back or write-through. In this manner, system memory situated on an I/O bus which does not recognize inhibit cycles can have its data cached.

...read moreread less

113 citations

Patent•

Automatic cache bypass for instructions exhibiting poor cache hit ratio

[...]

Jamshed H. Mirza¹•Institutions (1)

IBM¹

15 Apr 1991

TL;DR: In this paper, a cache bypass mechanism automatically avoids caching of data for instructions whose data references, for whatever reason, exhibit low cache hit ratio, and this record is used to decide whether its future references should be cached or not.

...read moreread less

Abstract: A cache bypass mechanism automatically avoids caching of data for instructions whose data references, for whatever reason, exhibit low cache hit ratio. The mechanism keeps a record of an instruction's behavior in the immediate past, and this record is used to decide whether its future references should be cached or not. If an instruction is experiencing bad cache hit ratio, it is marked as non-cacheable, and its data references are made to bypass the cache. This avoids the additional penalty of unnecessarily fetching the remaining words in the line, reduces the demand on the memory bandwidth, avoids flushing the cache of useful data and, in parallel processing environments, prevents line thrashing. The cache management scheme is automatic and requires no compiler or user intervention.

...read moreread less

82 citations

Journal Article•DOI•

Efficient trace-driven simulation methods for cache performance analysis

[...]

Wen-Hann Wang¹, Jean-Loup Baer²•Institutions (2)

IBM¹, University of Washington²

01 Aug 1991-ACM Transactions on Computer Systems

TL;DR: This work reduces the program traces to the extent that exact performance can still be obtained from the reduced traces and devise an algorithm that can produce performance results for a variety of metrics for a large number of set-associative write-back caches in just a single simulation run.

...read moreread less

Abstract: We propose improvements to current trace-driven cache simulation methods to make them faster and mnre economical. We attack the large time and space demands of cache simulation in two nays. First, we reduce the program traces to the extent that exact performance can still be obtained from the reduced traces. Second, we devise an algorithm that can produce performance results for a variety of metrics (hit ratio, write-back counts, bus traffic) for a large number of set-associative write-back caches in just a single simulation run. The trace reduction and the efficient simulation techniques are extended to parallel multiprocessor cache simulations. Our simulation results show that our approach substantially reduces the disk space needed to store the program traces and can dramatically speedup cache simulations and still produce the exact results.

...read moreread less

77 citations

Patent•

Selectively locking memory locations within a microprocessor's on-chip cache

[...]

Donald B. Alpert¹, Oved Oz¹, Gideon Intrater¹, Reuven Marko¹, Alon Shacham¹ - Show less +1 more•Institutions (1)

National Semiconductor¹

16 May 1991

TL;DR: In this article, a microprocessor architecture that includes capabilities for locking individual entries into its integrated instruction cache and data cache while leaving the remainder of the cache unlocked and available for use in capturing the microprocessor's dynamic locality of reference is presented.

...read moreread less

Abstract: A microprocessor architecture that includes capabilities for locking individual entries into its integrated instruction cache and data cache while leaving the remainder of the cache unlocked and available for use in capturing the microprocessor's dynamic locality of reference. The microprocessor also includes the capability for locking instruction cache entries without requiring that the instructions be executed during the locking process.

...read moreread less

57 citations

Proceedings Article•

Cache Coherence on a Slotted Ring.

[...]

Luiz André Barroso, Michel Dubois

01 Jan 1991

TL;DR: This paper introduces the Express Ring architecture and presents a snooping cache coherence protocol for this machine, and shows how consistency of shared memory accesses can be efficiently maintained in a ring-connected multiprocessor.

...read moreread less

Abstract: 1 Abstract-The Express Ring is a new architecture under investigation at the University of Southern California. Its main goal is to demonstrate that a slotted unidirectional ring with very fast point-to-point interconnections can be at least ten times faster than a shared bus, using the same technology, and may be the topology of choice for future shared-memory multiprocessors. In this paper we introduce the Express Ring architecture and present a snooping cache coherence protocol for this machine. This protocol shows how consistency of shared memory accesses can be efficiently maintained in a ring-connected multiprocessor. We analyze the proposed protocol and compare it to other more usual alternatives for point-to-point connected machines, such as the SCI cache coherence protocol and directory based protocols.

...read moreread less

47 citations

Report•DOI•

The Effect of Garbage Collection on Cache Performance

[...]

Benjamin G. Zorn

01 May 1991

TL;DR: Results suggest that garbage collection algorithms will play an important part in improving cache performance as processor speeds increase and two-way set-associative caches are shown to reduce the miss rate in stop-and-copy algorithms often by a factor of two and sometimes by almost five over direct-mapped caches.

...read moreread less

Abstract: : Cache performance is an important part of total performance in modern computer systems. This paper describes the use of trace-driven simulation to estimate the effect of garbage collection algorithms on cache performance Traces from four large Common Lisp programs have been collected and analyzed with an all-associatively cache simulator. While previous work has focused on the effect of garbage collection on page reference locality this evaluation unambiguously shows that garbage collection algorithms can have a profound effect on cache performance as well. On processors with a direct-mapped cache a generation stop-and-copy algorithm exhibits a miss rate up to four times higher than a comparable generation mark-and-sweep algorithm. Furthermore, two-way set-associative caches are shown to reduce the miss rate in stop-and-copy algorithms often by a factor of two and sometimes by a factor of almost five over direct-mapped caches. As processor speeds increase, cache performance will play an increasing role in total performance. These results suggest that garbage collection algorithms will play an important part in improving that performance.

...read moreread less

Patent•

A data processor having a deferred cache load

[...]

Pamela S. Laakso¹, Bradley Martin¹•Institutions (1)

Motorola¹

10 Jan 1991

TL;DR: In this paper, a data processing system (10) is provided having a secondary cache (34) for performing a deferred cache load, where the primary cache (26) is compared with the indexed entries in a primary cache, and the physical address corresponding to the single cache line stored in the secondary cache(34).

...read moreread less

Abstract: A data processing system (10) is provided having a secondary cache (34) for performing a deferred cache load. The data processing system (10) has a pipelined integer unit (12) which uses an instruction prefetch unit (IPU) (12). The (IPU) (12) issues prefetch requests to a cache controller (22) and transfers a prefetch address to a cache address memory management unit (CAMMU) (24), for translation into a corresponding physical address. The physical address is compared with the indexed entries in a primary cache (26), and compared with the physical address corresponding to the single cache line stored in the secondary cache (34). When a prefetch miss occurs in both the primary (26) and the secondary cache (34), the cache controller (22) issues a bus transfer request to retrieve the requested cache line from an external memory (20). While a bus controller (16) performs the bus transfer, the cache controller (22) loads the primary cache (26) with the cache line currently stored in the secondary cache (34).

...read moreread less

Patent•

Cache bypass apparatus

[...]

James E. Bohner¹, Thang T. Do¹, Richard J. Gusefski¹, Kevin Huang¹, Chon I. Lei¹ - Show less +1 more•Institutions (1)

IBM¹

25 Feb 1991

TL;DR: In this paper, an inpage buffer is used between a cache and a slower storage device to provide data corresponding to subsequent requests, provided that the buffer is also able to contain such data.

...read moreread less

Abstract: An inpage buffer is used between a cache and slower storage device. When a processor requests data, the cache is checked to see if the data is already in the cache. If not, a request for the data is sent to the slower storage device. The buffer receives the data from the slower storage device and provides the data to the processor that requested the data. The buffer then provides the data to the cache for storage provided that the cache is not working on a separate storage request from the processor. The data will be written into the cache from the buffer when the cache is free from such requests. The buffer is also able to provide data corresponding to subsequent requests provided it contains such data. This may happen if a request for the same data occurs, and the buffer has not yet written the data into the cache. It can also occur if the areas of the cache which can hold data from an area of the slower storage is inoperable for some reason. The buffer acts as a minicache when such a catastrophic error in the cache occurs.

...read moreread less

Patent•

Bus protocol for write-back cache processor

[...]

Rebecca L. Stamm, John H. Edmondson, David Archer, Samyojita Nadkarni, Raymond Strouble - Show less +1 more

26 Jun 1991

Patent•

Method for avoiding cache misses during external tournament tree replacement sorting procedures

[...]

James T. Brady¹, Balakrishna R. Iyer¹•Institutions (1)

IBM¹

23 Dec 1991

TL;DR: In this article, a method and apparatus for avoiding line-accessed cache misses during a replacement/selection (tournament) sorting process is presented, which avoids the second merge phase overhead that formerly doubled the sorting time necessary for larger cache sizes.

...read moreread less

Abstract: A method and apparatus for avoiding line-accessed cache misses during a replacement/selection (tournament) sorting process. Prior to the sorting phase, the method includes the steps of sizing and writing maximal sets of sub-tree nodes of a nested ordering of keys, suitable for staging as cache lines. During the sort phase, the method includes the steps of prefetching into cache from CPU main memory one or more cache lines formed from a sub-tree of ancestor nodes immediate to the node in cache just selected for replacement. The combination of the clustering of ancestor nodes within individual cache lines and the prefetching of cache lines upon replacement node selection permits execution of the full tournament sort procedure without the normally-expected cache miss rate. For selection trees larger than those that can fit entirely into cache, the method avoids the second merge phase overhead that formerly doubled the sorting time necessary for larger cache sizes.

...read moreread less

Book Chapter•DOI•

Allocating SMART cache segments for schedulability

[...]

D.B. Kirk¹, Jay K. Strosnider², J.E. Sasinowski²•Institutions (2)

IBM¹, Carnegie Mellon University²

12 Jun 1991-Real-time Systems

TL;DR: An overview of the SMART caching strategy is presented, as well as a dynamic programming algorithm which finds an allocation of cache segments to a set of periodic tasks that both minimizes the utilization of the task set and guaranteeing that thetask set remains schedulable using rate monotonic scheduling.

...read moreread less

Abstract: Since they were first introduced in the IBM 360/85 in 1969, cache designs have been optimized for average case performance, which has opened a wide gap between average case performance and the worst case performance that is critical to real-time computing community. The SMART (Strategic Memory Allocation for Real-Time) cache design narrows this gap. This paper focuses on an analytical approach to cache allocation. An overview of the SMART caching strategy is presented, as well as a dynamic programming algorithm which finds an allocation of cache segments to a set of periodic tasks that both minimizes the utilization of the task set and guaranteeing that the task set remains schedulable using rate monotonic scheduling. Results which show SMART caches narrowing the gap between average and worst case performance to less than 10% are then presented. >

...read moreread less

Journal Article•DOI•

Using cache mechanisms to exploit nonrefreshing DRAMs for on-chip memories

[...]

D.D. Lee¹, Randy H. Katz²•Institutions (2)

PARC¹, University of California, Berkeley²

01 Apr 1991-IEEE Journal of Solid-state Circuits

TL;DR: A new cache design approach is described that makes use of a selective invalidation technique that invalidates only those cache entries that are not fresh, without interrupting the processor execution stream and without degrading the cache performance.

...read moreread less

Abstract: On-chip memories are becoming an established feature in single-chip microprocessor designs because they significantly improve performance. It is particularly important for single-chip reduced instruction set computer (RISC) microprocessors to include large, high-speed memories, because RISC chips must reduce off-chip memory delays to achieve the shortest possible cycle time. The use of dynamic RAM for all on-chip cache results in all important increased density of local memory for a given scarce chip area, but complicates the processor control due to the inherent requirement for refreshing. By using simple circuit techniques and making a few modifications to cache organization, the refreshing requirement of dynamic RAM can be eliminated. This new cache design approach is described. It makes use of a selective invalidation technique that invalidates only those cache entries that are not fresh. This is accomplished without interrupting the processor execution stream and without degrading the cache performance. >

...read moreread less

Patent•

System for execution of storage-immediate and storage-storage instructions within cache buffer storage

[...]

Steven Lee Gregor¹•Institutions (1)

IBM¹

26 Jun 1991

TL;DR: In this article, a cache storage system having hardware for in-cache execution of storage-storage and storage-immediate instructions is proposed to obviate the need for data to be moved from the cache to a separate execution unit and back to cache.

...read moreread less

Abstract: A cache storage system having hardware for in-cache execution of storage-storage and storage-immediate instructions thereby obviating the need for data to be moved from the cache to a separate execution unit and back to cache.

...read moreread less

Proceedings Article•

A Lockup-Free Multiprocessor Cache Design.

[...]

Per Stenström, Fredrik Dahlgren, Lars Lundberg

01 Jan 1991

Patent•

Multilevel inclusion in multilevel cache hierarchies

[...]

Roger E. Tipley

14 Jun 1991

TL;DR: In this article, the caches align themselves on a "way" basis by their respective cache controllers communicating with each other as to which blocks of data they are replacing and which of their cache ways are being filled with data.

...read moreread less

Abstract: A method for achieving multilevel inclusion in a computer system with first and second level caches. The caches align themselves on a "way" basis by their respective cache controllers communicating with each other as to which blocks of data they are replacing and which of their cache ways are being filled with data. On first and second level cache read misses the first level cache controller provides way information to the second level cache controller to allow received data to be placed in the same way. On first level cache read misses and second level cache read hits, the second level cache controller provides way information the first level cache controller, which ignores its replacement indication and places data in the indicated way. On processor writes the first level cache controller caches the writes and provides the way information to the second level cache controller which also caches the writes and uses the way information to select the proper way for data storage. An inclusion bit is set on data in the second level cache that is duplicated in the first level cache. Multilevel inclusion allows the second level cache controller to perform the principal snooping responsibilities for both caches, thereby enabling the first level cache controller to avoid snooping duties until a first level cache snoop hit occurs. On a second level cache snoop hit, the second level cache controller checks the respective inclusion bit to determine if a copy of this data also resides in the first level cache. The first level cache controller is directed to snoop the bus only if the respective inclusion bit is set.

...read moreread less

Patent•

Automatic cache flush

[...]

Fong-Lu Lin

19 Dec 1991

TL;DR: In this article, a chipset is provided which powers up in a default state with caching disabled and which writes permanently non-cacheable tags into tag RAM entries corresponding to memory addresses being read while caching is disabled.

...read moreread less

Abstract: According to the invention, a chipset is provided which powers up in a default state with caching disabled and which writes permanently non-cacheable tags into tag RAM entries corresponding to memory addresses being read while caching is disabled. Even though no "valid" bit is cleared, erroneous cache hits after caching is enabled are automatically prevented since any address which does match a tag in the tag RAM, is a non-cacheable address and will force retrieval directly from main memory anyway.

...read moreread less

Journal Article•DOI•

Shared block contention in a cache coherence protocol

[...]

Michel Dubois¹, J.-C. Wang•Institutions (1)

University of Southern California¹

01 May 1991-IEEE Transactions on Computers

TL;DR: A simple program model for data and block sharing is introduced, and an analytical closed-form solution is found for all components of the cache coherence overhead based on the observation that shared writable blocks are accessed in critical or in semicritical sections.

...read moreread less

Abstract: Simulation is used to analyze shared block contention in eight parallel algorithms and its effects on the performance of a cache coherence protocol under the assumption of infinite cache sizes. A simple program model for data and block sharing is introduced, and an analytical closed-form solution is found for all components of the cache coherence overhead. This model is based on the observation that shared writable blocks are accessed in critical or in semicritical sections. The program model is applied to the analysis of multiprocessor systems with finite cache sizes and for steady state computations. The authors compare the model predictions to the results of execution-driven simulations of eight parallel algorithms. The simulation is conducted for various numbers of processors and different cache block sizes. >

...read moreread less

Proceedings Article•DOI•

Implementing a cache for a high-performance GaAs microprocessor

[...]

O.A. Olukotun¹, Trevor Mudge¹, Richard B. Brown¹•Institutions (1)

University of Michigan¹

01 Apr 1991

Showing papers on "Smart Cache published in 1991"