Showing papers on "Cache invalidation published in 1993"

PDF

Open Access

Patent•

Coherence controls for store-multiple shared data coordinated by cache directory entries in a shared electronic storage

[...]

Kelly Carpenter¹, Gerard Maclean Dearing¹, Jeffrey M. Nick¹, Jimmy Paul Strickland¹, Michael D. Swanson¹, Wendell W. Wilkinson¹ - Show less +2 more•Institutions (1)

IBM¹

08 Nov 1993

TL;DR: In this paper, the authors propose a shared high-speed cache management logic to meet the serialization and data coherency requirements of data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.

...read moreread less

Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the shared high-speed cache is provided to meet the serialization and data coherency requirements of the data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.

...read moreread less

478 citations

Book•

The Cache Memory Book

[...]

James E. Handy

01 Jan 1993

TL;DR: What is Cache Memory?

...read moreread less

Abstract: What is Cache Memory? How are Caches Designed? Cache Memories and RISC Processors. Maintaining Coherency in Cached Systems. Cute Cache Tricks. Subject Index.

...read moreread less

447 citations

Proceedings Article•DOI•

An adaptive cache coherence protocol optimized for migratory sharing

[...]

Per Stenström¹, Mats Brorsson¹, Lars Sandberg¹•Institutions (1)

Lund University¹

01 May 1993

TL;DR: An adaptive protocol is proposed that effectively eliminates most single invalidations and improves the performance by reducing the shared access penalty and the network traffic.

...read moreread less

Abstract: Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor typically causes a cache miss followed by an invalidation request which could be merged with the preceding cache-miss request.In this paper we propose an adaptive protocol that invokes this optimization dynamically for migratory blocks. For other blocks, the protocol works as an ordinary write-invalidate protocol. We show that the protocol is a simple extension to a write-invalidate protocol.Based on a program-driven simulation model of an architecture similar to the Stanford DASH, and a set of four benchmarks, we evaluate the potential performance improvements of the protocol. We find that it effectively eliminates most single invalidations which improves the performance by reducing the shared access penalty and the network traffic.

...read moreread less

239 citations

Proceedings Article•DOI•

Cache write policies and performance

[...]

Norman P. Jouppi

01 May 1993

TL;DR: Tradeoffs on writes that miss in the cache are investigated and a mixture of these two alternatives, called write caching, which places a small fully-associative cache behind a write-through cache.

...read moreread less

Abstract: This paper investigates issues involving writes and caches. First, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line is written before hit or miss is known are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. The combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Second, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache.

...read moreread less

234 citations

Patent•

Locating resources in computer networks having cache server nodes

[...]

Ray William Boyles¹, Michael Francis Gierlach¹, Prabandham Madan Gopal¹, Robert Sultan¹, Gary Michael Vacek¹ - Show less +1 more•Institutions (1)

IBM¹

23 Mar 1993

TL;DR: Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests as mentioned in this paper, where an origin cache server receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary.

...read moreread less

Abstract: A computer network in which resources are dynamically located through the use of LOCATE requests includes multiple cache server nodes, network nodes which have an additional obligation to build and maintain large caches of directory entries. Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests. Where an origin cache server node receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary. If the necessary information isn't found locally or in alternate cache server nodes, the LOCATE request is then broadcast to all network nodes in the network. If the broadcast results are negative, the request is forwarded to selected gateway nodes to permit the search to continue in adjacent networks.

...read moreread less

199 citations

Patent•

System and Method for dynamically controlling cache management

[...]

Moshe Yanai¹, Natan Vishlitzky¹, Bruno Alterescu¹, Daniel Castel¹•Institutions (1)

EMC Corporation¹

03 Jun 1993

TL;DR: In this article, a cache management system and method coupled to at least one host and one data storage device is presented, where a cache indexer maintains a current index (25) of data elements which are stored in cache memory.

...read moreread less

Abstract: A cache management system and method monitors and controls the contents of cache memory (12) coupled to at least one host (22a) and at least one data storage device (18a). A cache indexer (16) maintains a current index (25) of data elements which are stored in cache memory (12). A sequential data access indicator (30), responsive to the cache index (16) and to a user selectable sequential data access threshold, determines that a sequential data access is in progress for a given process and provides an indication of the same. The system and method allocate a micro-cache memory (12) to any process performing a sequential data access. In response to the indication of a sequential data access in progress and to a user selectable maximum number of data elements to be prefetched, a data retrieval requestor requests retrieval of up to the selected maximum number of data elements from a data storage device (18b). A user selectable number of sequential data elements determines when previously used micro-cache memory locations will be overwritten. A method of dynamically monitoring and adjusting cache management parameters is also disclosed.

...read moreread less

179 citations

Patent•

System for maintaining data coherency in cache memory by periodically broadcasting invalidation reports from server to client

[...]

Daniel Barbará¹, Tomasz Imielinski¹•Institutions (1)

Princeton University¹

06 Dec 1993

TL;DR: In this article, a method and system for maintaining coherency between a server processor and a client processor that has a cache memory is presented, where the server processor periodically broadcasts invalidation reports to the client processor.

...read moreread less

Abstract: A method and system are provided for maintaining coherency between a server processor and a client processor that has a cache memory. The server may, for example, be a fixed location mobile unit support station. The client may, for example, be a palmtop computer. The server stores a plurality of data values, and the client stores a subset of the plurality of data values in the cache. The server processor periodically broadcasts invalidation reports to the client processor. Each respective invalidation report includes information identifying which, if any, of the plurality of data values have been updated within a predetermined period of time before the server processor broadcasts the respective invalidation report. The client processor determines, based on the invalidation reports, whether a selected data value in the cache memory of the client processor has been updated in the server processor since the selected data value was stored in the cache memory. The client processor invalidates the selected data value in the cache memory of the client processor, if the selected data value has been updated in the server processor.

...read moreread less

172 citations

Proceedings Article•DOI•

Efficient simulation of caches under optimal replacement with applications to miss characterization

[...]

Rabin A. Sugumar, Santosh G. Abraham

01 Jun 1993

TL;DR: The OPT model is proposed that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model, and three new techniques for optimal cache simulation are presented.

...read moreread less

Abstract: Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches.

...read moreread less

168 citations

Patent•

Methods and apparatus for implementing a pseudo-LRU cache memory replacement scheme with a locking feature

[...]

Adam Malamy¹, Rajiv N. Patel¹, Norman M. Hayes¹•Institutions (1)

Sun Microsystems¹

19 Apr 1993

TL;DR: In this paper, a cache memory replacement scheme with a locking feature is provided, where the locking bits associated with each line in the cache are supplied in the tag table and used by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in cache to replace.

...read moreread less

Abstract: In a memory system having a main memory and a faster cache memory, a cache memory replacement scheme with a locking feature is provided. Locking bits associated with each line in the cache are supplied in the tag table. These locking bits are preferably set and reset by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in the cache to replace. The lock bits and replacement bits for a cache line are "ORed" to create a composite bit for the cache line. If the composite bit is set the cache line is not removed from the cache. When deadlock due to all composite bits being set will result, all replacement bits are cleared. One cache line is always maintained as non-lockable. The locking bits "lock" the line of data in the cache until such time when the process resets the lock bit. By providing that the process controls the state of the lock bits, the intelligence and knowledge the process contains regarding the frequency of use of certain memory locations can be utilized to provide a more efficient cache.

...read moreread less

153 citations

Journal Article•DOI•

Replication algorithms in a remote caching architecture

[...]

Avraham Leff¹, Joel L. Wolf¹, Philip S. Yu¹•Institutions (1)

IBM¹

01 Nov 1993-IEEE Transactions on Parallel and Distributed Systems

TL;DR: Performance of the distributed algorithms is found to be close to optimal, while that of the greedy algorithms is far from optimal.

...read moreread less

Abstract: Studies the cache performance in a remote caching architecture. The authors develop a set of distributed object replication policies that are designed to implement different optimization goals. Each site is responsible for local cache decisions, and modifies cache contents in response to decisions made by other sites. The authors use the optimal and greedy policies as upper and lower bounds, respectively, for performance in this environment. Critical system parameters are identified, and their effect on system performance studied. Performance of the distributed algorithms is found to be close to optimal, while that of the greedy algorithms is far from optimal. >

...read moreread less

135 citations

Journal Article•DOI•

Cache coherence in large-scale shared-memory multiprocessors: issues and comparisons

[...]

David J. Lilja¹•Institutions (1)

University of Minnesota¹

01 Sep 1993-ACM Computing Surveys

TL;DR: This paper surveys current cache coherence mechanisms, and identifies several issues critical to their design, and hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different coherence mechanism into a single system.

...read moreread less

Abstract: Private data caches have not been as effective in reducing the average memory delay in multiprocessors as in uniprocessors due to data spreading among the processors, and due to the cache coherence problem. A wide variety of mechanisms have been proposed for maintaining cache coherence in large-scale shared memory multiprocessors making it difficult to compare their performance and implementation implications. To help the computer architect understand some of the trade-offs involved, this paper surveys current cache coherence mechanisms, and identifies several issues critical to their design. These design issues include: 1) the coherence detection strategy, through which possibly incoherent memory accesses are detected either statically at compile-time, or dynamically at run-time; 2) the coherence enforcement strategy, such as updating or invalidating, that is used to ensure that stale cache entries are never referenced by a processor; 3) how the precision of block sharing information can be changed to trade-off the implementation cost and the performance of the coherence mechanism; and 4) how the cache block size affects the performance of the memory system. Trace-driven simulations are used to compare the performance and implementation impacts of these different issues. In addition, hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different coherence mechanisms into a single system.

...read moreread less

Patent•

Method and apparatus for achieving multilevel inclusion in multilevel cache hierarchies

[...]

Roger E. Tipley

27 May 1993

TL;DR: In this article, the caches align on a "way" basis by their respective cache controllers communicating with each other which blocks of data they are replacing and which of their cache ways are being filled with data.

...read moreread less

Abstract: A method for achieving multilevel inclusion in a computer system with first and second level caches. The caches align on a "way" basis by their respective cache controllers communicating with each other which blocks of data they are replacing and which of their cache ways are being filled with data. On first and second level cache read misses the first level cache controller provides way information to the second level cache controller to allow received data to be placed in the same way. On first level cache read misses and second level cache read hits, the second level cache controller provides way information to the first level cache controller, which places data in the indicated way. On processor writes the first level cache controller caches the writes and provides the way information to the second level cache controller which uses the way information to select the proper way for data storage. An inclusion bit is set on data in the second level cache that is duplicated in the first level cache. On a second level cache snoop hit, the second level cache controller checks the respective inclusion bit to determine if a copy of this data also resides in the first level cache. The first level cache controller is directed to snoop the bus only if the respective inclusion bit is set.

...read moreread less

Patent•

Data processing system and method with prefetch buffers

[...]

Norman P. Jouppi

25 Jan 1993

TL;DR: In this paper, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). Misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

Patent•

Two-level cache memory system

[...]

George S. Taylor¹, P. Michael Farmwald, Timothy P. Layman, Huy X. Ngo, Allen W. Roberts - Show less +1 more•Institutions (1)

MIPS Technologies¹

10 May 1993

TL;DR: A two-level cache memory system for use in a computer system including two primary cache memories, one for storing instructions and the other for storing data, is described in this article.

...read moreread less

Abstract: A two-level cache memory system for use in a computer system including two primary cache memories, one for storing instructions and one for storing data. The system also includes a secondary cache memory for storing both instructions and data. The primary and secondary caches each employ their own separate tag directory. The primary caches use a virtual addressing scheme employing both virtual tags and virtual addresses. The secondary cache employs a hybrid addressing scheme which uses virtual tags and partial physical addresses. The primary and secondary caches operate in parallel unless the larger and slower secondary cache is busy performing a previous operation. Only if a "miss" is encountered in both the primary and secondary caches does the system processor access the main memory.

...read moreread less

Patent•

Method and means for dynamic cache management by variable space and time binding and rebinding of cache extents to DASD cylinders

[...]

John G. Aschoff¹, Jeffrey A. Berger¹, David Alan Burton¹, Bruce McNutt¹, Stanley C. Kurtz¹ - Show less +1 more•Institutions (1)

IBM¹

19 May 1993

TL;DR: In this article, the authors allocate read cache space among bands of DASD cylinders rather than to data sets or processes as a function of a weighted average hit ratio to the counterpart cache space.

...read moreread less

Abstract: Dynamic allocation of read cache space is allocated among bands of DASD cylinders rather than to data sets or processes as a function of a weighted average hit ratio to the counterpart cache space. Upon the hit ratio falling below a predetermined threshold, the bands are disabled for a defined interval as measured by cache accesses and then rebound to cache space again.

...read moreread less

Patent•

User-centric system for choosing networked services

[...]

Afshin Jalalian¹, Christopher R. Bingham¹•Institutions (1)

Apple Inc.¹

14 Oct 1993

TL;DR: In this paper, the authors describe a quick-choice cache into which are collected the names and aliases of networked devices or services that are expected to most routinely used by a particular user.

...read moreread less

Abstract: A personal computer or workstation on a network includes a quick-choice cache into which are collected the names and aliases of networked devices or services that are expected to be most routinely used by a particular user. The cache is initialized to contain the names and aliases of devices within a network zone assigned to the workstation. This collection of names/aliases is expanded each time the user makes a connection to a device not previously listed. The cache drives a graphic user interface (GUI) that shows the user what service categories are available within the cache, and then when a service category is selected, what specific devices are included within the cache under that service category. The GUI permits quick logical connection to devices whose aliases are stored in the user's cache. A connection map later graphically shows the user what connections he or she has made.

...read moreread less

Patent•

Apparatus and method for an instruction cache locking scheme

[...]

Scott B. Huck¹, Konrad K. Lai¹, Sunil Shenoy¹, Larry Smith¹•Institutions (1)

Intel¹

09 Feb 1993

TL;DR: In this article, a cache locking scheme in a two-set associative instruction cache that utilizes a specially designed Least Recently Used (LRU) unit to effectively lock a first portion of the instruction cache to allow high speed and predictable execution time for time critical program code sections residing in the first portion while leaving another portion of instruction cache free to operate as an instruction cache for other, non-critical, code sections.

...read moreread less

Abstract: An instruction locking apparatus and method for a cache memory allowing execution time predictability and high speed performance. The present invention implements a cache locking scheme in a two set associative instruction cache that utilizes a specially designed Least Recently Used (LRU) unit to effectively lock a first portion of the instruction cache to allow high speed and predictable execution time for time critical program code sections residing in the first portion while leaving another portion of the instruction cache free to operate as an instruction cache for other, non-critical, code sections. The present invention provides the above features in a system that is virtually transparent to the program code and does not require a variety of complex or specialized instructions or address coding methods. The present invention is flexible in that the two set associative instruction cache is transformed into what may be thought of as a static RAM in cache, and in addition, a direct map cache unit. Several different time critical code sections may be loaded and locked into the cache at different times.

...read moreread less

Patent•

Method of operating a cache system including determining an elapsed time or amount of data written to cache prior to writing to main storage

[...]

Lautzenheiser Marvin

20 Oct 1993

TL;DR: In this article, a cache memory system is proposed to dynamically assign segments of cache memory to correspond to segments of the mass storage device, accept data written by the host into portions of the assigned segments, and determine if the elapsed time since any modified data has been written to the cache memory exceeds a predetermined period of time.

...read moreread less

Abstract: A method for operating a cache memory system which has a high speed cache memory and a mass storage device that operate in a highly efficient manner with a host device. The system operates to dynamically assign segments of the cache memory to correspond to segments of the mass storage device, accept data written by the host into portions of the assigned segments of the cache memory, and determine if the elapsed time since any modified data has been written to the cache memory exceeds a predetermined period of time, or if the number of modified segments to be written to the mass storage device exceeds a preset limit. If so, the cache memory system enables a transfer mechanism to cause modified data to be written from the cache memory to the mass storage device, based on the location of segments relative to a currently selected track of the mass storage device. Movement of updated data from the cache memory (solid state storage) to the mass storage device (which may be, for example, a magnetic disk) and of prefetched data from the mass storage to the cache memory is done on a timely, but unobtrusive, basis as a background task. A direct, private channel between the cache memory and the mass storage device prevents communications between these two media from conflicting with transmission of data between the host and the cache memory system. A set of microprocessors manages and oversees the data transmission and storage. Data integrity is maintained in the event of a power interruption via a battery assisted, automatic and intelligent shutdown procedure.

...read moreread less

Patent•

Decoded instruction cache architecture with each instruction field in multiple-instruction cache line directly connected to specific functional unit

[...]

Einar Ristad, Bjørn Olav Bakka, Inge Birkeli, Nils Anker Orthe

22 Dec 1993

TL;DR: In this paper, a decoded instruction cache with multiple instructions per cache line is proposed, where the decode logic fills the cache line with instructions up to its limit during run time cache misses, enabling the processor to dispatch multiple instructions during one clock cycle.

...read moreread less

Abstract: A general purpose computer system is equipped with apparatus for enabling a processor to provide efficient execution of multiple instructions per clock cycle. The major feature is a decoded instruction cache with multiple instructions per cache line. During run time cache hits, the decode logic fills the cache line with instructions up to its limit. During run time cache misses, the cache line enables the processor to dispatch multiple instructions during one clock cycle. Hereby is achieved high performance with a simple, but still powerful, decode and dispatch logic. An important feature of the instruction cache is that it holds the target addresses for the next instructions. No separate address logic is needed to proceed in the program execution during cache hits. A conditional branch holds its alternative target address in a separate field. This enables the processor, to a large degree, to be independent of the conditional branch bottleneck.

...read moreread less

Patent•

Cache coherency without bus master arbitration signals

[...]

Thomas D. Selgas, Thomas B. Brightman, William C. Patton

01 Oct 1993

TL;DR: In this paper, a method of data communication between asynchronous processes of a computer system is disclosed in connection with a cache coherency system for a processor-cache used in a multi-master computer system in which bus arbitration signals either are not available to the processor cache, or are not exclusively relied on by the processorcache to assure validity of the data in the cache.

...read moreread less

Abstract: A method of data communication between asynchronous processes of a computer system is disclosed in connection with a cache coherency system for a processor-cache used in a multi-master computer system in which bus arbitration signals either are not available to the processor-cache, or are not exclusively relied on by the processor-cache to assure validity of the data in the cache (e.g., a 386-bus compatible computer system using an external secondary cache in which bus arbitration signals are only connected to and used by the secondary cache controller). In an exemplary external-chip implementation, the cache coherency system (120) comprises two PLAs--a FLUSH module (122) and a WAVESHAPING module (124). The FLUSH module (a) receives selected bus cycle definition and control signals from a microprocessor ((110), (b) detects FLUSH (cache invalidation) conditions, i.e., bus master synchronization events, and for each such FLUSH condition, (c) provides a FLUSH output signal. The WAVESHAPING module provides a corresponding CPU/FLUSH signal to the microprocessor with the appropriate set up and hold time. The exemplary bus master synchronization events, or FLUSH conditions, that cause cache invalidation are: (a) hardware generated interrupts, and (b) read or read/write accesses to I/O address space, except for those directed to a hard disk or an external coprocessor. If the bus architecture uses memory-mapped I/O, accesses to selected regions of memory-mapped I/O space could also be used. The cache coherency functionality could be implemented on-board the microprocessor.

...read moreread less

Patent•

Cache system and method for prefetching of data

[...]

Frederick A. Ware¹, Michael Farmwald¹, Craig E. Hampel¹, Karnamadakala Krishnamohan¹•Institutions (1)

Rambus¹

28 May 1993

TL;DR: In this article, a cache system which includes prefetch pointer fields for identifying lines of memory to prefetch thereby minimizing the occurrence of cache misses is proposed, which takes advantage of the previous execution history of the processor and the locality of reference exhibited by the requested addresses.

...read moreread less

Abstract: A cache system which includes prefetch pointer fields for identifying lines of memory to prefetch thereby minimizing the occurrence of cache misses. This cache structure and method for implementing the same takes advantage of the previous execution history of the processor and the locality of reference exhibited by the requested addresses. In particular, each cache line contains a prefetch pointer field which contains a pointer to a line in memory to be prefetched and placed in the cache. By prefetching specified lines of data with temporal locality to the lines of data containing the prefetch pointers the number of cache misses is minimized.

...read moreread less

Patent•

Computer memory architecture including a replacement cache

[...]

David B. Witt¹•Institutions (1)

Advanced Micro Devices¹

09 Dec 1993

TL;DR: In this paper, a microprocessor is provided with an integral, two-level cache memory architecture, where the first level cache misses and the second level cache is discarded and stored in the replacement cache.

...read moreread less

Abstract: A microprocessor is provided with an integral, two level cache memory architecture. The microprocessor includes a microprocessor core and a set associative first level cache both located on a common semiconductor die. A replacement cache, which is at least as large as approximately one half the size of the first level cache, is situated on the same semiconductor die and is coupled to the first level cache. In the event of a first level cache miss, a first level entry is discarded and stored in the replacement cache. When such a first level cache miss occurs, the replacement cache is checked to see if the desired entry is stored therein. If a replacement cache hit occurs, then the hit entry is forwarded to the first level cache and stored therein. If a cache miss occurs in both the first level cache and the replacement cache, then a main memory access is commenced to retrieve the desired entry. In that event, the desired entry retrieved from main memory is forwarded to the first level cache and stored therein. When a replacement cache entry is removed from the replacement cache by the replacement algorithm associated therewith, that entry is written back to main memory if that entry was modified. Otherwise the entry is discarded.

...read moreread less

Proceedings Article•DOI•

Trace driven analysis of write caching policies for disks

[...]

Prabuddha Biswas, Kadangode K. Ramakrishnan, Don Towsley

01 Jun 1993

TL;DR: The issues around managing a non-volatile disk cache using a detailed trace driven simulation are looked at and it is observed that even a simple write-behind policy for the write cache is effective in reducing the total number of writes by over 50%.

...read moreread less

Abstract: The I/O subsystem in a computer system is becoming the bottleneck as a result of recent dramatic improvements in processor speeds. Disk caches have been effective in closing this gap but the benefit is restricted to the read operations as the write I/Os are usually committed to disk to maintain consistency and to allow for crash recovery. As a result, write I/O traffic is becoming dominant and solutions to alleviate this problem are becoming increasingly important. A simple solution which can easily work with existing tile systems is to use non-volatile disk caches together with a write-behind strategy. In this study, we look at the issues around managing such a cache using a detailed trace driven simulation. Traces from three different commercial sites are used in the analysis of various policies for managing the write cache.We observe that even a simple write-behind policy for the write cache is effective in reducing the total number of writes by over 50%. We further observe that the use of hysteresis in the policy to purge the write cache, with two thresholds, yields substantial improvement over a single threshold scheme. The inclusion of a mechanism to piggyback blocks from the write cache with read miss I/Os further reduces the number of writes to only about 15% of the original total number of write operations. We compare two piggybacking options and also study the impact of varying the write cache size. We briefly looked at the case of a single non-volatile disk cache to estimate the performance impact of statically partitioning the cache for reads and writes.

...read moreread less

Patent•

Method and apparatus for reducing false invalidations in distributed systems

[...]

Yifong Shih¹•Institutions (1)

IBM¹

01 Sep 1993

TL;DR: In this article, a distributed file system controls memory allocation between a global cache shared storage memory unit and a plurality of local cache memory units by calculating a variable global cache LRU stack update interval.

...read moreread less

Abstract: A distributed file system controls memory allocation between a global cache shared storage memory unit and a plurality of local cache memory units by calculating a variable global cache LRU stack update interval. A new update interval is calculated using fresh system statistics at the end of each update interval. A stack update command is issued at the end of the update interval only if the expected minimum data residency time in the global cache shared memory is less than or equal to the expected average residency time in the local cache memory.

...read moreread less

Patent•

Method and apparatus for invalidating a cache while in a low power state

[...]

James W. Conary¹, Robert R. Beutler¹•Institutions (1)

Intel¹

24 Mar 1993

TL;DR: In this paper, a method and apparatus for allowing a processor to invalidate an individual line of its internal cache while in a non-clocked low power state was presented, and the processor was powered up out of the reduced power consumption state.

...read moreread less

Abstract: A method and apparatus for allowing a processor to invalidate an individual line of its internal cache while in a non-clocked low power state. The present invention includes circuitry for placing the processor in a reduced power consumption state. The present invention also includes circuitry for powering up the processor out of the reduced power consumption state to invalidate data in the cache in order to maintain cache coherency while in the reduced power consumption state.

...read moreread less

Proceedings Article•DOI•

The shared regions approach to software cache coherence on multiprocessors

[...]

Harjinder S. Sandhu¹, B. Gamsa¹, Songnian Zhou¹•Institutions (1)

University of Toronto¹

01 Jul 1993

TL;DR: Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

...read moreread less

Abstract: The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for shared data with software cache management . The program-level abstraction, called Shared Regions, explicitly relates synchronization objects with the data they protect. Cache coherence algorithms are presented which use the information provided by shared region primitives, and ensure that shared regions are always cacheable by the processors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessor are shown. Comparisons with other software based coherence strategies, including a user-controlled strategy and an operating system-based strategy, show that this approach is able to deliver better performance, with relatively low corresponding overhead and only a small increase in the programming effort. Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

...read moreread less

Patent•

Memory system including local and global caches for storing floating point and integer data

[...]

John Brennan¹, Peter Yan-Tek Hsu¹, William A. Huffman¹, Paul Rodman¹, Joseph T. Scanlon¹, Kit Man Tang¹, Steve J. Ciavaglia¹ - Show less +3 more•Institutions (1)

MIPS Technologies¹

15 Dec 1993

TL;DR: In this paper, a split level cache memory system for a data processor includes a single chip integer unit, an ARM processor such as a floating point unit and an external main memory.

...read moreread less

Abstract: A split level cache memory system for a data processor includes a single chip integer unit, an army processor such as a floating point unit, an external main memory and a split level cache. The split level cache includes an on-chip, fast local cache with low latency for use by the integer unit for loads and stores of integer and address data and an off-chip, pipelined global cache for storing arrays of data such as floating point data for use by the array processor and integer and address data for refilling the local cache. Coherence between the local cache and global cache is maintained by writing through to the global cache during integer stores. Local cache words are invalidated when data is written to the global cache during an army processor store.

...read moreread less

Patent•

Flexible multiport multiformat burst buffer

[...]

Steven S. Chan, Miles S. Simpson, Scott A. Kimura

27 Sep 1993

TL;DR: In this article, a multi-port central cache memory is used to queue all the incoming data to be stored in a DRAM and all the outgoing data being retrieved from the DRAM.

...read moreread less

Abstract: A multimedia video processor chip for a personal computer in employs a multi-port central cache memory to queue all the incoming data to be stored in a DRAM and all the outgoing data being retrieved from the DRAM. Such a cache memory is used in one of several modes in which the cache memory is partitioned by cache boundaries into different groups of storage areas. Each storage area of the cache is dedicated to storing data from a specific data source. The cache boundaries are chosen such that, for a given mode, the storage areas are optimized for worst case conditions for all data streams.

...read moreread less

Patent•

Multiprocessor system with shared cache and data input/output circuitry for transferring data amount greater than system bus capacity

[...]

Michael Thomas Dibrino¹, Dwain A. Hicks¹, George McNeil Lattimore¹, Kimming K. So¹, Hanaa Youssef¹ - Show less +1 more•Institutions (1)

IBM¹

02 Aug 1993

TL;DR: In this article, a high performance shared cache is provided to support multiprocessor systems and allow maximum parallelism in accessing the cache by the processors, servicing one processor request in each machine cycle, reducing system response time and increasing system throughput.

...read moreread less

Abstract: A high performance shared cache is provided to support multiprocessor systems and allow maximum parallelism in accessing the cache by the processors, servicing one processor request in each machine cycle, reducing system response time and increasing system throughput. The shared cache of the present invention uses the additional performance optimization techniques of pipelining cache operations (loads and stores) and burst-mode data accesses. By including built-in pipeline stages, the cache is enabled to service one request every machine cycle from any processing element. This contributes to reduction in the system response time as well as the throughput. With regard to the burst-mode data accesses, the widest possible data out of the cache can be stored to, and retrieved from, the cache by one cache access operation. One portion of the data is held in logic in the cache (on the chip), while another portion (corresponding to the system bus width) gets transferred to the requesting element (processor or memory) in one cycle. The held portion of the data can then be transferred in the following machine cycle.

...read moreread less

Patent•

Method and apparatus for cache memory

[...]

S. Craig Nelson¹•Institutions (1)

LSI Corporation¹

17 Dec 1993

TL;DR: In this article, the authors propose an improved method and apparatus for selecting and replacing a block of a set of cache memory by assigning indices to the memory blocks of a given set of caches.

...read moreread less

Abstract: The present invention is an improved method and apparatus for selecting and replacing a block of a set of cache memory. The present invention provides for the weighted random replacement of blocks of cache memory by assigning indices to the memory blocks of a given set of cache memory. One of the assigned indices is then randomly selected by the present invention. The memory block of the given set to which the randomly selected index is assigned is replaced. The indices are assigned such that one or more blocks of the given set of cache memory have a high probability of replacement, whereas the other blocks of the given set of cache memory have significantly lower probabilities of replacement.

...read moreread less

Collapse