scispace - formally typeset
Search or ask a question

Showing papers on "Cache invalidation published in 1993"


Patent
08 Nov 1993
TL;DR: In this paper, the authors propose a shared high-speed cache management logic to meet the serialization and data coherency requirements of data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.
Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the shared high-speed cache is provided to meet the serialization and data coherency requirements of the data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.

478 citations


Book
01 Jan 1993
TL;DR: What is Cache Memory?
Abstract: What is Cache Memory? How are Caches Designed? Cache Memories and RISC Processors. Maintaining Coherency in Cached Systems. Cute Cache Tricks. Subject Index.

447 citations


Proceedings ArticleDOI
01 May 1993
TL;DR: An adaptive protocol is proposed that effectively eliminates most single invalidations and improves the performance by reducing the shared access penalty and the network traffic.
Abstract: Parallel programs that use critical sections and are executed on a shared-memory multiprocessor with a write-invalidate protocol result in invalidation actions that could be eliminated. For this type of sharing, called migratory sharing, each processor typically causes a cache miss followed by an invalidation request which could be merged with the preceding cache-miss request.In this paper we propose an adaptive protocol that invokes this optimization dynamically for migratory blocks. For other blocks, the protocol works as an ordinary write-invalidate protocol. We show that the protocol is a simple extension to a write-invalidate protocol.Based on a program-driven simulation model of an architecture similar to the Stanford DASH, and a set of four benchmarks, we evaluate the potential performance improvements of the protocol. We find that it effectively eliminates most single invalidations which improves the performance by reducing the shared access penalty and the network traffic.

239 citations


Proceedings ArticleDOI
01 May 1993
TL;DR: Tradeoffs on writes that miss in the cache are investigated and a mixture of these two alternatives, called write caching, which places a small fully-associative cache behind a write-through cache.
Abstract: This paper investigates issues involving writes and caches. First, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line is written before hit or miss is known are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. The combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Second, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache.

234 citations


Patent
23 Mar 1993
TL;DR: Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests as mentioned in this paper, where an origin cache server receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary.
Abstract: A computer network in which resources are dynamically located through the use of LOCATE requests includes multiple cache server nodes, network nodes which have an additional obligation to build and maintain large caches of directory entries. Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests. Where an origin cache server node receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary. If the necessary information isn't found locally or in alternate cache server nodes, the LOCATE request is then broadcast to all network nodes in the network. If the broadcast results are negative, the request is forwarded to selected gateway nodes to permit the search to continue in adjacent networks.

199 citations


Patent
03 Jun 1993
TL;DR: In this article, a cache management system and method coupled to at least one host and one data storage device is presented, where a cache indexer maintains a current index (25) of data elements which are stored in cache memory.
Abstract: A cache management system and method monitors and controls the contents of cache memory (12) coupled to at least one host (22a) and at least one data storage device (18a). A cache indexer (16) maintains a current index (25) of data elements which are stored in cache memory (12). A sequential data access indicator (30), responsive to the cache index (16) and to a user selectable sequential data access threshold, determines that a sequential data access is in progress for a given process and provides an indication of the same. The system and method allocate a micro-cache memory (12) to any process performing a sequential data access. In response to the indication of a sequential data access in progress and to a user selectable maximum number of data elements to be prefetched, a data retrieval requestor requests retrieval of up to the selected maximum number of data elements from a data storage device (18b). A user selectable number of sequential data elements determines when previously used micro-cache memory locations will be overwritten. A method of dynamically monitoring and adjusting cache management parameters is also disclosed.

179 citations


Patent
06 Dec 1993
TL;DR: In this article, a method and system for maintaining coherency between a server processor and a client processor that has a cache memory is presented, where the server processor periodically broadcasts invalidation reports to the client processor.
Abstract: A method and system are provided for maintaining coherency between a server processor and a client processor that has a cache memory. The server may, for example, be a fixed location mobile unit support station. The client may, for example, be a palmtop computer. The server stores a plurality of data values, and the client stores a subset of the plurality of data values in the cache. The server processor periodically broadcasts invalidation reports to the client processor. Each respective invalidation report includes information identifying which, if any, of the plurality of data values have been updated within a predetermined period of time before the server processor broadcasts the respective invalidation report. The client processor determines, based on the invalidation reports, whether a selected data value in the cache memory of the client processor has been updated in the server processor since the selected data value was stored in the cache memory. The client processor invalidates the selected data value in the cache memory of the client processor, if the selected data value has been updated in the server processor.

172 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: The OPT model is proposed that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model, and three new techniques for optimal cache simulation are presented.
Abstract: Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches.

168 citations


Patent
19 Apr 1993
TL;DR: In this paper, a cache memory replacement scheme with a locking feature is provided, where the locking bits associated with each line in the cache are supplied in the tag table and used by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in cache to replace.
Abstract: In a memory system having a main memory and a faster cache memory, a cache memory replacement scheme with a locking feature is provided. Locking bits associated with each line in the cache are supplied in the tag table. These locking bits are preferably set and reset by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in the cache to replace. The lock bits and replacement bits for a cache line are "ORed" to create a composite bit for the cache line. If the composite bit is set the cache line is not removed from the cache. When deadlock due to all composite bits being set will result, all replacement bits are cleared. One cache line is always maintained as non-lockable. The locking bits "lock" the line of data in the cache until such time when the process resets the lock bit. By providing that the process controls the state of the lock bits, the intelligence and knowledge the process contains regarding the frequency of use of certain memory locations can be utilized to provide a more efficient cache.

153 citations


Journal ArticleDOI
Avraham Leff1, Joel L. Wolf1, Philip S. Yu1
TL;DR: Performance of the distributed algorithms is found to be close to optimal, while that of the greedy algorithms is far from optimal.
Abstract: Studies the cache performance in a remote caching architecture. The authors develop a set of distributed object replication policies that are designed to implement different optimization goals. Each site is responsible for local cache decisions, and modifies cache contents in response to decisions made by other sites. The authors use the optimal and greedy policies as upper and lower bounds, respectively, for performance in this environment. Critical system parameters are identified, and their effect on system performance studied. Performance of the distributed algorithms is found to be close to optimal, while that of the greedy algorithms is far from optimal. >

135 citations


Journal ArticleDOI
TL;DR: This paper surveys current cache coherence mechanisms, and identifies several issues critical to their design, and hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different coherence mechanism into a single system.
Abstract: Private data caches have not been as effective in reducing the average memory delay in multiprocessors as in uniprocessors due to data spreading among the processors, and due to the cache coherence problem. A wide variety of mechanisms have been proposed for maintaining cache coherence in large-scale shared memory multiprocessors making it difficult to compare their performance and implementation implications. To help the computer architect understand some of the trade-offs involved, this paper surveys current cache coherence mechanisms, and identifies several issues critical to their design. These design issues include: 1) the coherence detection strategy, through which possibly incoherent memory accesses are detected either statically at compile-time, or dynamically at run-time; 2) the coherence enforcement strategy, such as updating or invalidating, that is used to ensure that stale cache entries are never referenced by a processor; 3) how the precision of block sharing information can be changed to trade-off the implementation cost and the performance of the coherence mechanism; and 4) how the cache block size affects the performance of the memory system. Trace-driven simulations are used to compare the performance and implementation impacts of these different issues. In addition, hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different coherence mechanisms into a single system.

Patent
27 May 1993
TL;DR: In this article, the caches align on a "way" basis by their respective cache controllers communicating with each other which blocks of data they are replacing and which of their cache ways are being filled with data.
Abstract: A method for achieving multilevel inclusion in a computer system with first and second level caches. The caches align on a "way" basis by their respective cache controllers communicating with each other which blocks of data they are replacing and which of their cache ways are being filled with data. On first and second level cache read misses the first level cache controller provides way information to the second level cache controller to allow received data to be placed in the same way. On first level cache read misses and second level cache read hits, the second level cache controller provides way information to the first level cache controller, which places data in the indicated way. On processor writes the first level cache controller caches the writes and provides the way information to the second level cache controller which uses the way information to select the proper way for data storage. An inclusion bit is set on data in the second level cache that is duplicated in the first level cache. On a second level cache snoop hit, the second level cache controller checks the respective inclusion bit to determine if a copy of this data also resides in the first level cache. The first level cache controller is directed to snoop the bus only if the respective inclusion bit is set.

Patent
25 Jan 1993
TL;DR: In this paper, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.
Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). Misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

Patent
10 May 1993
TL;DR: A two-level cache memory system for use in a computer system including two primary cache memories, one for storing instructions and the other for storing data, is described in this article.
Abstract: A two-level cache memory system for use in a computer system including two primary cache memories, one for storing instructions and one for storing data. The system also includes a secondary cache memory for storing both instructions and data. The primary and secondary caches each employ their own separate tag directory. The primary caches use a virtual addressing scheme employing both virtual tags and virtual addresses. The secondary cache employs a hybrid addressing scheme which uses virtual tags and partial physical addresses. The primary and secondary caches operate in parallel unless the larger and slower secondary cache is busy performing a previous operation. Only if a "miss" is encountered in both the primary and secondary caches does the system processor access the main memory.

Patent
John G. Aschoff1, Jeffrey A. Berger1, David Alan Burton1, Bruce McNutt1, Stanley C. Kurtz1 
19 May 1993
TL;DR: In this article, the authors allocate read cache space among bands of DASD cylinders rather than to data sets or processes as a function of a weighted average hit ratio to the counterpart cache space.
Abstract: Dynamic allocation of read cache space is allocated among bands of DASD cylinders rather than to data sets or processes as a function of a weighted average hit ratio to the counterpart cache space. Upon the hit ratio falling below a predetermined threshold, the bands are disabled for a defined interval as measured by cache accesses and then rebound to cache space again.

Patent
14 Oct 1993
TL;DR: In this paper, the authors describe a quick-choice cache into which are collected the names and aliases of networked devices or services that are expected to most routinely used by a particular user.
Abstract: A personal computer or workstation on a network includes a quick-choice cache into which are collected the names and aliases of networked devices or services that are expected to be most routinely used by a particular user. The cache is initialized to contain the names and aliases of devices within a network zone assigned to the workstation. This collection of names/aliases is expanded each time the user makes a connection to a device not previously listed. The cache drives a graphic user interface (GUI) that shows the user what service categories are available within the cache, and then when a service category is selected, what specific devices are included within the cache under that service category. The GUI permits quick logical connection to devices whose aliases are stored in the user's cache. A connection map later graphically shows the user what connections he or she has made.

Patent
09 Feb 1993
TL;DR: In this article, a cache locking scheme in a two-set associative instruction cache that utilizes a specially designed Least Recently Used (LRU) unit to effectively lock a first portion of the instruction cache to allow high speed and predictable execution time for time critical program code sections residing in the first portion while leaving another portion of instruction cache free to operate as an instruction cache for other, non-critical, code sections.
Abstract: An instruction locking apparatus and method for a cache memory allowing execution time predictability and high speed performance. The present invention implements a cache locking scheme in a two set associative instruction cache that utilizes a specially designed Least Recently Used (LRU) unit to effectively lock a first portion of the instruction cache to allow high speed and predictable execution time for time critical program code sections residing in the first portion while leaving another portion of the instruction cache free to operate as an instruction cache for other, non-critical, code sections. The present invention provides the above features in a system that is virtually transparent to the program code and does not require a variety of complex or specialized instructions or address coding methods. The present invention is flexible in that the two set associative instruction cache is transformed into what may be thought of as a static RAM in cache, and in addition, a direct map cache unit. Several different time critical code sections may be loaded and locked into the cache at different times.

Patent
20 Oct 1993
TL;DR: In this article, a cache memory system is proposed to dynamically assign segments of cache memory to correspond to segments of the mass storage device, accept data written by the host into portions of the assigned segments, and determine if the elapsed time since any modified data has been written to the cache memory exceeds a predetermined period of time.
Abstract: A method for operating a cache memory system which has a high speed cache memory and a mass storage device that operate in a highly efficient manner with a host device. The system operates to dynamically assign segments of the cache memory to correspond to segments of the mass storage device, accept data written by the host into portions of the assigned segments of the cache memory, and determine if the elapsed time since any modified data has been written to the cache memory exceeds a predetermined period of time, or if the number of modified segments to be written to the mass storage device exceeds a preset limit. If so, the cache memory system enables a transfer mechanism to cause modified data to be written from the cache memory to the mass storage device, based on the location of segments relative to a currently selected track of the mass storage device. Movement of updated data from the cache memory (solid state storage) to the mass storage device (which may be, for example, a magnetic disk) and of prefetched data from the mass storage to the cache memory is done on a timely, but unobtrusive, basis as a background task. A direct, private channel between the cache memory and the mass storage device prevents communications between these two media from conflicting with transmission of data between the host and the cache memory system. A set of microprocessors manages and oversees the data transmission and storage. Data integrity is maintained in the event of a power interruption via a battery assisted, automatic and intelligent shutdown procedure.

Patent
22 Dec 1993
TL;DR: In this paper, a decoded instruction cache with multiple instructions per cache line is proposed, where the decode logic fills the cache line with instructions up to its limit during run time cache misses, enabling the processor to dispatch multiple instructions during one clock cycle.
Abstract: A general purpose computer system is equipped with apparatus for enabling a processor to provide efficient execution of multiple instructions per clock cycle. The major feature is a decoded instruction cache with multiple instructions per cache line. During run time cache hits, the decode logic fills the cache line with instructions up to its limit. During run time cache misses, the cache line enables the processor to dispatch multiple instructions during one clock cycle. Hereby is achieved high performance with a simple, but still powerful, decode and dispatch logic. An important feature of the instruction cache is that it holds the target addresses for the next instructions. No separate address logic is needed to proceed in the program execution during cache hits. A conditional branch holds its alternative target address in a separate field. This enables the processor, to a large degree, to be independent of the conditional branch bottleneck.

Patent
01 Oct 1993
TL;DR: In this paper, a method of data communication between asynchronous processes of a computer system is disclosed in connection with a cache coherency system for a processor-cache used in a multi-master computer system in which bus arbitration signals either are not available to the processor cache, or are not exclusively relied on by the processorcache to assure validity of the data in the cache.
Abstract: A method of data communication between asynchronous processes of a computer system is disclosed in connection with a cache coherency system for a processor-cache used in a multi-master computer system in which bus arbitration signals either are not available to the processor-cache, or are not exclusively relied on by the processor-cache to assure validity of the data in the cache (e.g., a 386-bus compatible computer system using an external secondary cache in which bus arbitration signals are only connected to and used by the secondary cache controller). In an exemplary external-chip implementation, the cache coherency system (120) comprises two PLAs--a FLUSH module (122) and a WAVESHAPING module (124). The FLUSH module (a) receives selected bus cycle definition and control signals from a microprocessor ((110), (b) detects FLUSH (cache invalidation) conditions, i.e., bus master synchronization events, and for each such FLUSH condition, (c) provides a FLUSH output signal. The WAVESHAPING module provides a corresponding CPU/FLUSH signal to the microprocessor with the appropriate set up and hold time. The exemplary bus master synchronization events, or FLUSH conditions, that cause cache invalidation are: (a) hardware generated interrupts, and (b) read or read/write accesses to I/O address space, except for those directed to a hard disk or an external coprocessor. If the bus architecture uses memory-mapped I/O, accesses to selected regions of memory-mapped I/O space could also be used. The cache coherency functionality could be implemented on-board the microprocessor.

Patent
28 May 1993
TL;DR: In this article, a cache system which includes prefetch pointer fields for identifying lines of memory to prefetch thereby minimizing the occurrence of cache misses is proposed, which takes advantage of the previous execution history of the processor and the locality of reference exhibited by the requested addresses.
Abstract: A cache system which includes prefetch pointer fields for identifying lines of memory to prefetch thereby minimizing the occurrence of cache misses. This cache structure and method for implementing the same takes advantage of the previous execution history of the processor and the locality of reference exhibited by the requested addresses. In particular, each cache line contains a prefetch pointer field which contains a pointer to a line in memory to be prefetched and placed in the cache. By prefetching specified lines of data with temporal locality to the lines of data containing the prefetch pointers the number of cache misses is minimized.

Patent
09 Dec 1993
TL;DR: In this paper, a microprocessor is provided with an integral, two-level cache memory architecture, where the first level cache misses and the second level cache is discarded and stored in the replacement cache.
Abstract: A microprocessor is provided with an integral, two level cache memory architecture. The microprocessor includes a microprocessor core and a set associative first level cache both located on a common semiconductor die. A replacement cache, which is at least as large as approximately one half the size of the first level cache, is situated on the same semiconductor die and is coupled to the first level cache. In the event of a first level cache miss, a first level entry is discarded and stored in the replacement cache. When such a first level cache miss occurs, the replacement cache is checked to see if the desired entry is stored therein. If a replacement cache hit occurs, then the hit entry is forwarded to the first level cache and stored therein. If a cache miss occurs in both the first level cache and the replacement cache, then a main memory access is commenced to retrieve the desired entry. In that event, the desired entry retrieved from main memory is forwarded to the first level cache and stored therein. When a replacement cache entry is removed from the replacement cache by the replacement algorithm associated therewith, that entry is written back to main memory if that entry was modified. Otherwise the entry is discarded.

Proceedings ArticleDOI
01 Jun 1993
TL;DR: The issues around managing a non-volatile disk cache using a detailed trace driven simulation are looked at and it is observed that even a simple write-behind policy for the write cache is effective in reducing the total number of writes by over 50%.
Abstract: The I/O subsystem in a computer system is becoming the bottleneck as a result of recent dramatic improvements in processor speeds. Disk caches have been effective in closing this gap but the benefit is restricted to the read operations as the write I/Os are usually committed to disk to maintain consistency and to allow for crash recovery. As a result, write I/O traffic is becoming dominant and solutions to alleviate this problem are becoming increasingly important. A simple solution which can easily work with existing tile systems is to use non-volatile disk caches together with a write-behind strategy. In this study, we look at the issues around managing such a cache using a detailed trace driven simulation. Traces from three different commercial sites are used in the analysis of various policies for managing the write cache.We observe that even a simple write-behind policy for the write cache is effective in reducing the total number of writes by over 50%. We further observe that the use of hysteresis in the policy to purge the write cache, with two thresholds, yields substantial improvement over a single threshold scheme. The inclusion of a mechanism to piggyback blocks from the write cache with read miss I/Os further reduces the number of writes to only about 15% of the original total number of write operations. We compare two piggybacking options and also study the impact of varying the write cache size. We briefly looked at the case of a single non-volatile disk cache to estimate the performance impact of statically partitioning the cache for reads and writes.

Patent
Yifong Shih1
01 Sep 1993
TL;DR: In this article, a distributed file system controls memory allocation between a global cache shared storage memory unit and a plurality of local cache memory units by calculating a variable global cache LRU stack update interval.
Abstract: A distributed file system controls memory allocation between a global cache shared storage memory unit and a plurality of local cache memory units by calculating a variable global cache LRU stack update interval. A new update interval is calculated using fresh system statistics at the end of each update interval. A stack update command is issued at the end of the update interval only if the expected minimum data residency time in the global cache shared memory is less than or equal to the expected average residency time in the local cache memory.

Patent
24 Mar 1993
TL;DR: In this paper, a method and apparatus for allowing a processor to invalidate an individual line of its internal cache while in a non-clocked low power state was presented, and the processor was powered up out of the reduced power consumption state.
Abstract: A method and apparatus for allowing a processor to invalidate an individual line of its internal cache while in a non-clocked low power state. The present invention includes circuitry for placing the processor in a reduced power consumption state. The present invention also includes circuitry for powering up the processor out of the reduced power consumption state to invalidate data in the cache in order to maintain cache coherency while in the reduced power consumption state.

Proceedings ArticleDOI
01 Jul 1993
TL;DR: Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.
Abstract: The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for shared data with software cache management . The program-level abstraction, called Shared Regions, explicitly relates synchronization objects with the data they protect. Cache coherence algorithms are presented which use the information provided by shared region primitives, and ensure that shared regions are always cacheable by the processors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessor are shown. Comparisons with other software based coherence strategies, including a user-controlled strategy and an operating system-based strategy, show that this approach is able to deliver better performance, with relatively low corresponding overhead and only a small increase in the programming effort. Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

Patent
15 Dec 1993
TL;DR: In this paper, a split level cache memory system for a data processor includes a single chip integer unit, an ARM processor such as a floating point unit and an external main memory.
Abstract: A split level cache memory system for a data processor includes a single chip integer unit, an army processor such as a floating point unit, an external main memory and a split level cache. The split level cache includes an on-chip, fast local cache with low latency for use by the integer unit for loads and stores of integer and address data and an off-chip, pipelined global cache for storing arrays of data such as floating point data for use by the array processor and integer and address data for refilling the local cache. Coherence between the local cache and global cache is maintained by writing through to the global cache during integer stores. Local cache words are invalidated when data is written to the global cache during an army processor store.

Patent
27 Sep 1993
TL;DR: In this article, a multi-port central cache memory is used to queue all the incoming data to be stored in a DRAM and all the outgoing data being retrieved from the DRAM.
Abstract: A multimedia video processor chip for a personal computer in employs a multi-port central cache memory to queue all the incoming data to be stored in a DRAM and all the outgoing data being retrieved from the DRAM. Such a cache memory is used in one of several modes in which the cache memory is partitioned by cache boundaries into different groups of storage areas. Each storage area of the cache is dedicated to storing data from a specific data source. The cache boundaries are chosen such that, for a given mode, the storage areas are optimized for worst case conditions for all data streams.

Patent
02 Aug 1993
TL;DR: In this article, a high performance shared cache is provided to support multiprocessor systems and allow maximum parallelism in accessing the cache by the processors, servicing one processor request in each machine cycle, reducing system response time and increasing system throughput.
Abstract: A high performance shared cache is provided to support multiprocessor systems and allow maximum parallelism in accessing the cache by the processors, servicing one processor request in each machine cycle, reducing system response time and increasing system throughput. The shared cache of the present invention uses the additional performance optimization techniques of pipelining cache operations (loads and stores) and burst-mode data accesses. By including built-in pipeline stages, the cache is enabled to service one request every machine cycle from any processing element. This contributes to reduction in the system response time as well as the throughput. With regard to the burst-mode data accesses, the widest possible data out of the cache can be stored to, and retrieved from, the cache by one cache access operation. One portion of the data is held in logic in the cache (on the chip), while another portion (corresponding to the system bus width) gets transferred to the requesting element (processor or memory) in one cycle. The held portion of the data can then be transferred in the following machine cycle.

Patent
S. Craig Nelson1
17 Dec 1993
TL;DR: In this article, the authors propose an improved method and apparatus for selecting and replacing a block of a set of cache memory by assigning indices to the memory blocks of a given set of caches.
Abstract: The present invention is an improved method and apparatus for selecting and replacing a block of a set of cache memory. The present invention provides for the weighted random replacement of blocks of cache memory by assigning indices to the memory blocks of a given set of cache memory. One of the assigned indices is then randomly selected by the present invention. The memory block of the given set to which the randomly selected index is assigned is replaced. The indices are assigned such that one or more blocks of the given set of cache memory have a high probability of replacement, whereas the other blocks of the given set of cache memory have significantly lower probabilities of replacement.