scispace - formally typeset
Search or ask a question

Showing papers on "Cache coloring published in 1993"


Patent
08 Nov 1993
TL;DR: In this paper, the authors propose a shared high-speed cache management logic to meet the serialization and data coherency requirements of data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.
Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the shared high-speed cache is provided to meet the serialization and data coherency requirements of the data systems when sharing the high speed cache as a store-multiple cache in a multi-system environment.

478 citations


Book
01 Jan 1993
TL;DR: What is Cache Memory?
Abstract: What is Cache Memory? How are Caches Designed? Cache Memories and RISC Processors. Maintaining Coherency in Cached Systems. Cute Cache Tricks. Subject Index.

447 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: The Wisconsin Wind Tunnel (WWT) as mentioned in this paper runs a parallel shared-memory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time.
Abstract: We have developed a new technique for evaluating cache coherent, shared-memory computers The Wisconsin Wind Tunnel (WWT) runs a parallel shared-memory program on a parallel computer (CM-5) and uses execution-driven, distributed, discrete-event simulation to accurately calculate program execution time WWT is a virtual prototype that exploits similarities between the system under design (the target) and an existing evaluation platform (the host) The host directly executes all target program instructions and memory references that hit in the target cache WWT's shared memory uses the CM-5 memory's error-correcting code (ECC) as valid bits for a fine-grained extension of shared virtual memory Only memory references that miss in the target cache trap to WWT, which simulates a cache-coherence protocol WWT correctly interleaves target machine events and calculates target program execution time WWT runs on parallel computers with greater speed and memory capacity than uniprocessors WWT's simulation time decreases as target system size increases for fixed-size problems and holds roughly constant as the target system and problem scale

304 citations


Proceedings ArticleDOI
01 May 1993
TL;DR: Tradeoffs on writes that miss in the cache are investigated and a mixture of these two alternatives, called write caching, which places a small fully-associative cache behind a write-through cache.
Abstract: This paper investigates issues involving writes and caches. First, tradeoffs on writes that miss in the cache are investigated. In particular, whether the missed cache block is fetched on a write miss, whether the missed cache block is allocated in the cache, and whether the cache line is written before hit or miss is known are considered. Depending on the combination of these polices chosen, the entire cache miss rate can vary by a factor of two on some applications. The combination of no-fetch-on-write and write-allocate can provide better performance than cache line allocation instructions. Second, tradeoffs between write-through and write-back caching when writes hit in a cache are considered. A mixture of these two alternatives, called write caching is proposed. Write caching places a small fully-associative cache behind a write-through cache. A write cache can eliminate almost as much write traffic as a write-back cache.

234 citations


Patent
23 Mar 1993
TL;DR: Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests as mentioned in this paper, where an origin cache server receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary.
Abstract: A computer network in which resources are dynamically located through the use of LOCATE requests includes multiple cache server nodes, network nodes which have an additional obligation to build and maintain large caches of directory entries. Cache server nodes play a key role in the LOCATE process and can prevent redundant network-wide broadcasts of LOCATE requests. Where an origin cache server node receives a request from a served node, the cache server node searches its local directories first, then forwards the request to alternate cache server nodes if necessary. If the necessary information isn't found locally or in alternate cache server nodes, the LOCATE request is then broadcast to all network nodes in the network. If the broadcast results are negative, the request is forwarded to selected gateway nodes to permit the search to continue in adjacent networks.

199 citations


Patent
24 May 1993
TL;DR: In this paper, the authors propose a cache coherency protocol for multi-processor systems which provides for read/write, read-only and transitional data states and for an indication of these states to be stored in a memory directory in main memory.
Abstract: A cache coherency protocol for a multi-processor system which provides for read/write, read-only and transitional data states and for an indication of these states to be stored in a memory directory in main memory. The transitional data state occurs when a processor requests from main memory a data block in another processor's cache and the request is pending completion. All subsequent read requests for the data block during the pendency of the first request are inhibited until completion of the first request. Also provided in the memory directory for each data block is a field for identifying the processor which owns the data block in question. Data block ownership information is used to determine where requested owned data is located.

180 citations


Patent
03 Jun 1993
TL;DR: In this article, a cache management system and method coupled to at least one host and one data storage device is presented, where a cache indexer maintains a current index (25) of data elements which are stored in cache memory.
Abstract: A cache management system and method monitors and controls the contents of cache memory (12) coupled to at least one host (22a) and at least one data storage device (18a). A cache indexer (16) maintains a current index (25) of data elements which are stored in cache memory (12). A sequential data access indicator (30), responsive to the cache index (16) and to a user selectable sequential data access threshold, determines that a sequential data access is in progress for a given process and provides an indication of the same. The system and method allocate a micro-cache memory (12) to any process performing a sequential data access. In response to the indication of a sequential data access in progress and to a user selectable maximum number of data elements to be prefetched, a data retrieval requestor requests retrieval of up to the selected maximum number of data elements from a data storage device (18b). A user selectable number of sequential data elements determines when previously used micro-cache memory locations will be overwritten. A method of dynamically monitoring and adjusting cache management parameters is also disclosed.

179 citations


Proceedings ArticleDOI
01 Jun 1993
TL;DR: The OPT model is proposed that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model, and three new techniques for optimal cache simulation are presented.
Abstract: Cache miss characterization models such as the three Cs model are useful in developing schemes to reduce cache misses and their penalty. In this paper we propose the OPT model that uses cache simulation under optimal (OPT) replacement to obtain a finer and more accurate characterization of misses than the three Cs model. However, current methods for optimal cache simulation are slow and difficult to use. We present three new techniques for optimal cache simulation. First, we propose a limited lookahead strategy with error fixing, which allows one pass simulation of multiple optimal caches. Second, we propose a scheme to group entries in the OPT stack, which allows efficient tree based fully-associative cache simulation under OPT. Third, we propose a scheme for exploiting partial inclusion in set-associative cache simulation under OPT. Simulators based on these algorithms were used to obtain cache miss characterizations using the OPT model for nine SPEC benchmarks. The results indicate that miss ratios under OPT are substantially lower than those under LRU replacement, by up to 70% in fully-associative caches, and up to 32% in two-way set-associative caches.

168 citations


Patent
19 Apr 1993
TL;DR: In this paper, a cache memory replacement scheme with a locking feature is provided, where the locking bits associated with each line in the cache are supplied in the tag table and used by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in cache to replace.
Abstract: In a memory system having a main memory and a faster cache memory, a cache memory replacement scheme with a locking feature is provided. Locking bits associated with each line in the cache are supplied in the tag table. These locking bits are preferably set and reset by the application program/process executing and are utilized in conjunction with cache replacement bits by the cache controller to determine the lines in the cache to replace. The lock bits and replacement bits for a cache line are "ORed" to create a composite bit for the cache line. If the composite bit is set the cache line is not removed from the cache. When deadlock due to all composite bits being set will result, all replacement bits are cleared. One cache line is always maintained as non-lockable. The locking bits "lock" the line of data in the cache until such time when the process resets the lock bit. By providing that the process controls the state of the lock bits, the intelligence and knowledge the process contains regarding the frequency of use of certain memory locations can be utilized to provide a more efficient cache.

153 citations


Journal ArticleDOI
TL;DR: This paper surveys current cache coherence mechanisms, and identifies several issues critical to their design, and hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different coherence mechanism into a single system.
Abstract: Private data caches have not been as effective in reducing the average memory delay in multiprocessors as in uniprocessors due to data spreading among the processors, and due to the cache coherence problem. A wide variety of mechanisms have been proposed for maintaining cache coherence in large-scale shared memory multiprocessors making it difficult to compare their performance and implementation implications. To help the computer architect understand some of the trade-offs involved, this paper surveys current cache coherence mechanisms, and identifies several issues critical to their design. These design issues include: 1) the coherence detection strategy, through which possibly incoherent memory accesses are detected either statically at compile-time, or dynamically at run-time; 2) the coherence enforcement strategy, such as updating or invalidating, that is used to ensure that stale cache entries are never referenced by a processor; 3) how the precision of block sharing information can be changed to trade-off the implementation cost and the performance of the coherence mechanism; and 4) how the cache block size affects the performance of the memory system. Trace-driven simulations are used to compare the performance and implementation impacts of these different issues. In addition, hybrid strategies are presented that can enhance the performance of the multiprocessor memory system by combining several different coherence mechanisms into a single system.

123 citations


Patent
23 Dec 1993
TL;DR: In this paper, the authors propose a method for organizing the disk array into segments and dividing the cache memory into groups in order of least recently used memory locations and then determining metrics that permit the disk arrays controller to identify the locations having the most dirty blocks by segment and group.
Abstract: A controller for a disk array with parity and sparing includes a non-volatile cache memory and optimizes the destaging process for blocks from the cache memory to both maximize the cache hit ratio and minimize disk utilization. The invention provides a method for organizing the disk array into segments and dividing the cache memory into groups in order of least recently used memory locations and then determining metrics that permit the disk array controller to identify the cache memory locations having the most dirty blocks by segment and group and to identify the utilization rates of the disks. These characteristics are considered to determine when, what, and how to destage. For example, in terms of maximizing the cache hit ratio, when the percentage of dirty blocks in a particular group of the cache memory locations reaches a predetermined level, destaging is begun. The destaging operation continues until the percentage of dirty blocks decreases to a predetermined level. In terms of minimizing disk utilization, all of the dirty blocks in a segment having the most dirty blocks in a group are destaged.

Patent
27 May 1993
TL;DR: In this article, the caches align on a "way" basis by their respective cache controllers communicating with each other which blocks of data they are replacing and which of their cache ways are being filled with data.
Abstract: A method for achieving multilevel inclusion in a computer system with first and second level caches. The caches align on a "way" basis by their respective cache controllers communicating with each other which blocks of data they are replacing and which of their cache ways are being filled with data. On first and second level cache read misses the first level cache controller provides way information to the second level cache controller to allow received data to be placed in the same way. On first level cache read misses and second level cache read hits, the second level cache controller provides way information to the first level cache controller, which places data in the indicated way. On processor writes the first level cache controller caches the writes and provides the way information to the second level cache controller which uses the way information to select the proper way for data storage. An inclusion bit is set on data in the second level cache that is duplicated in the first level cache. On a second level cache snoop hit, the second level cache controller checks the respective inclusion bit to determine if a copy of this data also resides in the first level cache. The first level cache controller is directed to snoop the bus only if the respective inclusion bit is set.

Patent
25 Jan 1993
TL;DR: In this paper, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.
Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). Misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

Proceedings ArticleDOI
01 Jun 1993
TL;DR: It is shown how the design of a memory allocator can significantly affect the reference locality for various applications, and measurements suggest an allocator design that is both very fast and has good locality of reference.
Abstract: The allocation and disposal of memory is a ubiquitous operation in most programs. Rarely do programmers concern themselves with details of memory allocators; most assume that memory allocators provided by the system perform well. This paper presents a performance evaluation of the reference locality of dynamic storage allocation algorithms based on trace-driven simualtion of five large allocation-intensive C programs. In this paper, we show how the design of a memory allocator can significantly affect the reference locality for various applications. Our measurements show that poor locality in sequential-fit allocation algorithms reduces program performance, both by increasing paging and cache miss rates. While increased paging can be debilitating on any architecture, cache misses rates are also important for modern computer architectures. We show that algorithms attempting to be space-efficient by coalescing adjacent free objects show poor reference locality, possibly negating the benefits of space efficiency. At the other extreme, algorithms can expend considerable effort to increase reference locality yet gain little in total execution performance. Our measurements suggest an allocator design that is both very fast and has good locality of reference.

Patent
10 May 1993
TL;DR: A two-level cache memory system for use in a computer system including two primary cache memories, one for storing instructions and the other for storing data, is described in this article.
Abstract: A two-level cache memory system for use in a computer system including two primary cache memories, one for storing instructions and one for storing data. The system also includes a secondary cache memory for storing both instructions and data. The primary and secondary caches each employ their own separate tag directory. The primary caches use a virtual addressing scheme employing both virtual tags and virtual addresses. The secondary cache employs a hybrid addressing scheme which uses virtual tags and partial physical addresses. The primary and secondary caches operate in parallel unless the larger and slower secondary cache is busy performing a previous operation. Only if a "miss" is encountered in both the primary and secondary caches does the system processor access the main memory.

Patent
John G. Aschoff1, Jeffrey A. Berger1, David Alan Burton1, Bruce McNutt1, Stanley C. Kurtz1 
19 May 1993
TL;DR: In this article, the authors allocate read cache space among bands of DASD cylinders rather than to data sets or processes as a function of a weighted average hit ratio to the counterpart cache space.
Abstract: Dynamic allocation of read cache space is allocated among bands of DASD cylinders rather than to data sets or processes as a function of a weighted average hit ratio to the counterpart cache space. Upon the hit ratio falling below a predetermined threshold, the bands are disabled for a defined interval as measured by cache accesses and then rebound to cache space again.

Patent
14 Oct 1993
TL;DR: In this paper, the authors describe a quick-choice cache into which are collected the names and aliases of networked devices or services that are expected to most routinely used by a particular user.
Abstract: A personal computer or workstation on a network includes a quick-choice cache into which are collected the names and aliases of networked devices or services that are expected to be most routinely used by a particular user. The cache is initialized to contain the names and aliases of devices within a network zone assigned to the workstation. This collection of names/aliases is expanded each time the user makes a connection to a device not previously listed. The cache drives a graphic user interface (GUI) that shows the user what service categories are available within the cache, and then when a service category is selected, what specific devices are included within the cache under that service category. The GUI permits quick logical connection to devices whose aliases are stored in the user's cache. A connection map later graphically shows the user what connections he or she has made.

Patent
09 Feb 1993
TL;DR: In this article, a cache locking scheme in a two-set associative instruction cache that utilizes a specially designed Least Recently Used (LRU) unit to effectively lock a first portion of the instruction cache to allow high speed and predictable execution time for time critical program code sections residing in the first portion while leaving another portion of instruction cache free to operate as an instruction cache for other, non-critical, code sections.
Abstract: An instruction locking apparatus and method for a cache memory allowing execution time predictability and high speed performance. The present invention implements a cache locking scheme in a two set associative instruction cache that utilizes a specially designed Least Recently Used (LRU) unit to effectively lock a first portion of the instruction cache to allow high speed and predictable execution time for time critical program code sections residing in the first portion while leaving another portion of the instruction cache free to operate as an instruction cache for other, non-critical, code sections. The present invention provides the above features in a system that is virtually transparent to the program code and does not require a variety of complex or specialized instructions or address coding methods. The present invention is flexible in that the two set associative instruction cache is transformed into what may be thought of as a static RAM in cache, and in addition, a direct map cache unit. Several different time critical code sections may be loaded and locked into the cache at different times.

Patent
20 Oct 1993
TL;DR: In this article, a cache memory system is proposed to dynamically assign segments of cache memory to correspond to segments of the mass storage device, accept data written by the host into portions of the assigned segments, and determine if the elapsed time since any modified data has been written to the cache memory exceeds a predetermined period of time.
Abstract: A method for operating a cache memory system which has a high speed cache memory and a mass storage device that operate in a highly efficient manner with a host device. The system operates to dynamically assign segments of the cache memory to correspond to segments of the mass storage device, accept data written by the host into portions of the assigned segments of the cache memory, and determine if the elapsed time since any modified data has been written to the cache memory exceeds a predetermined period of time, or if the number of modified segments to be written to the mass storage device exceeds a preset limit. If so, the cache memory system enables a transfer mechanism to cause modified data to be written from the cache memory to the mass storage device, based on the location of segments relative to a currently selected track of the mass storage device. Movement of updated data from the cache memory (solid state storage) to the mass storage device (which may be, for example, a magnetic disk) and of prefetched data from the mass storage to the cache memory is done on a timely, but unobtrusive, basis as a background task. A direct, private channel between the cache memory and the mass storage device prevents communications between these two media from conflicting with transmission of data between the host and the cache memory system. A set of microprocessors manages and oversees the data transmission and storage. Data integrity is maintained in the event of a power interruption via a battery assisted, automatic and intelligent shutdown procedure.

Patent
22 Dec 1993
TL;DR: In this paper, a decoded instruction cache with multiple instructions per cache line is proposed, where the decode logic fills the cache line with instructions up to its limit during run time cache misses, enabling the processor to dispatch multiple instructions during one clock cycle.
Abstract: A general purpose computer system is equipped with apparatus for enabling a processor to provide efficient execution of multiple instructions per clock cycle. The major feature is a decoded instruction cache with multiple instructions per cache line. During run time cache hits, the decode logic fills the cache line with instructions up to its limit. During run time cache misses, the cache line enables the processor to dispatch multiple instructions during one clock cycle. Hereby is achieved high performance with a simple, but still powerful, decode and dispatch logic. An important feature of the instruction cache is that it holds the target addresses for the next instructions. No separate address logic is needed to proceed in the program execution during cache hits. A conditional branch holds its alternative target address in a separate field. This enables the processor, to a large degree, to be independent of the conditional branch bottleneck.

Patent
28 May 1993
TL;DR: In this article, a cache system which includes prefetch pointer fields for identifying lines of memory to prefetch thereby minimizing the occurrence of cache misses is proposed, which takes advantage of the previous execution history of the processor and the locality of reference exhibited by the requested addresses.
Abstract: A cache system which includes prefetch pointer fields for identifying lines of memory to prefetch thereby minimizing the occurrence of cache misses. This cache structure and method for implementing the same takes advantage of the previous execution history of the processor and the locality of reference exhibited by the requested addresses. In particular, each cache line contains a prefetch pointer field which contains a pointer to a line in memory to be prefetched and placed in the cache. By prefetching specified lines of data with temporal locality to the lines of data containing the prefetch pointers the number of cache misses is minimized.

Patent
09 Dec 1993
TL;DR: In this paper, a microprocessor is provided with an integral, two-level cache memory architecture, where the first level cache misses and the second level cache is discarded and stored in the replacement cache.
Abstract: A microprocessor is provided with an integral, two level cache memory architecture. The microprocessor includes a microprocessor core and a set associative first level cache both located on a common semiconductor die. A replacement cache, which is at least as large as approximately one half the size of the first level cache, is situated on the same semiconductor die and is coupled to the first level cache. In the event of a first level cache miss, a first level entry is discarded and stored in the replacement cache. When such a first level cache miss occurs, the replacement cache is checked to see if the desired entry is stored therein. If a replacement cache hit occurs, then the hit entry is forwarded to the first level cache and stored therein. If a cache miss occurs in both the first level cache and the replacement cache, then a main memory access is commenced to retrieve the desired entry. In that event, the desired entry retrieved from main memory is forwarded to the first level cache and stored therein. When a replacement cache entry is removed from the replacement cache by the replacement algorithm associated therewith, that entry is written back to main memory if that entry was modified. Otherwise the entry is discarded.

Patent
09 Mar 1993
TL;DR: In this article, the first and second processors each having a virtual cache memory, a main memory, and a bus coupled to the main memory and the processors, and apparatus for addressing the cache associated with each processor for providing that the data in each virtual cache stores data from the same physical location in main memory at a same index position in each VM.
Abstract: A computer system includes first and second processors each having a virtual cache memory, a main memory, a bus coupled to the main memory and the processors, and apparatus for addressing the cache associated with each processor for providing that the data in each virtual cache stores data from the same physical location in main memory at a same index position in each virtual cache, a memory management unit (MMU) coupled to each processor such that addressing information is transferred to each memory management unit to indicate the virtual address of data to be written to the virtual cache, the memory management unit generating from the virtual address a physical address, and determining whether any other virtual cache includes data from the same physical memory positions.

Patent
Takashi Nakayama1
30 Sep 1993
TL;DR: In this paper, a microprocessor includes a CPU, a main memory and primary and second cache memories of the direct mapped type, that are all implemented on the same LSI chip.
Abstract: A microprocessor includes a CPU, a main memory and primary and second cache memories of the direct mapped type, that are all implemented on the same LSI chip. The second cache memory's capacity is not greater than the primary cache memory. The primary and second cache memories are organized in a hierarchical structure so that the primary cache memory is accessed before the secondary cache memory, and when the first cache memory is not hit, the secondary cache memory is accessed. Thus, a high performance microprocessor having a small chip area is constructed by adding a small, high speed secondary cache memory, rather than by increasing the memory capacity of the primary cache memory.

Patent
Yifong Shih1
01 Sep 1993
TL;DR: In this article, a distributed file system controls memory allocation between a global cache shared storage memory unit and a plurality of local cache memory units by calculating a variable global cache LRU stack update interval.
Abstract: A distributed file system controls memory allocation between a global cache shared storage memory unit and a plurality of local cache memory units by calculating a variable global cache LRU stack update interval. A new update interval is calculated using fresh system statistics at the end of each update interval. A stack update command is issued at the end of the update interval only if the expected minimum data residency time in the global cache shared memory is less than or equal to the expected average residency time in the local cache memory.

Patent
24 Mar 1993
TL;DR: In this paper, a method and apparatus for allowing a processor to invalidate an individual line of its internal cache while in a non-clocked low power state was presented, and the processor was powered up out of the reduced power consumption state.
Abstract: A method and apparatus for allowing a processor to invalidate an individual line of its internal cache while in a non-clocked low power state. The present invention includes circuitry for placing the processor in a reduced power consumption state. The present invention also includes circuitry for powering up the processor out of the reduced power consumption state to invalidate data in the cache in order to maintain cache coherency while in the reduced power consumption state.

Proceedings ArticleDOI
01 Jul 1993
TL;DR: Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.
Abstract: The effective management of caches is critical to the performance of applications on shared-memory multiprocessors. In this paper, we discuss a technique for software cache coherence tht is based upon the integration of a program-level abstraction for shared data with software cache management . The program-level abstraction, called Shared Regions, explicitly relates synchronization objects with the data they protect. Cache coherence algorithms are presented which use the information provided by shared region primitives, and ensure that shared regions are always cacheable by the processors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessors accessing them. Measurements and experiments of the Shared Region approach on a shared-memory multiprocessor are shown. Comparisons with other software based coherence strategies, including a user-controlled strategy and an operating system-based strategy, show that this approach is able to deliver better performance, with relatively low corresponding overhead and only a small increase in the programming effort. Compared to a compiler-based coherence strategy, the Shared Regions approach still performs better than a compiler that can achieve 90% accuracy in allowing cacheing, as long as the regions are a few hundred bytes or larger, or they are re-used a few times in the cache.

Patent
15 Dec 1993
TL;DR: In this paper, a split level cache memory system for a data processor includes a single chip integer unit, an ARM processor such as a floating point unit and an external main memory.
Abstract: A split level cache memory system for a data processor includes a single chip integer unit, an army processor such as a floating point unit, an external main memory and a split level cache. The split level cache includes an on-chip, fast local cache with low latency for use by the integer unit for loads and stores of integer and address data and an off-chip, pipelined global cache for storing arrays of data such as floating point data for use by the array processor and integer and address data for refilling the local cache. Coherence between the local cache and global cache is maintained by writing through to the global cache during integer stores. Local cache words are invalidated when data is written to the global cache during an army processor store.

Patent
27 Sep 1993
TL;DR: In this article, a multi-port central cache memory is used to queue all the incoming data to be stored in a DRAM and all the outgoing data being retrieved from the DRAM.
Abstract: A multimedia video processor chip for a personal computer in employs a multi-port central cache memory to queue all the incoming data to be stored in a DRAM and all the outgoing data being retrieved from the DRAM. Such a cache memory is used in one of several modes in which the cache memory is partitioned by cache boundaries into different groups of storage areas. Each storage area of the cache is dedicated to storing data from a specific data source. The cache boundaries are chosen such that, for a given mode, the storage areas are optimized for worst case conditions for all data streams.

Patent
02 Aug 1993
TL;DR: In this article, a high performance shared cache is provided to support multiprocessor systems and allow maximum parallelism in accessing the cache by the processors, servicing one processor request in each machine cycle, reducing system response time and increasing system throughput.
Abstract: A high performance shared cache is provided to support multiprocessor systems and allow maximum parallelism in accessing the cache by the processors, servicing one processor request in each machine cycle, reducing system response time and increasing system throughput. The shared cache of the present invention uses the additional performance optimization techniques of pipelining cache operations (loads and stores) and burst-mode data accesses. By including built-in pipeline stages, the cache is enabled to service one request every machine cycle from any processing element. This contributes to reduction in the system response time as well as the throughput. With regard to the burst-mode data accesses, the widest possible data out of the cache can be stored to, and retrieved from, the cache by one cache access operation. One portion of the data is held in logic in the cache (on the chip), while another portion (corresponding to the system bus width) gets transferred to the requesting element (processor or memory) in one cycle. The held portion of the data can then be transferred in the following machine cycle.