scispace - formally typeset
Search or ask a question

Showing papers on "Cache pollution published in 1990"


Proceedings ArticleDOI
01 May 1990
TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.
Abstract: Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches.Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches.Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching.Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. Stream buffers are more effective than previously investigated prefetch techniques at using the next slower level in the memory hierarchy when it is pipelined. An extension to the basic stream buffer, called multi-way stream buffers, is introduced. Multi-way stream buffers are useful for prefetching along multiple intertwined data reference streams.Together, victim caches and stream buffers reduce the miss rate of the first level in the cache hierarchy by a factor of two to three on a set of six large benchmarks.

1,481 citations


Journal ArticleDOI
01 May 1990
TL;DR: This work has examined the sharing and synchronization behavior of a variety of shared memory parallel programs and found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist.
Abstract: An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform static cache coherence mechanisms. We have examined the sharing and synchronization behavior of a variety of shared memory parallel programs. We have found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist. In addition, we have performed a simulation study that provides two examples of how an adaptive caching mechanism can take advantage of semantic information.

166 citations


Book
03 Jan 1990
TL;DR: This paper presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing a caching system.
Abstract: 1. Introduction 2. Background Material 3. The Cache Design Problem and Its Solution 4. Performance-Directed Cache Design 5. Multi-Level Cache Hierarchies 6. Summary, Implications and Conclusions Appendix A. Validation of the Empirical Results Appendix B. Modelling Write Strategy Effects

156 citations


Patent
27 Mar 1990
TL;DR: In this article, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.
Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

143 citations


Patent
24 Oct 1990
TL;DR: In this paper, a method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system is presented.
Abstract: A method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system. Processors 20 and 120 are reset without causing cache memory controllers 24 and 124 to reset.

143 citations


Patent
17 Jul 1990
TL;DR: In this article, a disk drive access control apparatus for connection between a host computer and a plurality of disk drives to provide an asynchronously operating storage system is presented. But the disk drive controller channels are not connected to the disk drives and each of the channels includes a cache/buffer memory and a micro-processor unit.
Abstract: This invention provides disk drive access control apparatus for connection between a host computer and a plurality of disk drives to provide an asynchronously operating storage system. It also provides increases in performance over earlier versions thereof. There are a plurality of disk drive controller channels connected to respective ones of the disk drives and controlling transfers of data to and from the disk drives, each of the disk drive controller channels includes a cache/buffer memory and a micro-processor unit. An interface and driver unit interfaces with the host computer and there is a central cache memory. Cache memory control logic controls transfers of data from the cache/buffer memory of the plurality of disk drive controller channels to the cache memory and from the cache memory to the cache/buffer memory of the plurality of disk drive controller channels and from the cache memory to the host computer through the interface and driver unit. A central processing unit manages the use of the cache memory by requesting data transfers only of data not presently in the cache memory and by sending high level commands to the disk drive controller channels. A first (data) bus interconnects the plurality of disk drive cache/buffer memories, the interface and driver unit, and the cache memory for the transfer of information therebetween and a second (information and commands) bus interconnects the same elements with the central processing unit for the transfer of control and information therebetween.

140 citations


Journal ArticleDOI
01 May 1990
TL;DR: Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached.
Abstract: This paper presents an empirical evaluation of two memory-efficient directory methods for maintaining coherent caches in large shared memory multiprocessors. Both directory methods are modifications of a scheme proposed by Censier and Feautrier [5] that does not rely on a specific interconnection network and can be readily distributed across interleaved main memory. The schemes considered here overcome the large amount of memory required for tags in the original scheme in two different ways. In the first scheme each main memory block is sectored into sub-blocks for which the large tag overhead is shared. In the second scheme a limited number of large tags are stored in an associative cache and shared among a much larger number of main memory blocks. Simulations show that in terms of access time and network traffic both directory methods provide significant performance improvements over a memory system in which shared-writeable data is not cached. The large block sizes required for the sectored scheme, however, promotes sufficient false sharing that its performance is markedly worse than using a tag cache.

135 citations


Journal ArticleDOI
01 May 1990
TL;DR: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance and finds the most effective fetch strategy improved performance by between 1.7% and 4.5%.
Abstract: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block size is almost always four or eight words [10]. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance-optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of two greater than the data cache fetch size, which in turn should equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated here, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.

92 citations


Patent
12 Dec 1990
TL;DR: In this article, an integrated cache unit (ICU) comprising both a cache memory and a cache controller on a single chip is presented. Butler et al. proposed an ICU that supports high speed data and instruction processing applications in both RISC and non-RISC architecture environments.
Abstract: Methods and apparatus are disclosed for realizing an integrated cache unit (ICU) comprising both a cache memory and a cache controller on a single chip. The novel ICU is capable of being programmed, supports high speed data and instruction processing applications in both Reduced Instruction Set Computers (RISC) and non-RISC architecture environments, and supports high speed processing applications in both single and multiprocessor systems. The preferred ICU has two buses, one for the processor interface and the other for a memory interface. The ICU support single, burst and pipelined processor accesses and is capable of operating at frequencies in excess of 25 megahertz, achieving processor access times of two cycles for the first access in a sequence, and one cycle for burst mode or piplined accesses. It can be used as either an instruction or data cache with flexible internal cache organization. A RISC processor and two ICUs (for instruction and data cache) implements a very high performance processor with 16k bytes of cache. Larger caches can be designed by using additional ICUs which, according to the preferred embodiment of the invention, are modular. Further features include flexible and extensive multiprocessor support hardware, low power requirements, and support of a combination of bus watching, ownership schemes, software control and hardware control schemes which may be used with the novel ICU to achieve cache consistency.

79 citations


Journal ArticleDOI
TL;DR: It is shown how the cache DRAM bridges the gap in speed between high-performance microprocessor units and existing DRAMs and its architecture is presented.
Abstract: A DRAM (dynamic RAM) with an on-chip cache, called the cache DRAM, has been proposed and fabricated. It is a hierarchical RAM containing a 1-Mb DRAM for the main memory and an 8-kb SRAM (static RAM) for cache memory. It uses a 1.2- mu m CMOS technology. Suitable for no-wait-state memory access in low-end workstations and personal computers, the chip also serves high-end systems as a secondary cache scheme. It is shown how the cache DRAM bridges the gap in speed between high-performance microprocessor units and existing DRAMs. The cache DRAM concept is explained, and its architecture is presented. The error checking and correction scheme used to improve the cache DRAM's reliability is described. Performance results for an experimental device are reported. >

78 citations


Proceedings ArticleDOI
05 Dec 1990
TL;DR: SMART, a technique for providing predictable cache performance for real-time systems with priority-based preemptive scheduling, is presented and the value density acceleration (VDA) cache allocation algorithm is introduced and shown to be suitable for run-time cache allocation.
Abstract: SMART, a technique for providing predictable cache performance for real-time systems with priority-based preemptive scheduling, is presented. The technique is implemented in a R3000 cache design. The value density acceleration (VDA) cache allocation algorithm is also introduced, and shown to be suitable for run-time cache allocation. >

Patent
28 Nov 1990
TL;DR: In this paper, the cache controller tagram is configured into two ways, each way including tag and valid-bit storage for associatively searching the directory for cache data-array addresses, and the external cache memory is organized such that both ways are simultaneously available to a number of available memory modules in the system to thereby allow the way access time to occur in parallel with the tag lookup.
Abstract: A cache controller (10) which sits in parallel with a microprocessor bus (14, 15, 29) so as not to impede system response in the event of a cache miss. The cache controller tagram (24) is configured into a two ways, each way including tag and valid-bit storage for associatively searching the directory for cache data-array addresses. The external cache memory (8) is organized such that both ways are simultaneously available to a number of available memory modules in the system to thereby allow the way access time to occur in parallel with the tag lookup.

Patent
30 Nov 1990
TL;DR: In this paper, a cache check request signal is transmitted on the shared data bus, which enables all the cache memories to update their contents if necessary, and each cache has control logic for determining when a specified address location is stored in one of its lines or blocks.
Abstract: In a multiprocessor computer system, a number of processors are coupled to main memory by a shared memory bus, and one or more of the processors have a two level direct mapped cache memory. When any one processor updates data in a shared portion of the address space, a cache check request signal is transmitted on the shared data bus, which enables all the cache memories to update their contents if necessary. Since both caches are direct mapped, each line of data stored in the first cache is also stored in one of the blocks in the second cache. Each cache has control logic for determining when a specified address location is stored in one of its lines or blocks. To avoid spurious accesses to the first level cache when a cache check is performed, the second cache has a special table which stores a pointer for each line in said first cache array. This pointer denotes the block in the second cache which stores the same data as is stored in the corresponding line of the first cache. When the control logic of the second cache indicates that the specified address for a cache check is located in the second cache, a lookup circuit compares the pointer in the special table which corresponds to the specified address with a subset of the bits of the specified address. If the two match, then the specified address is located in the first cache, and the first cache is updated.

Patent
18 Oct 1990
TL;DR: In this paper, a cache access system that shortens the address generation machine cycle of a digital computer, while simultaneously avoiding the synonym problem of logical addressing, is proposed, based on the concept of predicting what the real address used in the cache memory will be, independent of the generation of the logical address.
Abstract: This invention implements a cache access system that shortens the address generation machine cycle of a digital computer, while simultaneously avoiding the synonym problem of logical addressing. The invention is based on the concept of predicting what the real address used in the cache memory will be, independent of the generation of the logical address. The prediction involves recalling the last real address used to access the cache memory for a particular instruction, and then using that real address to access the cache memory. Incorrect guesses are corrected and kept to a minimum through monitoring the history of instructions and real addresses called for in the computer. This allows the cache memory to retrieve the information faster than waiting for the virtual address to be generated and then translating the virtual address into a real address. The address generation machine cycle is faster because the delays associated with the adder of the virtual address generation means and the translation buffer are bypassed.

Patent
04 Oct 1990
TL;DR: A checkpoint retry system for recovery from an error condition in a multiprocessor type central processing unit which may have a store-in or store-through cache system is presented in this paper.
Abstract: A checkpoint retry system for recovery from an error condition in a multiprocessor type central processing unit which may have a store-in or a store-through cache system. At detection of a checkpoint instruction, the system initiates action to save the content of the program status word, the floating point registers, the access registers and the general purpose registers until the store operations are completed for the checkpointed sequence. For processors which have a store-in cache, modified cache data is saved in a store buffer until the checkpointed instructions are completed and then written to a cache which is accessible to other processors in the system. For processors which utilize store-through cache, the modified data for the checkpointed instructions is also stored in the store buffer prior to storage in the system memory.

Patent
21 Feb 1990
TL;DR: A cache management system for a computer system having a central processing unit, a main memory, and cache memory including a memory management unit for transferring page size blocks of information, apparatus for reading information from main memory and apparatus for writing information to the cache memory is described in this article.
Abstract: A cache management system for a computer system having a central processing unit, a main memory, and cache memory including a memory management unit for transferring page size blocks of information, apparatus for reading information from main memory, apparatus for writing information to the cache memory, and apparatus for overlapping the write of information to the cache memory to occur during the read of information from the main memory.

Patent
Hitoshi Yamahata1
20 Jun 1990
TL;DR: In this paper, a cache bypass signal generator is used to prevent the cache memory from performing a data caching operation on the data in order to prevent data to be cache-bypassed without checking bus status signals.
Abstract: A microprocessor capable of being incorporated in an information processing system with a cache memory unit and capable of realizing fine cache bypass control. The microprocessor can detect data to be cache-bypassed without checking bus status signals. The microprocessor is equipped with a cache bypass signal generator. Upon detection of data to be bypassed, the cache bypass signal generator generates a cache bypass request signal, which prevents the cache memory from performing a data caching operation on the data.

Patent
14 Dec 1990
TL;DR: In this paper, a method of controlling entry of a block of data is used with a high-speed cache which is shared by a plurality of independently-operating computer systems in a multi-system data sharing complex.
Abstract: A method of controlling entry of a block of data is used with a high-speed cache which is shared by a plurality of independently-operating computer systems in a multi-system data sharing complex. Each computer system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the high-speed cache assures that the block of data entered into the cache will not be overwritten by an earlier version of the block of data obtained from the upper-level storage.

Patent
Thomas A. Dye1
21 Feb 1990
TL;DR: In this article, the authors proposed an approach to reduce access time to RAM arrays, especially DRAMs, by including fast access cache rows, e.g., four rows, to store data from accessed rows of the array, where data can then be accessed without precharging, row decoding sensing, and other cycling usually required to access the DRAM.
Abstract: A device for reducing access time to RAM arrays, especially DRAMs, by including fast access cache rows, e.g., four rows, to store data from accessed rows of the array, where data can then be accessed without precharging, row decoding sensing, and other cycling usually required to access the DRAM. Address registers, comparators, and MRU/LRU register and other cache control logic may be included in the device. The device allows parallel transfer of data between the RAM array and the cache rows. The device may be constructed on a single chip. A system is disclosed which makes use of the cache RAM features in a data processing system to take advantage of the attributes of a cache RAM memory.


Patent
01 Oct 1990
TL;DR: In this paper, a write-back cache with parity and error correction coding (ECC) is proposed, where the parity and ECC codes are generated by a memory interface when data is transferred by main memory to the central processing unit associated with the cache.
Abstract: This invention relates to a write-back cache which is protected with parity and error correction coding ("ECC"). The parity and ECC codes are generated by a memory interface when data is transferred by main memory to the central processing unit ("CPU") associated with the cache. Thus, all data originating in main memory will be parity and ECC encoded when it passes through the memory interface, and the data, and its related parity information and ECC codes will be stored in the cache. On the other hand, data which is taken from the cache and modified by the CPU during its processing operations is also transferred to the memory interface for ECC encoding. Thus, all data modified by the CPU is also protected, and the modified data, and its related parity information and ECC codes are also stored in the cache. The memory interface also contains ECC checking and correcting circuitry which can correct erroneous data, on the basis of its ECC code, if that data has been corrupted while stored in the cache, or during transmission on a bus. Therefore, if data in the cache is corrupted, it can be corrected when it is returned to main memory via the memory interface. Accordingly, the invention makes a write-back cache compatible with full ECC protection, even though, at times, the cache may contain the only correct, current copy of given data in the system.

Journal ArticleDOI
01 May 1990
TL;DR: A new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism, and a new simulation model is developed embodying a widely accepted paradigm of parallel programming that outperforms existing cache protocols.
Abstract: Introducing private caches in bus-based shared memory multiprocessors leads to the cache consistency problem since there may be multiple copies of shared data. However, the ability to snoop on the bus coupled with the fast broadcast capability allows the design of special hardware support for synchronization. We present a new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism. With this scheme high-level synchronization primitives as well as low-level ones can be implemented without excessive overhead. Cost functions for well-known synchronization methods are derived for invalidation schemes, write update schemes, and our lock-based scheme. To accurately predict the performance implications of the new scheme, a new simulation model is developed embodying a widely accepted paradigm of parallel programming. It is shown that our lock-based protocol outperforms existing cache protocols.

Patent
14 Dec 1990
TL;DR: In this article, a high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex, where each data system has access both to the high speed cache and the lower-speed, secondary storage for obtaining and storing data.
Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and the lower-speed, secondary storage for obtaining and storing data. Management logic and the high-speed cache assures that a block of data obtained form the cache for entry into the secondary storage will be consistent with the version of the block of data in the shared cache.

Proceedings ArticleDOI
01 Apr 1990
TL;DR: This work reduces the program traces to the extent that exact performance can still be obtained from the reduced traces and devise an algorithm that can produce performance results for a variety of metrics for a large number of set-association write-back caches in just a single simulation run.
Abstract: We propose improvements to current trace-driven cache simulation methods to make them faster and more economical. We attack the large time and space demands of cache simulation in two ways. First, we reduce the program traces to the extent that exact performance can still be obtained from the reduced traces. Second, we devise an algorithm that can produce performance results for a variety of metrics (hit ratio, write-back counts, bus traffic) for a large number of set-associative write-back caches in just a single simulation run. The trace reduction and the efficient simulation techniques are extended to parallel multiprocessor cache simulations. Our simulation results show that our approach substantially reduces the disk space needed to store the program traces and can dramatically speedup cache simulations and still produce the exact results.

01 Jul 1990
TL;DR: In this article, the average response time for disk read requests when the periodic update write policy is used is determined, and design criteria that can be used to decide among competing cache write policies.
Abstract: A disk cache is typically used in file systems to reduce average access time for data storage and retrieval. The 'periodic update' write policy, widely used in existing computer systems, is one in which dirty cache blocks are written to a disk on a periodic basis. The average response time for disk read requests when the periodic update write policy is used is determined. Read and write load, cache-hit ratio, and the disk scheduler's ability to reduce service time under load are incorporated in the analysis, leading to design criteria that can be used to decide among competing cache write policies. The main conclusion is that the bulk arrivals generated by the periodic update policy cause a traffic jam effect which results in severely degraded service. Effective use of the disk cache and disk scheduling can alleviate this problem, but only under a narrow range of operating conditions. Based on this conclusion, alternate write packages that retain the periodic update policy's advantages and provide uniformly better service are proposed. >

Journal ArticleDOI
TL;DR: A 32-kB cache macro with an experimental reduced instruction set computer (RISC) is realized and a pipelined cache access to realize a cycle time shorter than the cache access time is proposed.
Abstract: A 32-kB cache macro with an experimental reduced instruction set computer (RISC) is realized. A pipelined cache access to realize a cycle time shorter than the cache access time is proposed. A double-word-line architecture combines single-port cells, dual-port cells, and CAM cells into a memory array to improve silicon area efficiency. The cache macro exhibits 9-ns typical clock-to-HIT delay as a result of several circuit techniques, such as a section word-line selector, a dual transfer gate, and 1.0- mu m CMOS technology. It supports multitask operation with logical addressing by a selective clear circuit. The RISC includes a double-word load/store instruction using a 64-b bus to fully utilize the on-chip cache macro. A test scheme allows measurement of the internal signal delay. The test device design is based on the unified design rules scalable through multigenerations of process technologies down to 0.8 mu m. >

Patent
11 Oct 1990
TL;DR: In this article, a permutation operation on an M bit portion (X) of the system memory address is performed, which permutation determines the congruence class into which the address will map.
Abstract: An electronic computer system including a central processor and a hierarchical memory system having a large relatively low speed random access system memory and a small high speed set-associative cache memory including a data store section for storing lines of data from the system memory and a cache directory for indicating, by means of line identifier fields at any time, the lines of the system memory data currently resident in cache, is provided with a way to improve the distribution of data across the congruence classes within the cache. A mechanism is provided for performing a permutation operation on an M bit portion (X) of the system memory address, which permutation determines the congruence class into which the address will map. The permutation mechanism performs a bit-matrix multiplication of said M-bit address with an M×M matrix (where M is a real positive integer greater than 1) to produce a permuted M-bit address (X'). The directory controls utilize the permuted M-bit address (X') to determine the congruence class of any given memory access and automatically access the congruence class of the permuted address (X') subsequent to the permutation operation to determine if one of the line identifiers which identifies, every member of a congruence class currently stored in the directory, matches an identifier field from the memory access request from the CPU. If the match is successful the data store portion of the cache is accessed at the permuted M-bit address (X') and the requested data line is accessed at the address field specified by the CPU.

Patent
11 Jan 1990
TL;DR: A data processing system that includes a plurality of processors with at least a portion of the processors each individually connected to a cache memory for storing data for that processor is considered in this paper.
Abstract: A data processing system that includes a plurality of processors with at least a portion of this plurality of processors each individually connected to a cache memory for storing data for that processor. Each cache memory includes a cache controller that is connected to a bus. Each controller includes a circuit for independently storing a data coherency procedure indicator indicating that the controller will perform one of two or more data coherency procedures. According to one procedure, when data is updated in a cache memory, corresponding data is updated in another cache that stores the corresponding data. In a second data coherency procedure, when data is updated in one cache, the corresponding data stored in another cache is invalidated. The individual and independent storing of the coherency procedure indicator enables each cache to perform either one or the other data coherency procedure without interfering with the data coherency procedures performed by other caches in the data processing system. Furthermore, the cache that is updating data provides an updating signal on the bus, which is received by the other caches on the bus. The controllers of those caches will then either update or invalidate any corresponding data in accordance with those caches stored coherency procedure indicators.

Patent
26 Feb 1990
TL;DR: In this article, a data processor is provided for reloading deferred pushes in copy-back cache, which avoids the potential for multiple concurrent exception conditions, and eliminates the problem of unnecessarily removing an otherwise valid cache entry from the cache.
Abstract: A data processor is provided for reloading deferred pushes in copy-back cache. When a cache "miss" occurs, a cache controller selects a cache line for replacement, and request a burst line read to transfer the required cache line from an external memory. When the date entries in the cache line selected for replacement are marked dirty, the cache controller "pushes" the cache line or dirty portions thereof into a buffer, which stores the cache line pending completion, by a bus interface controller, or the burst line read. When the burst line read terminates abnormally, due to a bus error or bus cache inhibit (or any other reason), the data cache controller reloads the copy-back cache with the cache line stored in the buffer. The reloading of the copy-back cache avoids the potential for multiple concurrent exception conditions, and eliminates the problem of unnecessarily removing an otherwise valid cache entry from the cache.

Proceedings ArticleDOI
01 Oct 1990
TL;DR: It is pointed out that aggressive compilers should be able to improve program performance by focusing on those array accesses that result in cache misses, and it was observed that the data caches contained the values for between 45% and 99% of the array access, depending on the cache and the program.
Abstract: Processor speed has been increasing faster than mass memory speed. One method of matching a processor's speed to memory's is high-speed caches. This paper examines the data cache performance of a set of computationally intensive programs. Our interset in measuring cache performance arises from an interest in improving the performance of program during compilation. We observed that the data caches contained the values for between 45% and 99+% of the array accesses, depending on the cache and the program. The delays from the misses accounted for up to half of the total execution time of the program. The misses were grouped in a subset of source program references which resulted in misses on every access. Aggressive compilers should be able to improve program performance by focusing on those array accesses that result in cache misses.