Showing papers on "Cache invalidation published in 1990"

PDF

Open Access

Proceedings Article•DOI•

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

[...]

01 May 1990

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.

...read moreread less

Abstract: Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches.Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches.Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching.Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. Stream buffers are more effective than previously investigated prefetch techniques at using the next slower level in the memory hierarchy when it is pipelined. An extension to the basic stream buffer, called multi-way stream buffers, is introduced. Multi-way stream buffers are useful for prefetching along multiple intertwined data reference streams.Together, victim caches and stream buffers reduce the miss rate of the first level in the cache hierarchy by a factor of two to three on a set of six large benchmarks.

...read moreread less

1,481 citations

Journal Article•DOI•

Directory-based cache coherence in large-scale multiprocessors

[...]

David Chaiken¹, Craig Fields¹, K. Kurihara¹, Anant Agarwal¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jun 1990-IEEE Computer

TL;DR: In this article, the usefulness of shared data caches in large-scale multiprocessors, the relative merits of different coherence schemes, and system-level methods for improving directory efficiency are addressed.

...read moreread less

Abstract: The usefulness of shared-data caches in large-scale multiprocessors, the relative merits of different coherence schemes, and system-level methods for improving directory efficiency are addressed. The research presented is part of an effort to build a high-performance, large-scale multiprocessor. The various classes of cache directory schemes are described, and a method of measuring cache coherence is presented. The various directory schemes are analyzed, and ways of improving the performance of directories are considered. It is found that the best solutions to the cache-coherence problem result from a synergy between a multiprocessor's software and hardware components. >

...read moreread less

291 citations

Journal Article•DOI•

Adaptive software cache management for distributed shared memory architectures

[...]

John K. Bennett¹, John B. Carter¹, Willy Zwaenepoel¹•Institutions (1)

Rice University¹

01 May 1990

TL;DR: This work has examined the sharing and synchronization behavior of a variety of shared memory parallel programs and found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist.

...read moreread less

Abstract: An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform static cache coherence mechanisms. We have examined the sharing and synchronization behavior of a variety of shared memory parallel programs. We have found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist. In addition, we have performed a simulation study that provides two examples of how an adaptive caching mechanism can take advantage of semantic information.

...read moreread less

166 citations

Book•

Cache and memory hierarchy design: a performance-directed approach

[...]

Steven A. Przybylski

03 Jan 1990

TL;DR: This paper presents a meta-modelling architecture that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of designing and implementing a caching system.

...read moreread less

Abstract: 1. Introduction 2. Background Material 3. The Cache Design Problem and Its Solution 4. Performance-Directed Cache Design 5. Multi-Level Cache Hierarchies 6. Summary, Implications and Conclusions Appendix A. Validation of the Empirical Results Appendix B. Modelling Write Strategy Effects

...read moreread less

156 citations

Patent•

Data processing system and method with small fully-associative cache and prefetch buffers

[...]

Norman P. Jouppi, Alan Eustace

27 Mar 1990

TL;DR: In this article, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

143 citations

Journal Article•DOI•

The performance impact of block sizes and fetch strategies

[...]

Steven Przybylski

01 May 1990

TL;DR: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance and finds the most effective fetch strategy improved performance by between 1.7% and 4.5%.

...read moreread less

Abstract: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block size is almost always four or eight words [10]. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance-optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of two greater than the data cache fetch size, which in turn should equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated here, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.

...read moreread less

92 citations

Patent•

Integrated cache memory system with primary and secondary cache memories

[...]

Dennis L. Segers

21 Nov 1990

TL;DR: In this article, the data is stored in an appropriate one of the arrays (88)-(94) and transferred through the primary cache (26) via transfer circuits (96), (98), (100) and (102) to the data bus (32).

...read moreread less

Abstract: A central processing unit (10) has a cache memory system (24) associated therewith for interfacing with a main memory system (23). The cache memory system (24) includes a primary cache (26) comprised of SRAMS and a secondary cache (28) comprised of DRAM. The primary cache (26) has a faster access than the secondary cache (28). When it is determined that the requested data is stored in the primary cache (26) it is transferred immediately to the central processing unit (10). When it is determined that the data resides only in the secondary cache (28), the data is accessed therefrom and routed to the central processing unit (10) and simultaneously stored in the primary cache (26). If a hit occurs in the primary cache (26), it is accessed and output to a local data bus (32). If only the secondary cache (28) indicates a hit, data is accessed from the appropriate one of the arrays (80)-(86) and transferred through the primary cache ( 26) via transfer circuits (96), (98), (100) and (102) to the data bus (32). Simultaneously therewith, the data is stored in an appropriate one of the arrays (88)-(94). When a hit does not occur in either the secondary cache (28) or the primary cache (26), data is retrieved from the main system memory (23) through a buffer/multiplexer circuit on one side of the secondary cache (28) and passed through both the secondary cache (28) and the primary cache (26) and stored therein in a single operation due to the line for line transfer provided by the transfer circuits (96)-(102).

...read moreread less

92 citations

Patent•

Cache contained type semiconductor memory device and operating method therefor

[...]

Kazuyasu Fujishima¹, Charles A Hart¹•Institutions (1)

Mitsubishi¹

14 Jun 1990

TL;DR: In this article, a dynamic random access memory with a fast serial access mode for use in a simple cache system includes a plurality of memory cell blocks prepared by division of a memory cell array.

...read moreread less

Abstract: A dynamic random access memory with a fast serial access mode for use in a simple cache system includes a plurality of memory cell blocks prepared by division of a memory cell array, a plurality of data latches each provided for each column in the memory cell blocks and a block selector. When a cache miss signal is produced by the cache system, data on the column in the cell block selected by the block decoder are transferred into the data latches provided for the columns in the selected block after selection. When a cache hit signal is produced by the cache system, the data latches are isolated from the memory cell array. Accessing is made to at least one of the data latches based on an externally applied column address on cache hit, and to at least one of the columns in the selected block based on the column address on cache miss.

...read moreread less

86 citations

Proceedings Article•DOI•

Parallel trace-driven cache simulation by time partitioning

[...]

Philip Heidelberger¹, Harold S. Stone¹•Institutions (1)

IBM¹

01 Dec 1990

TL;DR: The authors describe a technique for performing parallel simulation of a trace of address references for the purpose of evaluating different cache structures, and found that after simulating the trace in parallel, a small amount of resimulation can produce the correct counts of cache hits and misses.

...read moreread less

Abstract: The authors describe a technique for performing parallel simulation of a trace of address references for the purpose of evaluating different cache structures. One way to achieve fast parallel simulation is to simulate the individual independent sets of a cache concurrently on different computers, but this technique is not efficient in a statistical sense because of a high correlation of the activity between different sets. Only a small fraction of sets should actually be simulated. To put parallelism to effective use, a trace of the sets to be simulated can be partitioned into disjoint time intervals, and each interval can be simulated concurrently. Because the contents of the cache are unknown at the start of the time intervals, this parallel simulation does not produce the correct counts of cache hits and misses. However, after simulating the trace in parallel, a small amount of resimulation can produce the correct counts. The resimulation effort required is proportional to the size of the cache simulated and not to the length of the trace. >

...read moreread less

78 citations

Proceedings Article•DOI•

SMART (strategic memory allocation for real-time) cache design using the MIPS R3000

[...]

D.B. Kirk¹, Jay K. Strosnider¹•Institutions (1)

Carnegie Mellon University¹

05 Dec 1990

TL;DR: SMART, a technique for providing predictable cache performance for real-time systems with priority-based preemptive scheduling, is presented and the value density acceleration (VDA) cache allocation algorithm is introduced and shown to be suitable for run-time cache allocation.

...read moreread less

Abstract: SMART, a technique for providing predictable cache performance for real-time systems with priority-based preemptive scheduling, is presented. The technique is implemented in a R3000 cache design. The value density acceleration (VDA) cache allocation algorithm is also introduced, and shown to be suitable for run-time cache allocation. >

...read moreread less

77 citations

Patent•

Controller for two-way set associative cache

[...]

John H. Crawford¹, Sundaravarathan R. Iyengar¹, James Nadir¹•Institutions (1)

Intel¹

28 Nov 1990

TL;DR: In this paper, the cache controller tagram is configured into two ways, each way including tag and valid-bit storage for associatively searching the directory for cache data-array addresses, and the external cache memory is organized such that both ways are simultaneously available to a number of available memory modules in the system to thereby allow the way access time to occur in parallel with the tag lookup.

...read moreread less

Abstract: A cache controller (10) which sits in parallel with a microprocessor bus (14, 15, 29) so as not to impede system response in the event of a cache miss. The cache controller tagram (24) is configured into a two ways, each way including tag and valid-bit storage for associatively searching the directory for cache data-array addresses. The external cache memory (8) is organized such that both ways are simultaneously available to a number of available memory modules in the system to thereby allow the way access time to occur in parallel with the tag lookup.

...read moreread less

Patent•

Apparatus and method for reducing interference in two-level cache memories

[...]

Charles P. Thacker

30 Nov 1990

TL;DR: In this paper, a cache check request signal is transmitted on the shared data bus, which enables all the cache memories to update their contents if necessary, and each cache has control logic for determining when a specified address location is stored in one of its lines or blocks.

...read moreread less

Abstract: In a multiprocessor computer system, a number of processors are coupled to main memory by a shared memory bus, and one or more of the processors have a two level direct mapped cache memory. When any one processor updates data in a shared portion of the address space, a cache check request signal is transmitted on the shared data bus, which enables all the cache memories to update their contents if necessary. Since both caches are direct mapped, each line of data stored in the first cache is also stored in one of the blocks in the second cache. Each cache has control logic for determining when a specified address location is stored in one of its lines or blocks. To avoid spurious accesses to the first level cache when a cache check is performed, the second cache has a special table which stores a pointer for each line in said first cache array. This pointer denotes the block in the second cache which stores the same data as is stored in the corresponding line of the first cache. When the control logic of the second cache indicates that the specified address for a cache check is located in the second cache, a lookup circuit compares the pointer in the special table which corresponds to the specified address with a subset of the bits of the specified address. If the two match, then the specified address is located in the first cache, and the first cache is updated.

...read moreread less

Patent•

Central processing unit checkpoint retry for store-in and store-through cache systems

[...]

Barbara A. Hall¹, Kevin Huang¹, John David Jabusch¹, Agnes Yee Ngai¹•Institutions (1)

IBM¹

04 Oct 1990

TL;DR: A checkpoint retry system for recovery from an error condition in a multiprocessor type central processing unit which may have a store-in or store-through cache system is presented in this paper.

...read moreread less

Abstract: A checkpoint retry system for recovery from an error condition in a multiprocessor type central processing unit which may have a store-in or a store-through cache system. At detection of a checkpoint instruction, the system initiates action to save the content of the program status word, the floating point registers, the access registers and the general purpose registers until the store operations are completed for the checkpointed sequence. For processors which have a store-in cache, modified cache data is saved in a store buffer until the checkpointed instructions are completed and then written to a cache which is accessible to other processors in the system. For processors which utilize store-through cache, the modified data for the checkpointed instructions is also stored in the store buffer prior to storage in the system memory.

...read moreread less

Patent•

Microprocessor having cache bypass signal terminal

[...]

Hitoshi Yamahata¹•Institutions (1)

NEC¹

20 Jun 1990

TL;DR: In this paper, a cache bypass signal generator is used to prevent the cache memory from performing a data caching operation on the data in order to prevent data to be cache-bypassed without checking bus status signals.

...read moreread less

Abstract: A microprocessor capable of being incorporated in an information processing system with a cache memory unit and capable of realizing fine cache bypass control. The microprocessor can detect data to be cache-bypassed without checking bus status signals. The microprocessor is equipped with a cache bypass signal generator. Upon detection of data to be bypassed, the cache bypass signal generator generates a cache bypass request signal, which prevents the cache memory from performing a data caching operation on the data.

...read moreread less

Patent•

Non-blocking serialization for caching data in a shared cache

[...]

Chandrasekaran Mohan¹, Inderpal S. Narang¹•Institutions (1)

IBM¹

14 Dec 1990

TL;DR: In this paper, a method of controlling entry of a block of data is used with a high-speed cache which is shared by a plurality of independently-operating computer systems in a multi-system data sharing complex.

...read moreread less

Abstract: A method of controlling entry of a block of data is used with a high-speed cache which is shared by a plurality of independently-operating computer systems in a multi-system data sharing complex. Each computer system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the high-speed cache assures that the block of data entered into the cache will not be overwritten by an earlier version of the block of data obtained from the upper-level storage.

...read moreread less

Patent•

Method and apparatus for controlling one or more hierarchical memories using a virtual storage scheme and physical to virtual address translation

[...]

Toshio Doi¹, Takemoto Takeshi¹, Nakatsuka Yasuhiro¹•Institutions (1)

Hitachi¹

26 Oct 1990

TL;DR: In this article, an integrated circuit structure for a multiprocessor system includes an execution unit operative on the basis of a virtual storage scheme and a cache memory having entries designated by logical addresses from the execution unit.

...read moreread less

Abstract: A processing apparatus of an integrated circuit structure for a multiprocessor system includes an execution unit operative on the basis of a virtual storage scheme and a cache memory having entries designated by logical addresses from the execution unit For controlling the cache memory, a first address array containing entries designated by the same logical addresses as the cache memory and storing control information for the corresponding entries of the cache memory is provided in association with a second address array having entries designated by physical addresses and storing translation information for translation of physical addresses to logical addresses for the entries When a physical address at which invalidation is to be performed is inputted in response to a cache memory invalidation request supplied externally, access is made to the second address array by using the physical address to obtain the translation information from the second address array to thereby generate a logical address to be invalidated The first address array is accessed by using the generated logical address to perform a invalidation processing on the control information

...read moreread less

Patent•

Cached random access memory device and system

[...]

Thomas A. Dye¹•Institutions (1)

Texas Instruments¹

21 Feb 1990

TL;DR: In this article, the authors proposed an approach to reduce access time to RAM arrays, especially DRAMs, by including fast access cache rows, e.g., four rows, to store data from accessed rows of the array, where data can then be accessed without precharging, row decoding sensing, and other cycling usually required to access the DRAM.

...read moreread less

Abstract: A device for reducing access time to RAM arrays, especially DRAMs, by including fast access cache rows, e.g., four rows, to store data from accessed rows of the array, where data can then be accessed without precharging, row decoding sensing, and other cycling usually required to access the DRAM. Address registers, comparators, and MRU/LRU register and other cache control logic may be included in the device. The device allows parallel transfer of data between the RAM array and the cache rows. The device may be constructed on a single chip. A system is disclosed which makes use of the cache RAM features in a data processing system to take advantage of the attributes of a cache RAM memory.

...read moreread less

Patent•

Write-back cache with ECC protection

[...]

Michael A. Callander

01 Oct 1990

TL;DR: In this paper, a write-back cache with parity and error correction coding (ECC) is proposed, where the parity and ECC codes are generated by a memory interface when data is transferred by main memory to the central processing unit associated with the cache.

...read moreread less

Abstract: This invention relates to a write-back cache which is protected with parity and error correction coding ("ECC"). The parity and ECC codes are generated by a memory interface when data is transferred by main memory to the central processing unit ("CPU") associated with the cache. Thus, all data originating in main memory will be parity and ECC encoded when it passes through the memory interface, and the data, and its related parity information and ECC codes will be stored in the cache. On the other hand, data which is taken from the cache and modified by the CPU during its processing operations is also transferred to the memory interface for ECC encoding. Thus, all data modified by the CPU is also protected, and the modified data, and its related parity information and ECC codes are also stored in the cache. The memory interface also contains ECC checking and correcting circuitry which can correct erroneous data, on the basis of its ECC code, if that data has been corrupted while stored in the cache, or during transmission on a bus. Therefore, if data in the cache is corrupted, it can be corrected when it is returned to main memory via the memory interface. Accordingly, the invention makes a write-back cache compatible with full ECC protection, even though, at times, the cache may contain the only correct, current copy of given data in the system.

...read moreread less

Journal Article•DOI•

Synchronization with multiprocessor caches

[...]

Joonwon Lee¹, Umakishore Ramachandran¹•Institutions (1)

Georgia Institute of Technology¹

01 May 1990

TL;DR: A new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism, and a new simulation model is developed embodying a widely accepted paradigm of parallel programming that outperforms existing cache protocols.

...read moreread less

Abstract: Introducing private caches in bus-based shared memory multiprocessors leads to the cache consistency problem since there may be multiple copies of shared data. However, the ability to snoop on the bus coupled with the fast broadcast capability allows the design of special hardware support for synchronization. We present a new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism. With this scheme high-level synchronization primitives as well as low-level ones can be implemented without excessive overhead. Cost functions for well-known synchronization methods are derived for invalidation schemes, write update schemes, and our lock-based scheme. To accurately predict the performance implications of the new scheme, a new simulation model is developed embodying a widely accepted paradigm of parallel programming. It is shown that our lock-based protocol outperforms existing cache protocols.

...read moreread less

Patent•

Non-blocking serialization for removing data from a shared cache

[...]

Chandrasekaran Mohan¹, Inderpal S. Narang¹•Institutions (1)

IBM¹

14 Dec 1990

TL;DR: In this article, a high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex, where each data system has access both to the high speed cache and the lower-speed, secondary storage for obtaining and storing data.

...read moreread less

Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and the lower-speed, secondary storage for obtaining and storing data. Management logic and the high-speed cache assures that a block of data obtained form the cache for entry into the secondary storage will be consistent with the version of the block of data in the shared cache.

...read moreread less

Proceedings Article•DOI•

Efficient trace-driven simulation method for cache performance analysis

[...]

Wen-Hann Wang¹, Jean-Loup Baer²•Institutions (2)

IBM¹, University of Washington²

01 Apr 1990

TL;DR: This work reduces the program traces to the extent that exact performance can still be obtained from the reduced traces and devise an algorithm that can produce performance results for a variety of metrics for a large number of set-association write-back caches in just a single simulation run.

...read moreread less

Abstract: We propose improvements to current trace-driven cache simulation methods to make them faster and more economical. We attack the large time and space demands of cache simulation in two ways. First, we reduce the program traces to the extent that exact performance can still be obtained from the reduced traces. Second, we devise an algorithm that can produce performance results for a variety of metrics (hit ratio, write-back counts, bus traffic) for a large number of set-associative write-back caches in just a single simulation run. The trace reduction and the efficient simulation techniques are extended to parallel multiprocessor cache simulations. Our simulation results show that our approach substantially reduces the disk space needed to store the program traces and can dramatically speedup cache simulations and still produce the exact results.

...read moreread less

Patent•

Pipelined error checking and correction for cache memories

[...]

Hu Herbert Chao¹, Jung-Herng Chang¹•Institutions (1)

IBM¹

20 Mar 1990

TL;DR: In this paper, a scheme for the implementation of single error correction, double error detection function is provided for cache memories wherein the normal cache access time is not affected by the addition of the ECC function.

...read moreread less

Abstract: A scheme for the implementation of single error correction, double error detection function is provided for cache memories wherein the normal cache access time is not affected by the addition of the ECC function. Check bits are provided for multiple bytes of data, thereby lowering the overhead of the error detecting and correcting technique. When a single error is detected, a cycle is inserted by the control circuitry of the cache chip. At the same time, the clocks for the CPU are held high until released by the cache chip on the next cycle. Error correction on multi-byte data is performed using, for example, the 72/64 Hamming code. The technique requires a 2-port cache array (one write port, and one read port). However, the density of a true 2-port array is too low; therefore, the technique is implemented with a 1-port array using a time multiplexing technique, providing an effective 2-port array but with the density of a single port array.

...read moreread less

Patent•

Variable capacity cache memory

[...]

Daniel Esbensen¹•Institutions (1)

CA Technologies¹

13 Jul 1990

TL;DR: In this article, a variable length cache system (58, 60) keeps track of the amount of available space on an output device (10) and the capacity of the cache system is continuously increased so long as it is less than the available output space on the output unit (10).

...read moreread less

Abstract: A variable length cache system (58, 60) keeps track of the amount of available space on an output device (10). The capacity of the cache system (60) is continuously increased so long as it is less than the available output space on the output unit (10). Once the size of the cache system exceeds the available output space on the output unit (10), which is less than the total space available on the output unit (10) by a predetermined amount, the contents of the cache memory (60) are flushed or written to the output device (10) and the size of the cache memory (60) is reduced to zero.

...read moreread less

Analysis of the periodic update write policy for disk cache systems

[...]

S. D. Carson¹, Sanjeev Setia¹•Institutions (1)

University of Maryland, College Park¹

01 Jul 1990

TL;DR: In this article, the average response time for disk read requests when the periodic update write policy is used is determined, and design criteria that can be used to decide among competing cache write policies.

...read moreread less

Abstract: A disk cache is typically used in file systems to reduce average access time for data storage and retrieval. The 'periodic update' write policy, widely used in existing computer systems, is one in which dirty cache blocks are written to a disk on a periodic basis. The average response time for disk read requests when the periodic update write policy is used is determined. Read and write load, cache-hit ratio, and the disk scheduler's ability to reduce service time under load are incorporated in the analysis, leading to design criteria that can be used to decide among competing cache write policies. The main conclusion is that the bulk arrivals generated by the periodic update policy cause a traffic jam effect which results in severely degraded service. Effective use of the disk cache and disk scheduling can alleviate this problem, but only under a narrow range of operating conditions. Based on this conclusion, alternate write packages that retain the periodic update policy's advantages and provide uniformly better service are proposed. >

...read moreread less

Proceedings Article•DOI•

Blocking: exploiting spatial locality for trace compaction

[...]

Anant Agarwal¹, Minor Huffman¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Apr 1990

TL;DR: A technique called blocking and a variant called blocking with temporal data are presented that compress traces by exploiting spatial locality and show that blocking filtering combined with cache filtering can reduce trace length by nearly two orders of magnitude.

...read moreread less

Abstract: Trace-driven simulation is a popular method of estimating the performance of cache memories, translation lookaside buffers, and paging schemes. Because the cost of trace-driven simulation is directly proportional to trace length, reducing the number of references in the trace significantly impacts simulation time. This paper concentrates on trace driven simulation for cache miss rate analysis. Previous schemes, such as cache filtering, exploited temporal locality for compressing traces and could yield an order of magnitude reduction in trace length. A technique called blocking and a variant called blocking with temporal data are presented that compress traces by exploiting spatial locality. Experimental results show that blocking filtering combined with cache filtering can reduce trace length by nearly two orders of magnitude while introducing about 10% error in cache miss rate estimates.

...read moreread less

Journal Article•DOI•

A 9-ns HIT-delay 32-kbyte cache macro for high-speed RISC

[...]

Kazutaka Nogami¹, Takayasu Sakurai¹, Kazuhiro Sawada¹, Kenji Sakaue¹, Yuichi Miyazawa¹, Sumio Tanaka¹, Yoichi Hiruta¹, K. Katoh¹, Toshinari Takayanagi¹, Tsukasa Shirotori¹, Y. Itoh¹, Masanori Uchida¹, Tetsuya Iizuka¹ - Show less +9 more•Institutions (1)

Toshiba¹

01 Feb 1990-IEEE Journal of Solid-state Circuits

TL;DR: A 32-kB cache macro with an experimental reduced instruction set computer (RISC) is realized and a pipelined cache access to realize a cycle time shorter than the cache access time is proposed.

...read moreread less

Abstract: A 32-kB cache macro with an experimental reduced instruction set computer (RISC) is realized. A pipelined cache access to realize a cycle time shorter than the cache access time is proposed. A double-word-line architecture combines single-port cells, dual-port cells, and CAM cells into a memory array to improve silicon area efficiency. The cache macro exhibits 9-ns typical clock-to-HIT delay as a result of several circuit techniques, such as a section word-line selector, a dual transfer gate, and 1.0- mu m CMOS technology. It supports multitask operation with logical addressing by a selective clear circuit. The RISC includes a double-word load/store instruction using a 64-b bus to fully utilize the on-chip cache macro. A test scheme allows measurement of the internal signal delay. The test device design is based on the unified design rules scalable through multigenerations of process technologies down to 0.8 mu m. >

...read moreread less

Patent•

A data processor for reloading deferred pushes in a copy-back data cache

[...]

Robin W. Edenfield¹, William B. Ledbetter¹•Institutions (1)

Motorola¹

26 Feb 1990

TL;DR: In this article, a data processor is provided for reloading deferred pushes in copy-back cache, which avoids the potential for multiple concurrent exception conditions, and eliminates the problem of unnecessarily removing an otherwise valid cache entry from the cache.

...read moreread less

Abstract: A data processor is provided for reloading deferred pushes in copy-back cache. When a cache "miss" occurs, a cache controller selects a cache line for replacement, and request a burst line read to transfer the required cache line from an external memory. When the date entries in the cache line selected for replacement are marked dirty, the cache controller "pushes" the cache line or dirty portions thereof into a buffer, which stores the cache line pending completion, by a bus interface controller, or the burst line read. When the burst line read terminates abnormally, due to a bus error or bus cache inhibit (or any other reason), the data cache controller reloads the copy-back cache with the cache line stored in the buffer. The reloading of the copy-back cache avoids the potential for multiple concurrent exception conditions, and eliminates the problem of unnecessarily removing an otherwise valid cache entry from the cache.

...read moreread less

Patent•

Efficient cache write technique through deferred tag modification

[...]

Paul Mageau¹•Institutions (1)

Hewlett-Packard¹

22 Jun 1990

TL;DR: In this article, the authors propose a two-stage cache access pipeline which embellishes a simple "write-thru with write-allocate" cache write policy to achieve single cycle cache write access even when the processor cycle time does not allow sufficient time for the cache control to check the cache tag for validity and to reflect those results to the processor within the same processor cycle.

...read moreread less

Abstract: An efficient cache write technique useful in digital computer systems wherein it is desired to achieve single cycle cache write access even when the processor cycle time does not allow sufficient time for the cache control to check the cache "tag" for validity and to reflect those results to the processor within the same processor cycle. The novel method and apparatus comprising a two-stage cache access pipeline which embellishes a simple "write-thru with write-allocate" cache write policy.

...read moreread less

Patent•

Method for operating a cached peripheral data storage subsystem including a step of subsetting the data transfer into subsets of data records

[...]

Brent Cameron Beardsley¹, Michael T. Benhase¹, Gail Andrea Spear¹, William Dennis Williams¹•Institutions (1)

IBM¹

31 Aug 1990

TL;DR: In this paper, the authors consider the problem of determining the maximum cache space for the peripheral cache DASD subsystem, where the write domain is established in the subsystem by the host processor as a number of records to be written on the disk only after the available cache space is compared with a needed cache space.

...read moreread less

Abstract: The peripheral cache DASD subsystem is connected to predetermined host processors. A channel connection between the host processor and the peripheral subsystem has a much higher burst rate then the burst data transfer rate of a DASD while having an extended signal propagation time preventing rapid exchanges of interactive control signals. The branch write occurs in that data is written both to the DASD and to the cache simultaneously. The write domain is established in the subsystem by the host processor as a number of records to be written on the DASD only after the available cache space is compared with a needed cache space for the entire write domain. Whenever the available cache space is less than the write domain needs, then the peripheral cache DASD subsystem subsets the data transfer into a plurality of subset data transfers each having data storable in the available cache data storage space. Calculations for determining the maximum cache space are described and sets of machine operations are disclosed for effecting the above operations.

...read moreread less

Patent•

Microprocessor with two groups of internal buses

[...]

Harutaka Goto¹•Institutions (1)

Toshiba¹

13 Jul 1990

TL;DR: In this paper, a microprocessor has a CPU, cache memories including TLBs, an internal memory control section (IMC), an external bus controller for controlling the data input/output operation between external memories and the cache memories, and a first group of internal buses for connecting the CPU, the cache memory and IMC, and for transferring logical addresses, logical data and data among the CPU and cache memories.

...read moreread less

Abstract: A microprocessor has a CPU, cache memories including TLBs, an internal memory control section (IMC) for controlling the data access operation to the cache memories, an external bus controller for controlling the data input/output operation between external memories and the cache memories, a first group of internal buses for connecting the CPU, the cache memories and IMC, and for transferring logical addresses, logical data and data among the CPU and the cache memories, and a second group of internal buses for connecting the cache memories, IMC and the external bus controller, and for transferring data among the cache memories and the external memories. Each cache memory and IMC are connected to the first group of internal buses and to the second group of internal buses in parallel, and the IMC controls the use of the second group of internal buses, and the data input/output operation to the group of the internal memories.

...read moreread less