scispace - formally typeset
Search or ask a question

Showing papers on "Cache algorithms published in 1983"


Proceedings ArticleDOI
13 Jun 1983
TL;DR: It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.
Abstract: The importance of reducing processor-memory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are inappropriate in these environments because they generate large bursts of bus traffic. A cache exploiting primarily temporal locality (look-behind) is then proposed and demonstrated to be effective in an environment where process switches are infrequent. We argue that such an environment is possible if the traffic to backing store is small enough that many processors can share a common memory and if the cache data consistency problem is solved. We demonstrate that such a cache can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.

431 citations


Patent
21 Oct 1983
TL;DR: In this paper, short traces of consecutive CPU references to storage are accumulated and processed to ascertain hit ratio as a function of cache size. From this determination, an allocation of cache can be made.
Abstract: Short traces of consecutive CPU references to storage are accumulated and processed to ascertain hit ratio as a function of cache size. From this determination, an allocation of cache can be made. Because this determination requires minimal processing time, LRU-referenceable memory space among concurrently executing sequential processes is used dynamically by a CPU cache manager.

117 citations


Patent
21 Dec 1983
TL;DR: In this paper, a hierarchical memory system for use with a high speed data processor characterized by having separate dedicated cache memories for storing data and instructions and further characterized by each cache having a unique cache directory containing a plurality of control bits for assisting line replacement within the individual cache memories and to insure that unnecessary or incorrect data is never stored back into said main memory.
Abstract: A hierarchical memory system for use with a high speed data processor characterized by having separate dedicated cache memories for storing data and instructions and further characterized by each cache having a unique cache directory containing a plurality of control bits for assisting line replacement within the individual cache memories and for eliminating many accesses to main memory and to insure that unnecessary or incorrect data is never stored back into said main memory. The present cache architecture and control features render broadcasting between the data cache and instruction cache unnecessary. Modification of the instruction cache is not permitted. Accordingly, control bits indicating a modification in the cache directory for the instruction cache are not necessary and similarly it is never necessary to store instruction cache lines back into main memory since their modification is not permitted. The cache architecture and controls permit normal instruction and data cache fetches and data cache stores. Additionally, special instructions are provided for setting the special control bits provided in both the instruction and data cache directories, independently of actual memory accessing OPS by the CPU and for storing and loading cache lines independently of memory OPS by the CPU.

107 citations


Journal ArticleDOI
TL;DR: Measurements are reported including the hit ratios of data and instruction references, the rate of cache invalidations by I/O, and the amount of waiting time due to cache misses.

96 citations


Journal ArticleDOI
TL;DR: Efficient methods are discussed for calculating the success function of replacement policies used to manage very large fixed size caches, and how to modify Bennett and Kruskal's algorithm to run in bounded space.

88 citations


Proceedings ArticleDOI
13 Jun 1983
TL;DR: It is concluded theoretically that random replacement is better than LRU and FIFO, and that under certain circumstances, a direct-mapped or set associative caches may perform better than a full associative cache organization.
Abstract: Instruction caches are analyzed both theoretically and experimentally. The theoretical analysis begins with a new model for cache referencing behavior—the loop model. This model is used to study cache organizations and replacement policies. It is concluded theoretically that random replacement is better than LRU and FIFO, and that under certain circumstances, a direct-mapped or set associative cache may perform better than a full associative cache organization. Experimental results using instruction trace data are then given. The experimental results are shown to support the theoretical conclusions.

79 citations


Patent
01 Jul 1983
TL;DR: In this article, the authors propose a method for Direct (DASD) cache management that reduces the volume of data transfer between DASD and cache while avoiding the complexity of managing variable length records in the cache.
Abstract: A method for Direct (DASD) cache management that reduces the volume of data transfer between DASD and cache while avoiding the complexity of managing variable length records in the cache. This is achieved by always choosing the starting point for staging a record to be at the start of the missing record and, at the same time, allocating and managing cache space in fixed length blocks. The method steps require staging records, starting with the requested record and continuing until either the cache block is full, the end of track is reached, or a record already in the cache is encountered.

70 citations


Proceedings ArticleDOI
13 Jun 1983
TL;DR: In designing a VLSI instruction cache for a RISC microprocessor the authors have uncovered four ideas potentially applicable to other V LSI machines that provide expansible cache memory, increased cache speed, reduced program code size, and decreased manufacturing costs.
Abstract: A cache was first used in a commercial computer in 1968,1 and researchers have spent the last 15 years analyzing caches and suggesting improvements. In designing a VLSI instruction cache for a RISC microprocessor we have uncovered four ideas potentially applicable to other VLSI machines. These ideas provide expansible cache memory, increased cache speed, reduced program code size, and decreased manufacturing costs. These improvements blur the habitual distinction between an instruction cache and an instruction fetch unit. The next four sections present the four architectural ideas, followed by a section on performance evaluation of each idea. We then describe the implementation of the cache and finally summarize the results.

66 citations


Patent
17 Oct 1983
TL;DR: In this paper, the cache operating cycle is divided into two subcycles dedicated to mutually exclusive operations: the first subcycle is dedicated to receiving a central processor memory read request, with its address.
Abstract: A data processing machine in which the cache operating cycle is divided into two subcycles dedicated to mutually exclusive operations. The first subcycle is dedicated to receiving a central processor memory read request, with its address. The second subcycle is dedicated to every other kind of cache operation, in particular either (a) receiving an address from a peripheral processor for checking the cache contents after a peripheral processor write to main memory, or (b) writing anything to the cache, including an invalid bit after a cache check match condition, or data after either a cache miss or a central processor write to main memory. The central processor can continue uninteruptedly to read the cache on successive central processor microinstruction cycles, regardless of the fact that the cache contents are being "simultaneously" checked, invalidated or updated after central processor writes. After a cache miss, although the central processor must be stopped to permit updating, it can resume operations a cycle earlier than is possible without the divided cache cycle.

62 citations


Journal ArticleDOI
J. Voldman1, Benoit B. Mandelbrot1, Lee Windsor Hoevel1, J. Knight1, P. L. Rosenfeld1 
TL;DR: This paper uses fractals to model the clustering of cache misses as a discriminate between interactive and batch environments and finds that the cluster dimension provides a measure of the intrinsic differences between workloads.
Abstract: This paper uses fractals to model the clustering of cache misses. The clustering of cache misses can be quantified by a single number analog to a fractional dimension, and we are intrigued by the possibility that this number can be used as a measure of software complexity. The essential intuition is that cache misses are a direct reflection of changes in locality of reference, and that complex software requires more frequent (and larger) changes in this locality than simple software. The cluster dimension provides a measure (and perhaps the basis for a model) of the intrinsic differences between workloads. In this paper, we focus on cache miss activity as a discriminate between interactive and batch environments.

52 citations


Patent
22 Feb 1983
TL;DR: In this article, an instruction is transferred to a region from the main data memory in response to a program address and may be executed without waiting for simultaneous transfer of a large block or number of instructions.
Abstract: A memory system includes a high-speed, multi-region instruction cache, each region of which stores a variable number of instructions received from a main data memory said instructions forming part of a program. An instruction is transferred to a region from the main data memory in response to a program address and may be executed without waiting for simultaneous transfer of a large block or number of instructions. Meanwhile, instructions at consecutively subsequent addresses in the main data memory are transferred to the same region for building an expanding cache of rapidly accessible instructions. The expansion of a given region is brought about as a result of the addressing of that region, such that a cache region receiving a main line of the aforementioned program will be expanded in preference to a region receiving an occasionally used sub-routine. When a new program address is presented, a simultaneous comparison is made with pointers which are provided to be indicative of addresses of instructions currently stored in the various cache regions, and stored information is gated from a region which produces a favorable comparison. When a new address is presented to which no cache region is responsive, the least recently used region, that is the region that has been accessed least recently, is immediately invalidated and reused by writing thereover, starting with the new address to which no cache region was responsive, for accumulating a substituted cache of information from the main data memory.

Patent
30 Jun 1983
TL;DR: A multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors as mentioned in this paper.
Abstract: A multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors. Test control apparatus which couples to the directory error checking apparatus operates to selectively enable and disable the directory error checking circuits in response to commands received from a central processing unit so as to enable the testing of the cache directory and other portions of the cache system using common test routines.

Patent
30 Jun 1983
TL;DR: In this paper, a multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors.
Abstract: A multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors. Test apparatus coupled to the control apparatus and operates to selectively alter the operational states of the cache levels in response to commands received from a central processing unit for enabling testing of such control apparatus in addition to the other cache control areas.

Patent
28 Feb 1983
TL;DR: Cache memory includes a dual or two-part cache with one part of the cache being primarily designated for instruction data while the other part is designated for operand data, but not exclusively as discussed by the authors.
Abstract: Cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return.

Patent
28 Feb 1983
TL;DR: A cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other part is primarily reserved for operand data, but not exclusively as discussed by the authors.
Abstract: A cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return. An arrangement is included whereby the real page number of a delayed transaction may be verified.

Journal ArticleDOI
TL;DR: In this paper a mathematical model is developed which predicts the effect on the miss ratio of running a program in a sequence of interrupted execution intervals and results are compared to measured miss ratios of real programs executing in an interrupted execution environment.
Abstract: A cache is a small, fast associative memory located between a central processor and primary memory and used to hold copies of the contents of primary memory locations. A key performance measure of a cache is the miss ratio: the fraction of processor references which are not satisfied by the cache and result in primary memory references. The miss ratio is sometimes measured by running a single program to completion; however, in real systems rarely does such uninterrupted execution occur. In this paper a mathematical model is developed which predicts the effect on the miss ratio of running a program in a sequence of interrupted execution intervals. Results from the model are compared to measured miss ratios of real programs executing in an interrupted execution environment.

Patent
13 May 1983
TL;DR: In this article, a protocol for implementing a data replacement algorithm associated with a fast, low capacity cache, such as least recently used (LRU), which is fast and which minimizes circuitry is provided.
Abstract: A circuit and method for implementing a predetermined data replacement algorithm associated with a fast, low capacity cache, such as least recently used (LRU), which is fast and which minimizes circuitry is provided. A latch stores the present status of the replacement algorithm, and an address control signal indicates which one of n sets of stored information in the cache has been most recently accessed, where n is an integer. The predetermined algorithm is implemented by a predetermined permutation table stored in a translator which provides an output signal in response to both the present status of the replacement algorithm and the address control signal. The output signal indicates which one of the n sets of stored information in the cache may be replaced with new information.

Journal ArticleDOI
Yeh1, Patel, Davidson
TL;DR: Cache memory organization for parallel-pipelined multiprocessor systems is evaluated and a shared cache is evaluated, which can attain a higher hit ratio and suffers performance degradation due to access conflicts.
Abstract: Cache memory organization for parallel-pipelined multiprocessor systems is evaluated. Private caches have a cache coherence problem. A shared cache avoids this problem and can attain a higher hit ratio due to sharing of single copies of common blocks and dynamic allocation of cache space among the processes. However, a shared cache suffers performance degradation due to access conflicts.

01 Jan 1983
TL;DR: An organisation for a cache memory system for use in a microprocessor-based system structured around the multibus or some similar bus is presented and standard dynamic random access memory (DRAM) is used to store the data in the cache.
Abstract: An organisation for a cache memory system for use in a microprocessor-based system structured around the multibus or some similar bus is presented. Standard dynamic random access memory (DRAM) is used to store the data in the cache. Information necessary for control of and access to the cache is held in a specially designed NMOS VLSI chip. The feasibility of this approach has been demonstrated by designing and fabricating the VLSI chip and a test facility. The critical parameters and implementation details are discussed. This implementation supports multiple cards, each containing a processor and a cache. The technique involves monitoring the bus for references to main storage. The contention for cache cycles between the processor and the bus is resolved by using two identical copies of the tag memory. 9 references.

Proceedings ArticleDOI
13 Jun 1983
TL;DR: Effective shared cache organizations are proposed which retain the cache coherency advantage and which have very low access conflict even with very high request rates.
Abstract: Shared-cache memory organizations for parallel-pipelined multiple instruction stream processors avoid the cache coherence problem of private caches by sharing single copies of common blocks. A shared cache may have a higher hit ratio, but suffers performance degradation due to access conflicts. Effective shared cache organizations are proposed which retain the cache coherency advantage and which have very low access conflict even with very high request rates. Analytic expressions for performance based on a Markov model have been found for several important cases. Performance of shared cache organizations and design tradeoffs are discussed.

Patent
08 Nov 1983
TL;DR: A data storage system includes a host computer (10) and magnetic disk units (26) of diverse types as mentioned in this paper, which store data records at addresses which are generated by a microprocessor (40) in the cache manager.
Abstract: A data storage system includes a host computer (10) and magnetic disk units (26) of diverse types. A solid state cache memory (30) stores data records at addresses which are generated by a microprocessor (40) in the cache manager (32). These addresses include a beginning of track address and an end of track address which span a frame having enough memory locations to store an entire track for a particular type of disk unit (26).

01 Apr 1983
TL;DR: This paper describes a shared cache management scheme to maintain data integrity in a multiprocessing system and describes how this scheme can be implemented in a distributed system.
Abstract: This paper describes a shared cache management scheme to maintain data integrity in a multiprocessing system.

01 Mar 1983
TL;DR: The authors describe a new shared cache scheme for multiprocessors (MPS) that permits each processor to execute directly out of shared cache.
Abstract: The authors describe a new shared cache scheme for multiprocessors (MPS) The scheme permits each processor to execute directly out of shared cache The access time is longer than that to private cache Store-through local caches are used

Patent
Edward George Drimak1
15 Dec 1983
TL;DR: In this article, a cache memory between a CPU and a main memory is employed to store vectors in a cache vector space, and three vector operand address registers (48-50) are employed for reading vector operands from said cache memory and for writing results of vector operations back into cache memory.
Abstract: A cache memory (10), between a CPU and a main memory, is employed to store vectors in a cache vector space (11). Three vector operand address registers (48-50) are employed for reading vector operand elements from said cache memory and for writing results of vector operations back into cache memory. A data path from the cache memory allows vector operand elements to be written into selected local storage registers of the CPU, and a path from the local storage registers to the cache memory includes a buffer. This apparatus allows overlapped reading and writing of vector elements to minimize the time required for vector processing.

Journal ArticleDOI
Dae-Wha Seo1, Jung Wan Cho1
TL;DR: A new directory-based scheme (BIND) based on a number-balanced binary tree can significantly reduce invalidation latency, directory memory requirements, and network traffic as compared to the existing directory- based schemes.