scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 1983"


Proceedings ArticleDOI
13 Jun 1983
TL;DR: It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.
Abstract: The importance of reducing processor-memory bandwidth is recognized in two distinct situations: single board computer systems and microprocessors of the future. Cache memory is investigated as a way to reduce the memory-processor traffic. We show that traditional caches which depend heavily on spatial locality (look-ahead) for their performance are inappropriate in these environments because they generate large bursts of bus traffic. A cache exploiting primarily temporal locality (look-behind) is then proposed and demonstrated to be effective in an environment where process switches are infrequent. We argue that such an environment is possible if the traffic to backing store is small enough that many processors can share a common memory and if the cache data consistency problem is solved. We demonstrate that such a cache can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.

431 citations


Patent
21 Oct 1983
TL;DR: In this paper, short traces of consecutive CPU references to storage are accumulated and processed to ascertain hit ratio as a function of cache size. From this determination, an allocation of cache can be made.
Abstract: Short traces of consecutive CPU references to storage are accumulated and processed to ascertain hit ratio as a function of cache size. From this determination, an allocation of cache can be made. Because this determination requires minimal processing time, LRU-referenceable memory space among concurrently executing sequential processes is used dynamically by a CPU cache manager.

117 citations


Patent
21 Dec 1983
TL;DR: In this paper, a hierarchical memory system for use with a high speed data processor characterized by having separate dedicated cache memories for storing data and instructions and further characterized by each cache having a unique cache directory containing a plurality of control bits for assisting line replacement within the individual cache memories and to insure that unnecessary or incorrect data is never stored back into said main memory.
Abstract: A hierarchical memory system for use with a high speed data processor characterized by having separate dedicated cache memories for storing data and instructions and further characterized by each cache having a unique cache directory containing a plurality of control bits for assisting line replacement within the individual cache memories and for eliminating many accesses to main memory and to insure that unnecessary or incorrect data is never stored back into said main memory. The present cache architecture and control features render broadcasting between the data cache and instruction cache unnecessary. Modification of the instruction cache is not permitted. Accordingly, control bits indicating a modification in the cache directory for the instruction cache are not necessary and similarly it is never necessary to store instruction cache lines back into main memory since their modification is not permitted. The cache architecture and controls permit normal instruction and data cache fetches and data cache stores. Additionally, special instructions are provided for setting the special control bits provided in both the instruction and data cache directories, independently of actual memory accessing OPS by the CPU and for storing and loading cache lines independently of memory OPS by the CPU.

107 citations


Patent
Michael Howard Hartung1
20 Sep 1983
TL;DR: In this article, fast and slow channels are attached to a cached peripheral storage system, having a front store (40) and a backing store (16) preferably with a plurality of data storage devices.
Abstract: Fast and slow channels (13, 14) are attached to a cached peripheral storage system, having a front store (40) and a backing store (16) preferably with a plurality of data storage devices. The peripheral data storage device (16) data transfer rate is not greater than the data rate of the fast channels but greater than the data rate of the slow channels. For data associated with a fast channel, data promotion from the backing store to the front store is designed to encourage read hits while discouraging write hits. For the slow channel, all data goes through the front store. Cache bypassing controls (61) are handled through the LRU (least recently used) replacement algorithm (43) for the slow channels. A common demotion of data from the front store to the backing store is used for all channels. Front store occupancy varies in that buffering for slow channels (data rate change) tends to store and keep full tracks, while caching for fast channels limits data occupancy. For a fast channel, a cache miss results in directly accessing the backing store. The data storage devices are preferably disk data storage devices (DASD) or magnetic tape recorders.

105 citations


Journal ArticleDOI
TL;DR: Measurements are reported including the hit ratios of data and instruction references, the rate of cache invalidations by I/O, and the amount of waiting time due to cache misses.

96 citations


Patent
Howard Thomas Olnowich1
13 Apr 1983
TL;DR: In this article, the authors describe circuits for writing into the cache and adapting the cache to a multi-cache arrangement, where each tag word read out must compare equal (28) with the high order sector bits (A18-A31) of the address and an accompanying validity bit (Vi) for each accessed block location in its group.
Abstract: A cache memory (10) for a data processing system having a tag array (22) in which each tag word can contain a sector address (A18-A31) and represents a predetermined set or block group of consecutively addressable data block locations in a data array (26). The lower order set address bits (A8-A17) concurrently access the tag word and its associated group of block locations in the data array while individual blocks within the group are accessed by supplemental block bits (A6-A7). Each tag word read out must compare equal (28) with the high order sector bits (A18-A31) of the address and an accompanying validity bit (Vi) for each accessed block location in its group must be ON in order to effect (30) a hit. Also described are circuits for writing into the cache and adapting the cache to a multi-cache arrangement.

91 citations


Journal ArticleDOI
TL;DR: Efficient methods are discussed for calculating the success function of replacement policies used to manage very large fixed size caches, and how to modify Bennett and Kruskal's algorithm to run in bounded space.

88 citations


Proceedings ArticleDOI
13 Jun 1983
TL;DR: It is concluded theoretically that random replacement is better than LRU and FIFO, and that under certain circumstances, a direct-mapped or set associative caches may perform better than a full associative cache organization.
Abstract: Instruction caches are analyzed both theoretically and experimentally. The theoretical analysis begins with a new model for cache referencing behavior—the loop model. This model is used to study cache organizations and replacement policies. It is concluded theoretically that random replacement is better than LRU and FIFO, and that under certain circumstances, a direct-mapped or set associative cache may perform better than a full associative cache organization. Experimental results using instruction trace data are then given. The experimental results are shown to support the theoretical conclusions.

79 citations


Patent
01 Jul 1983
TL;DR: In this article, the authors propose a method for Direct (DASD) cache management that reduces the volume of data transfer between DASD and cache while avoiding the complexity of managing variable length records in the cache.
Abstract: A method for Direct (DASD) cache management that reduces the volume of data transfer between DASD and cache while avoiding the complexity of managing variable length records in the cache. This is achieved by always choosing the starting point for staging a record to be at the start of the missing record and, at the same time, allocating and managing cache space in fixed length blocks. The method steps require staging records, starting with the requested record and continuing until either the cache block is full, the end of track is reached, or a record already in the cache is encountered.

70 citations


Proceedings ArticleDOI
13 Jun 1983
TL;DR: In designing a VLSI instruction cache for a RISC microprocessor the authors have uncovered four ideas potentially applicable to other V LSI machines that provide expansible cache memory, increased cache speed, reduced program code size, and decreased manufacturing costs.
Abstract: A cache was first used in a commercial computer in 1968,1 and researchers have spent the last 15 years analyzing caches and suggesting improvements. In designing a VLSI instruction cache for a RISC microprocessor we have uncovered four ideas potentially applicable to other VLSI machines. These ideas provide expansible cache memory, increased cache speed, reduced program code size, and decreased manufacturing costs. These improvements blur the habitual distinction between an instruction cache and an instruction fetch unit. The next four sections present the four architectural ideas, followed by a section on performance evaluation of each idea. We then describe the implementation of the cache and finally summarize the results.

66 citations


Patent
17 Oct 1983
TL;DR: In this paper, the cache operating cycle is divided into two subcycles dedicated to mutually exclusive operations: the first subcycle is dedicated to receiving a central processor memory read request, with its address.
Abstract: A data processing machine in which the cache operating cycle is divided into two subcycles dedicated to mutually exclusive operations. The first subcycle is dedicated to receiving a central processor memory read request, with its address. The second subcycle is dedicated to every other kind of cache operation, in particular either (a) receiving an address from a peripheral processor for checking the cache contents after a peripheral processor write to main memory, or (b) writing anything to the cache, including an invalid bit after a cache check match condition, or data after either a cache miss or a central processor write to main memory. The central processor can continue uninteruptedly to read the cache on successive central processor microinstruction cycles, regardless of the fact that the cache contents are being "simultaneously" checked, invalidated or updated after central processor writes. After a cache miss, although the central processor must be stopped to permit updating, it can resume operations a cycle earlier than is possible without the divided cache cycle.

Journal ArticleDOI
TL;DR: In this article, it is shown that a large fraction of all I/O requests are captured by a cache of an 8-Mbyte order-of-magnitude size for a workload sample.

Journal ArticleDOI
J. Voldman1, Benoit B. Mandelbrot1, Lee Windsor Hoevel1, J. Knight1, P. L. Rosenfeld1 
TL;DR: This paper uses fractals to model the clustering of cache misses as a discriminate between interactive and batch environments and finds that the cluster dimension provides a measure of the intrinsic differences between workloads.
Abstract: This paper uses fractals to model the clustering of cache misses. The clustering of cache misses can be quantified by a single number analog to a fractional dimension, and we are intrigued by the possibility that this number can be used as a measure of software complexity. The essential intuition is that cache misses are a direct reflection of changes in locality of reference, and that complex software requires more frequent (and larger) changes in this locality than simple software. The cluster dimension provides a measure (and perhaps the basis for a model) of the intrinsic differences between workloads. In this paper, we focus on cache miss activity as a discriminate between interactive and batch environments.

Patent
22 Feb 1983
TL;DR: In this article, an instruction is transferred to a region from the main data memory in response to a program address and may be executed without waiting for simultaneous transfer of a large block or number of instructions.
Abstract: A memory system includes a high-speed, multi-region instruction cache, each region of which stores a variable number of instructions received from a main data memory said instructions forming part of a program. An instruction is transferred to a region from the main data memory in response to a program address and may be executed without waiting for simultaneous transfer of a large block or number of instructions. Meanwhile, instructions at consecutively subsequent addresses in the main data memory are transferred to the same region for building an expanding cache of rapidly accessible instructions. The expansion of a given region is brought about as a result of the addressing of that region, such that a cache region receiving a main line of the aforementioned program will be expanded in preference to a region receiving an occasionally used sub-routine. When a new program address is presented, a simultaneous comparison is made with pointers which are provided to be indicative of addresses of instructions currently stored in the various cache regions, and stored information is gated from a region which produces a favorable comparison. When a new address is presented to which no cache region is responsive, the least recently used region, that is the region that has been accessed least recently, is immediately invalidated and reused by writing thereover, starting with the new address to which no cache region was responsive, for accumulating a substituted cache of information from the main data memory.

Patent
Robert Percy Fletcher1
04 Feb 1983
TL;DR: In this paper, a remote TID register (11) is provided to receive the TID from any remote CP in the MP on each cache miss cross-interrogation hit from a remote CP.
Abstract: Cache directory entry replacement for central processors (CPs) in a multiprocessor (MP) utilizes task identifiers (TIDs) provided in each directory entry to identify the program task which inserted the respective entry. A remote TID register (11) is provided to receive the TID from any remote CP in the MP on each cache miss cross-interrogation hit from any remote CP. Each time a respective CP (i.e. local CP) makes a storage request to its private cache directory, the TIDs are compared (in 18) to any remote TID in the CP's remote TID register. A TID candidate is any entry which compares equal to the remote TID and is not equal to the current local processor TID. It is identified as a candidate for replacement in the local cache directory on a cache miss. The candidate priorities in replacement selection circuit (51) are: highest priority is any invalid entry, next is any TID candidate, and lowest priority is the conventional LRU candidate. The TID operation obtains early castout to main storage of any cache line associated with a task being executed in a remote CP and not associated with the task being executed in the CP casting out the line, and thus reduces the potential for future cross-interrogation hits.

Patent
30 Jun 1983
TL;DR: A multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors as mentioned in this paper.
Abstract: A multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors. Test control apparatus which couples to the directory error checking apparatus operates to selectively enable and disable the directory error checking circuits in response to commands received from a central processing unit so as to enable the testing of the cache directory and other portions of the cache system using common test routines.

Patent
30 Jun 1983
TL;DR: In this paper, a multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors.
Abstract: A multilevel set associative cache system whose directory and cache store are organized into levels of memory locations includes control apparatus which selectively degrades cache operation in response to error signals from directory error checking circuits to those levels detected to be free from errors. Test apparatus coupled to the control apparatus and operates to selectively alter the operational states of the cache levels in response to commands received from a central processing unit for enabling testing of such control apparatus in addition to the other cache control areas.

Proceedings ArticleDOI
13 Jun 1983
TL;DR: A highly effective stack which can support a value cache and virtual stack, and high-speed garbage collection algorithm for virtual memory is described.
Abstract: ALPHA is a dedicated machine designed for high-speed list processing. In this article, we describe a highly effective stack which can support a value cache and virtual stack, and high-speed garbage collection algorithm for virtual memory. These new ideas have been studied in ALPHA. ALPHA is designed as a back end processor for a large computer under TSS. ALPHA allows TSS users to do more high-speed list processing than a large computer does. Currently UTILISP is operating on ALPHA and runs several times faster than MACLISP on the DEC 2060.

Patent
Philip Lewis Rosenfeld1
02 May 1983
TL;DR: Load Control Block Address Unit (LCBU) as discussed by the authors is a load control block address unit that allows prefetching of data from main memory into cache simultaneous with execution of a sequence of instructions in a linked list, where information determining starting address of a next block in the linked list is stored at a location in the current block at a fixed offset from the beginning of the block.
Abstract: In a digital data processing system including an Instruction Unit, an Execute Unit, and a multilevel Processor Storage System including a cache memory, additional apparatus is included referred to as a Load Control Block Address Unit for implementing a load control block address instruction which permits prefetching of data from main memory into cache simultaneous with execution of a sequence of instructions in a linked list wherein information determining starting address of a next block in the linked list is stored at a location in the current block at a fixed offset from the beginning of the block.

Patent
28 Feb 1983
TL;DR: Cache memory includes a dual or two-part cache with one part of the cache being primarily designated for instruction data while the other part is designated for operand data, but not exclusively as discussed by the authors.
Abstract: Cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return.

Patent
28 Feb 1983
TL;DR: A cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other part is primarily reserved for operand data, but not exclusively as discussed by the authors.
Abstract: A cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return. An arrangement is included whereby the real page number of a delayed transaction may be verified.

Journal ArticleDOI
TL;DR: In this paper a mathematical model is developed which predicts the effect on the miss ratio of running a program in a sequence of interrupted execution intervals and results are compared to measured miss ratios of real programs executing in an interrupted execution environment.
Abstract: A cache is a small, fast associative memory located between a central processor and primary memory and used to hold copies of the contents of primary memory locations. A key performance measure of a cache is the miss ratio: the fraction of processor references which are not satisfied by the cache and result in primary memory references. The miss ratio is sometimes measured by running a single program to completion; however, in real systems rarely does such uninterrupted execution occur. In this paper a mathematical model is developed which predicts the effect on the miss ratio of running a program in a sequence of interrupted execution intervals. Results from the model are compared to measured miss ratios of real programs executing in an interrupted execution environment.

Patent
13 May 1983
TL;DR: In this article, a protocol for implementing a data replacement algorithm associated with a fast, low capacity cache, such as least recently used (LRU), which is fast and which minimizes circuitry is provided.
Abstract: A circuit and method for implementing a predetermined data replacement algorithm associated with a fast, low capacity cache, such as least recently used (LRU), which is fast and which minimizes circuitry is provided. A latch stores the present status of the replacement algorithm, and an address control signal indicates which one of n sets of stored information in the cache has been most recently accessed, where n is an integer. The predetermined algorithm is implemented by a predetermined permutation table stored in a translator which provides an output signal in response to both the present status of the replacement algorithm and the address control signal. The output signal indicates which one of the n sets of stored information in the cache may be replaced with new information.

Journal ArticleDOI
Yeh1, Patel, Davidson
TL;DR: Cache memory organization for parallel-pipelined multiprocessor systems is evaluated and a shared cache is evaluated, which can attain a higher hit ratio and suffers performance degradation due to access conflicts.
Abstract: Cache memory organization for parallel-pipelined multiprocessor systems is evaluated. Private caches have a cache coherence problem. A shared cache avoids this problem and can attain a higher hit ratio due to sharing of single copies of common blocks and dynamic allocation of cache space among the processes. However, a shared cache suffers performance degradation due to access conflicts.

Patent
21 Mar 1983
TL;DR: In this paper, an indicator is set for that segment by the storage control unit to prevent further attempts to transfer the segment, which remains in the cache store until the host processor issues an initialization or reset segment command.
Abstract: In a system having a host processor connected through a storage control unit to a cache store and a plurality of disk devices, segments of data which have been written to, while resident in the cache store, are transferred to the disks at some later time. If an abnormal condition, such as a bad spot on the disk, prevents the transfer of a segment from the cache store to the disk, an indicator is set for that segment by the storage control unit to prevent further attempts to transfer the segment. A segment whose indicator is set remains in the cache store until the host processor issues an initialize or a reset segment command.

Patent
30 Jun 1983
TL;DR: In this article, a multilevel set associative cache system whose directory and cache store organized into levels of memory locations is described, and a round robin replacement apparatus is used to identify in which level information is to be replaced.
Abstract: A multilevel set associative cache system whose directory and cache store organized into levels of memory locations. Round robin replacement apparatus is used to identify in which level information is to be replaced. The directory includes error checking apparatus for generating address check bits which are written into directory locations together with addresses. Control apparatus in response to error signals from the error checking apparatus degrades cache operation to those levels detected to be free from errors. Test error mode control apparatus which couples to the replacement and check bit apparatuses causes the address check bits to be selectively forced to incorrect values in response to commands received from a central processing unit enabling the verification of both the checking and control apparatus without interference from other operations initiated by the central processing unit.

Patent
30 Aug 1983
TL;DR: In this article, access control to a high speed data buffer is bypassed in order to enhance the speed of access of data which must be retrieved from a large capacity main storage facility without requiring additional circuitry in the data processing machine.
Abstract: Access control to a high speed data buffer in a data processing machine is bypassed in order to enhance the speed of access of data which must be retrieved from a large capacity main storage facility without requiring additional circuitry in the data processing machine. Access control normally requires an entire line of data to be present in a cache before allowing a read to a portion of the line. When lines are moved in to the cache, only sub-line increments are loaded at a time. Thus, when a line is being moved in from a main store to the cache, the increment that satisfies a pending request for access is transferred first. The access control is overridden to allow access to the increment before the balance of the line is transferred.

Patent
Kanji Kubo1, Chikahiko Izumi1
21 Jan 1983
TL;DR: In this paper, a storage system includes a storage for storing data, and a buffer for temporarily buffering data before storing it into the storage, and when a fetch request for the storage does not exist, the store data buffered in the store buffer is transferred from the buffer to the storage and is stored therein.
Abstract: A storage system includes a storage for storing data, and a store buffer for temporarily buffering data before storing it into the storage. A store request is applied to the store buffer, and store data accompanied by the store request is applied to the store buffer. When a fetch request for the storage does not exist, the store data buffered in the store buffer is transferred from the store buffer to the storage and is stored therein.

01 Jan 1983
TL;DR: An organisation for a cache memory system for use in a microprocessor-based system structured around the multibus or some similar bus is presented and standard dynamic random access memory (DRAM) is used to store the data in the cache.
Abstract: An organisation for a cache memory system for use in a microprocessor-based system structured around the multibus or some similar bus is presented. Standard dynamic random access memory (DRAM) is used to store the data in the cache. Information necessary for control of and access to the cache is held in a specially designed NMOS VLSI chip. The feasibility of this approach has been demonstrated by designing and fabricating the VLSI chip and a test facility. The critical parameters and implementation details are discussed. This implementation supports multiple cards, each containing a processor and a cache. The technique involves monitoring the bus for references to main storage. The contention for cache cycles between the processor and the bus is resolved by using two identical copies of the tag memory. 9 references.

Journal ArticleDOI
Briggs1, Dubois
TL;DR: An approximate model is developed to estimate the processor utilization and the speed-up improvement provided by the caches, and it assumes a two-dimensional organization, previously studied under random and word access.
Abstract: A possible design alternative for improving the performance of a multiprocessor system is to insert a private cache between each processor and the shared memory. The caches act as high-speed buffers by reducing the effective memory access time, and affect the delays caused by memory conflicts. In this paper, we study the effectiveness of caches in a multiprocessor system. The shared memory is pipelined and interleaved to improve the block transfer rate, and it assumes a two-dimensional organization, previously studied under random and word access. An approximate model is developed to estimate the processor utilization and the speed-up improvement provided by the caches.