scispace - formally typeset
Search or ask a question

Showing papers on "Cache published in 1978"


Patent
30 Jun 1978
TL;DR: In this article, a programmable control unit for disk memory and controller comprising a microprocessor (MPU), random access memory (RAM) used as a buffer to cache disk resident data used by a data processing system (DPS), and a configurable data path (CDP) which couples the DPS to the disk controller.
Abstract: A programmable control unit for disk memory and controller comprising a microprocessor (MPU), random access memory (RAM) used as a buffer to cache disk resident data used by a data processing system (DPS), and a configurable data path (CDP) which couples the DPS to the disk controller. The (MPU) is programmed to provide the DPS rapid access to disk resident data by so controlling the CDP as to maintain a memory cache of disk resident data in the RAM. Data is cached under either directed control via an application level task or operator console directive, or under dynamic control through which a predetermined number of successive blocks of data are read from the disk and stored in the RAM each time any one block is addressed, and once the RAM is full, by discarding from the RAM a block of data which the immediate history has shown is least useful.

145 citations


Journal ArticleDOI
TL;DR: It is suggested that as electronically accessed third-level memories composed of electron-beam tubes, magnetic bubbles, or charge-coupled devices become available, algorithms currently used only for cache paging will be applied to main memory, for the same reasons of efficiency, implementation ease, and cost.
Abstract: Set associative page mapping algorithms have become widespread for the operation of cache memories for reasons of cost and efficiency. We show how to calculate analytically the effectiveness of standard bit-selection set associative page mapping or random mapping relative to fully associative (unconstrained mapping) paging. For two miss ratio models, Saltzer's linear model and a mixed geometric model, we are able to obtain simple, closed-form expressions for the relative LRU fault rates. We also experiment with two (infeasible to implement) dynamic mapping algorithms, in which pages are assigned to sets either in an LRU or FIFO manner at fault times, and find that they often yield significantly lower miss ratios than static algorithms such as bit selection. Trace driven simulations are used to generate experimental results and to verify the accuracy of our calculations. We suggest that as electronically accessed third-level memories composed of electron-beam tubes, magnetic bubbles, or charge-coupled devices become available, algorithms currently used only for cache paging will be applied to main memory, for the same reasons of efficiency, implementation ease, and cost.

120 citations


Patent
Chang Shih-Jeh1, Toy Wing Noom1
08 Jun 1978
TL;DR: In this paper, a data processing system includes a memory arrangement comprising a main memory, and a cache memory including a validity bit per storage location to indicate the validity of data stored therein.
Abstract: A data processing system includes a memory arrangement comprising a main memory, and a cache memory including a validity bit per storage location to indicate the validity of data stored therein. Cache performance is improved by a special read operation to eliminate storage of data otherwise purged by a replacement scheme. A special read removes cache data after it is read and does not write data read from the main memory into the cache. Additional operations include: normal read, where data is read from the cache memory if available, or, from main memory and written into cache; normal write, where data is written into main memory and the cache is interrogated, in the event of a hit, the data is either updated or effectively removed from the cache by invalidating its associated validity bit; and special write, where data is written both into main memory and the cache.

65 citations


Patent
02 Oct 1978
TL;DR: In this paper, the authors propose an approach for avoiding ambiguous data in a multi-requestor computing system of the type where each of the requestors has its own dedicated cache memory.
Abstract: Apparatus for avoiding ambiguous data in a multi-requestor computing system of the type wherein each of the requestors has its own dedicated cache memory. Each requestor has access to its own dedicated cache memory for purposes of ascertaining whether a particular data word is present in its cache memory and of obtaining that data word directly from its cache memory without the necessity of referencing main memory. Each requestor also has access to all other dedicated cache memories for purposes of invalidating a particular data word contained therein when that same particular data word has been written by that requestor into its own dedicated cache memory. Requestors and addresses in a particular cache memory are time multiplexed in such a way as to allow a particular dedicated cache memory to service invalidate requests from other requestors without sacrificing speed of reference or cycle time of the particular dedicated cache memory from servicing read requests from its own requestor.

49 citations


Patent
Marion G. Porter1
11 Dec 1978
TL;DR: In this paper, a cache unit couples between a main store and data processing unit and includes a cache store organized into a plurality of levels, each for storing blocks of information in the form of data and instructions.
Abstract: A cache unit couples between a main store and data processing unit. The cache unit includes a cache store organized into a plurality of levels, each for storing blocks of information in the form of data and instructions. The cache unit further includes a write command buffer, a transit block buffer and command queue apparatus coupled to the buffers for controlling the sequencing of commands stored in the buffers. The command queue apparatus includes a plurality of multibit storage locations for storing address and control information. The control information is coded to specify the type of command and the number of words the command contains. The address information is used as a pointer for read out of the command from either the write buffer or transit block buffer simplifying control. Control circuits included within the command queue apparatus in accordance with signals corresponding to the control information and use of the address information control the sequencing of commands so as to maximize the use of the buffer storage capacity.

37 citations



Patent
07 Mar 1978
TL;DR: In this article, the cache is accessible to the processor during one of the cache timing cycles and to the main storage during the other cache timing cycle, but no alternately accessible modules, buffering, delay, or interruption is provided for main storage line transfers to the cache.
Abstract: The disclosure enables concurrent access to a cache by main storage and a processor by means of a cache control which provides two cache access timing cycles during each processor storage request cycle. The cache is accessible to the processor during one of the cache timing cycles and is accessible to main storage during the other cache timing cycle. No alternately accessible modules, buffering, delay, or interruption is provided for main storage line transfers to the cache.

33 citations


Patent
11 Dec 1978
TL;DR: In this paper, a cache system includes a storage unit organized into a plurality of levels, each including a number of multiword blocks and a corresponding number of address selection switches and address registers.
Abstract: A cache system includes a storage unit organized into a plurality of levels, each including a number of multiword blocks and a corresponding number of address selection switches and address registers. Each address selection switch has a plurality of different positions connected to receive address signals from a plurality of address sources. A decoder circuit generates output signals for controlling the operation of the address selection switches. In response to previously defined level signals, the decoder circuit conditions a specified one of the number of switches to switch from a first position to a second position. An address specifying the location into which memory data is to be written is clocked into one address register while the address specifying the location from which an instruction is to be fetched is clocked into the remaining address registers. A comparator circuit compares signals indicating the level into which memory data is to be written with signals indicating the level from which a next instruction is to be fetched. The comparator circuit generates signals which cause the delay of instruction access when there is a conflict between writing memory data and accessing instructions.

33 citations


Patent
11 Dec 1978
TL;DR: A cache unit includes a cache store organized into a number of levels to provide a fast access to instructions and data words as mentioned in this paper, and a cache unit further includes a detection apparatus for detecting a conflict condition resulting in an improper assignment, upon detecting such a condition, advances the relacement circuits forward for assigning the next sequential group of locations or level inhibiting it from making its normal location assignment.
Abstract: A cache unit includes a cache store organized into a number of levels to provide a fast access to instructions and data words. Directory circuits, associated with the cache store, contain address information identifying those instructions and data words stored in the cache store. The cache unit has at least one instruction register for storing address and level signals for specifying the location of the next instruction to be fetched and transferred to the processing unit. Replacement circuits are included which, during normal operation, assign cache locations sequentially for replacing old information with new information. The cache unit further includes detection apparatus for detecting a conflict condition resulting in an improper assignment. The detection apparatus, upon detecting such a condition, advances the relacement circuits forward for assigning the next sequential group of locations or level inhibiting it from making its normal location assignment. It also inhibits the directory circuits from writing the necessary information therein required for making the location assignment and prevents the information which produced the conflict from being written into cache store when received from memory.

32 citations


Patent
16 Mar 1978
TL;DR: In this article, the successive fetch requests by the I-unit for sublines (e.g. doublewords) of a variable length field operand are provided by the first through the highest-address fetched sublines in a line being accessed from main storage via a cache bypass.
Abstract: In the case of a cache miss, the successive fetch requests by the I-unit for sublines (e.g. doublewords) of a variable length field operand are provided by the first through the highest-address fetched sublines in a line being accessed from main storage via a cache bypass. This avoids the time delay for the I-unit caused by waiting until the complete line has been transferred to the cache before all required sublines in the line are obtainable from the cache. Address operand pairs (AOP's) consisting of request and buffer registers are provided in the I-unit to handle the fetched sublines as fast as the cache bypass can provide them from main storage. If there is a cache hit, the sublines are accessed from the cache.

31 citations


Patent
11 Dec 1978
TL;DR: A cache system includes a high speed storage unit organized into a plurality of levels, each including a number of multiword blocks and at least one multiposition address selection switch and address register as discussed by the authors.
Abstract: A cache system includes a high speed storage unit organized into a plurality of levels, each including a number of multiword blocks and at least one multiposition address selection switch and address register. The address switch is connected to receive address signals from a plurality of address sources. The system further includes a directory organized into a plurality of levels for storing address information required for accessing blocks from the cache storage unit and timing circuits for defining first and second halves of a cache cycle of operation. Control circuits coupled to the timing circuits generate control signals for controlling the operation of the address selection switch. During the previous cycle, the control circuits condition the address selector switch to select an address which is loaded into the address register during the previous half cycle. This enables either the accessing of instructions from cache or the writing of data into cache during the first half of the next cache cycle. During the first half of the cycle, the address selected by the address switch in response to control signals from the control circuits is clocked into the address register. This permits processor operations, such as the accessing of operand data or the writing of data into cache to be performed during the second half of the same cycle.

Patent
11 Dec 1978
TL;DR: In this paper, a data processing system is described, where a cache store is organized into a plurality of levels, each for storing blocks of information in the form of data and instructions, and the cache unit further includes control apparatus, an instruction buffer for storing instructions received from the main store and a transit block buffer comprising of locations for storing read commands.
Abstract: A data processing system comprises a data processing unit coupled to a cache unit which couples to a main store. The cache unit includes a cache store organized into a plurality of levels, each for storing blocks of information in the form of data and instructions. The cache unit further includes control apparatus, an instruction buffer for storing instructions received from main store and a transit block buffer comprising a plurality of locations for storing read commands. The control apparatus includes a plurality of groups of bit storage elements corresponding to the number of transit buffer locations. Each group includes at least a pair of instruction fetch indicator elements which are operatively connected to control the writing of first and second blocks of instructions into the instruction buffer. Each time a read command specifying the fetching of instructions of either a first or second block is received from the processing unit, the flag storage element associated with the transit block buffer location into which the read command is loaded is set to a binary ONE state while the corresponding ones of the flag storage elements associated with the other locations storing outstanding read commands specifying instruction fetches are reset to binary ZEROS. This permits only those instructions received from main store in response to that read command to be loaded into a specified section of the instruction buffer for enabling overlaps in processing several commands specifying instruction fetch operations.

Patent
11 Dec 1978
TL;DR: In this article, a data processing system comprises of a cache unit coupled to a cache units which couples to a main store, and the cache unit includes a cache store organized into a plurality of levels, each for storing a number of blocks of information in the form of data and instructions.
Abstract: A data processing system comprises a data processing unit coupled to a cache unit which couples to a main store. The cache unit includes a cache store organized into a plurality of levels, each for storing a number of blocks of information in the form of data and instructions. Directories associated with the cache store contain addresses and level control information for indicating which blocks of information reside in the cache store. The cache unit further includes control apparatus and a transit block buffer comprising a number of sections each having a plurality of locations for storing read commands and transit block addresses associated therewith. A corresponding number of valid bit storage elements are included, each of which is set to a binary ONE state when a read command and the associated transit block address are loaded into a corresponding one of the buffer locations. Comparison circuits, coupled to the transit block buffer, compare the transit block address of each outstanding read command stored in the transit block buffer section with the address of each read command or write command received from the processing unit. When there is a conflict, the comparison circuits generate an output signal which conditions the control apparatus to hold or stop further processing of the command by the cache unit and the operation of the processing unit. Holding lasts until the valid bit storage element of the location storing the outstanding read command is reset to a binary ZERO indicating that execution of the read command is completed.

Journal ArticleDOI
TL;DR: Simulation is presented as a practical technique for performance evaluation of alternative configurations of highly concurrent computers and it appears that many of the sophisticated pipelining and buffering techniques implemented in the architecture of the IBM 360/91 are of little value when high-speed (cache) memory is used, as in the IBM360/195.
Abstract: Simulation is presented as a practical technique for performance evaluation of alternative configurations of highly concurrent computers. A technique is described for constructing a detailed deterministic simulation model of a system. In the model a control stream replaces the instruction and data streams of the real system. Simulation of the system model yields the timing and resource usage statistics needed for performance evaluation, without the necessity of emulating the system. As a case study, the implementation of a simulator of a model of the CPU-memory subsystem of the IBM 360/91 is described. The results of evaluating some alternative system designs are discussed. The experiments reveal that, for the case study, the major bottlenecks in the system are the memory unit and the fixed point unit. Further, it appears that many of the sophisticated pipelining and buffering techniques implemented in the architecture of the IBM 360/91 are of little value when high-speed (cache) memory is used, as in the IBM 360/195.

Patent
11 Dec 1978
TL;DR: In this paper, a data processing system is described, which consists of an instruction buffer having first and second sections for storing instructions received from the main store, each of which includes a plurality of word storage locations, each location having a number of bit positions.
Abstract: A data processing system comprises a data processing unit coupled to a cache unit which couples to a main store. The cache unit includes a cache store organized into a plurality of levels, each for storing blocks of information in the form of data and instructions. The cache unit further includes an instruction buffer having first and second sections for storing instructions received from main store. Each instruction buffer section includes a plurality of word storage locations, each location having a number of bit positions. A predetermined bit position of each location is used to indicate when an instruction word has been written into the location. Control apparatus coupled to each of the buffer sections is operative to reset all of the word locations to binary ZEROS when a command requesting an instruction block from main store is ready to be transferred thereto. It is set to a binary ONE state when an instruction word is loaded into the location. Instruction buffer ready circuits included within the control apparatus are conditioned by the states of the predetermined bit positions of the locations to generate output signals to the processing unit enabling the transfer of requested instruction words to the processing unit as soon as they are received from main store.

Proceedings ArticleDOI
03 Apr 1978
TL;DR: The entire Cray-1 machine is described, efficient ways to use the 656+ programmer-accessible registers are discussed, and some of the design shortcomings are discussed.
Abstract: The Cray-1 is an extremely high-speed computer, intended to be used for large floating-point scientific computations. However, it is a well-balanced machine that can gracefully be used on a wide class of problems. The machine has two major architectural innovations: (1) 128 backup registers which represent a new layer in the memory hierarchy, essentially a programmer or compiler-managed cache, and (2) 8 vector registers holding up to 64 words each, and operated on by vector instructions. In this paper, we will describe the entire machine, discuss efficient ways to use the 656+ programmer-accessible registers, and discuss some of the design shortcomings.

Proceedings ArticleDOI
01 Oct 1978
TL;DR: This paper examines the memory hierarchy both overall and with respect to its components in an attempt to identify research problems and project future directions for both research and development.
Abstract: The memory hierarchy is usually the largest identifiable part of a computer system and making effective use of it is critical to the operation and use of the system. We consider the levels of such a memory hierarchy and describe the state of the art and likely directions for both research and development. Algorithmic and logical features of the hierarchy not directly associated with specific components are also discussed. Among the problems we believe to be the most significant are the following: (a) evaluate the effectiveness of gap filler technology as a level of storage between main memory and disk, and if it proves to be effective, determine how/where it should be used, (b) develop algorithms for the use of mass storage in a large comguter system and (c) determine how cache memories should be implemented in very large, fast multiprocessor systems.

Book ChapterDOI
01 Jan 1978
TL;DR: The storage subsystem of a data processing system comprises those components that store programs and data, which includes the spectrum from bulk storage with its own microprograms to the buffer stores and registers located in the Central Processing Unit.
Abstract: The storage subsystem of a data processing system comprises those components that store programs and data. This includes the spectrum from bulk storage with its own microprograms to the buffer stores (cache stores) and registers located in the Central Processing Unit (Fig. 1). Present storage subsystems are almost exclusively implemented in the form of several independent storage hierarchies with their own address spaces. The cost of the storage subsystem can be more than 50 % of the total system hardware cost (Fig. 2).

Patent
19 Jan 1978
TL;DR: In this article, the FIFO store may form the cache memory of a multilevel storage system, the output of circuit 10 being decoded to select the storage area to receive replacement data from main storage when the requested data is not present in the cache.
Abstract: 1533831 FIFO storage INTERNATIONAL BUSINESS MACHINES CORP 13 June 1977 [2 July 1976] 24608/77 Heading G4A Each storage area 0-7 has a marker bit a-h, an encoding (EXOR) circuit 10 produces from the marker bits an indication of which storage area was the first to be loaded, and each time a storage area is loaded its marker bit is inverted, whereby the encoding circuit identifies the storage areas in a predetermined cyclic sequence. The encoding circuit shown produces coded bit combinations ABC in Gray code sequence when the marker bits are set from 0 to 1 in the order a, b, d, c, g, h, f, e and are subsequently reset from 1 to 0 in the same order as successive storage areas are loaded (overwriting occurring after the first sequence). The FIFO store may form the cache memory of a multilevel storage system, the output of circuit 10 being decoded to select the storage area to receive replacement data from main storage when the requested data is not present in the cache.


Book ChapterDOI
01 Jan 1978
TL;DR: The evolution of the last years has shown that with improving technological possibilities, concepts of cache and hierarchical memories must be taken into consideration also for minicomputer systems.
Abstract: The evolution of the last years has shown that with improving technological possibilities, concepts of cache and hierarchical memories must be taken into consideration also for minicomputer systems [2, 8, 10]. As well-known, the system performance critically depends on placing the addressed data at the time of memory access in the cache, as the memory level being next to the processor. Therefore, size and structure of the cache, as well as the organizing strategies are of decisive importance [2, 3, 4, 7, 8].

Book ChapterDOI
01 Jan 1978
TL;DR: This chapter discusses design decisions for the PDP-11/60 midrange minicomputer, which incorporates a 2048-byte cache, memory management unit, and an integral floating-point unit as standard components.
Abstract: Publisher Summary This chapter discusses design decisions for the PDP-11/60 midrange minicomputer. The design of a midrange minicomputer is used as a concrete illustration of tradeoffs made to affect a price/performance balance. Designers use technology advances, for example, doubling of density on a memory chip, to produce new designs in one of two design styles: constant cost/increasing functionality or constant functionality/decreasing cost. By choosing a less powerful cache organization, the 11/60 design obtains a factor of five component reduction. Cache design also illustrates how some design parameters are highly interdependent. The frequency-driven design approach, used on the floating-point processor, can lead to a 40% performance gain. Examples of added functionality in the constant-cost style of design include greater reliability and maintainability and user microprogramming. Internal structure of the PDP-11/60 incorporates a 2048-byte cache, memory management unit, and an integral floating-point unit as standard components. The unit can perform a register-to-register add instruction in an average time of 530 ns, the internal cycle time being 170 ns.



Book ChapterDOI
01 Jan 1978
TL;DR: This chapter focuses on cache memories for PDP-11 family computers, a small, fast, associative memory located between the central processor Pc and the primary memory Mp.
Abstract: Publisher Summary This chapter focuses on cache memories for PDP-11 family computers. One of the most important concepts in computer systems is that of a memory hierarchy. A memory hierarchy is simply a memory system built of two (or more) memory technologies. A cache memory is a small, fast, associative memory located between the central processor Pc and the primary memory Mp. Typically, the cache is implemented in bipolar technology, while Mp is implemented in MOS or magnetic core technology. The cache stores the address data AD pairs, consisting of an Mp address, and a copy of the contents of the Mp location corresponding to that address. The most common form of cache organization is fully associative with the data portion of the AD pair corresponding to basic addressable unit of memory. In a fully associative cache, any AD pair can be stored in any cache location. A set associative cache consists of a number of sets, which are accessed by indexing rather than by association. Each of the sets contains one or more AD pairs. The performance goals of the PDP-11/70 computer system require the typical miss ratio to be 0.1 or less.

Journal ArticleDOI
01 Nov 1978
TL;DR: The results indicate the viability of systems utilising cache memories and pipelined switches which exhibit performance comparable to systems with crosspoint switches, and suggest that m.m.i.d. systems with pipelining binary switches can be implemented at a lower cost than those withCrosspoint switches.
Abstract: Simulation results of a multiple-instruction multiple-data-stream (m.i.m.d.) organisation are presented. The results deal with the behaviour of throughput performance with respect to variations in cache-memory parameters, number of processors and processing time, of a m.i.m.d. system in which a pipelined binary switch is used as the interconnection network. The results indicate the viability of systems utilising cache memories and pipelined switches which exhibit performance comparable to systems with crosspoint switches. This aspect is attractive, since it is likely that m.i.m.d. systems with pipelined binary switches can be implemented at a lower cost than those with crosspoint switches.


Patent
17 Feb 1978
TL;DR: In this article, a cache store and a backing store are combined with parity check circuits for detecting errors in the addresses and information read from the cache store during a read cycle of operation.
Abstract: A memory system includes a cache store and a backing store. The cache store provides fast access to blocks of information previously fetched from the backing store in response to commands. The backing store includes error detection and correction apparatus for detecting and correcting errors in the information read from backing store during a backing store cycle of operation. The cache store includes parity generation circuits which generate check bits for the addresses to be written into a director associated therewith. Additionally, the cache store includes parity check circuits for detecting errors in the addresses and information read from the cache store during a read cycle of operation. The memory system further includes control apparatus for enabling for operation, the backing store and cache store in response to the commands. The control apparatus includes circuits which couples to the parity check circuits. Such circuits are operative upon detecting an error in either an address or information read from the cache store to simulate a condition that the information requested was not stored in cache store. This causes the control apparatus to initiate a backing store cycle of operation for read out of a correct version of the requested information thereby eliminating the necessity of including in cache store more complex detection and correction circuits.