scispace - formally typeset
Search or ask a question

Showing papers on "Cache coloring published in 1982"


Journal ArticleDOI
TL;DR: In this article, an analytical model for the program behavior of a multitasked system is introduced, including the behavior of each process and the interactions between processes with regard to the sharing of data blocks.
Abstract: In many commercial multiprocessor systems, each processor accesses the memory through a private cache. One problem that could limit the extensibility of the system and its performance is the enforcement of cache coherence. A mechanism must exist which prevents the existence of several different copies of the same data block in different private caches. In this paper, we present an in-depth analysis of the effects of cache coherency in multiprocessors. A novel analytical model for the program behavior of a multitasked system is introduced. The model includes the behavior of each process and the interactions between processes with regard to the sharing of data blocks. An approximation is developed to derive the main effects of the cache coherency contributing to degradations in system performance.

133 citations


Journal ArticleDOI
01 Apr 1982
TL;DR: An in-depth analysis of the effects of cache coherency in multiprocessors is presented and a novel analytical model for the program behavior of a multitasked system is introduced.
Abstract: In many commercial multiprocessor systems, each processor accesses the memory through a private cache. One problem that could limit the extensibility of the system and its performance is the enforcement of cache coherence. A mechanism must exist which prevents the existence of several different copies of the same data block in different private caches. In this paper, we present an indepth analysis of the effect of cache coherency in multiprocessors. A novel analytical model for the program behavior of a multitasked system is introduced. The model includes the behavior of each process and the interactions between processes with regard to the sharing of data blocks. An approximation is developed to derive the main effects of the cache coherency contributing to degradations in system performance.

109 citations


Journal ArticleDOI
TL;DR: An approximate analytical model for the performance of multiprocessors with private cache memories and a single shared main memory is presented and is found to be very good over a broad range of parameters.
Abstract: This paper presents an approximate analytical model for the performance of multiprocessors with private cache memories and a single shared main memory. The accuracy of the model is compared with simulation results and is found to be very good over a broad range of parameters. The parameters of the model are the size of the multiprocessor, the size and type of the interconnection network, the cache miss-ratio, and the cache block transfer time. The analysis is extended to include several different read/write policies such as write-through, load-through, and buffered write-back. The analytical technique presented is also applicable to the performance of interconnection networks under block transfer mode.

78 citations


Patent
03 Mar 1982
TL;DR: In a hierarchical memory system, replacement of segments in a cache memory is governed by a least recently used algorithm, while trickling of segments from the cache memory to the bulk memory was governed by the age since first write.
Abstract: In a hierarchical memory system, replacement of segments in a cache memory is governed by a least recently used algorithm, while trickling of segments from the cache memory to the bulk memory is governed by the age since first write The host processor passes an AGEOLD parameter to the memory subsystem and this parameter regulates the trickling of segments Unless the memory system is idle (no I/O activity), no trickling takes place until the age of the oldest written-to segment is at least as great as AGEOLD A command is generated for each segment to be trickled and the priority of execution assigned to such commands is variable and determined by the relationship of AGEOLD to the oldest age since first write of any of the segments If the subsystem receives no command from the host processor for a predetermined interval, AGEOLD is ignored and any written-to segment becomes a candidate for trickling

68 citations


Patent
03 Mar 1982
TL;DR: In this paper, a timestamp is generated with each write command and the timestamp accompanying a write command which is the first command to write to a segment after that segment is moved from bulk memory to the cache is entered into the list at the most recently used position.
Abstract: In a data processing system including a processor, a bulk memory, a cache, and a storage control unit for controlling the transfer of data between the bulk memory and the cache, a timestamp is generated with each write command. A linked list is maintained, having an entry therein corresponding to each segment in the cache which has been written to since it was moved from the bulk memory to the cache. The timestamp accompanying a write command which is the first command to write to a segment after that segment is moved from bulk memory to the cache is entered into the list at the most recently used position. An entry in the linked list is removed from the list when the segment corresponding thereto is transferred from the cache to the bulk memory. The linked list is utilized to update a value TOLDEST, which represents the age of the oldest written-to segment in the cache that has not been returned to bulk memory since it was first written to. The processor periodically issues a command which transfers TOLDEST from the subsystem to the processor. In case of a cache failure, such as might result from a power loss, the processor may sense the latest value of TOLDEST together with other file recovery synchronization information and determine what part of the data which it sent to the cache was lost because it did not get recorded in the bulk memory.

68 citations


Patent
18 Oct 1982
TL;DR: In this paper, the authors modify cache addressing in order to decrease the cache miss rate based on a statistical observation that the lowest and highest locations in pages in main storage page frames are usually accessed at a higher frequency than intermediate locations in the pages.
Abstract: The described embodiment modifies cache addressing in order to decrease the cache miss rate based on a statistical observation that the lowest and highest locations in pages in main storage page frames are usually accessed at a higher frequency than intermediate locations in the pages. Cache class addressing controls are modified to change the distribution of cache contained data more uniformly among the congruence classes in the cache (by comparison with conventional cache class distribution). The cache addressing controls change the congruence class address as a function of the state of a higher-order bit or field in any CPU requested address.

54 citations


Patent
31 Mar 1982
TL;DR: In this paper, the directory and cache store of a multilevel set associative cache system are organized in levels of memory locations, and a round robin replacement apparatus is used to identify in which one of the multi-levels information is to be replaced.
Abstract: The directory and cache store of a multilevel set associative cache system are organized in levels of memory locations. Round robin replacement apparatus is used to identify in which one of the multilevels information is to be replaced. The directory includes parity detection apparatus for detecting errors in the addresses being written in the directory during a cache memory cycle of operation. Control apparatus combines such parity errors with signals indicative of directory hits to produce invalid hit detection signals. The control apparatus in response to the occurrence of a first invalid hit detection signal conditions the round robin apparatus as well as other portions of the cache system to limit cache operation to those sections whose levels are error free thereby gracefully degrading cache operation.

52 citations


Patent
James W. Keeley1
31 Mar 1982
TL;DR: In this paper, the circuits of a cache unit constructed from a single board are divided into a cache memory section and a controller section, and the cache unit is connectable to the central processing unit of a data processing system through the interface circuits of the controller section.
Abstract: The circuits of a cache unit constructed from a single board are divided into a cache memory section and a controller section. The cache unit is connectable to the central processing unit (CPU) of a data processing system through the interface circuits of the controller section. Test mode logic circuits included within the cache memory section enable cache memories to be tested without controller interference utilizing the same controller interface circuits.

51 citations


Patent
03 Mar 1982
TL;DR: In this article, a command queue is maintained for storing commands waiting to be executed, and each command is assigned a priority level for execution vis-a-vis other commands in the queue.
Abstract: In a system having a cache memory and a bulk memory, and wherein a command queue is maintained for storing commands waiting to be executed, each command is assigned a priority level for execution vis-a-vis other commands in the queue. Commands are generated for transferring from the cache memory to the bulk memory segments of data which have been written to while resident in the cache memory. Each generated command may be assigned a generated priority level which is dependent upon the number of segments in the cache memory which have been written to but not yet copied into the bulk memory. A second priority level may be generated which is dependent on the time which has elapsed since the first write into any of the cache memory segments, and the priority level assigned to any given generated command is the higher of the two generated priority levels.

50 citations


Patent
03 Mar 1982
TL;DR: In this article, a file status flipflop is provided with an auxiliary power supply to maintain it in its present state in the event all power to the cache memory is terminated, and the output of the status flip flop may be sampled by a command from the host processor to find out if any data was lost because written-to segments were present in cache memory at the time all power was lost.
Abstract: A data processing system has a host processor, a RAM, a cache memory for storing segments of data, a plurality of disk drive devices and a storage control unit for controlling data transfers, data from the host processor being written to the cache memory and subsequently destaged to the disks. The storage control unit continuously updates a variable indicating the number of written-to segments resident in the cache memory that have not been destaged to the disks. The variable is stored in the RAM. A File Status flipflop is responsive to the variable to produce a "good file" signal when the variable is zero. The File Status flipflop is provided with an auxiliary power supply to maintain it in its present state in the event all power to the cache memory is terminated. Upon restoration of power after a complete power loss, the output of the status flipflop may be sampled by a command from the host processor to find out if any data was lost because written-to segments were present in the cache memory at the time all power was lost. The output of the File Status flipflop is also utilized to inhibit manually actuated functions which would destroy or invalidate written-to segments in the cache memory if the power to the cache memory were turned off, the cache memory were taken off line, or the port select switches between cache memory and the storage control unit were actuated to change port connections.

48 citations


Patent
25 Jan 1982
TL;DR: In this paper, a cache clearing apparatus for a multiprocessor data processing system having a cache unit and a duplicate directory associated with each processor is described, where commands affecting information segments within the main memory are transferred by the system controller unit to each of the duplicate directories to determine if the information segment affected is stored in the cache memory of its associated cache memory.
Abstract: A cache clearing apparatus for a multiprocessor data processing system having a cache unit and a duplicate directory associated with each processor. The duplicate directory, which reflects the contents of the cache directory within its associated cache unit, and the cache directory are connected through a system controller unit. Commands affecting information segments within the main memory are transferred by the system controller unit to each of the duplicate directories to determine if the information segment affected is stored in the cache memory of its associated cache memory. If the information segment is stored therein the duplicate directory issues a clear command through the system controller to clear the information segment from the associated cache unit.

Patent
25 Mar 1982
TL;DR: In this paper, a cache memory system reduces cache interference during direct memory access block write operations to main memory by resetting all validity bits for the block in a single cache cycle.
Abstract: A cache memory system reduces cache interference during direct memory access block write operations to main memory. A control memory within cache contains in a single location validity bits for each word in a memory block. In response to the first word transferred at the beginning of a direct memory access block write operation to main memory, all validity bits for the block are reset in a single cache cycle. Cache is thereafter free to be read by the central processor during the time that the remaining words of the block are written without the need for additional cache invalidation memory cycles.

Dissertation
01 Jan 1982
TL;DR: Two cache management models are developed: the prompting model, and the explicit management model that rely on software based enhancement methods that proved to be successful in boosting main memory performance, and it is found that optimal data packing is a hard problem.
Abstract: An ideal high performance computer includes a fast processor and a multi-million byte memory of comparable speed. Since it is currently economically infeasible to have large memories with speeds matching the processor, hardware designers have included the cache. Because of its small size, and its effectiveness in eliminating the speed mismatch, the cache has become a common feature of high performance computers. Enhancing cache performance proved to be instrumental in the speed up of cache-based computers. In most cases enhancement methods could be classified as either software based, or hardware controlled. In most cases, software based improvement methods that proved to be very effective in main memory were considered to be inapplicable to the cache. A main reason has been the cache's transparency to programs, and the fast response time of main memory. This resulted in only hardware enhancement features being considered, and implemented for the cache. Developments in program optimization by the compiler were successful in improving the program's performance, and the understanding of program behavior. Coupling the information about a program's behavior with knowledge of the hardware structure became a good approach to optimization. With this premise we developed two cache management models: the prompting model, and the explicit management model. Both models rely on the underlying concepts of: prefetching, clustering (packing), and loop transformations. All three are software based enhancement methods that proved to be successful in boosting main memory performance. In analyzing these methods for possible implementation in the cache we found that optimal data packing is a hard problem. Nevertheless, we suggested various heuristic methods for effective packing. We then set forth a number of conditions for loop transformations. The aim of these transformations is to facilitate prefetching (preloading) of cache blocks during loop execution. In both models the compiler places preload requests within the program's code. These requests are serviced in parallel with program execution. Replacement decisions are determined at compile time in the explicit model, but are fully controlled by the hardware in the prompting model. In this model special tag bits are introduced to each cache block in order to facilitate replacement decisions. The handling of aggregate data elements (arrays) are also discussed in the thesis. In the explicit model a special indexing scheme is introduced for controlling array access in the cache. In addition, main memory addresses are only generated for block load requests, all other addresses are for the cache.

Patent
24 May 1982
TL;DR: In this paper, a cache memory (42) is provided for storing blocks of data which are most likely to be needed in the near future, and a conflict chain is set-up so that checking the contents of the cache memory can be done simply and quickly.
Abstract: A controller I/O (20) for transferring data between a host processor (10) and a plurality of attachment devices (16) comprises a cache memory (42) provided for storing blocks of data which are most likely to be needed in the near future. When transferring data to cache memory (42) from an attachment device (16), additional unrequested information can be transferred at the same time if it is likely that this additional data will soon be requested. Further memory (47) includes a directory table wherein all data in cache memory (42) is listed at a «home" position and, if more than one block of data in cache memory (42) have the same home position, a conflict chain is set-up so that checking the contents of the cache memory (42) can be done simply and quickly.

Patent
10 May 1982
TL;DR: In this article, a cache memory 30 having a thirteen bit word length is illustrated for storing more than one data word read from a system memory 80 having an eight bit word size and providing the stored words to a video display 18 on a reoccurring basis.
Abstract: A cache memory 30 having a thirteen bit word length is illustrated for storing more than one data word read from a system memory 80 having an eight bit word length and providing the stored words to a video display 18 on a reoccurring basis. The cache memory 30 has a storage capacity sufficient to store the words for one row of display text characters. Two system memory bytes are concatenated by a latch 74 and storage buffer 96 prior to writing into the cache memory 30. After the scanning of the first word of the last scan line in the current cathode ray tube (19) display text row, the first word of the new text row is written into the cache memory (30) from the system memory (80). The write procedure is completed before the last word of the new text row is read from the cache memory 30 to the cathode ray tube display 19 during the writing of the first scan line of the new text row.

Patent
Gerner Manfred Dipl Ing1
17 Aug 1982
TL;DR: In this article, a cache store comprising an associative store (CAM) and a write/read store (RAM) is integrated in a micropressor chip, which can be divided on the logic level into a program cache store, a micro-programme cache store and a data cache store of variable size.
Abstract: 1. A cache store comprising an associative store (CAM) and a write/read store (RAM), characterized by the following features : - the cache store is integrated in a micropressor chip, - it can be divided on the logic level into a programme cache store, a micro-programme cache store and a data cache store of variable size.

Patent
26 Nov 1982
TL;DR: In this article, a cache memory (30) is provided to receive data expected to be required by the host computer (10), and a single cache memory serves all directors (12, 14).
Abstract: A host computer (10) has a plurality of channels (16,18) for writing data to and reading data from secondary memory means (26) such as disk memories, via directors (12,14) and control modules (24) for the secondary memories. In order to speed up memory accesses, a cache memory (30) is provided to receive data expected to be required by the host computer (10). A single cache memory (30) serves all directors (12,14). Data entered into the cache memory flows from a disk memory (26), a control module (24) therefor and a director (12 or 14) to the cache memory (30). Data flows to the host computer from the cache memory (30) through a director (12 or 14) and the corresponding channel (16 or 18). A microprocessor control unit (32) determines what data should be cached and memory allocation within the cache memory (30).

Patent
Robert Percy Fletcher1
16 Nov 1982
TL;DR: In this article, the authors propose a hybrid cache storage system, where a sharing flag is provided with each line representation in the cache directory to uniquely indicate for each cache line whether it is to be handled as a store-in-cache (SIC) line when its SH flag is in non-sharing state, and as a SIC line in sharing state.
Abstract: In a cache storage system a sharing (SH) flag is provided with each line representation in the cache directory to uniquely indicate for each cache line whether it is to be handled as a store-in-cache (SIC) line when its SH flag is in non-sharing state, and as a store-through (ST) cache line when its SH flag is in sharing state. At any time the hybrid cache can have some lines operating as ST lines, and other lines as SIC lines. Such cache storage systems are used as private caches in a multiprocessor (MP) system. A newly fetched line (resulting from a cache miss) has its SH flag set to non-sharing (SIC) state in its location determined by cache replacement selection circuits, unless the SH flag for the requested line is dynamically set to sharing (ST) state if a cross-interrogation (XI) hit in another cache is found by cross-interrogation (XI) controls, which interrogates all other cache directories in the MP for every store or fetch cache miss and for every store cache hit of a ST line (having SH=1). A XI hit signals that a conflicting copy of the line has been found in another cache. In the conflicting cache line is changed from its corresponding main storage (MS) line, the cache line is castout to MS. The sharing (SH) flag for the conflicting line is set to sharing state for a fetch miss, but the conflicting line is invalidated for a store miss.

Proceedings ArticleDOI
01 Jan 1982
TL;DR: In this paper, the authors present a unified nomenclature for the description of cache memory systems and compare the performance of different cache memory architectures, including a programmable cache memory architecture, where intelligence is added to the cache to direct the activity between the cache and the main memory.
Abstract: The tutorial presents a unified nomenclature for the description of cache memory systems. Using this foundation, examples of existing cache memory systems are detailed and compared.The second presentation discusses a programmable cache memory architecture. In this architecture, intelligence is added to the cache to direct the activity between the cache and the main memory. Also to be described are heuristics for programming the cache which allow the additional power to be exploited.The third presentation deals with innovations involving systems where the cache memory is not used as a simple high speed buffer for main memory. A straight forward example of this appears in IBM's Translation Lookaside Buffer on 370s with dynamic address translation hardware. Other examples are to be described include a cache system for the activation stack of a block structured language, a cache system to store subexpressions for an expression oriented architecture, and a multiprocessor architecture that relies on two levels of cache.

Patent
Dana R Spencer1
28 May 1982
TL;DR: In this paper, the cache allocates a line (i.e. block) for LS use by the instruction unit (IE) sending a special signal with an address for a line in a special area in main storage which is non-program addressable.
Abstract: The processor contains a relatively small local storage (LS 12) which can be effectively expanded by utilizing a portion of a processor's store-in-cache (63). The cache allocates a line (i.e. block) for LS use by the instruction unit (IE) sending a special signal with an address for a line in a special area in main storage which is non-program addressable (i.e. not addressable by any of the architected instructions of the processor). The special signal suppresses the normal line fetch operation of the cache from main storage caused when the cache does not have a requested line. After the initial allocation of the line space in the cache to LS use, the normal cache operation is again enabled, and the LS line can be castout to the special area in main storage and be retrieved therefrom to the cache for LS use.

Patent
26 Nov 1982
TL;DR: In this paper, a host computer (10) is backed up by long term secondary magnetic disk storage means (14) coupled to the computer by channels (12), a storage director (16), and a control module (18).
Abstract: A host computer (10) is backed up by long term secondary magnetic disk storage means (14) coupled to the computer by channels (12), a storage director (16) and a control module (18). A cache memory (22) with an associated cache manager (24) are also connected to the storage director (16) for storing data which the host computer (10) is likely to require. In order to allow automatic transfer to the cache memory (22) of only that data which is likely to be required, the storage director (16) and cache manager (24) determine when accessed data from the disk storage means (14) appears to be part of sequential data because it lacks indications to the contrary, such as embedded SEEK instructions. When data lacks such counter indications, automatic transfers to the cache memory (22) occur a track at a time.

Patent
26 Oct 1982
TL;DR: In this paper, a multiprocessing three level memory hierarchy implementation is described which uses a "write" flag and a "share" flag per pages of information stored in a level three main memory.
Abstract: A multiprocessing three level memory hierarchy implementation is described which uses a "write" flag and a "share" flag per pages of information stored in a level three main memory. These two flag bits are utilized to communicate from main memory (4) at level three to private and shared caches (12, 27; 20, 30; 14; 22) at memory levels one and two how a given page of information is to be used. Essentially, pages which can be both written and shared are moved from main memory to the shared level two cache and then to the shared level one cache, with the processors executing from the shared level one cache. All other pages are moved from main memory to the private level two and level one caches of the requesting processor. Thus, a processor executes either from its private or shared level one cache. This allows several processors to share a level three common main memory without encountering cross interrogation overhead. If uniform status within a page cannot be guaranteed at the main memory interface, the shared cache configuration does not interface with main memory but, in parallel, with the private caches at an appropriate intermediate level.

Patent
14 Jul 1982
TL;DR: In this paper, the authors propose an efficient promotion of data from a backing store (disk storage apparatus 16-18 termed DASD) to a random access cache 40 in a storage system such as used for swap and paging data transfers.
Abstract: The efficient promotion of data from a backing store (disk storage apparatus 16-18 termed DASD) to a random access cache 40 in a storage system such as used for swap and paging data transfers. When a sequential access indicator (SEQ in 22) is sent to and retained in the storage system, all data specified in a subsequent read «paging mode» command is fetched to the cache from DASD. If such prefetched data is replaced from cache and the sequential bit is on, a subsequent host access request for such data causes all related data not yet read to be promoted to cache. A maximal amount only of related data may be promoted; such maximal amount is determined by cache addressing characteristics and DASD access delay boundaries. Without the sequential bit on, only the addressed data block is promoted to cache.

Patent
03 Aug 1982
TL;DR: In this article, the n-bit portion of a desired address from an associated CPU selects a location in directory 202, and the m-bit address portions in the 4 levels I to IV of that location are compared with the desired address on a match, the corresponding level of the corresponding location of the data store is accessed to access the desired word.
Abstract: A cache memory comprises a directory 202 and a data store 201 The n-bit portion of a desired address from an associated CPU selects a location in directory 202, and the m-bit address portions in the 4 levels I to IV of that location are compared at 203 with the m-bit portion of the desired address On a match, the corresponding level of the corresponding location of the data store 201 is accessed to access the desired word The cache words should mirror the contents of the main memory, but the latter may be changed by eg another CPU or an IOC, and the resulting invalid addresses must be cleared from the cache memory This is done by searching the directory 202 for an invalid address during the second half of a cache cycle, after the directory has been searched to determine whether the desired word is in the cache and while that desired word is being accessed in the cache store 201 If an invalid address is found, the second half of the next cache cycle is used to clear it from the cache, by resetting the full/empty indicator in the directory control portion C for that level and that location

Patent
25 Aug 1982
Abstract: A storage hierarchy has a backing store (14) and a cache (15). During a series of accesses to the hierarchy by a user (10) write commands are monitored and analysed. Writing data to the hierarchy results in data being selectively removed from the cache. Space in the cache not being allocated to data being written, results in such data being written to the backing store to the exclusion of the cache. Writing as part of a chain or sequential set of commands causes further removal of the data from the cache at the end of the chain or sequence. Removal of data increases the probability of writing data directly to the backing store with data readings from the cache.