scispace - formally typeset
Search or ask a question

Showing papers on "Cache invalidation published in 1986"


Journal ArticleDOI
TL;DR: The magnitude of the potential performance difference between the various approaches indicates that the choice of coherence solution is very important in the design of an efficient shared-bus multiprocessor, since it may limit the number of processors in the system.
Abstract: Using simulation, we examine the efficiency of several distributed, hardware-based solutions to the cache coherence problem in shared-bus multiprocessors. For each of the approaches, the associated protocol is outlined. The simulation model is described, and results from that model are presented. The magnitude of the potential performance difference between the various approaches indicates that the choice of coherence solution is very important in the design of an efficient shared-bus multiprocessor, since it may limit the number of processors in the system.

671 citations


Proceedings ArticleDOI
27 Oct 1986
TL;DR: This work presents new on-line algorithms which decide, for each cache, which blocks to retain and which to drop in order to minimize communication over the bus in a snoopy cache multiprocessor system.
Abstract: In a snoopy cache multiprocessor system, each processor has a cache in which it stores blocks of data. Each cache is connected to a bus used to communicate with the other caches and with main memory. For several of the proposed models of snoopy caching, we present new on-line algorithms which decide, for each cache, which blocks to retain and which to drop in order to minimize communication over the bus. We prove that, for any sequence of operations, our algorithms' communication costs are within a constant factor of the minimum required for that sequence; for some of our algorithms we prove that no on-line algorithm has this property with a smaller constant.

268 citations


Patent
James Gerald Brenza1
01 May 1986
TL;DR: In this paper, a common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches, each of which is either a real/absolute address or a virtual address according to whichever address mode the CPU is in.
Abstract: A data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. L1 and L2) are private (not shared) to a single CPU, in order to be in close proximity to each other and to the CPU. Each cache has a data line length convenient to the respective cache. A common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches. The common directory contains and is addressed by the CPU requesting logical addresses, each of which is either a real/absolute address or a virtual address, according to whichever address mode the CPU is in. Each entry in the directory contains a logical address representation derived from a logical address that previously missed in the directory. A CPU request "hits" in the directory if its requested address is in any private cache (e.g. in L1 or L2). A line presence field (LPF) is included in each directory entry to aid in determining a hit in the L1 cache. The L1CA contains L1 cache information to supplement the corresponding common directory entry; the L1CA is used during a L1 LRU castout, but is not the critical path of an L1 or L2 hit. A translation lookaside buffer (TLB) is not used to determine cache hits. The TLB output is used only during the infrequent times that a CPU request misses in the cache directory, and the translated address (i.e. absolute address) is then used to access the data in a synonym location in the same cache, or in main storage, or in the L1 or L2 cache in another CPU in a multiprocessor system using synonym/cross-interrogate directories.

134 citations


Proceedings ArticleDOI
01 May 1986
TL;DR: An analytical model for a cache-reload transient is developed and it is shown that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache.
Abstract: This paper develops an analytical model for a cache-reload transient. When an interrupt program or system program runs periodically in a cache-based computer, a short cache-reload transient occurs each time the interrupt program is invoked. That transient depends on the size of the cache, the fraction of the cache used by the interrupt program, and the fraction of the cache used by background programs that run between interrupts. We call the portion of a cache used by a program its footprint in the cache, and we show that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache. We believe that the model may be useful as well for predicting paging behavior in virtual-memory systems with round-robin scheduling.

112 citations


Patent
12 Nov 1986
TL;DR: In this paper, the authors propose a system for maintaining data consistency among distributed processors, each having its associated cache memory, where a processor addresses data in its cache by specifying the virtual address.
Abstract: A system for maintaining data consistency among distributed processors, each having its associated cache memory. A processor addresses data in its cache by specifying the virtual address. The cache will search its cells for the data associatively. Each cell has a virtual address, a real address, flags and a plurality of associated data words. If there is no hit on the virtual address supplied by the processor, a map processor supplies the equivalent real address which the cache uses to access the data from another cache if one has it, or else from real memory. When a processor writes into a data word in the cache, the cache will update all other caches that share the data before allowing the write to the local cache.

106 citations


Journal ArticleDOI
01 May 1986
TL;DR: This paper shows how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost, and describes simple solutions to the consistency problems associated with virtually addressed caches.
Abstract: VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory. Each processor has a synchronous, virtually addressed, single master connection to its cache, providing very high memory bandwidth. An unusually large cache page size and fast sequential memory copy hardware make it feasible for cache misses to be handled in software, analogously to the handling of virtual memory page faults. Hardware support for cache consistency is limited to a simple state machine that monitors the bus and interrupts the processor when a cache consistency action is required.In this paper, we show how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost. We also describe simple solutions to the consistency problems associated with virtually addressed caches. Simulation results indicate that the design achieves good performance providing data contention is not excessive.

99 citations


Patent
25 Jul 1986
TL;DR: In this article, a method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data is presented, where a bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved.
Abstract: A method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data. A bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved. The bit is passed to the cache together with the associated real address after address translation (in the case of a real cache). when the cache controls load the address of the data in the directory it is also stored the marked data bit (MDB) in the directory with the address. When the cacheability of the temporarily cacheable data changes from cacheable to non-cacheable, a single instruction is issued to cause the cache to invalidate all marked data. When an "invalidate marked data" instruction is received, the cache controls sweep through the entire cache directory and invalidate any cache line which has the "marked data bit" set in a single pass. An extension of the invention involves using a multi-bit field rather than a single bit to provide a more versatile control of the temporary cacheability of data.

89 citations


Patent
Lishing Liu1
12 Sep 1986
TL;DR: In this article, a method and apparatus for associating in cache directories the Control Domain Identifications (CDIDs) of software covered by each cache line is provided, through the use of such provision and/or the addition of Identifications of users actively using lines, cache coherence of certain data is controlled without performing conventional Cross-Interrogates (XIs), if the accesses to such objects are properly synchronized with locking type concurrency controls.
Abstract: A method and apparatus is provided for associating in cache directories the Control Domain Identifications (CDIDs) of software covered by each cache line. Through the use of such provision and/or the addition of Identifications of users actively using lines, cache coherence of certain data is controlled without performing conventional Cross-Interrogates (XIs), if the accesses to such objects are properly synchronized with locking type concurrency controls. Software protocols to caches are provided for the resource kernel to control the flushing of released cache lines. The parameters of these protocols are high level Domain Identifications and Task Identifications.

86 citations


Patent
03 Oct 1986
TL;DR: In this paper, a cache memory system with multiple-word boundary registers, multipleword line registers, and a multipleword boundary detector system is presented, where the cache memory stores four words per addressable line of cache storage.
Abstract: In a cache memory system, multiple-word boundary registers, multiple-word line registers, and a multiple-word boundary detector system provide accelerated access of data contained within the cache memory within the multiple-word boundaries, and provides for effective prefetch of sequentially ascending locations of stored data from the cache memory. In an illustrated embodiment, the cache memory stores four words per addressable line of cache storage, and accordingly quad-word boundary registers determine boundary limits on quad-words, quad-word line registers store, in parallel, a selected line from the cache memory, and a quad-word boundary detector system determines when to prefetch the next set of quad-words from the cache memory for storage in the quad-word line registers.

82 citations


Patent
16 Oct 1986
TL;DR: In this article, a pipelined digital computer processor system is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions.
Abstract: A pipelined digital computer processor system (10, FIG. 1) is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions. The IPU (2) has associated with it a high speed instruction cache (6), and the ALPU (4) has associated with it a high speed operand cache (8). Each cache comprises a data store (84, 94, FIG. 3) for storing frequently accessed data, and a tag store (82, 92, FIG. 3) for indicating which main memory locations are contained in the respective cache. The IPU and ALPU processing units (2, 4) may access their associated caches independently under most conditions. When the ALPU performs a write operation to main memory, it also updates the corresponding data in the operand cache and, if contained therein, in the instruction cache permitting the use of self-modifying code. The IPU does not write to either cache. Provision is made for clearing the caches on certain conditions when their contents become invalid.

76 citations



Patent
03 Oct 1986
TL;DR: In this article, a very high speed instruction and data interface circuitry for coupling via respective separate very high-speed instruction and interface buses to respective external instruction cache and data cache circuitry is described.
Abstract: A microprocessor architecture is disclosed having separate very high speed instruction and data interface circuitry for coupling via respective separate very high speed instruction and data interface buses to respective external instruction cache and data cache circuitry. The microprocessor is comprised of an instruction interface, a data interface, and an execution unit. The instruction interface controls communications with the external instruction cache and couples the instructions from the instruction cache to the microprocessor at very high speed. The data interface controls communications with the external data cache and communicates data bidirectionally at very high speed between the data cache and the microprocessor. The execution unit selectively processes the data received via the data interface from the data cache responsive to the execution unit decoding and executing a respective one of the instructions received via the instruction interface from the instruction cache. In one embodiment, the external instruction cache is comprised of a program counter and addressable memory for outputting stored instructions responsive to its program counter and to an instruction cache advance signal output from the instruction interface. Circuitry in the instruction interface selectively outputs an initial instruction address for storage in the instruction cache program counter responsive to a context switch or branch, such that the instruction interface repetitively couples a plurality of instructions from the instruction cache to the microprocessor responsive to the cache advance signal, independent of and without the need for any intermediate or further address output from the instruction interface to the instruction cache except upon the occurrence of another context switch or branch.

Patent
06 Aug 1986
TL;DR: Cache memory includes a dual or two-part cache with one part of the cache being primarily designated for instruction data while the other part is designated for operand data, but not exclusively.
Abstract: Cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return.

Patent
02 Jan 1986
TL;DR: In this article, a lock warning mechanism is proposed to warn the paged memory management unit that the translator cache is in danger of becoming full of locked translators, if the last translator is removed from the cache.
Abstract: In a data processing system, a paged memory management unit (PMMU) translates logical addresses provided by a processor to physical addresses in a memory using translators constructed from page descriptors comprising, in part, translation tables stored in the memory. The PMMU maintains a set of recently used translators in a translator cache. In response to a particular lock value contained in a lock field of the page descriptor for a particular page, the PMMU sets a lock indicator in the translator cache associated with the corresponding translator, to preclude replacement of this translator in the translator cache. A lock warning mechanism provides a lock warning signal whenever all but a predetermined number of the translators in the cache are locked. In response, the PMMU can warn the processor that the translator cache is in danger of becoming full of locked translators. Preferably, the PMMU is also inhibited from locking the last translator in the cache.

Book ChapterDOI
01 Jun 1986
TL;DR: This paper describes a cache coherence protocol for an architecture composed of several processors, each with their own local cache, connected via a switching structure to a shared memory itself split into several modules managed by independent controllers.
Abstract: This paper describes a cache coherence protocol for an architecture composed of several processors, each with their own local cache, connected via a switching structure to a shared memory itself split into several modules managed by independent controllers. The protocol prevents processors from simultaneously modifying their respective copies and always provides a processor requiring a copy of a memory location with the most up-to-date version. A top down description and modeling of the protocol is given using Predicate/Transition nets. This modeling allows to formally describe the complex synchronizations of this protocol. Then invariants are directly obtained without unfolding the Predicate/Transition net. They are the basis for studying behavioral properties.

Proceedings Article
01 Jun 1986
TL;DR: The design and implementation issues associated with realizing an instruction cache for a machine that uses an "extended-" version of the delayed branch instruction are discussed and timing results are presented which indicate the performance of critica-i circuits.
Abstract: _ In this paper, we present the design ofan instruction cache for a machine that uses an "extended-" version of the delayed branch instruction. The extended delayed branch, which we iall the prepa,re to branch, or PBR instruction, permits the unconditional execution of between 0 and 7 instruition parcels after the branch instruction. The instruction cache is designed to fit o-l_t!e. same chip with the processor and takes advanlage of the PBR instruction to minimiZe the effective latencv associated with memory references and the filling of the instruction register. Tlis paper discusses the design and implementation issues associated with realizing such an instruction cache. We pres_ent-critical aspects of the design and the philosophy used to guidg the developm-ent 9{ the design. Finally, somir timing results are presented which indicate-the performance of critica-i circuits.

Patent
18 Dec 1986
TL;DR: In this article, a raster scan video controller is provided with an update cache for selective updating of a display memory under control of an updating device, where each display memory connection is paired to a corresponding connection to the updating device.
Abstract: A raster scan video controller is provided with an update cache for selective updating of a display memory under control of an updating device. The display has a larger bit width than the updating device. The update cache is full-width connected to the display memory. Each display memory connection is paired to a corresponding connection to the updating device. The updating device has random-access facility to the update cache for accessing only a selective part thereof for receiving, transmitting and latching data with respect to the update cache. Corresponding functions are available with respect to the memory.

Patent
25 Jun 1986
TL;DR: In this paper, the authors propose to improve the hit rate at the time of sharing by connecting track slots which contain head data by a pointer and leaving them on a cache memory by LRU processing.
Abstract: PURPOSE: To improve the hit rate at the time of sharing by connecting track slots which contain head data by a pointer and leaving them on a cache memory by LRU processing. CONSTITUTION: A CPU 1 passes the addresses (extent information) of the head record and tail record on a disk and requests the setting of a cache access mode wherein processing is performed within this range on a cache access basis in sequential access mode. Then, the CPU 1 passes record addresses to a storage director 31 and sends a request to input corresponding records on the disk to the storage director 31. The storage director 31 searches an extent information control table 332 in a directory memory on the basis of the received record addresses to find which cache access mode a record which is referred to corresponds to. COPYRIGHT: (C)1988,JPO&Japio

Journal ArticleDOI
TL;DR: There is a significant improvement in the instruction execution rate due to the increase in bandwidth and decrease in access time and performance is further improved by using c access notion in the cache interleaving.

01 Jan 1986
TL;DR: The style of use and performance improvement of caching in an existing file system is measured, and the protocol and interface architecture of the Caching Ring is developed, a combination of an intelligent network interface and an efficient network protocol that allows caching of all types file blocks at the client machines.
Abstract: Caching has long been recognized as a powerful performance enhancement technique in many areas of computer design. Most modern computer systems include a hardware cache between the processor and main memory, and many operating systems include a software cache between the file system routines and the disk hardware. In a distributed file system, where the file systems of several client machines are separated from the server backing store by a communications network, it is desirable to have a cache of recently used file blocks at the client, to avoid some of the communications overhead. In this configuration, special care must be taken to maintain consistency between the client caches, as some disk blocks may be in use by more than one client. For this reason, most current distributed file systems do not provide a cache at the client machine. Those systems that do place restrictions on the types of file blocks that may be shared, or require extra communication to confirm a cached block is still valid each time the block is to be used. The Caching Ring is a combination of an intelligent network interface and an efficient network protocol that allows caching of all types file blocks at the client machines. Blocks held in a client cache are guaranteed to be valid copies. We measure the style of use and performance improvement of caching in an existing file system, and develop the protocol and interface architecture of the Caching Ring. Using simulation, we study the performance of the Caching Ring and compare it to similar schemes using conventional network hardware.

Patent
Masatoshi Kofuji1
03 Feb 1986
TL;DR: In this article, a cache memory circuit is responsive to a read request to fetch a data block by block transfer from a main memory to cache memory when the data block is not stored in the cache memory.
Abstract: In a cache memory circuit responsive to a read request to fetch a data block by block transfer from a main memory to a cache memory when the data block is not stored in the cache memory, a sequence of data units into which the data block is divided is successively assigned to a plurality of cache write registers one by one. The assigned data units are simultaneously moved to one of sub-blocks of the cache memory during each of write-in durations with an idle interval left between two adjacent ones of the write-in durations. Each state of the sub-blocks is monitored in a controller. During the idle interval, a following read request can be processed with reference to the states of the sub-blocks even when it requests the data block being transferred. In addition, a read address for the following read request may be preserved in a saving address register to process another read request (Figure 1).

Patent
21 Feb 1986
TL;DR: In this paper, a very high speed instruction and data interface circutry is proposed for coupling via a separate instruction and interface buses to the respective external instruction cache and data cache.
Abstract: A microprocessor architecture is disclosed having separate very high speed instruction and data interface circutry for coupling via respective separate very high speed instruction and data interface buses to respective external instruction cache and data cache circuitry. The microprocessor is comprised of an instruction interface, a data interface, and an execution unit. The instruction interface controls communications with the external instruction cache and couples the instructions from the instruction cache to the microprocessor at very high speed. The data interface controls communications with the external data cache and communicates data bi-directionally at very high speed between the data cache and the microprocessor. The execution unit selectively processes the data received via the data interface from the data cache responsive to the execution unit decoding and executing a respective one of the instructions received via the instruction interface from the instruction cache. In one embodiment, the external instruction cache is comprised of a program counter and addressable memory for outputting stored instructions responsive to its program counter and to an instruction cache advance signal output from the instruction interface. An address generator in the instruction interface selectively outputs an initial instruction address for storage in the instruction cache program counter resonsive to a context switch or branch, such that the instruction interface repetitively couples a plurality of instructions from the instruction cache to the microprocessor responsive to the cache advance signal, independent of and without the need for any intermediate or further address output from the instruction interface to the instruction cache except upon the occurrence of another context switch or branch.

01 Jan 1986
TL;DR: This document summarizes current capabilities, research and operational priorities, and plans for further studies that were established at the 2015 USGS workshop on quantitative hazard assessments of earthquake-triggered landsliding and liquefaction at the 2013 USGS Deepwater Horizon disaster.
Abstract: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. xi

Patent
28 May 1986
TL;DR: In this paper, a magnetic disk controller incorporating a cache system having a cache memory is presented, which comprises cache operation designation data producing means ( 45) or uses a data transfer direction control code in cache operation designated data representing five modes of operations.
Abstract: The present invention provides a magnetic disk controller incorporating a cache system having a cache memory. In order to improve the operation of the cache memory the magnetic disk controller comprises cache operation designation data producing means ( 45) or uses a data transfer direction control code in cache operation designation data representing five modes of operations.