Showing papers on "Cache published in 1986"

PDF

Open Access

Journal Article•DOI•

Memory access buffering in multiprocessors

[...]

01 May 1986-ACM Sigarch Computer Architecture News

TL;DR: In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache as discussed by the authors, which is done to reduce the average memory access latency and to take advantage o...

...read moreread less

398 citations

Proceedings Article•DOI•

Competitive snoopy caching

[...]

Anna R. Karlin¹, Mark S. Manasse, Larry Rudolph², Daniel D. Sleator³•Institutions (3)

Stanford University¹, Hebrew University of Jerusalem², Carnegie Mellon University³

27 Oct 1986

TL;DR: This work presents new on-line algorithms which decide, for each cache, which blocks to retain and which to drop in order to minimize communication over the bus in a snoopy cache multiprocessor system.

...read moreread less

Abstract: In a snoopy cache multiprocessor system, each processor has a cache in which it stores blocks of data. Each cache is connected to a bus used to communicate with the other caches and with main memory. For several of the proposed models of snoopy caching, we present new on-line algorithms which decide, for each cache, which blocks to retain and which to drop in order to minimize communication over the bus. We prove that, for any sequence of operations, our algorithms' communication costs are within a constant factor of the minimum required for that sequence; for some of our algorithms we prove that no on-line algorithm has this property with a smaller constant.

...read moreread less

268 citations

Patent•

Cache memory consistency control with explicit software instructions

[...]

William S. Worley¹, William R. Bryg¹, Allen J. Baum¹•Institutions (1)

Hewlett-Packard¹

06 Jun 1986

TL;DR: Memory integrity is maintained in a system with a hierarchical memory using a set of explicit cache control instructions as mentioned in this paper, with two status flags, a valid bit and a dirty bit, with each block of information stored.

...read moreread less

Abstract: Memory integrity is maintained in a system with a hierarchical memory using a set of explicit cache control instructions. The caches in the system have two status flags, a valid bit and a dirty bit, with each block of information stored. The operating system executes selected cache control instructions to ensure memory integrity whenever there is a possibility that integrity could be compromised.

...read moreread less

134 citations

Patent•

Variable address mode cache

[...]

James Gerald Brenza¹•Institutions (1)

IBM¹

01 May 1986

TL;DR: In this paper, a common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches, each of which is either a real/absolute address or a virtual address according to whichever address mode the CPU is in.

...read moreread less

Abstract: A data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. L1 and L2) are private (not shared) to a single CPU, in order to be in close proximity to each other and to the CPU. Each cache has a data line length convenient to the respective cache. A common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches. The common directory contains and is addressed by the CPU requesting logical addresses, each of which is either a real/absolute address or a virtual address, according to whichever address mode the CPU is in. Each entry in the directory contains a logical address representation derived from a logical address that previously missed in the directory. A CPU request "hits" in the directory if its requested address is in any private cache (e.g. in L1 or L2). A line presence field (LPF) is included in each directory entry to aid in determining a hit in the L1 cache. The L1CA contains L1 cache information to supplement the corresponding common directory entry; the L1CA is used during a L1 LRU castout, but is not the critical path of an L1 or L2 hit. A translation lookaside buffer (TLB) is not used to determine cache hits. The TLB output is used only during the infrequent times that a CPU request misses in the cache directory, and the translated address (i.e. absolute address) is then used to access the data in a synonym location in the same cache, or in main storage, or in the L1 or L2 cache in another CPU in a multiprocessor system using synonym/cross-interrogate directories.

...read moreread less

134 citations

Patent•

Digital computer with cache capable of concurrently handling multiple accesses from parallel processors

[...]

Blau Jonathan Seth, Robert L. Fredieu, Michael L. Ziegler

17 Jun 1986

TL;DR: A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel is discussed in this paper. But the work in this paper is restricted to the case of a single processor.

...read moreread less

Abstract: A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel. Current accesses to the cache are handled by current-access-completion circuitry which determines whether the current access is capable of immediate completion and either completes the access immediately if so capable or transfers the access to pending-access-completion circuitry if not so capable. The latter circuitry works on completion of pending accesses; it determines and stores for each pending access status information prescribing the steps required to complete the access and redetermines that status information as conditions change. In working on completion of current and pending accesses, the addresses of the accesses are compared to those of memory accesses in progress on the system.

...read moreread less

120 citations

Journal Article•DOI•

An in-cache address translation mechanism

[...]

Darien Wood¹, Susan J. Eggers¹, Garth A. Gibson¹, Mark D. Hill¹, J. M. Pendleton¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

01 May 1986

TL;DR: In the design of SPUR, a high-performance multiprocessor workstation, the use of large caches and hardware-supported cache consistency suggests a new approach to virtual address translation, which substantially reduces the hardware cost and complexity of the translation mechanism and eliminates the translation consistency problem.

...read moreread less

Abstract: In the design of SPUR, a high-performance multiprocessor workstation, the use of large caches and hardware-supported cache consistency suggests a new approach to virtual address translation. By performing translation in each processor's virtually-tagged cache, the need for separate translation lookaside buffers (TLBs) is eliminated. Eliminating the TLB substantially reduces the hardware cost and complexity of the translation mechanism and eliminates the translation consistency problem. Trace-driven simulations show that normal cache behavior is only minimally affected by caching page table entries, and that in many cases, using a separate device would actually reduce system performance.

...read moreread less

115 citations

Proceedings Article•DOI•

Footprints in the cache

[...]

Harold S. Stone¹, Dominique Thibaut²•Institutions (2)

IBM¹, University of Massachusetts Amherst²

01 May 1986

TL;DR: An analytical model for a cache-reload transient is developed and it is shown that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache.

...read moreread less

Abstract: This paper develops an analytical model for a cache-reload transient. When an interrupt program or system program runs periodically in a cache-based computer, a short cache-reload transient occurs each time the interrupt program is invoked. That transient depends on the size of the cache, the fraction of the cache used by the interrupt program, and the fraction of the cache used by background programs that run between interrupts. We call the portion of a cache used by a program its footprint in the cache, and we show that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache. We believe that the model may be useful as well for predicting paging behavior in virtual-memory systems with round-robin scheduling.

...read moreread less

112 citations

Patent•

Virtual memory cache for use in multi-processing systems

[...]

Stephen R. Dashiell¹, Bindiganavele A. Prasad¹, Ronald E. Rider¹, James A. Steffen¹•Institutions (1)

Xerox¹

12 Nov 1986

TL;DR: In this paper, the authors propose a system for maintaining data consistency among distributed processors, each having its associated cache memory, where a processor addresses data in its cache by specifying the virtual address.

...read moreread less

Abstract: A system for maintaining data consistency among distributed processors, each having its associated cache memory. A processor addresses data in its cache by specifying the virtual address. The cache will search its cells for the data associatively. Each cell has a virtual address, a real address, flags and a plurality of associated data words. If there is no hit on the virtual address supplied by the processor, a map processor supplies the equivalent real address which the cache uses to access the data from another cache if one has it, or else from real memory. When a processor writes into a data word in the cache, the cache will update all other caches that share the data before allowing the write to the local cache.

...read moreread less

106 citations

Journal Article•DOI•

Software-controlled caches in the VMP multiprocessor

[...]

David R. Cheriton¹, Gert A. Slavenburg², P. D. Boyle¹•Institutions (2)

Stanford University¹, Philips²

01 May 1986

TL;DR: This paper shows how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost, and describes simple solutions to the consistency problems associated with virtually addressed caches.

...read moreread less

Abstract: VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory. Each processor has a synchronous, virtually addressed, single master connection to its cache, providing very high memory bandwidth. An unusually large cache page size and fast sequential memory copy hardware make it feasible for cache misses to be handled in software, analogously to the handling of virtual memory page faults. Hardware support for cache consistency is limited to a simple state machine that monitors the bus and interrupts the processor when a cache consistency action is required.In this paper, we show how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost. We also describe simple solutions to the consistency problems associated with virtually addressed caches. Simulation results indicate that the design achieves good performance providing data contention is not excessive.

...read moreread less

99 citations

Patent•

Memory access system

[...]

Edmund J. Kelly¹•Institutions (1)

Sun Microsystems¹

24 Jul 1986

TL;DR: In this article, the row and column addresses are used to access data stored in a dynamic random access memory (DRAM), where the high order bits represent a virtual row address and the low order bits represented a real column address.

...read moreread less

Abstract: A memory architecture having particular application for use in computer systems employing virtual memory techniques. A processor provides row and column addresses to access data stored in a dynamic random access memory (DRAM). The virtual address supplied by the processor includes high and low order bits. In the present embodiment, the high order bits represent a virtual row address and the low order bits represent a real column address. The virtual row address is applied to a memory management unit (MMU) for translation into a real row address. The real column address need not be translated. A comparator compares the current virtual row address to the previous row address stored in a latch. If the current row and previous row addresses match, a cycle control circuit couples the real column address to the DRAM, and applies a strobe signal such that the desired data is accessed in the memory without the need to reapply the row address. If the row addresses do not match, the cycle control circuit initiates a complete memory fetch cycle and applies both row and column addresses to the DRAM, along with the respective strobe signals. By properly organizing data in the memory, the probability that sequential memory operations access the same row in the DRAM may be significantly increased. By using such an organization, the present invention provides data retrieval at speeds on the order of a cache based memory system for a subset of data stored.

...read moreread less

94 citations

Patent•

Method and apparatus for efficiently handling temporarily cacheable data

[...]

John H. Anthony¹, William C. Brantley¹, Kevin P. McAuliffe¹, Vern Alan Norton¹, Gregory Francis Pfister¹ - Show less +1 more•Institutions (1)

IBM¹

25 Jul 1986

TL;DR: In this article, a method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data is presented, where a bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved.

...read moreread less

Abstract: A method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data. A bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved. The bit is passed to the cache together with the associated real address after address translation (in the case of a real cache). when the cache controls load the address of the data in the directory it is also stored the marked data bit (MDB) in the directory with the address. When the cacheability of the temporarily cacheable data changes from cacheable to non-cacheable, a single instruction is issued to cause the cache to invalidate all marked data. When an "invalidate marked data" instruction is received, the cache controls sweep through the entire cache directory and invalidate any cache line which has the "marked data bit" set in a single pass. An extension of the invention involves using a multi-bit field rather than a single bit to provide a more versatile control of the temporary cacheability of data.

...read moreread less

Patent•

Method and apparatus for facilitating instruction processing of a digital computer

[...]

Ruby B. Lee¹, Allen J. Baum¹, Russell Kao¹•Institutions (1)

Hewlett-Packard¹

27 Mar 1986

TL;DR: In this paper, the authors propose to use a cache memory and a main memory with a transformation unit between the main memory and the cache memory so that at least a portion of an information unit retrieved from the cache may be transformed during retrieval of the information (fetch) from a mainmemory and prior to storage in the cache (cache).

...read moreread less

Abstract: A computer having a cache memory and a main memory is provided with a transformation unit between the main memory and the cache memory so that at least a portion of an information unit retrieved from the main memory may be transformed during retrieval of the information (fetch) from a main memory and prior to storage in the cache memory (cache). In a specific embodiment, an instruction may be predecoded prior to storage in the cache memory. In another embodiment involving a branch instruction, the address of the target of the branch is calculated prior to storing in the instruction cache. The invention has advantages where a particular instruction is repetitively executed since a needed decode operation which has been partially performed previously need not be repeated with each execution of an instruction. Consequently, the latency time of each machine cycle may be reduced, and the overall efficiency of the computing system can be improved. If the architecture defines delayed branch instructions, such branch instructions may be executed in effectively zero machine cycles. This requires a wider bus and an additional register in the processor to allow the fetching of two instructions from the cache memory in the same cycle.

...read moreread less

Patent•

A cache coherence mechanism based on locking

[...]

Lishing Liu¹•Institutions (1)

IBM¹

12 Sep 1986

TL;DR: In this article, a method and apparatus for associating in cache directories the Control Domain Identifications (CDIDs) of software covered by each cache line is provided, through the use of such provision and/or the addition of Identifications of users actively using lines, cache coherence of certain data is controlled without performing conventional Cross-Interrogates (XIs), if the accesses to such objects are properly synchronized with locking type concurrency controls.

...read moreread less

Abstract: A method and apparatus is provided for associating in cache directories the Control Domain Identifications (CDIDs) of software covered by each cache line. Through the use of such provision and/or the addition of Identifications of users actively using lines, cache coherence of certain data is controlled without performing conventional Cross-Interrogates (XIs), if the accesses to such objects are properly synchronized with locking type concurrency controls. Software protocols to caches are provided for the resource kernel to control the flushing of released cache lines. The parameters of these protocols are high level Domain Identifications and Task Identifications.

...read moreread less

Patent•

Quadword boundary cache system

[...]

Howard G. Sachs¹, James Y. Cho¹, Walter H. Hollingsworth¹•Institutions (1)

Intergraph¹

03 Oct 1986

TL;DR: In this paper, a cache memory system with multiple-word boundary registers, multipleword line registers, and a multipleword boundary detector system is presented, where the cache memory stores four words per addressable line of cache storage.

...read moreread less

Abstract: In a cache memory system, multiple-word boundary registers, multiple-word line registers, and a multiple-word boundary detector system provide accelerated access of data contained within the cache memory within the multiple-word boundaries, and provides for effective prefetch of sequentially ascending locations of stored data from the cache memory. In an illustrated embodiment, the cache memory stores four words per addressable line of cache storage, and accordingly quad-word boundary registers determine boundary limits on quad-words, quad-word line registers store, in parallel, a selected line from the cache memory, and a quad-word boundary detector system determines when to prefetch the next set of quad-words from the cache memory for storage in the quad-word line registers.

...read moreread less

Patent•

Cache MMU system

[...]

Howard G. Sachs¹, James Y. Cho¹, Walter H. Hollingsworth¹•Institutions (1)

Intergraph¹

21 Feb 1986

TL;DR: In this article, a cache and memory management system architecture and associated protocol is described, which is comprised of a set associative memory cache subsystem, a set associated translation logic memory subsystem, hardwired page translation, selectable access mode logic, and selectively enableable instruction prefetch logic.

...read moreread less

Abstract: A cache and memory management system architecture and associated protocol is disclosed. The cache and memory management system is comprised of a set associative memory cache subsystem, a set associative translation logic memory subsystem, hardwired page translation, selectable access mode logic, and selectively enableable instruction prefetch logic. The cache and memory management system includes a system interface for coupling to a systems bus to which a main memory is coupled, and is also comprised of a processor/cache bus interface for coupling to an external CPU. As disclosed, the cache memory management system can function as either an instruction cache with instruction prefetch capability, and on-chip program counter capabilities, and as a data cache memory management system which has an address register for receiving addresses from the CPU, to initiate a transfer of defined numbers of words of data commencing at the transmitted address. Another novel feature disclosed is the quadword boundary, quadword line registers, and quadword boundary detector subsystem, which accelerates access of data within quadword boundaries, and provides for effective prefetch of sequentially ascending locations of storage instructions or data from the cache memory subsystem.

...read moreread less

Journal Article•DOI•

Multiprocessor cache synchronization: issues, innovations, evolution

[...]

P. Bitar¹, Alvin M. Despain²•Institutions (2)

Research Institute for Advanced Computer Science¹, University of California, Berkeley²

01 May 1986

TL;DR: This work describes a protocol that is currently exploring in a cache synchronization scheme for a broadcast system, and analyzes the evolution of options that have been proposed under write-in (or write-back) policy.

...read moreread less

Abstract: Many options are possible in a cache synchronization (or consistency) scheme for a broadcast system. We clarify basic concepts, analyze the handling of shared data, and then describe a protocol that we are currently exploring. Finally, we analyze the evolution of options that have been proposed under write-in (or write-back) policy. We show how our protocol extends this evolution with new methods for efficient busy-wait locking, waiting, and unlocking. The lock scheme allows locking and unlocking to occur in zero time, eliminating the need for test-and-set. The scheme also integrates processor atomic read-modify-write instructions and programmer/compiler busy-wait-synchronized operations under the same mechanism. The wait scheme eliminates all unsuccessful retries from the bus, and allows a process to work while waiting.

...read moreread less

Patent•

Dual cache for independent prefetch and execution units

[...]

Richard F. Thompson¹, Daniel J. Disney¹, Swee-meng Quek¹, Eric C. Westerfeld¹•Institutions (1)

Motorola¹

16 Oct 1986

TL;DR: In this article, a pipelined digital computer processor system is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions.

...read moreread less

Abstract: A pipelined digital computer processor system (10, FIG. 1) is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions. The IPU (2) has associated with it a high speed instruction cache (6), and the ALPU (4) has associated with it a high speed operand cache (8). Each cache comprises a data store (84, 94, FIG. 3) for storing frequently accessed data, and a tag store (82, 92, FIG. 3) for indicating which main memory locations are contained in the respective cache. The IPU and ALPU processing units (2, 4) may access their associated caches independently under most conditions. When the ALPU performs a write operation to main memory, it also updates the corresponding data in the operand cache and, if contained therein, in the instruction cache permitting the use of self-modifying code. The IPU does not write to either cache. Provision is made for clearing the caches on certain conditions when their contents become invalid.

...read moreread less

Patent•

Cup chip having tag comparator and address translation unit on chip and connected to off-chip cache and main memories

[...]

John P. Moussouris, Lester M. Crudele, Steven A. Przybylski

06 Feb 1986

TL;DR: In this article, an address translation unit is included on the same chip as, and logically between, the address generating unit and the tag comparator logic, and interleaved access to more than one cache may be accomplished on the external address, data and tag busses.

...read moreread less

Abstract: A cache-based computer architecture has the address generating unit and the tag comparator packaged together and separately from the cache RAMS. If the architecture supports virtual memory, an address translation unit may be included on the same chip as, and logically between, the address generating unit and the tag comparator logic. Further, interleaved access to more than one cache may be accomplished on the external address, data and tag busses.

...read moreread less

Journal Article•DOI•

Foundations of Computer-aided Process Design: Edited by A.W. Westerberg and H.H. Chein CACHE Publication, 1042 pp., $133.25

[...]

J.W. Ponton

01 Jan 1986-Chemical Engineering Science

Proceedings Article•

A Compiler-Assisted Cache Coherence Solution for Multiprcessors.

[...]

Alexander V. Veidenbaum

01 Jan 1986

Patent•

Bus expander with logic for virtualizing single cache control into dual channels with separate directories and prefetch for different processors

[...]

David B. Johnson¹, Ronald J. Ebersole¹, Joel C. Huang¹, Manfred Neugebauer¹, Page Steven Ray¹, Keith Self¹ - Show less +2 more•Institutions (1)

Intel¹

29 Jul 1986

TL;DR: In this paper, a prefetch buffer is provided along with a pre-configuration register and control logic for splitting the prefetch buffers into two logical channels, a first channel for handling prefetches associated with requests from the first processor, and a second channel for dealing with requests of the second processor.

...read moreread less

Abstract: Control logic for controlling references to a cache (24) including a cache directory (62) which is capable of being configured into a plurality of ways, each way including tag and valid-bit storage for associatively searching the directory (62) for cache data-array addresses. A cache-configuration register and control logic (64) splits the cache directory (62) into two logical directories, one directory for controlling requests from a first processor and the other directory for controlling requests from a second processor. A prefetch buffer (63) is provided along with a prefetch control register for splitting the prefetch buffer into two logical channels, a first channel for handling prefetches associated with requests from the first processor, and a second channel for handling prefetches associated with requests from the second processor.

...read moreread less

Patent•

Multiprocessor coherent cache system including two level shared cache with separately allocated processor storage locations and inter-level duplicate entry replacement

[...]

James W. Keeley¹•Institutions (1)

Honeywell¹

27 Jun 1986

TL;DR: In this paper, a cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units, each processing unit is allocated one-half of the total available cache memory space by separate accounting replacement apparatus included within the buffer memory stage.

...read moreread less

Abstract: A cache memory subsystem has multilevel directory memory and buffer memory pipeline stages shared by at least a pair of independently operated central processing units. For completely independent operation, each processing unit is allocated one-half of the total available cache memory space by separate accounting replacement apparatus included within the buffer memory stage. A multiple allocation memory (MAM) is also included in the buffer memory stage. During each directory allocation cycle performed for a processing unit, the allocated space of the other processing unit is checked for the presence of a multiple allocation. The address of the multiple allocated location associated with the processing unit having the lower priority is stored in the MAM allowing for earliest data replacement thereby maintaining data coherency between both independently operated processing units.

...read moreread less

Proceedings Article•

Name Service Locality and Cache Design in a Distributed Operating System.

[...]

Alan B. Sheltzer, Robert Lindell, Gerald J. Popek

01 Jan 1986

TL;DR: The goal is for the cache behavior to dominate but the large storage facility to dominat osts, thus giving the illusion of a large, fast, inexpensive storage system.

...read moreread less

Abstract: m tively slow and inexpensive source of information and a uch faster consumer of that information. The cache c capacity is relatively small and expensive, but quickly ac essible. The goal is for the cache behavior to dominate e c performance but the large storage facility to dominat osts, thus giving the illusion of a large, fast, inexpensive a s storage system. Successful operation depends both on ubstantial level of locality being exhibited by the consut mer, and careful strategies being chosen for cache opera ion disciplines for replacement of contents, update syn-

...read moreread less

Patent•

Method and apparatus for addressing a cache memory

[...]

Howard G. Sachs¹, James Y. Cho¹, Walter H. Hollingsworth¹•Institutions (1)

Intergraph¹

03 Oct 1986

TL;DR: In this article, a very high speed instruction and data interface circuitry for coupling via respective separate very high-speed instruction and interface buses to respective external instruction cache and data cache circuitry is described.

...read moreread less

Abstract: A microprocessor architecture is disclosed having separate very high speed instruction and data interface circuitry for coupling via respective separate very high speed instruction and data interface buses to respective external instruction cache and data cache circuitry. The microprocessor is comprised of an instruction interface, a data interface, and an execution unit. The instruction interface controls communications with the external instruction cache and couples the instructions from the instruction cache to the microprocessor at very high speed. The data interface controls communications with the external data cache and communicates data bidirectionally at very high speed between the data cache and the microprocessor. The execution unit selectively processes the data received via the data interface from the data cache responsive to the execution unit decoding and executing a respective one of the instructions received via the instruction interface from the instruction cache. In one embodiment, the external instruction cache is comprised of a program counter and addressable memory for outputting stored instructions responsive to its program counter and to an instruction cache advance signal output from the instruction interface. Circuitry in the instruction interface selectively outputs an initial instruction address for storage in the instruction cache program counter responsive to a context switch or branch, such that the instruction interface repetitively couples a plurality of instructions from the instruction cache to the microprocessor responsive to the cache advance signal, independent of and without the need for any intermediate or further address output from the instruction interface to the instruction cache except upon the occurrence of another context switch or branch.

...read moreread less

Patent•

Prioritized secondary use of a cache with simultaneous access

[...]

Charles P. Ryan¹, Russell W. Guenthner¹•Institutions (1)

Honeywell¹

06 Aug 1986

TL;DR: Cache memory includes a dual or two-part cache with one part of the cache being primarily designated for instruction data while the other part is designated for operand data, but not exclusively.

...read moreread less

Abstract: Cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return.

...read moreread less

Patent•

Write-back cache system using concurrent address transfers to setup requested address in main memory before dirty miss signal from cache

[...]

Steven C. Steps

20 Oct 1986

TL;DR: In this article, the cache memory is not accessable other than from cache memory, and the main memory access delay is not caused by requests from other system modules such as the I/O controller.

...read moreread less

Abstract: A computer system in which only the cache memory is permitted to communicate with main memory and the same address being used in the cache is also sent at the same time to the main memory. Thus, as soon as it is discovered that the desired main memory address is not presently in the cache, the main memory RAMs can be read to the cache without being delayed by the main memory address set up time. In addition, since the main memory is not accessable other than from the cache memory, there is also no main memory access delay caused by requests from other system modules such as the I/O controller. Likewise, since the contents of the cache memory is written into a temporary register before being sent to the main memory, a main memory read can be performed before doing a writeback of the cache to the main memory, so that data can be back to the cache in approximately the same amount of time required for a normal main memory access. The result is a significant reduction in the overhead time normally associated with cache memories.

...read moreread less

Patent•

Cache system adopting an lru system, and magnetic disk controller incorporating it

[...]

Akihiko Furuya¹, Kouiti Kanamaru¹, Junichi Inoue¹•Institutions (1)

Toshiba¹

28 May 1986

TL;DR: In this article, a cache system employing an LRU (Least Recently Used) scheme in a replacement algorithm of cache blocks and comprising a directory memory whose entires have LRU counter fields, a host system issuing a read/write command to which an arbitrary LRU setting value is appended, a directory search circuit, and a microprocessor.

...read moreread less

Abstract: A cache system employing an LRU (Least Recently Used) scheme in a replacement algorithm of cache blocks and comprising a directory memory whose entires have LRU counter fields, a host system issuing a read/write command to which an arbitrary LRU setting value is appended, a directory search circuit, and a microprocessor. The directory search circuit searches the directory memory in response to the read/write command issued from the host system. The microprocessor stores the LRU setting value appended to the read/write command in the LRU counter field of the hit entry of the directory memory or of the entry corresponding to the replacement target cache block, in response to the search result of the directory search circuit.

...read moreread less

Patent•

Cache memory with multiple valid bits for each data indication the validity within different contents

[...]

Jonathan J. Rubinstein¹•Institutions (1)

Hewlett-Packard¹

31 Jul 1986

TL;DR: In this paper, the cache memory is divided into two valid bits associated with the user execution space and the supervisor or operating system execution space, and each collection of valid bits can be cleared in unison independently of the other.

...read moreread less

Abstract: Each entry in a cache memory located between a processor and an MMU has two valid bits. One valid bit is associated with the user execution space and the other with the supervisor or operating system execution space. Each collection of valid bits can be cleared in unison independently of the other. This allows supervisor entries in the cache to survive context changes without being purged along with the user entries.

...read moreread less

Proceedings Article•DOI•

Memory access buffering in multiprocessors

[...]

Michel Dubois¹, Christoph Scheurich¹, Faye Briggs²•Institutions (2)

University of Southern California¹, Rice University²

01 May 1986

TL;DR: It is shown that the logical problem of buffering is directly related to the problem of synchronization, and a simple model is presented to evaluate the performance improvement resulting from buffering.

...read moreread less

Abstract: In highly-pipelined machines, instructions and data are prefetched and buffered in both the processor and the cache. This is done to reduce the average memory access latency and to take advantage of memory interleaving. Lock-up free caches are designed to avoid processor blocking on a cache miss. Write buffers are often included in a pipelined machine to avoid processor waiting on writes. In a shared memory multiprocessor, there are more advantages in buffering memory requests, since each memory access has to traverse the memory- processor interconnection and has to compete with memory requests issued by different processors. Buffering, however, can cause logical problems in multiprocessors. These problems are aggravated if each processor has a private memory in which shared writable data may be present, such as in a cache-based system or in a system with a distributed global memory. In this paper, we analyze the benefits and problems associated with the buffering of memory requests in shared memory multiprocessors. We show that the logical problem of buffering is directly related to the problem of synchronization. A simple model is presented to evaluate the performance improvement resulting from buffering.

...read moreread less

Journal Article•DOI•

Bibliography and reading on CPU cache memories and related topics

[...]

Alan Jay Smith

01 Jan 1986-ACM Sigarch Computer Architecture News