Showing papers on "Cache algorithms published in 1986"

PDF

Open Access

Journal Article•DOI•

Cache coherence protocols: evaluation using a multiprocessor simulation model

[...]

James Archibald¹, Jean-Loup Baer¹•Institutions (1)

01 Sep 1986-ACM Transactions on Computer Systems

TL;DR: The magnitude of the potential performance difference between the various approaches indicates that the choice of coherence solution is very important in the design of an efficient shared-bus multiprocessor, since it may limit the number of processors in the system.

...read moreread less

Abstract: Using simulation, we examine the efficiency of several distributed, hardware-based solutions to the cache coherence problem in shared-bus multiprocessors. For each of the approaches, the associated protocol is outlined. The simulation model is described, and results from that model are presented. The magnitude of the potential performance difference between the various approaches indicates that the choice of coherence solution is very important in the design of an efficient shared-bus multiprocessor, since it may limit the number of processors in the system.

...read moreread less

671 citations

Proceedings Article•DOI•

Competitive snoopy caching

[...]

Anna R. Karlin¹, Mark S. Manasse, Larry Rudolph², Daniel D. Sleator³•Institutions (3)

Stanford University¹, Hebrew University of Jerusalem², Carnegie Mellon University³

27 Oct 1986

TL;DR: This work presents new on-line algorithms which decide, for each cache, which blocks to retain and which to drop in order to minimize communication over the bus in a snoopy cache multiprocessor system.

...read moreread less

Abstract: In a snoopy cache multiprocessor system, each processor has a cache in which it stores blocks of data. Each cache is connected to a bus used to communicate with the other caches and with main memory. For several of the proposed models of snoopy caching, we present new on-line algorithms which decide, for each cache, which blocks to retain and which to drop in order to minimize communication over the bus. We prove that, for any sequence of operations, our algorithms' communication costs are within a constant factor of the minimum required for that sequence; for some of our algorithms we prove that no on-line algorithm has this property with a smaller constant.

...read moreread less

268 citations

Patent•

Cache memory consistency control with explicit software instructions

[...]

William S. Worley¹, William R. Bryg¹, Allen J. Baum¹•Institutions (1)

Hewlett-Packard¹

06 Jun 1986

TL;DR: Memory integrity is maintained in a system with a hierarchical memory using a set of explicit cache control instructions as mentioned in this paper, with two status flags, a valid bit and a dirty bit, with each block of information stored.

...read moreread less

Abstract: Memory integrity is maintained in a system with a hierarchical memory using a set of explicit cache control instructions. The caches in the system have two status flags, a valid bit and a dirty bit, with each block of information stored. The operating system executes selected cache control instructions to ensure memory integrity whenever there is a possibility that integrity could be compromised.

...read moreread less

134 citations

Patent•

Variable address mode cache

[...]

James Gerald Brenza¹•Institutions (1)

IBM¹

01 May 1986

TL;DR: In this paper, a common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches, each of which is either a real/absolute address or a virtual address according to whichever address mode the CPU is in.

...read moreread less

Abstract: A data processing system which contains a multi-level storage hierarchy, in which the two highest hierarchy levels (e.g. L1 and L2) are private (not shared) to a single CPU, in order to be in close proximity to each other and to the CPU. Each cache has a data line length convenient to the respective cache. A common directory and an L1 control array (L1CA) are provided for the CPU to access both the L1 and L2 caches. The common directory contains and is addressed by the CPU requesting logical addresses, each of which is either a real/absolute address or a virtual address, according to whichever address mode the CPU is in. Each entry in the directory contains a logical address representation derived from a logical address that previously missed in the directory. A CPU request "hits" in the directory if its requested address is in any private cache (e.g. in L1 or L2). A line presence field (LPF) is included in each directory entry to aid in determining a hit in the L1 cache. The L1CA contains L1 cache information to supplement the corresponding common directory entry; the L1CA is used during a L1 LRU castout, but is not the critical path of an L1 or L2 hit. A translation lookaside buffer (TLB) is not used to determine cache hits. The TLB output is used only during the infrequent times that a CPU request misses in the cache directory, and the translated address (i.e. absolute address) is then used to access the data in a synonym location in the same cache, or in main storage, or in the L1 or L2 cache in another CPU in a multiprocessor system using synonym/cross-interrogate directories.

...read moreread less

134 citations

Patent•

Digital computer with cache capable of concurrently handling multiple accesses from parallel processors

[...]

Blau Jonathan Seth, Robert L. Fredieu, Michael L. Ziegler

17 Jun 1986

TL;DR: A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel is discussed in this paper. But the work in this paper is restricted to the case of a single processor.

...read moreread less

Abstract: A cache memory capable of concurrently accepting and working on completion of more than one cache access from a plurality of processors connected in parallel. Current accesses to the cache are handled by current-access-completion circuitry which determines whether the current access is capable of immediate completion and either completes the access immediately if so capable or transfers the access to pending-access-completion circuitry if not so capable. The latter circuitry works on completion of pending accesses; it determines and stores for each pending access status information prescribing the steps required to complete the access and redetermines that status information as conditions change. In working on completion of current and pending accesses, the addresses of the accesses are compared to those of memory accesses in progress on the system.

...read moreread less

120 citations

Proceedings Article•DOI•

Footprints in the cache

[...]

Harold S. Stone¹, Dominique Thibaut²•Institutions (2)

IBM¹, University of Massachusetts Amherst²

01 May 1986

TL;DR: An analytical model for a cache-reload transient is developed and it is shown that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache.

...read moreread less

Abstract: This paper develops an analytical model for a cache-reload transient. When an interrupt program or system program runs periodically in a cache-based computer, a short cache-reload transient occurs each time the interrupt program is invoked. That transient depends on the size of the cache, the fraction of the cache used by the interrupt program, and the fraction of the cache used by background programs that run between interrupts. We call the portion of a cache used by a program its footprint in the cache, and we show that the reload transient is related to the area in the tail of a normal distribution whose mean is a function of the footprints of the programs that compete for the cache. We believe that the model may be useful as well for predicting paging behavior in virtual-memory systems with round-robin scheduling.

...read moreread less

112 citations

Patent•

Virtual memory cache for use in multi-processing systems

[...]

Stephen R. Dashiell¹, Bindiganavele A. Prasad¹, Ronald E. Rider¹, James A. Steffen¹•Institutions (1)

Xerox¹

12 Nov 1986

TL;DR: In this paper, the authors propose a system for maintaining data consistency among distributed processors, each having its associated cache memory, where a processor addresses data in its cache by specifying the virtual address.

...read moreread less

Abstract: A system for maintaining data consistency among distributed processors, each having its associated cache memory. A processor addresses data in its cache by specifying the virtual address. The cache will search its cells for the data associatively. Each cell has a virtual address, a real address, flags and a plurality of associated data words. If there is no hit on the virtual address supplied by the processor, a map processor supplies the equivalent real address which the cache uses to access the data from another cache if one has it, or else from real memory. When a processor writes into a data word in the cache, the cache will update all other caches that share the data before allowing the write to the local cache.

...read moreread less

106 citations

Journal Article•DOI•

Software-controlled caches in the VMP multiprocessor

[...]

David R. Cheriton¹, Gert A. Slavenburg², P. D. Boyle¹•Institutions (2)

Stanford University¹, Philips²

01 May 1986

TL;DR: This paper shows how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost, and describes simple solutions to the consistency problems associated with virtually addressed caches.

...read moreread less

Abstract: VMP is an experimental multiprocessor that follows the familiar basic design of multiple processors, each with a cache, connected by a shared bus to global memory. Each processor has a synchronous, virtually addressed, single master connection to its cache, providing very high memory bandwidth. An unusually large cache page size and fast sequential memory copy hardware make it feasible for cache misses to be handled in software, analogously to the handling of virtual memory page faults. Hardware support for cache consistency is limited to a simple state machine that monitors the bus and interrupts the processor when a cache consistency action is required.In this paper, we show how the VMP design provides the high memory bandwidth required by modern high-performance processors with a minimum of hardware complexity and cost. We also describe simple solutions to the consistency problems associated with virtually addressed caches. Simulation results indicate that the design achieves good performance providing data contention is not excessive.

...read moreread less

99 citations

Patent•

Method and apparatus for efficiently handling temporarily cacheable data

[...]

John H. Anthony¹, William C. Brantley¹, Kevin P. McAuliffe¹, Vern Alan Norton¹, Gregory Francis Pfister¹ - Show less +1 more•Institutions (1)

IBM¹

25 Jul 1986

TL;DR: In this article, a method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data is presented, where a bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved.

...read moreread less

Abstract: A method and apparatus for marking data that is temporarily cacheable to facilitate the efficient management of said data. A bit in the segment and/or page descriptor of the data called the marked data bit (MDB) is generated by the compiler and included in a request for data from memory by the processor in the form of a memory address and will be stored in the cache directory at a location related to the particular line of data involved. The bit is passed to the cache together with the associated real address after address translation (in the case of a real cache). when the cache controls load the address of the data in the directory it is also stored the marked data bit (MDB) in the directory with the address. When the cacheability of the temporarily cacheable data changes from cacheable to non-cacheable, a single instruction is issued to cause the cache to invalidate all marked data. When an "invalidate marked data" instruction is received, the cache controls sweep through the entire cache directory and invalidate any cache line which has the "marked data bit" set in a single pass. An extension of the invention involves using a multi-bit field rather than a single bit to provide a more versatile control of the temporary cacheability of data.

...read moreread less

89 citations

Patent•

Method and apparatus for facilitating instruction processing of a digital computer

[...]

Ruby B. Lee¹, Allen J. Baum¹, Russell Kao¹•Institutions (1)

Hewlett-Packard¹

27 Mar 1986

TL;DR: In this paper, the authors propose to use a cache memory and a main memory with a transformation unit between the main memory and the cache memory so that at least a portion of an information unit retrieved from the cache may be transformed during retrieval of the information (fetch) from a mainmemory and prior to storage in the cache (cache).

...read moreread less

Abstract: A computer having a cache memory and a main memory is provided with a transformation unit between the main memory and the cache memory so that at least a portion of an information unit retrieved from the main memory may be transformed during retrieval of the information (fetch) from a main memory and prior to storage in the cache memory (cache). In a specific embodiment, an instruction may be predecoded prior to storage in the cache memory. In another embodiment involving a branch instruction, the address of the target of the branch is calculated prior to storing in the instruction cache. The invention has advantages where a particular instruction is repetitively executed since a needed decode operation which has been partially performed previously need not be repeated with each execution of an instruction. Consequently, the latency time of each machine cycle may be reduced, and the overall efficiency of the computing system can be improved. If the architecture defines delayed branch instructions, such branch instructions may be executed in effectively zero machine cycles. This requires a wider bus and an additional register in the processor to allow the fetching of two instructions from the cache memory in the same cycle.

...read moreread less

86 citations

Patent•

A cache coherence mechanism based on locking

[...]

Lishing Liu¹•Institutions (1)

IBM¹

12 Sep 1986

TL;DR: In this article, a method and apparatus for associating in cache directories the Control Domain Identifications (CDIDs) of software covered by each cache line is provided, through the use of such provision and/or the addition of Identifications of users actively using lines, cache coherence of certain data is controlled without performing conventional Cross-Interrogates (XIs), if the accesses to such objects are properly synchronized with locking type concurrency controls.

...read moreread less

Abstract: A method and apparatus is provided for associating in cache directories the Control Domain Identifications (CDIDs) of software covered by each cache line. Through the use of such provision and/or the addition of Identifications of users actively using lines, cache coherence of certain data is controlled without performing conventional Cross-Interrogates (XIs), if the accesses to such objects are properly synchronized with locking type concurrency controls. Software protocols to caches are provided for the resource kernel to control the flushing of released cache lines. The parameters of these protocols are high level Domain Identifications and Task Identifications.

...read moreread less

Patent•

Quadword boundary cache system

[...]

Howard G. Sachs¹, James Y. Cho¹, Walter H. Hollingsworth¹•Institutions (1)

Intergraph¹

03 Oct 1986

TL;DR: In this paper, a cache memory system with multiple-word boundary registers, multipleword line registers, and a multipleword boundary detector system is presented, where the cache memory stores four words per addressable line of cache storage.

...read moreread less

Abstract: In a cache memory system, multiple-word boundary registers, multiple-word line registers, and a multiple-word boundary detector system provide accelerated access of data contained within the cache memory within the multiple-word boundaries, and provides for effective prefetch of sequentially ascending locations of stored data from the cache memory. In an illustrated embodiment, the cache memory stores four words per addressable line of cache storage, and accordingly quad-word boundary registers determine boundary limits on quad-words, quad-word line registers store, in parallel, a selected line from the cache memory, and a quad-word boundary detector system determines when to prefetch the next set of quad-words from the cache memory for storage in the quad-word line registers.

...read moreread less

Patent•

Cache MMU system

[...]

Howard G. Sachs¹, James Y. Cho¹, Walter H. Hollingsworth¹•Institutions (1)

Intergraph¹

21 Feb 1986

TL;DR: In this article, a cache and memory management system architecture and associated protocol is described, which is comprised of a set associative memory cache subsystem, a set associated translation logic memory subsystem, hardwired page translation, selectable access mode logic, and selectively enableable instruction prefetch logic.

...read moreread less

Abstract: A cache and memory management system architecture and associated protocol is disclosed. The cache and memory management system is comprised of a set associative memory cache subsystem, a set associative translation logic memory subsystem, hardwired page translation, selectable access mode logic, and selectively enableable instruction prefetch logic. The cache and memory management system includes a system interface for coupling to a systems bus to which a main memory is coupled, and is also comprised of a processor/cache bus interface for coupling to an external CPU. As disclosed, the cache memory management system can function as either an instruction cache with instruction prefetch capability, and on-chip program counter capabilities, and as a data cache memory management system which has an address register for receiving addresses from the CPU, to initiate a transfer of defined numbers of words of data commencing at the transmitted address. Another novel feature disclosed is the quadword boundary, quadword line registers, and quadword boundary detector subsystem, which accelerates access of data within quadword boundaries, and provides for effective prefetch of sequentially ascending locations of storage instructions or data from the cache memory subsystem.

...read moreread less

Patent•

Dual cache for independent prefetch and execution units

[...]

Richard F. Thompson¹, Daniel J. Disney¹, Swee-meng Quek¹, Eric C. Westerfeld¹•Institutions (1)

Motorola¹

16 Oct 1986

TL;DR: In this article, a pipelined digital computer processor system is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions.

...read moreread less

Abstract: A pipelined digital computer processor system (10, FIG. 1) is provided comprising an instruction prefetch unit (IPU,2) for prefetching instructions and an arithmetic logic processing unit (ALPU, 4) for executing instructions. The IPU (2) has associated with it a high speed instruction cache (6), and the ALPU (4) has associated with it a high speed operand cache (8). Each cache comprises a data store (84, 94, FIG. 3) for storing frequently accessed data, and a tag store (82, 92, FIG. 3) for indicating which main memory locations are contained in the respective cache. The IPU and ALPU processing units (2, 4) may access their associated caches independently under most conditions. When the ALPU performs a write operation to main memory, it also updates the corresponding data in the operand cache and, if contained therein, in the instruction cache permitting the use of self-modifying code. The IPU does not write to either cache. Provision is made for clearing the caches on certain conditions when their contents become invalid.

...read moreread less

Proceedings Article•

A Compiler-Assisted Cache Coherence Solution for Multiprcessors.

[...]

Alexander V. Veidenbaum

01 Jan 1986

Patent•

Bus expander with logic for virtualizing single cache control into dual channels with separate directories and prefetch for different processors

[...]

David B. Johnson¹, Ronald J. Ebersole¹, Joel C. Huang¹, Manfred Neugebauer¹, Page Steven Ray¹, Keith Self¹ - Show less +2 more•Institutions (1)

Intel¹

29 Jul 1986

TL;DR: In this paper, a prefetch buffer is provided along with a pre-configuration register and control logic for splitting the prefetch buffers into two logical channels, a first channel for handling prefetches associated with requests from the first processor, and a second channel for dealing with requests of the second processor.

...read moreread less

Abstract: Control logic for controlling references to a cache (24) including a cache directory (62) which is capable of being configured into a plurality of ways, each way including tag and valid-bit storage for associatively searching the directory (62) for cache data-array addresses. A cache-configuration register and control logic (64) splits the cache directory (62) into two logical directories, one directory for controlling requests from a first processor and the other directory for controlling requests from a second processor. A prefetch buffer (63) is provided along with a prefetch control register for splitting the prefetch buffer into two logical channels, a first channel for handling prefetches associated with requests from the first processor, and a second channel for handling prefetches associated with requests from the second processor.

...read moreread less

Proceedings Article•

Name Service Locality and Cache Design in a Distributed Operating System.

[...]

Alan B. Sheltzer, Robert Lindell, Gerald J. Popek

01 Jan 1986

TL;DR: The goal is for the cache behavior to dominate but the large storage facility to dominat osts, thus giving the illusion of a large, fast, inexpensive storage system.

...read moreread less

Abstract: m tively slow and inexpensive source of information and a uch faster consumer of that information. The cache c capacity is relatively small and expensive, but quickly ac essible. The goal is for the cache behavior to dominate e c performance but the large storage facility to dominat osts, thus giving the illusion of a large, fast, inexpensive a s storage system. Successful operation depends both on ubstantial level of locality being exhibited by the consut mer, and careful strategies being chosen for cache opera ion disciplines for replacement of contents, update syn-

...read moreread less

Patent•

Prioritized secondary use of a cache with simultaneous access

[...]

Charles P. Ryan¹, Russell W. Guenthner¹•Institutions (1)

Honeywell¹

06 Aug 1986

TL;DR: Cache memory includes a dual or two-part cache with one part of the cache being primarily designated for instruction data while the other part is designated for operand data, but not exclusively.

...read moreread less

Abstract: Cache memory includes a dual or two part cache with one part of the cache being primarily designated for instruction data while the other is primarily designated for operand data, but not exclusively. For a maximum speed of operation, the two parts of the cache are equal in capacity. The two parts of the cache, designated I-Cache and O-Cache, are semi-independent in their operation and include arrangements for effecting synchronized searches, they can accommodate up to three separate operations substantially simultaneously. Each cache unit has a directory and a data array with the directory and data array being separately addressable. Each cache unit may be subjected to a primary and to one or more secondary concurrent uses with the secondary uses prioritized. Data is stored in the cache unit on a so-called store-into basis wherein data obtained from the main memory is operated upon and stored in the cache without returning the operated upon data to the main memory unit until subsequent transactions require such return.

...read moreread less

Patent•

Write-back cache system using concurrent address transfers to setup requested address in main memory before dirty miss signal from cache

[...]

Steven C. Steps

20 Oct 1986

TL;DR: In this article, the cache memory is not accessable other than from cache memory, and the main memory access delay is not caused by requests from other system modules such as the I/O controller.

...read moreread less

Abstract: A computer system in which only the cache memory is permitted to communicate with main memory and the same address being used in the cache is also sent at the same time to the main memory. Thus, as soon as it is discovered that the desired main memory address is not presently in the cache, the main memory RAMs can be read to the cache without being delayed by the main memory address set up time. In addition, since the main memory is not accessable other than from the cache memory, there is also no main memory access delay caused by requests from other system modules such as the I/O controller. Likewise, since the contents of the cache memory is written into a temporary register before being sent to the main memory, a main memory read can be performed before doing a writeback of the cache to the main memory, so that data can be back to the cache in approximately the same amount of time required for a normal main memory access. The result is a significant reduction in the overhead time normally associated with cache memories.

...read moreread less

Patent•

Cache system adopting an lru system, and magnetic disk controller incorporating it

[...]

Akihiko Furuya¹, Kouiti Kanamaru¹, Junichi Inoue¹•Institutions (1)

Toshiba¹

28 May 1986

TL;DR: In this article, a cache system employing an LRU (Least Recently Used) scheme in a replacement algorithm of cache blocks and comprising a directory memory whose entires have LRU counter fields, a host system issuing a read/write command to which an arbitrary LRU setting value is appended, a directory search circuit, and a microprocessor.

...read moreread less

Abstract: A cache system employing an LRU (Least Recently Used) scheme in a replacement algorithm of cache blocks and comprising a directory memory whose entires have LRU counter fields, a host system issuing a read/write command to which an arbitrary LRU setting value is appended, a directory search circuit, and a microprocessor. The directory search circuit searches the directory memory in response to the read/write command issued from the host system. The microprocessor stores the LRU setting value appended to the read/write command in the LRU counter field of the hit entry of the directory memory or of the entry corresponding to the replacement target cache block, in response to the search result of the directory search circuit.

...read moreread less

Patent•

Cache memory with multiple valid bits for each data indication the validity within different contents

[...]

Jonathan J. Rubinstein¹•Institutions (1)

Hewlett-Packard¹

31 Jul 1986

TL;DR: In this paper, the cache memory is divided into two valid bits associated with the user execution space and the supervisor or operating system execution space, and each collection of valid bits can be cleared in unison independently of the other.

...read moreread less

Abstract: Each entry in a cache memory located between a processor and an MMU has two valid bits. One valid bit is associated with the user execution space and the other with the supervisor or operating system execution space. Each collection of valid bits can be cleared in unison independently of the other. This allows supervisor entries in the cache to survive context changes without being purged along with the user entries.

...read moreread less

Patent•

Cache disable for a data processor

[...]

David S. Motersole¹, Jay A. Hartvigsen², John Zolnowsky²•Institutions (2)

Freescale Semiconductor¹, Motorola²

14 Jul 1986

TL;DR: In this paper, a disable circuit is provided to prevent the cache from providing the item when a signal external to the data processor is provided, so that a user, with the external signal, can cause a data processor to make all of its requests for items of operating information to the memory where these requests can be detected.

...read moreread less

Abstract: A data processor is adapted for operation with a memory containing a plurality of items of operating information for the data processor. In addition a cache stores a selected number of all of the items of the operating information. When the cache provides an item of operating information, the memory is not requested to provide the item so that a user of the data processor cannot detect the request for the item. A disable circuit is provided to prevent the cache from providing the item when a signal external to the data processor is provided. Consequently, a user, with the external signal, can cause the data processor to make all of its requests for items of operating information to the memory where these requests can be detected.

...read moreread less

Patent•

Cache memory control apparatus

[...]

Tetsu Igarashi¹•Institutions (1)

Toshiba¹

15 Dec 1986

TL;DR: In this paper, a cache memory control apparatus is presented, which includes data register blocks which are individually controlled for each byte, cache memory blocks, and a decoder for generating control signals which control the access to those blocks.

...read moreread less

Abstract: A cache memory control apparatus according to the present invention includes data register blocks which are individually controlled for each byte, cache memory blocks, and a decoder for generating control signals which control the access to those blocks In this cache memory control apparatus, when a cache hit is made in a write mode for byte data, the control signal is supplied to the data register blocks and cache memory blocks to individually control the respective blocks, thereby allowing word data corresponding to the write byte data to be synthesized Thus, the word data can be output to an external device by one operation

...read moreread less

Patent•

Buffer error retry

[...]

Michael D. Taylor, Shen H. Wang

17 Oct 1986

TL;DR: In this paper, a data processing machine including an instruction and operand processing complex generates results during execution of some instructions, and a request to store the results in a high speed cache is generated.

...read moreread less

Abstract: A data processing machine including an instruction and operand processing complex. The instruction and operand processing complex generates results during execution of some instructions. When results are generated, a request to store the results in a high speed cache is generated. The cache receiving the request to store the results includes means for detecting errors in a line in the storage means to which the results are to be stored prior to writing the results to the line. If the line includes an error, a means for correcting the error reads to the line from the cache, corrects the error and restores the line. When the line is restored, the results are written to the cache and a correct ECC code is generated.

...read moreread less

Journal Article•DOI•

Cache memory performance in a unix enviroment

[...]

Cedell A. Alexander¹, William M. Keshlear¹, Furrokh Cooper¹, Faye Briggs²•Institutions (2)

Texas Instruments¹, Rice University²

01 Jun 1986-ACM Sigarch Computer Architecture News

TL;DR: The intent is to credibly quantify the performance implications of parameter selection in a manner which emphasizes implementation tradeoffs using address reference traces obtained from typical multitasking UNIX workloads to research cache memory performance.

...read moreread less

Abstract: In a previous issue, we described a study of translation buffer performance undertaken in conjunction with the design of a memory management unit for a new 32 bit microprocessor [Alex85b]. This work produced generalized results via trace-driven simulations. The address reference traces were obtained from typical multitasking UNIX workloads and have now been used to research cache memory performance. Caches are small, fast memories placed between a processor and the main storage of a computer system in order to reduce the amount of time spent waiting on memory accesses. Blocks of recently referenced information are stored in the cache because future references are likely to be nearby (termed \"property of locality\") [Denn72]. When memory references are satisfied by the cache, the overhead of accesing main storage is eliminated. This frees the system bus for DMA or multiple processor activity and provides a significant improvement in the cost/performance ratio of the memory hierarchy. Processors may therefore operate at cache speed while maintaining the economic advantages of a slower main storage. There have been many published reports concerning cache memory performance. Some of these studies have been based on measurements, others have relied on analytic modeling, but most of the work has utilized trace-driven simulations. Address traces are typically captured by interpretively executing a program and recording each of its memory references. These traces are then used to drive the simulation model of a particular cache design. Although this approach has produced valuable insights, the absolute accuracy is questionable because the results are usually based on short traces of user programs which exclude operating system code, interrupts, task switches, instruction prefetching effects and input/output related activities [Smit85a]. The traces used during this project are unique in that they represent the UNIX environment and do not suffer from the problems described above. They were obtained via a hardware monitor from a system actually executing commonly fouhd workloads. Once the domain of mainframe engineers, cache memory has recently become the object of much popular attention. Driven by the practical need for cost effective utilization of high performance microprocessors and the desire to harness the power of multiple microprocessor configurations, controversial questions are being asked. And after discussions with several system architects, we felt that a cache study based on our address traces would be worthwhile. We have re-examined the basic design parameters, obtained some new results and provided a commentary on \"state-of-the-art\" issues. Our intent is to credibly quantify the performance implications of parameter selection in a manner which emphasizes implementation tradeoffs. Topics addressed by this paper include: (I) the effects of varying the block size and degree of associativity for a wide range of cache sizes; (2) coherency techniques and the cold start impact of invalidation; (3) memory update mechanisms and the efficiency achieved by several bus oriented protocols; (4) replacement algorithms; (5) sub-block and sector mapping schemes; (6) instruction caches; and (7) split caches.

...read moreread less

Processor architecture and cache performance

[...]

Chad L. Mitchell

01 Jun 1986

TL;DR: This study provides new results about the relationship between processor architecture and memory traffic for instruction fetches for a general range of cache sizes and observation that relative instruction traffic differences between architectures are about the same with very large caches as with no cache and that intermediate sized caches tend to accentuate such relative differences.

...read moreread less

Abstract: Previously, analysis of processor architecture has involved the measurement of hardware or interpreters. The use of benchmarks written in high-level languages has added the requirement for the compiler targeted to each architecture studied. Herein, a methodology based on the use of compiler tools has been developed which allows simulation of different processors without the necessity of creating interpreters and compilers for each architecture simulated. The resource commitment per architecture studied is greatly reduced and the study of a spectrum of processor architectures is facilitated. Tools for the use of this methodology were developed from existing compiler and simulation tools. The new tools were validated and the methodology was then applied to study the effects of processor architecture on instruction cache performance. Over 50 architectures from three architectural families (Stack, Register Set and Direct Correspondence) were simulated. Earlier, studies have compared and contrasted the effects of various features of processor architecture. Instruction cache performance has also been studied in some depth. This study provides new results about the relationship between processor architecture and memory traffic for instruction fetches for a general range of cache sizes. Among the results is the general observation that relative instruction traffic differences between architectures are about the same with very large caches as with no cache and that intermediate sized caches tend to accentuate such relative differences.

...read moreread less

Book Chapter•DOI•

Specification and properties of a cache coherence protocol model

[...]

Claude Girault¹, C. Chatelain¹, Serge Haddad¹•Institutions (1)

University of Paris¹

01 Jun 1986

TL;DR: This paper describes a cache coherence protocol for an architecture composed of several processors, each with their own local cache, connected via a switching structure to a shared memory itself split into several modules managed by independent controllers.

...read moreread less

Abstract: This paper describes a cache coherence protocol for an architecture composed of several processors, each with their own local cache, connected via a switching structure to a shared memory itself split into several modules managed by independent controllers. The protocol prevents processors from simultaneously modifying their respective copies and always provides a processor requiring a copy of a memory location with the most up-to-date version. A top down description and modeling of the protocol is given using Predicate/Transition nets. This modeling allows to formally describe the complex synchronizations of this protocol. Then invariants are directly obtained without unfolding the Predicate/Transition net. They are the basis for studying behavioral properties.

...read moreread less

Patent•

Paged virtual cache system

[...]

Thomas F. Joyce¹, Ming T. Miu¹, Jian-Kuo Shen¹, Forrest M. Phillips¹•Institutions (1)

Wilmington University¹

17 Dec 1986

TL;DR: In this paper, the authors present a page level cache organization for multiprocessor computer systems, which allows the processing of either virtual or physical addresses with improved speed and reduced complexity and the ability to eliminate both consistency and synonym problems.

...read moreread less

Abstract: OF THE DISCLOSURE A multiprocessor computer system includes a main memory and a plurality of central processing units (CPU's) which are connected to share main memory via a common bus network. Each CPU has instruction and data cache units, each organized on a page basis for complete operating compatibility with user processes. Each cache unit includes a number of content addressable memories (CAM's) and directly addressable memories (RAM's) organized to combine associative and direct mapping of data or instructions on a page basis. An input CAM in response to a CPU address provides a cache address which includes a page level number for identifying where all of the required information resides in the other memories f or processing requests relating to the page. This organization permits the processing of either virtual or physical addresses with improved speed and reduced complexity and the ability to detect and eliminate both consistency and synonym problems.

...read moreread less

Proceedings Article•

An instruction cache design for use with a delayed branch

[...]

Andrew R. Pleszkun, Matthew Farrens

01 Jun 1986

TL;DR: The design and implementation issues associated with realizing an instruction cache for a machine that uses an "extended-" version of the delayed branch instruction are discussed and timing results are presented which indicate the performance of critica-i circuits.

...read moreread less

Abstract: _ In this paper, we present the design ofan instruction cache for a machine that uses an "extended-" version of the delayed branch instruction. The extended delayed branch, which we iall the prepa,re to branch, or PBR instruction, permits the unconditional execution of between 0 and 7 instruition parcels after the branch instruction. The instruction cache is designed to fit o-l_t!e. same chip with the processor and takes advanlage of the PBR instruction to minimiZe the effective latencv associated with memory references and the filling of the instruction register. Tlis paper discusses the design and implementation issues associated with realizing such an instruction cache. We pres_ent-critical aspects of the design and the philosophy used to guidg the developm-ent 9{ the design. Finally, somir timing results are presented which indicate-the performance of critica-i circuits.

...read moreread less

Patent•

Raster scan video controller provided with an update cache, update cache for use in such video controller, and CRT display station comprising such controller

[...]

Craig Mackenna¹, Jan-Kwei Jack Li¹•Institutions (1)

Philips¹

18 Dec 1986

TL;DR: In this article, a raster scan video controller is provided with an update cache for selective updating of a display memory under control of an updating device, where each display memory connection is paired to a corresponding connection to the updating device.

...read moreread less

Abstract: A raster scan video controller is provided with an update cache for selective updating of a display memory under control of an updating device. The display has a larger bit width than the updating device. The update cache is full-width connected to the display memory. Each display memory connection is paired to a corresponding connection to the updating device. The updating device has random-access facility to the update cache for accessing only a selective part thereof for receiving, transmitting and latching data with respect to the update cache. Corresponding functions are available with respect to the memory.

...read moreread less