Showing papers on "Smart Cache published in 1990"

PDF

Open Access

Proceedings Article•DOI•

Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers

[...]

01 May 1990

TL;DR: In this article, a hardware technique to improve the performance of caches is presented, where a small fully-associative cache between a cache and its refill path is used to place prefetched data and not in the cache.

...read moreread less

Abstract: Projections of computer technology forecast processors with peak performance of 1,000 MIPS in the relatively near future. These processors could easily lose half or more of their performance in the memory hierarchy if the hierarchy design is based on conventional caching techniques. This paper presents hardware techniques to improve the performance of caches.Miss caching places a small fully-associative cache between a cache and its refill path. Misses in the cache that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache. Small miss caches of 2 to 5 entries are shown to be very effective in removing mapping conflict misses in first-level direct-mapped caches.Victim caching is an improvement to miss caching that loads the small fully-associative cache with the victim of a miss and not the requested line. Small victim caches of 1 to 5 entries are even more effective at removing conflict misses than miss caching.Stream buffers prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer and not in the cache. Stream buffers are useful in removing capacity and compulsory cache misses, as well as some instruction cache conflict misses. Stream buffers are more effective than previously investigated prefetch techniques at using the next slower level in the memory hierarchy when it is pipelined. An extension to the basic stream buffer, called multi-way stream buffers, is introduced. Multi-way stream buffers are useful for prefetching along multiple intertwined data reference streams.Together, victim caches and stream buffers reduce the miss rate of the first level in the cache hierarchy by a factor of two to three on a set of six large benchmarks.

...read moreread less

1,481 citations

Journal Article•DOI•

Adaptive software cache management for distributed shared memory architectures

[...]

John K. Bennett¹, John B. Carter¹, Willy Zwaenepoel¹•Institutions (1)

Rice University¹

01 May 1990

TL;DR: This work has examined the sharing and synchronization behavior of a variety of shared memory parallel programs and found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist.

...read moreread less

Abstract: An adaptive cache coherence mechanism exploits semantic information about the expected or observed access behavior of particular data objects. We contend that, in distributed shared memory systems, adaptive cache coherence mechanisms will outperform static cache coherence mechanisms. We have examined the sharing and synchronization behavior of a variety of shared memory parallel programs. We have found that the access patterns of a large percentage of shared data objects fall in a small number of categories for which efficient software coherence mechanisms exist. In addition, we have performed a simulation study that provides two examples of how an adaptive caching mechanism can take advantage of semantic information.

...read moreread less

166 citations

Patent•

Method and apparatus for independently resetting processors and cache controllers in multiple processor systems

[...]

David A. Miller, Kenneth A. Jansen, Paul R. Culley, Mark E. Taylor, Javier F. Izquierdo - Show less +1 more

24 Oct 1990

TL;DR: In this paper, a method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system is presented.

...read moreread less

Abstract: A method and system for independently resetting primary and secondary processors 20 and 120 respectively under program control in a multiprocessor, cache memory system. Processors 20 and 120 are reset without causing cache memory controllers 24 and 124 to reset.

...read moreread less

143 citations

Patent•

Data processing system and method with small fully-associative cache and prefetch buffers

[...]

Norman P. Jouppi, Alan Eustace

27 Mar 1990

TL;DR: In this article, the authors propose an extension to the basic stream buffer, called multi-way stream buffers (62), which is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

Abstract: A memory system (10) utilizes miss caching by incorporating a small fully-associative miss cache (42) between a cache (18 or 20) and second-level cache (26). misses in the cache (18 or 20) that hit in the miss cache have only a one cycle miss penalty, as opposed to a many cycle miss penalty without the miss cache (42). Victim caching is an improvement to miss caching that loads a small, fully associative cache (52) with the victim of a miss and not the requested line. Small victim caches (52) of 1 to 4 entries are even more effective at removing conflict misses than miss caching. Stream buffers (62) prefetch cache lines starting at a cache miss address. The prefetched data is placed in the buffer (62) and not in the cache (18 or 20). Stream buffers (62) are useful in removing capacity and compulsory cache misses, as well as some instruction cache misses. Stream buffers (62) are more effective than previously investigated prefetch techniques when the next slower level in the memory hierarchy is pipelined. An extension to the basic stream buffer, called multi-way stream buffers (62), is useful for prefetching along multiple intertwined data reference streams.

...read moreread less

143 citations

Journal Article•DOI•

The performance impact of block sizes and fetch strategies

[...]

Steven Przybylski

01 May 1990

TL;DR: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance and finds the most effective fetch strategy improved performance by between 1.7% and 4.5%.

...read moreread less

Abstract: This paper explores the interactions between a cache's block size, fetch size and fetch policy from the perspective of maximizing system-level performance. It has been previously noted that given a simple fetch strategy the performance optimal block size is almost always four or eight words [10]. If there is even a small cycle time penalty associated with either longer blocks or fetches, then the performance-optimal size is noticeably reduced. In split cache organizations, where the fetch and block sizes of instruction and data caches are all independent design variables, instruction cache block size and fetch size should be the same. For the workload and write-back write policy used in this trace-driven simulation study, the instruction cache block size should be about a factor of two greater than the data cache fetch size, which in turn should equal to or double the data cache block size. The simplest fetch strategy of fetching only on a miss and stalling the CPU until the fetch is complete works well. Complicated fetch strategies do not produce the performance improvements indicated by the accompanying reductions in miss ratios because of limited memory resources and a strong temporal clustering of cache misses. For the environments simulated here, the most effective fetch strategy improved performance by between 1.7% and 4.5% over the simplest strategy described above.

...read moreread less

92 citations

Patent•

Programmable cache memory as well as system incorporating same and method of operating programmable cache memory

[...]

Gigy Baror¹, William M. Johnson¹•Institutions (1)

Advanced Micro Devices¹

12 Dec 1990

TL;DR: In this article, an integrated cache unit (ICU) comprising both a cache memory and a cache controller on a single chip is presented. Butler et al. proposed an ICU that supports high speed data and instruction processing applications in both RISC and non-RISC architecture environments.

...read moreread less

Abstract: Methods and apparatus are disclosed for realizing an integrated cache unit (ICU) comprising both a cache memory and a cache controller on a single chip. The novel ICU is capable of being programmed, supports high speed data and instruction processing applications in both Reduced Instruction Set Computers (RISC) and non-RISC architecture environments, and supports high speed processing applications in both single and multiprocessor systems. The preferred ICU has two buses, one for the processor interface and the other for a memory interface. The ICU support single, burst and pipelined processor accesses and is capable of operating at frequencies in excess of 25 megahertz, achieving processor access times of two cycles for the first access in a sequence, and one cycle for burst mode or piplined accesses. It can be used as either an instruction or data cache with flexible internal cache organization. A RISC processor and two ICUs (for instruction and data cache) implements a very high performance processor with 16k bytes of cache. Larger caches can be designed by using additional ICUs which, according to the preferred embodiment of the invention, are modular. Further features include flexible and extensive multiprocessor support hardware, low power requirements, and support of a combination of bus watching, ownership schemes, software control and hardware control schemes which may be used with the novel ICU to achieve cache consistency.

...read moreread less

79 citations

Proceedings Article•DOI•

SMART (strategic memory allocation for real-time) cache design using the MIPS R3000

[...]

D.B. Kirk¹, Jay K. Strosnider¹•Institutions (1)

Carnegie Mellon University¹

05 Dec 1990

TL;DR: SMART, a technique for providing predictable cache performance for real-time systems with priority-based preemptive scheduling, is presented and the value density acceleration (VDA) cache allocation algorithm is introduced and shown to be suitable for run-time cache allocation.

...read moreread less

Abstract: SMART, a technique for providing predictable cache performance for real-time systems with priority-based preemptive scheduling, is presented. The technique is implemented in a R3000 cache design. The value density acceleration (VDA) cache allocation algorithm is also introduced, and shown to be suitable for run-time cache allocation. >

...read moreread less

77 citations

Patent•

Controller for two-way set associative cache

[...]

John H. Crawford¹, Sundaravarathan R. Iyengar¹, James Nadir¹•Institutions (1)

Intel¹

28 Nov 1990

TL;DR: In this paper, the cache controller tagram is configured into two ways, each way including tag and valid-bit storage for associatively searching the directory for cache data-array addresses, and the external cache memory is organized such that both ways are simultaneously available to a number of available memory modules in the system to thereby allow the way access time to occur in parallel with the tag lookup.

...read moreread less

Abstract: A cache controller (10) which sits in parallel with a microprocessor bus (14, 15, 29) so as not to impede system response in the event of a cache miss. The cache controller tagram (24) is configured into a two ways, each way including tag and valid-bit storage for associatively searching the directory for cache data-array addresses. The external cache memory (8) is organized such that both ways are simultaneously available to a number of available memory modules in the system to thereby allow the way access time to occur in parallel with the tag lookup.

...read moreread less

68 citations

Patent•

Apparatus and method for reducing interference in two-level cache memories

[...]

Charles P. Thacker

30 Nov 1990

TL;DR: In this paper, a cache check request signal is transmitted on the shared data bus, which enables all the cache memories to update their contents if necessary, and each cache has control logic for determining when a specified address location is stored in one of its lines or blocks.

...read moreread less

Abstract: In a multiprocessor computer system, a number of processors are coupled to main memory by a shared memory bus, and one or more of the processors have a two level direct mapped cache memory. When any one processor updates data in a shared portion of the address space, a cache check request signal is transmitted on the shared data bus, which enables all the cache memories to update their contents if necessary. Since both caches are direct mapped, each line of data stored in the first cache is also stored in one of the blocks in the second cache. Each cache has control logic for determining when a specified address location is stored in one of its lines or blocks. To avoid spurious accesses to the first level cache when a cache check is performed, the second cache has a special table which stores a pointer for each line in said first cache array. This pointer denotes the block in the second cache which stores the same data as is stored in the corresponding line of the first cache. When the control logic of the second cache indicates that the specified address for a cache check is located in the second cache, a lookup circuit compares the pointer in the special table which corresponds to the specified address with a subset of the bits of the specified address. If the two match, then the specified address is located in the first cache, and the first cache is updated.

...read moreread less

67 citations

Patent•

Central processing unit checkpoint retry for store-in and store-through cache systems

[...]

Barbara A. Hall¹, Kevin Huang¹, John David Jabusch¹, Agnes Yee Ngai¹•Institutions (1)

IBM¹

04 Oct 1990

TL;DR: A checkpoint retry system for recovery from an error condition in a multiprocessor type central processing unit which may have a store-in or store-through cache system is presented in this paper.

...read moreread less

Abstract: A checkpoint retry system for recovery from an error condition in a multiprocessor type central processing unit which may have a store-in or a store-through cache system. At detection of a checkpoint instruction, the system initiates action to save the content of the program status word, the floating point registers, the access registers and the general purpose registers until the store operations are completed for the checkpointed sequence. For processors which have a store-in cache, modified cache data is saved in a store buffer until the checkpointed instructions are completed and then written to a cache which is accessible to other processors in the system. For processors which utilize store-through cache, the modified data for the checkpointed instructions is also stored in the store buffer prior to storage in the system memory.

...read moreread less

62 citations

Patent•

Microprocessor having cache bypass signal terminal

[...]

Hitoshi Yamahata¹•Institutions (1)

NEC¹

20 Jun 1990

TL;DR: In this paper, a cache bypass signal generator is used to prevent the cache memory from performing a data caching operation on the data in order to prevent data to be cache-bypassed without checking bus status signals.

...read moreread less

Abstract: A microprocessor capable of being incorporated in an information processing system with a cache memory unit and capable of realizing fine cache bypass control. The microprocessor can detect data to be cache-bypassed without checking bus status signals. The microprocessor is equipped with a cache bypass signal generator. Upon detection of data to be bypassed, the cache bypass signal generator generates a cache bypass request signal, which prevents the cache memory from performing a data caching operation on the data.

...read moreread less

Patent•

Non-blocking serialization for caching data in a shared cache

[...]

Chandrasekaran Mohan¹, Inderpal S. Narang¹•Institutions (1)

IBM¹

14 Dec 1990

TL;DR: In this paper, a method of controlling entry of a block of data is used with a high-speed cache which is shared by a plurality of independently-operating computer systems in a multi-system data sharing complex.

...read moreread less

Abstract: A method of controlling entry of a block of data is used with a high-speed cache which is shared by a plurality of independently-operating computer systems in a multi-system data sharing complex. Each computer system has access both to the high-speed cache and to lower-speed, upper-level storage for obtaining and storing data. Management logic in the high-speed cache assures that the block of data entered into the cache will not be overwritten by an earlier version of the block of data obtained from the upper-level storage.

...read moreread less

Journal Article•DOI•

Characteristics of destination address locality in computer networks: a comparison of caching schemes

[...]

Raj Jain

01 May 1990-Computer Networks and Isdn Systems

TL;DR: In this paper, the authors compared the performance of MIN, LRU, FIFO, and random cache replacement algorithms and found that the interactive traffic in their sample had a quite different locality behavior than that of the noninteractive traffic.

...read moreread less

Abstract: The size of computer networks, along with their bandwidths, is growing exponentially. To support these large, high-speed networks, it is necessary to be able to forward packets in a few microseconds. One part of the forwarding operation consists of searching through a large address database. This problem is encountered in the design of adapters, bridges, routers, gateways, and name servers. Caching can reduce the lookup time if there is a locality in the address reference pattern. Using a destination reference trace measured on an extended local area network, we attempt to see if the destination references do have a significant locality. We compared the performance of MIN, LRU, FIFO, and random cache replacement algorithms. We found that the interactive (terminal) traffic in our sample had a quite different locality behavior than that of the noninteractive traffic. The interactive traffic did not follow the LRU stack model while the noninteractive traffic did. Examples are shown of the environments in which caching can help as well as those in which caching can hurt, unless the cache size is large.

...read moreread less

Journal Article•DOI•

Synchronization with multiprocessor caches

[...]

Joonwon Lee¹, Umakishore Ramachandran¹•Institutions (1)

Georgia Institute of Technology¹

01 May 1990

TL;DR: A new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism, and a new simulation model is developed embodying a widely accepted paradigm of parallel programming that outperforms existing cache protocols.

...read moreread less

Abstract: Introducing private caches in bus-based shared memory multiprocessors leads to the cache consistency problem since there may be multiple copies of shared data. However, the ability to snoop on the bus coupled with the fast broadcast capability allows the design of special hardware support for synchronization. We present a new lock-based cache scheme which incorporates synchronization into the cache coherency mechanism. With this scheme high-level synchronization primitives as well as low-level ones can be implemented without excessive overhead. Cost functions for well-known synchronization methods are derived for invalidation schemes, write update schemes, and our lock-based scheme. To accurately predict the performance implications of the new scheme, a new simulation model is developed embodying a widely accepted paradigm of parallel programming. It is shown that our lock-based protocol outperforms existing cache protocols.

...read moreread less

Patent•

Non-blocking serialization for removing data from a shared cache

[...]

Chandrasekaran Mohan¹, Inderpal S. Narang¹•Institutions (1)

IBM¹

14 Dec 1990

TL;DR: In this article, a high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex, where each data system has access both to the high speed cache and the lower-speed, secondary storage for obtaining and storing data.

...read moreread less

Abstract: A high-speed cache is shared by a plurality of independently-operating data systems in a multi-system data sharing complex. Each data system has access both to the high-speed cache and the lower-speed, secondary storage for obtaining and storing data. Management logic and the high-speed cache assures that a block of data obtained form the cache for entry into the secondary storage will be consistent with the version of the block of data in the shared cache.

...read moreread less

Proceedings Article•DOI•

Efficient trace-driven simulation method for cache performance analysis

[...]

Wen-Hann Wang¹, Jean-Loup Baer²•Institutions (2)

IBM¹, University of Washington²

01 Apr 1990

TL;DR: This work reduces the program traces to the extent that exact performance can still be obtained from the reduced traces and devise an algorithm that can produce performance results for a variety of metrics for a large number of set-association write-back caches in just a single simulation run.

...read moreread less

Abstract: We propose improvements to current trace-driven cache simulation methods to make them faster and more economical. We attack the large time and space demands of cache simulation in two ways. First, we reduce the program traces to the extent that exact performance can still be obtained from the reduced traces. Second, we devise an algorithm that can produce performance results for a variety of metrics (hit ratio, write-back counts, bus traffic) for a large number of set-associative write-back caches in just a single simulation run. The trace reduction and the efficient simulation techniques are extended to parallel multiprocessor cache simulations. Our simulation results show that our approach substantially reduces the disk space needed to store the program traces and can dramatically speedup cache simulations and still produce the exact results.

...read moreread less

Journal Article•DOI•

A 9-ns HIT-delay 32-kbyte cache macro for high-speed RISC

[...]

Kazutaka Nogami¹, Takayasu Sakurai¹, Kazuhiro Sawada¹, Kenji Sakaue¹, Yuichi Miyazawa¹, Sumio Tanaka¹, Yoichi Hiruta¹, K. Katoh¹, Toshinari Takayanagi¹, Tsukasa Shirotori¹, Y. Itoh¹, Masanori Uchida¹, Tetsuya Iizuka¹ - Show less +9 more•Institutions (1)

Toshiba¹

01 Feb 1990-IEEE Journal of Solid-state Circuits

TL;DR: A 32-kB cache macro with an experimental reduced instruction set computer (RISC) is realized and a pipelined cache access to realize a cycle time shorter than the cache access time is proposed.

...read moreread less

Abstract: A 32-kB cache macro with an experimental reduced instruction set computer (RISC) is realized. A pipelined cache access to realize a cycle time shorter than the cache access time is proposed. A double-word-line architecture combines single-port cells, dual-port cells, and CAM cells into a memory array to improve silicon area efficiency. The cache macro exhibits 9-ns typical clock-to-HIT delay as a result of several circuit techniques, such as a section word-line selector, a dual transfer gate, and 1.0- mu m CMOS technology. It supports multitask operation with logical addressing by a selective clear circuit. The RISC includes a double-word load/store instruction using a 64-b bus to fully utilize the on-chip cache macro. A test scheme allows measurement of the internal signal delay. The test device design is based on the unified design rules scalable through multigenerations of process technologies down to 0.8 mu m. >

...read moreread less

Patent•

A data processor for reloading deferred pushes in a copy-back data cache

[...]

Robin W. Edenfield¹, William B. Ledbetter¹•Institutions (1)

Motorola¹

26 Feb 1990

TL;DR: In this article, a data processor is provided for reloading deferred pushes in copy-back cache, which avoids the potential for multiple concurrent exception conditions, and eliminates the problem of unnecessarily removing an otherwise valid cache entry from the cache.

...read moreread less

Abstract: A data processor is provided for reloading deferred pushes in copy-back cache. When a cache "miss" occurs, a cache controller selects a cache line for replacement, and request a burst line read to transfer the required cache line from an external memory. When the date entries in the cache line selected for replacement are marked dirty, the cache controller "pushes" the cache line or dirty portions thereof into a buffer, which stores the cache line pending completion, by a bus interface controller, or the burst line read. When the burst line read terminates abnormally, due to a bus error or bus cache inhibit (or any other reason), the data cache controller reloads the copy-back cache with the cache line stored in the buffer. The reloading of the copy-back cache avoids the potential for multiple concurrent exception conditions, and eliminates the problem of unnecessarily removing an otherwise valid cache entry from the cache.

...read moreread less

Proceedings Article•DOI•

Data cache performance of supercomputer applications

[...]

David Callahan¹, Allan Porterfield¹•Institutions (1)

Tera Computer Company¹

01 Oct 1990

TL;DR: It is pointed out that aggressive compilers should be able to improve program performance by focusing on those array accesses that result in cache misses, and it was observed that the data caches contained the values for between 45% and 99% of the array access, depending on the cache and the program.

...read moreread less

Abstract: Processor speed has been increasing faster than mass memory speed. One method of matching a processor's speed to memory's is high-speed caches. This paper examines the data cache performance of a set of computationally intensive programs. Our interset in measuring cache performance arises from an interest in improving the performance of program during compilation. We observed that the data caches contained the values for between 45% and 99+% of the array accesses, depending on the cache and the program. The delays from the misses accounted for up to half of the total execution time of the program. The misses were grouped in a subset of source program references which resulted in misses on every access. Aggressive compilers should be able to improve program performance by focusing on those array accesses that result in cache misses.

...read moreread less

Patent•

Efficient cache write technique through deferred tag modification

[...]

Paul Mageau¹•Institutions (1)

Hewlett-Packard¹

22 Jun 1990

TL;DR: In this article, the authors propose a two-stage cache access pipeline which embellishes a simple "write-thru with write-allocate" cache write policy to achieve single cycle cache write access even when the processor cycle time does not allow sufficient time for the cache control to check the cache tag for validity and to reflect those results to the processor within the same processor cycle.

...read moreread less

Abstract: An efficient cache write technique useful in digital computer systems wherein it is desired to achieve single cycle cache write access even when the processor cycle time does not allow sufficient time for the cache control to check the cache "tag" for validity and to reflect those results to the processor within the same processor cycle. The novel method and apparatus comprising a two-stage cache access pipeline which embellishes a simple "write-thru with write-allocate" cache write policy.

...read moreread less

Proceedings Article•DOI•

Improving instruction cache behavior by reducing cache pollution

[...]

Rajiv Gupta¹, Chi-Hung Chi²•Institutions (2)

University of Pittsburgh¹, Philips²

01 Oct 1990

TL;DR: Compiler techniques for improving instruction cache performance through repositioning of the code in the main memory, leaving memory locations unused, code duplication, and code propagation, can be improved due to reduced cache pollution and fewer cache misses.

...read moreread less

Abstract: In this paper we describe compiler techniques for improving instruction cache performance. Through repositioning of the code in main memory, leaving memory locations unused, code duplication, and code propagation, the effectiveness of the cache can be improved due to reduced cache pollution and fewer cache misses. Results of experiments indicate that significant reduction in bus traffic results from the use of these techniques. Since memory bandwidth is a critical resource in shared memory multiprocessors, such systems can benefit from the techniques described. The notion of control dependence is used to decide when instructions belonging to different basic blocks can be allowed to share the same cache line without increasing cache pollution.

...read moreread less

Journal Article•DOI•

Trace-driven simulations for a two-level cache design in open bus systems

[...]

Håkon O. Bugge, Ernst H. Kristiansen, Bjørn O. Bakka

01 May 1990

TL;DR: This paper evaluates various metrics for data cache* designs, and discusses two open bus systems supporting a coherent memory model, Futurebus+ and SCI, as the interconnect system for main memory.

...read moreread less

Abstract: Two-level cache hierarchies will be a design issue in future high-performance CPUs. In this paper we evaluate various metrics for data cache* designs. We discuss both one- and two-level cache hierarchies. Our target is a new 100+ mips CPU, but the methods are applicable to any cache design. The basis of our work is a new trace-driven, multiprocess cache simulator. The simulator incorporates a simple priority-based scheduler which controls the execution of the processes. The scheduler blocks a process when a system call is executed. A workload consists of a total of 60 processes, distributed among seven unique programs with about nine instances each. We discuss two open bus systems supporting a coherent memory model, Futurebus+ and SCI, as the interconnect system for main memory.

...read moreread less

Patent•

Cache coherency method and apparatus for a multiple path interconnection network

[...]

Vernon K. Boland¹•Institutions (1)

NCR Corporation¹

24 Dec 1990

TL;DR: In this article, a method and apparatus for providing coherency for cache data in a multiple processor system with the processors distributed among multiple independent data paths is presented, which includes a set of cache monitors, sometimes called snoopers, associated with each cache memory.

...read moreread less

Abstract: A method and apparatus for providing coherency for cache data in a multiple processor system with the processors distributed among multiple independent data paths. The apparatus includes a set of cache monitors, sometimes called snoopers, associated with each cache memory. There are the same number of monitors as there are independent data paths. Thus, each cache stores cache tags that correspond to its currently encached data into each of the monitors of the set associated therewith. Thus, each cache has an monitor associated therewith which monitors each of the multiple paths for an operation at an address that corresponds to data stored in its cache. If such an access is detected by one of the set of monitors, the monitor notifies its cache so that appropriate action will be taken to ensure cache data coherency.

...read moreread less

Journal Article•DOI•

Lockup-free caches in high-performance multiprocessors

[...]

C. Scheurich¹, Michel Dubois¹•Institutions (1)

University of Southern California¹

01 Dec 1990-Journal of Parallel and Distributed Computing

TL;DR: This paper proposes a cache design in which the handling of one or several cache misses occurs concurrently with processor activity, and identifies system configurations for which concurrent miss resolution is effective.

...read moreread less

Journal Article•DOI•

Multiple-bus shared-memory system: Aquarius project

[...]

M. Carlton¹, A. Despain²•Institutions (2)

University of California, Berkeley¹, University of Southern California²

01 Jun 1990-IEEE Computer

TL;DR: A multiple-bus architecture called a multi-multi is presented, designed to handle several dimensions with a moderate number of processors per bus, and features of snooping cache schemes, with features of directory schemes, to provide consistency between buses.

...read moreread less

Abstract: A multiple-bus architecture called a multi-multi is presented. The architecture is designed to handle several dimensions with a moderate number of processors per bus. It provides scaling to a large number of processors in a system. A key characteristic of the architecture is the large amount of bandwidth it provides. Each node in the architecture contains a microprocessor, memory, and a cache. The cache-coherence protocol for the multi-multi architecture combines features of snooping cache schemes, to provide consistency on individual buses, with features of directory schemes, to provide consistency between buses. The snooping cache component can take advantage of the low-latency communication possible on shared buses for efficiency, yet the complete protocol will support many more processors than a single bus can. The resulting protocol naturally extends cache coherence from a multi to a multi-multi. Cache and directory states are described. Concepts that allow efficient performance, namely, local sharing, root node, and bus addresses in the directory, are discussed. >

...read moreread less

Journal Article•DOI•

Cache performance of the integer SPEC benchmarks on a RISC

[...]

Dionisios Pnevmatikatos¹, Mark D. Hill¹•Institutions (1)

University of Wisconsin-Madison¹

01 May 1990-ACM Sigarch Computer Architecture News

TL;DR: It is shown that the cache miss ratio of the Integer SPEC benchmarks depends strongly on the program, and that large caches are not completely exercised by these benchmarks.

...read moreread less

Abstract: SPEC is a new set of benchmark programs designed to measure a computer system's performance. The performance measured by benchmarks is strongly affected by the existence and configuration of cache memory. In this paper we evaluate the cache miss ratio of the Integer SPEC benchmarks. We show that the cache miss ratio depends strongly on the program, and that large caches are not completely exercised by these benchmarks.

...read moreread less

Patent•

Adaptive segment control and method for simulating a multi-segment cache

[...]

Hung Fan Kee F Seagate Technol¹•Institutions (1)

Seagate Technology¹

17 Aug 1990

TL;DR: In this article, an adaptive segment control for controlling performance of a multi-segment cache in a storage system is proposed. But, the adaptive control is limited to a single cache table.

...read moreread less

Abstract: An adaptive segment control for controlling performance of a multi-segment cache in a storage system. The adaptive segment control segments the cache to operate at a selected working segmentation level. A plurality of virtual cache tables are segmented such that one table operates at the working segmentation level and the other tables operate at different segmentation levels. During operation, the adaptive segment control monitors memory instructions transmitted by a host computer to the storage system and stores the instructions in an instruction queue. While the storage system is in an idle state, the adaptive segment control performs hit ratio simulations on the virtual cache tables by executing a selected number of instructions stored in the instruction queue. The working segmentation level is adjusted to equal the segmentation level of the virtual cache table having the highest hit ratio.

...read moreread less

Proceedings Article•DOI•

Disk cache performance for distributed systems

[...]

Dwight Makaroff¹, Derek L. Eager¹•Institutions (1)

University of Saskatchewan¹

28 May 1990

TL;DR: The influence of client and server cache sizes and the number of clients on caching performance is studied through trace-driven simulation and the results indicate that the locality of reference in disk block reference patterns allows relatively small caches to reduce significantlyThe number of disk accesses required.

...read moreread less

Abstract: The influence of client and server cache sizes and the number of clients on caching performance is studied through trace-driven simulation. The results indicate that the locality of reference in disk block reference patterns allows relatively small caches to reduce significantly the number of disk accesses required. File server cache performance is significantly different from client cache performance owing to the capture of disk block references by the client caches. The major factor influencing overall miss ratio statistics (actual disk reference frequencies) is found to be the maximum of the server cache size and the size of client caches. >

...read moreread less

Patent•

Re-configurable block length cache

[...]

Paul G. Schnizlein¹, Donald M. Walters•Institutions (1)

Advanced Micro Devices¹

23 Mar 1990

TL;DR: A cache organizational signal ("CORG signal") selects between cache organizations as mentioned in this paper, which is chosen according to the speed of the main memory that is paired with the cache to handle different size blocks of instructions.

...read moreread less

Abstract: A cache organizational signal ("CORG signal") selects between cache organizations. A cache organization is chosen according to the speed of the main memory that is paired with the cache to handle different size blocks of instructions. When the CORG signal organizes the cache to handle blocks having few instructions per block, many blocks are present and a higher hit rate occurs, which works well with a fast main memory. When the CORG signal organizes the cache to handle blocks having more instructions per block, fewer blocks are present, a lower hit rate occurs, and processor idle cycles decrease, which works well with a slower main memory.

...read moreread less

Patent•

Partitioned cache memory management.

[...]

Richard Lewis Mattson¹•Institutions (1)

IBM¹

26 Sep 1990

TL;DR: In this article, the cache is partitioned into a global sub-cache and k local sub-caches, with data belonging to one of the categories being pushed from the global subcaches to a corresponding subcache, and additionally pushing data from the sub caches into the storage device.

...read moreread less

Abstract: Improved performance of the storage system cache memory can be obtained by using a partitioned cache, in which storage data is classified into k categories using a pre-specified scheme of classification, and the cache is partitioned into a global sub-cache and k local sub-caches. When the data is required by the processor, data can be staged from the storage device, or the local sub-caches, to the global sub-cache, with data belonging to one of the categories being pushed from the global sub-cache to a corresponding one of the sub-caches, and additionally pushing data from the sub-caches into the storage device.

...read moreread less