Showing papers on "Cache algorithms published in 1984"

PDF

Open Access

Proceedings Article•DOI•

A low-overhead coherence solution for multiprocessors with private cache memories

[...]

Mark S. Papamarcos¹, Janak H. Patel¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Jan 1984

TL;DR: This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus that aims at reducing bus traffic and hence bus wait time and increases the overall processor utilization.

...read moreread less

Abstract: This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus. The solution aims at reducing bus traffic and hence bus wait time. This in turn increases the overall processor utilization. Unlike most traditional high-performance coherence solutions, this solution does not use any global tables. Furthermore, this coherence scheme is modular and easily extensible, requiring no modification of cache modules to add more processors to a system. The performance of this scheme is evaluated by using an approximate analysis method. It is shown that the performance of this scheme is closely tied with the miss ratio and the amount of sharing between processors.

...read moreread less

531 citations

Journal Article•DOI•

An economical solution to the cache coherence problem

[...]

James Archibald¹, Jean Loup Baer¹•Institutions (1)

University of Washington¹

01 Jan 1984

TL;DR: This work review and qualitatively evaluate schemes to maintain cache coherence in tightly-coupled multiprocessor systems and proposes a more economical, expandable and modular variation of the “global directory” approach.

...read moreread less

Abstract: In this paper we review and qualitatively evaluate schemes to maintain cache coherence in tightly-coupled multiprocessor systems. This leads us to propose a more economical (hardware-wise), expandable and modular variation of the “global directory” approach. Protocols for this solution are described. Performance evaluation studies indicate the limits (number of processors, level of sharing) within which this approach is viable.

...read moreread less

127 citations

Patent•

Distributed, on-chip cache

[...]

Richard E. Matick¹, Daniel T. Ling¹•Institutions (1)

IBM¹

01 Jun 1984

TL;DR: In this paper, a distributed cache is achieved by the use of communicating random access memory chips of the type incorporating a primary port (10) and a secondary port (14), which can run totally independently of each other.

...read moreread less

Abstract: The cache reload time in small computer systems is improved by using a distributed cache located on the memory chips. The large bandwidth between the main memory and cache is the usual on-chip interconnecting lines which avoids pin input/output problems. This distributed cache is achieved by the use of communicating random access memory chips of the type incorporating a primary port (10) and a secondary port (14). Ideally, the primary and secondary ports can run totally independently of each other. The primary port functions as in a typical dynamic random access memory and is the usual input/output path for the memory chips. The secondary port, which provides the distributed cache, makes use of a separate master/slave row buffer (15) which is normally isolated from the sense amplifier/latches. Once this master/slave row buffer is loaded, it can be accessed very fast, and the large bandwidth between the main memory array and the on-chip row buffer provides a very fast reload time for a cache miss.

...read moreread less

100 citations

Journal Article•DOI•

Experimental evaluation of on-chip microprocessor cache memories

[...]

Mark D. Hill¹, Alan Jay Smith¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1984

TL;DR: This paper uses trace driven simulation to study design tradeoffs for small (on-chip) caches, and finds that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well.

...read moreread less

Abstract: Advances in integrated circuit density are permitting the implementation on a single chip of functions and performance enhancements beyond those of a basic processors. One performance enhancement of proven value is a cache memory; placing a cache on the processor chip can reduce both mean memory access time and bus traffic. In this paper we use trace driven simulation to study design tradeoffs for small (on-chip) caches. Miss ratio and traffic ratio (bus traffic) are the metrics for cache performance. Particular attention is paid to sub-block caches (also known as sector caches), in which address tags are associated with blocks, each of which contains multiple sub-blocks; sub-blocks are the transfer unit. Using traces from two 16-bit architectures (Z8000, PDP-11) and two 32-bit architectures (VAX-11, System/370), we find that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well; typical miss and traffic ratios for a 1024 byte (net size) cache, 4-way set associative with 8 byte blocks are: PDP-11: .039, .156, Z8000: .015, .060, VAX 11: .080, .160, Sys/370: .244, .489. (These figures are based on traces of user programs and the performance obtained in practice is likely to be less good.) The use of sub-blocks allows tradeoffs between miss ratio and traffic ratio for a given cache size. Load forward is quite useful. Extensive simulation results are presented.

...read moreread less

99 citations

Patent•

Multiprocessor shared pipeline cache memory with split cycle and concurrent utilization

[...]

Keeley James W¹, Thomas F. Joyce¹•Institutions (1)

Honeywell¹

27 Sep 1984

TL;DR: In this paper, a cache memory unit is constructed to have a two-stage pipeline shareable by a plurality of sources which include two independently operated central processing units (CPUs).

...read moreread less

Abstract: A cache memory unit is constructed to have a two-stage pipeline shareable by a plurality of sources which include two independently operated central processing units (CPUs). Apparatus included within the cache memory unit operates to allocate alternate time slots to the two CPUs which offset their operations by a pipeline stage. This permits one pipeline stage of the cache memory unit to perform a directory search for one CPU while the other pipeline stage performs a data buffer read for the other CPU. Each CPU is programmed to use less than all of the time slots allocated to it. Thus, the processing units operate conflict-free while pipeline stages are freed up for processing requests from other sources, such as replacement data from main memory or cache updates.

...read moreread less

80 citations

Patent•

Interleaved set-associative memory

[...]

David Michael Patrick¹•Institutions (1)

AT&T Corporation¹

11 Apr 1984

TL;DR: In this paper, the cache memory is implemented in two memory parts (301, 302) as a two-way interleaved twoway set-associative memory, where one memory part implements odd words of one cache set and even words of the other cache set.

...read moreread less

Abstract: In a processing system (10) comprising a main memory (102) for storing blocks (150) of four contiguous words (160) of information, a cache memory (101) for storing selected ones of the blocks, and a two-word wide bus (110) for transferring words from the main memory to the cache, the cache memory is implemented in two memory parts (301, 302) as a two-way interleaved two-way set-associative memory. One memory part implements odd words of one cache set (0), and even words of the other cache set (1), while the other memory part implements even words of the one cache set and odd words of the other cache set. Storage locations (303) of the memory parts are grouped into at least four levels (204) with each level having a location from each of the memory parts and each of the cache sets. The cache receives a block over the bus in two pairs of contiguous words. The cache memory is updated with both words of a word pair simultaneously. The pairs of words are each stored simultaneously into locations of one of either of the cache sets, each word into a location of a different memory part and of a different level. Cache hit check is performed on all locations of a level simultaneously. Simultaneously with the hit check, all locations of the checked level are accessed simultaneously.

...read moreread less

58 citations

Patent•

Redundant page identification for a catalogued memory

[...]

Jerry Duane Dixon¹, Robert Henry Farrell¹, Gerald A. Marazas¹, Andrew Boyce Mcneill¹, Gerald Ulrich Merckel¹ - Show less +1 more•Institutions (1)

IBM¹

02 Nov 1984

TL;DR: In this article, a redundant error-detecting addressing code for use in a cache memory is presented, where the blocks are expanded to include redundant addressing information such as the logical data address and the physical cache address.

...read moreread less

Abstract: A redundant error-detecting addressing code for use in a cache memory. A directory converts logical data addresses to physical addresses in the cache where the data is stored in blocks. The blocks are expanded to include redundant addressing information such as the logical data address and the physical cache address. When a block is accessed from the cache, the redundant addressing is compared to the directory addressing information to confirm that the correct data has been accessed.

...read moreread less

29 citations

Journal Article•DOI•

The use of static column ram as a memory hierarchy

[...]

James R. Goodman¹, Men-chow Chiang¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 1984

TL;DR: The Static Column RAM devices recently introduced offer the potential for implementing a direct-mapped cache on-chip with only a small increase in complexity over that needed for a conventional dynamic RAM memory system.

...read moreread less

Abstract: The Static Column RAM devices recently introduced offer the potential for implementing a direct-mapped cache on-chip with only a small increase in complexity over that needed for a conventional dynamic RAM memory system. Trace-driven simulation shows that such a cache can only be marginally effective if used in the obvious way. However it can be effective in satisfying the requests from a processor containing an on-chip cache. The SCRAM cache is more effective if the processor cache handles both instructions and data.

...read moreread less

25 citations

Patent•

Computer hierarchy control

[...]

Hooker Lane K¹, Thomas H Howell¹, Ferrell Charles W¹•Institutions (1)

Honeywell¹

06 Jul 1984

TL;DR: In this article, the authors describe a cache-based multiple processor computer system, where data operated upon by any one of the processor units is stored in the cache memory associated with that processor unit.

...read moreread less

Abstract: A multiple processor computer system features a store-into cache arrangement wherein each processor unit of the system has its own unique cache memory unit. Data operated upon by any one of the processor units is stored in the cache memory associated with that processor unit. When a thus modified block of data is required by another one of the processor units, the requested data is transferred directly to the requesting processor unit without having to first transfer the data to a shared main memory. Provision is also made for transferring data, under prescribed conditions from a cache to the main memory, but not as a precondition for transfer to a requesting processor.

...read moreread less

20 citations

Book Chapter•DOI•

A Parallel Multi-Stage I/O Architecture with Self-Managing Disk Cache ForDatabase Management Applications

[...]

James C. Browne¹, Alfred G. Dale¹, C. Leung¹, Roy M. Jenevein¹•Institutions (1)

University of Texas at Austin¹

01 Nov 1984

TL;DR: This paper first defines and describes a highly parallel external data handling system and then shows how the capabilities of the system can be used to implement a high performance relational data base machine.

...read moreread less

Abstract: This paper first defines and describes a highly parallel external data handling system and then shows how the capabilities of the system can be used to implement a high performance relational data base machine. The elements of the system architecture are an interconnection network which implements both packet routing and circuit switching and which implements data organization functions such as indexing and sort merge and an intelligent memory unit with a self-managing cache which implements associative search and capabilities for application of filtering operations on data streaming to and from storage.

...read moreread less

11 citations

Problems, Directions and Issues in Memory Hierarchy

[...]

Alan J Smith

01 Jan 1984

TL;DR: This paper looks at each component of the memory hierarchy and addresses two issues: what are likely directions for development, and what are the interesting research problems.

...read moreread less

Abstract: The effective and efficient use of the memory hierarchy of the computer system is one of the, if not the single most important aspect of computer system design and use. Cache memory performance is often the limiting factor in CPU performance and cache memories also serve to cut the memory traffic in multiprocessor systems. Multiprocessor systems are also requiring advances in cache architecture with respect to cache consistency. Similarly, the study of the best means to share main memory is an important research topic. Disk cache is becoming important for performance in high end computer systems and is now widely available commercially; there are many related research problems. The development of mass storage, especially optical disk, will promote research in effective algorithms for file management and migration. In this paper, we look at each component of the memory hierarchy and address two issues: what are likely directions for development, and what are the interesting research problems.

...read moreread less

Patent•

Hierarchy of control stores for overlapped data transmission

[...]

Thomas Michael Steckler

19 Mar 1984

TL;DR: In this paper, a controller for communication between the auxiliary processor and a cache mechanism in the system interface is proposed, which communication is to be carried on independently of the main memory accesses required to update the cache mechanisation in an overlapped manner.

...read moreread less

Abstract: A controller for communication between the auxiliary processor and a cache mechanism in the system interface which communication is to be carried on independently of main memory accesses required to update the cache mechan ism in an overlapped manner.

...read moreread less

Patent•

A method for the replacement of blocks of information and its use in a data processing system

[...]

Malcolm Coleman Easton¹, John H. Howard¹•Institutions (1)

IBM¹

25 May 1984

TL;DR: In this article, the authors propose a method for Direct Access Storage Device (DASD) cache management that reduces the volume of data transfer between DASD (27, 29, 53) and cache (101) while avoiding the complexity of managing variable length records in the cache.

...read moreread less

Abstract: @ A method for Direct Access Storage Device (DASD) cache management that reduces the volume of data transfer between DASD (27, 29, 53) and cache (101) while avoiding the complexity of managing variable length records in the cache. This is achieved by always choosing the starting point for staging a record to be at the start of the missing record and, at the same time, allocating and managing cache space in fixed length blocks. The method steps require staging records, starting with the requested record and continuing until either the cache block is full, the end of track is reached, or a record already in the cache is encountered.

...read moreread less

Design and Implementation of An Integrated Snooping Data Cache

[...]

Gaetano Borriello, Susan J. Eggers, Randy H. Katz, Harry McKinley, Charles Perkins, Walter S Scott, Robert G. Sheldon, Shaun Whalen, Darien Wood - Show less +5 more

01 Jan 1984

TL;DR: The design and single chip implementation of a small data cache memory and associated controllers that implements an ownership- based cache consistency protocol and its evaluation is described.

...read moreread less

Abstract: We describe the design and single chip implementation of a small data cache memory and associated controllers. The chip can be used as a building block of a multiprocessor system, positioned between the main memory bus and an individual processor. It implements an ownership- based cache consistency protocol. The chip has been designed to be interfaced to the MultiBus system bus and the Motorola 68000 processor. In this paper, we present our cache consistency protocol and its evaluation, and the chip architecture, design decisions, and implementation details.

...read moreread less

Patent•

Error recovery of non-store-through cache

[...]

Bruce L. McGilvray¹, Arthur James Sutton¹•Institutions (1)

IBM¹

09 May 1984

TL;DR: In this article, specific flags that are set at various times during a cache-to-cache transfer will, in the event of an error, enable the system to identify the status of each cache involved in the transfer.

...read moreread less

Abstract: In a multiprocessing system, specific flags that are set at various times during a cache-to-cache transfer will, in the event of an error, enable the system to identify the status of each cache involved in the transfer. Different procedures are utilized for the cache from which data was being fetched and the cache to which data was being stored, as well as the main memory. If a castout was in process, then main memory may not have been updated and may contain obsolete data. In that case, the mechanism will force an uncorrectable error into the target location to insure that obsolete data is not subsequently utilized by the system.

...read moreread less

A Low-Overhead Coherence Solution for Bus-Organized Multiprocessors with Private Cache Memories

[...]

Mark Stanley Papamarcos

01 May 1984

Journal Article•DOI•

Strategy independent program restructuring using the critical reference principle

[...]

Kwai-Ting Lan¹, Dennis Kafura²•Institutions (2)

California State University¹, Virginia Tech²

01 May 1984-Performance Evaluation

TL;DR: Limited, though encouraging, results are presented which show that this new algorithm can be at least as effective as the Critical LRU algorithm, even when the memory management policy is LRU itself, and can also be at at leastAs effective as Critical Working Set, evenWhen the memorymanagement policy is the working set policy.

...read moreread less

Journal Article•DOI•

Hierarchical emulation of well-coded programs through the use of streams of instructions buffered in on-chip associative memories and distributed control

[...]

Dany Suk¹, Gunnar Carlstedt•Institutions (1)

Chalmers University of Technology¹

01 Mar 1984-Microprocessing and Microprogramming

TL;DR: Results prove that in order to obtain a reasonable performance it is enough with 50–150 cells in the cache memory, as well as a possible emulator architecture and its instruction stream processing principles.

...read moreread less

Designing a low-cost 3-mips computer

[...]

D.A. Williamson, S.C. Steps, B.A. Thompson

01 Feb 1984

TL;DR: The A900 computer provides approximately three times the performance of a previous HP 1000 computer while maintaining full software compatibility with other HP 1000 A-series machines.

...read moreread less

Abstract: The A900 computer provides approximately three times the performance of a previous HP 1000 computer while maintaining full software compatibility with other HP 1000 A-series machines. The cost has been kept low by not using emitter-coupled logic. The A900 makes use of a pipelined data path and a cache memory. Why it uses them is described by the author.

...read moreread less

A Performance Analysis of Multiprocessors Using Two-Level Caches

[...]

Daniel James Colglazier

01 Aug 1984

TL;DR: A cache coherence solution is proposed for a two-level cache organization for multiprocessors and the performance of the proposed multi-processor is evaluated with analytical methods.

...read moreread less

Abstract: : This thesis proposes a two-level cache organization for multiprocessors. The first level of cache consists of a private cache per processor. The second level of caches is shared by all processors. The main memory is also similarly shared. A cache coherence solution is proposed for such an organization. The performance of the proposed multi-processor is evaluated with analytical methods. The factors that affect the performance are quantitatively discussed. A variation of the proposed coherence algorithm is presented to improve the performance. Keywords: High reliability; Cache memories; Mathematical analysis. (Author)

...read moreread less

The MU6-G virtual address cache

[...]

Alan E Knowles, Shreekant Thakkar

01 Jan 1984