scispace - formally typeset
Search or ask a question

Showing papers on "Cache algorithms published in 1984"


Proceedings ArticleDOI
01 Jan 1984
TL;DR: This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus that aims at reducing bus traffic and hence bus wait time and increases the overall processor utilization.
Abstract: This paper presents a cache coherence solution for multiprocessors organized around a single time-shared bus. The solution aims at reducing bus traffic and hence bus wait time. This in turn increases the overall processor utilization. Unlike most traditional high-performance coherence solutions, this solution does not use any global tables. Furthermore, this coherence scheme is modular and easily extensible, requiring no modification of cache modules to add more processors to a system. The performance of this scheme is evaluated by using an approximate analysis method. It is shown that the performance of this scheme is closely tied with the miss ratio and the amount of sharing between processors.

531 citations


Journal ArticleDOI
01 Jan 1984
TL;DR: This work review and qualitatively evaluate schemes to maintain cache coherence in tightly-coupled multiprocessor systems and proposes a more economical, expandable and modular variation of the “global directory” approach.
Abstract: In this paper we review and qualitatively evaluate schemes to maintain cache coherence in tightly-coupled multiprocessor systems. This leads us to propose a more economical (hardware-wise), expandable and modular variation of the “global directory” approach. Protocols for this solution are described. Performance evaluation studies indicate the limits (number of processors, level of sharing) within which this approach is viable.

127 citations


Patent
Richard E. Matick1, Daniel T. Ling1
01 Jun 1984
TL;DR: In this paper, a distributed cache is achieved by the use of communicating random access memory chips of the type incorporating a primary port (10) and a secondary port (14), which can run totally independently of each other.
Abstract: The cache reload time in small computer systems is improved by using a distributed cache located on the memory chips. The large bandwidth between the main memory and cache is the usual on-chip interconnecting lines which avoids pin input/output problems. This distributed cache is achieved by the use of communicating random access memory chips of the type incorporating a primary port (10) and a secondary port (14). Ideally, the primary and secondary ports can run totally independently of each other. The primary port functions as in a typical dynamic random access memory and is the usual input/output path for the memory chips. The secondary port, which provides the distributed cache, makes use of a separate master/slave row buffer (15) which is normally isolated from the sense amplifier/latches. Once this master/slave row buffer is loaded, it can be accessed very fast, and the large bandwidth between the main memory array and the on-chip row buffer provides a very fast reload time for a cache miss.

100 citations


Journal ArticleDOI
01 Jan 1984
TL;DR: This paper uses trace driven simulation to study design tradeoffs for small (on-chip) caches, and finds that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well.
Abstract: Advances in integrated circuit density are permitting the implementation on a single chip of functions and performance enhancements beyond those of a basic processors. One performance enhancement of proven value is a cache memory; placing a cache on the processor chip can reduce both mean memory access time and bus traffic. In this paper we use trace driven simulation to study design tradeoffs for small (on-chip) caches. Miss ratio and traffic ratio (bus traffic) are the metrics for cache performance. Particular attention is paid to sub-block caches (also known as sector caches), in which address tags are associated with blocks, each of which contains multiple sub-blocks; sub-blocks are the transfer unit. Using traces from two 16-bit architectures (Z8000, PDP-11) and two 32-bit architectures (VAX-11, System/370), we find that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well; typical miss and traffic ratios for a 1024 byte (net size) cache, 4-way set associative with 8 byte blocks are: PDP-11: .039, .156, Z8000: .015, .060, VAX 11: .080, .160, Sys/370: .244, .489. (These figures are based on traces of user programs and the performance obtained in practice is likely to be less good.) The use of sub-blocks allows tradeoffs between miss ratio and traffic ratio for a given cache size. Load forward is quite useful. Extensive simulation results are presented.

99 citations


Patent
27 Sep 1984
TL;DR: In this paper, a cache memory unit is constructed to have a two-stage pipeline shareable by a plurality of sources which include two independently operated central processing units (CPUs).
Abstract: A cache memory unit is constructed to have a two-stage pipeline shareable by a plurality of sources which include two independently operated central processing units (CPUs). Apparatus included within the cache memory unit operates to allocate alternate time slots to the two CPUs which offset their operations by a pipeline stage. This permits one pipeline stage of the cache memory unit to perform a directory search for one CPU while the other pipeline stage performs a data buffer read for the other CPU. Each CPU is programmed to use less than all of the time slots allocated to it. Thus, the processing units operate conflict-free while pipeline stages are freed up for processing requests from other sources, such as replacement data from main memory or cache updates.

80 citations


Patent
11 Apr 1984
TL;DR: In this paper, the cache memory is implemented in two memory parts (301, 302) as a two-way interleaved twoway set-associative memory, where one memory part implements odd words of one cache set and even words of the other cache set.
Abstract: In a processing system (10) comprising a main memory (102) for storing blocks (150) of four contiguous words (160) of information, a cache memory (101) for storing selected ones of the blocks, and a two-word wide bus (110) for transferring words from the main memory to the cache, the cache memory is implemented in two memory parts (301, 302) as a two-way interleaved two-way set-associative memory. One memory part implements odd words of one cache set (0), and even words of the other cache set (1), while the other memory part implements even words of the one cache set and odd words of the other cache set. Storage locations (303) of the memory parts are grouped into at least four levels (204) with each level having a location from each of the memory parts and each of the cache sets. The cache receives a block over the bus in two pairs of contiguous words. The cache memory is updated with both words of a word pair simultaneously. The pairs of words are each stored simultaneously into locations of one of either of the cache sets, each word into a location of a different memory part and of a different level. Cache hit check is performed on all locations of a level simultaneously. Simultaneously with the hit check, all locations of the checked level are accessed simultaneously.

58 citations


Patent
02 Nov 1984
TL;DR: In this article, a redundant error-detecting addressing code for use in a cache memory is presented, where the blocks are expanded to include redundant addressing information such as the logical data address and the physical cache address.
Abstract: A redundant error-detecting addressing code for use in a cache memory. A directory converts logical data addresses to physical addresses in the cache where the data is stored in blocks. The blocks are expanded to include redundant addressing information such as the logical data address and the physical cache address. When a block is accessed from the cache, the redundant addressing is compared to the directory addressing information to confirm that the correct data has been accessed.

29 citations


Journal ArticleDOI
01 Jan 1984
TL;DR: The Static Column RAM devices recently introduced offer the potential for implementing a direct-mapped cache on-chip with only a small increase in complexity over that needed for a conventional dynamic RAM memory system.
Abstract: The Static Column RAM devices recently introduced offer the potential for implementing a direct-mapped cache on-chip with only a small increase in complexity over that needed for a conventional dynamic RAM memory system. Trace-driven simulation shows that such a cache can only be marginally effective if used in the obvious way. However it can be effective in satisfying the requests from a processor containing an on-chip cache. The SCRAM cache is more effective if the processor cache handles both instructions and data.

25 citations


Patent
06 Jul 1984
TL;DR: In this article, the authors describe a cache-based multiple processor computer system, where data operated upon by any one of the processor units is stored in the cache memory associated with that processor unit.
Abstract: A multiple processor computer system features a store-into cache arrangement wherein each processor unit of the system has its own unique cache memory unit. Data operated upon by any one of the processor units is stored in the cache memory associated with that processor unit. When a thus modified block of data is required by another one of the processor units, the requested data is transferred directly to the requesting processor unit without having to first transfer the data to a shared main memory. Provision is also made for transferring data, under prescribed conditions from a cache to the main memory, but not as a precondition for transfer to a requesting processor.

20 citations


Book ChapterDOI
01 Nov 1984
TL;DR: This paper first defines and describes a highly parallel external data handling system and then shows how the capabilities of the system can be used to implement a high performance relational data base machine.
Abstract: This paper first defines and describes a highly parallel external data handling system and then shows how the capabilities of the system can be used to implement a high performance relational data base machine. The elements of the system architecture are an interconnection network which implements both packet routing and circuit switching and which implements data organization functions such as indexing and sort merge and an intelligent memory unit with a self-managing cache which implements associative search and capabilities for application of filtering operations on data streaming to and from storage.

11 citations


01 Jan 1984
TL;DR: This paper looks at each component of the memory hierarchy and addresses two issues: what are likely directions for development, and what are the interesting research problems.
Abstract: The effective and efficient use of the memory hierarchy of the computer system is one of the, if not the single most important aspect of computer system design and use. Cache memory performance is often the limiting factor in CPU performance and cache memories also serve to cut the memory traffic in multiprocessor systems. Multiprocessor systems are also requiring advances in cache architecture with respect to cache consistency. Similarly, the study of the best means to share main memory is an important research topic. Disk cache is becoming important for performance in high end computer systems and is now widely available commercially; there are many related research problems. The development of mass storage, especially optical disk, will promote research in effective algorithms for file management and migration. In this paper, we look at each component of the memory hierarchy and address two issues: what are likely directions for development, and what are the interesting research problems.

Patent
19 Mar 1984
TL;DR: In this paper, a controller for communication between the auxiliary processor and a cache mechanism in the system interface is proposed, which communication is to be carried on independently of the main memory accesses required to update the cache mechanisation in an overlapped manner.
Abstract: A controller for communication between the auxiliary processor and a cache mechanism in the system interface which communication is to be carried on independently of main memory accesses required to update the cache mechan­ ism in an overlapped manner.

Patent
25 May 1984
TL;DR: In this article, the authors propose a method for Direct Access Storage Device (DASD) cache management that reduces the volume of data transfer between DASD (27, 29, 53) and cache (101) while avoiding the complexity of managing variable length records in the cache.
Abstract: @ A method for Direct Access Storage Device (DASD) cache management that reduces the volume of data transfer between DASD (27, 29, 53) and cache (101) while avoiding the complexity of managing variable length records in the cache. This is achieved by always choosing the starting point for staging a record to be at the start of the missing record and, at the same time, allocating and managing cache space in fixed length blocks. The method steps require staging records, starting with the requested record and continuing until either the cache block is full, the end of track is reached, or a record already in the cache is encountered.

01 Jan 1984
TL;DR: The design and single chip implementation of a small data cache memory and associated controllers that implements an ownership- based cache consistency protocol and its evaluation is described.
Abstract: We describe the design and single chip implementation of a small data cache memory and associated controllers. The chip can be used as a building block of a multiprocessor system, positioned between the main memory bus and an individual processor. It implements an ownership- based cache consistency protocol. The chip has been designed to be interfaced to the MultiBus system bus and the Motorola 68000 processor. In this paper, we present our cache consistency protocol and its evaluation, and the chip architecture, design decisions, and implementation details.

Patent
09 May 1984
TL;DR: In this article, specific flags that are set at various times during a cache-to-cache transfer will, in the event of an error, enable the system to identify the status of each cache involved in the transfer.
Abstract: In a multiprocessing system, specific flags that are set at various times during a cache-to-cache transfer will, in the event of an error, enable the system to identify the status of each cache involved in the transfer. Different procedures are utilized for the cache from which data was being fetched and the cache to which data was being stored, as well as the main memory. If a castout was in process, then main memory may not have been updated and may contain obsolete data. In that case, the mechanism will force an uncorrectable error into the target location to insure that obsolete data is not subsequently utilized by the system.


Journal ArticleDOI
TL;DR: Limited, though encouraging, results are presented which show that this new algorithm can be at least as effective as the Critical LRU algorithm, even when the memory management policy is LRU itself, and can also be at at leastAs effective as Critical Working Set, evenWhen the memorymanagement policy is the working set policy.

Journal ArticleDOI
TL;DR: Results prove that in order to obtain a reasonable performance it is enough with 50–150 cells in the cache memory, as well as a possible emulator architecture and its instruction stream processing principles.

01 Feb 1984
TL;DR: The A900 computer provides approximately three times the performance of a previous HP 1000 computer while maintaining full software compatibility with other HP 1000 A-series machines.
Abstract: The A900 computer provides approximately three times the performance of a previous HP 1000 computer while maintaining full software compatibility with other HP 1000 A-series machines. The cost has been kept low by not using emitter-coupled logic. The A900 makes use of a pipelined data path and a cache memory. Why it uses them is described by the author.

01 Aug 1984
TL;DR: A cache coherence solution is proposed for a two-level cache organization for multiprocessors and the performance of the proposed multi-processor is evaluated with analytical methods.
Abstract: : This thesis proposes a two-level cache organization for multiprocessors. The first level of cache consists of a private cache per processor. The second level of caches is shared by all processors. The main memory is also similarly shared. A cache coherence solution is proposed for such an organization. The performance of the proposed multi-processor is evaluated with analytical methods. The factors that affect the performance are quantitatively discussed. A variation of the proposed coherence algorithm is presented to improve the performance. Keywords: High reliability; Cache memories; Mathematical analysis. (Author)