scispace - formally typeset
Open AccessJournal ArticleDOI

Dynamic decentralized cache schemes for mimd parallel processors

Larry Rudolph, +1 more
- Vol. 12, Iss: 3, pp 340-347
TLDR
It appears that moderately large parallel processors can be designed by employing the principles presented in this paper, and both schemes feature decentralized consistency control and dynamic type classification of the datum cached.
Abstract
This paper presents two cache schemes for a shared-memory shared bus multiprocessor. Both schemes feature decentralized consistency control and dynamic type classification of the datum cached (i.e. read-only, local, or shared). It is shown how to exploit these features to minimize the shared bus traffic. The broadcasting ability of the shared bus is used not only to signal an event but also to distribute data. In addition, by introducing a new synchronization construct, i.e. the Test-and-Test-and-Set instruction, many of the traditional. parallell processing “hot spots” or bottlenecks are eliminated. Sketches of formal correctness proofs for the proposed schemes are also presented. It appears that moderately large parallel processors can be designed by employing the principles presented in this paper.

read more

Content maybe subject to copyright    Report

NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:
The copyright law of the United States (title 17, U.S. Code) governs the making
of photocopies or other reproductions of copyrighted material. Any copying of this
document without permission of its author may be prohibited by law.

CMU-CS-84-139
Dynamic Decentralized Cache Schemes for MIMD Parallel Processors
by
Larry Rudolph
Zary Segall
Computer Science Department
Carnegie-Mellon University
Abstract
This paper presents two cache schemes for a shared-memory shared bus multiprocessor. Both
schemes feature decentralized consistency control and dynamic type classification of the datum
cached (i.e. read-only, local, or shared). It is shown how to exploit these features to minimize the
shared bus traffic. The broadcasting ability of the shared bus is used not only to signal an event but
also to distribute data. In addition, by introducing a new synchronization construct, i.e. the Test-and-
Test-and-Set instruction, many of the traditional parallel processing "hot spots" or bottlenecks are
eliminated. Sketches of formal correctness proofs for the proposed schemes are also presented. It
appears that moderately large parallel processors can be designed by employing the principles
presented in this paper.
This research has been supported in part by National Science Foundation Grant MCS-8120270. The
views and conclusions contained in this paper are those of the authors and should not be interpreted
as representing the official policies, either expressed or implied, of NSF or Carnegie-Mellon
University.

Table of Contents
1. Introduction
2.
Assumptions
3.
The RB Cache Scheme
4. Proof of Consistency - Sketch
5. The RWB Cache Scheme
6. Synchronization Using Caches
6.1. Synchronization Using RB Scheme
6.2. Synchronization Using RWB Scheme
7. Shared Bus Bandwidth
8. Conclusion

ii
List of Figures
Figu re 3-1: State Transition Diagram for each Cache Entry for the RB Scheme 9
Figu re 5-1: State Transition Diagram for each Cache Entry for the RWB Scheme 14
Figu re 6-1: Synchronization with Test-and-Set for RB Scheme 16
Figu re 6-
2:
Synchronization with Test-and-Test-and-Set for RB Scheme 17
Figure 6-3: Synchronization with Test-and-Test-and-Set for RWB Scheme 18
Figu re 7-1: Multiple Shared Bus Cached Based Parallel Processor 20

Ill
List of Tables
Table
1
-1: Cm* Emulated Cache Results

Citations
More filters
Proceedings ArticleDOI

Transactional memory: architectural support for lock-free data structures

TL;DR: Simulation results show that transactional memory matches or outperforms the best known locking techniques for simple benchmarks, even in the absence of priority inversion, convoying, and deadlock.
Journal ArticleDOI

Algorithms for scalable synchronization on shared-memory multiprocessors

TL;DR: The principal conclusion is that contention due to synchronization need not be a problemin large-scale shared-memory multiprocessors, and the existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides protection against so-called “dance hall” architectures.
Proceedings ArticleDOI

Ligra: a lightweight graph processing framework for shared memory

TL;DR: This paper presents a lightweight graph processing framework that is specific for shared-memory parallel/multicore machines, which makes graph traversal algorithms easy to write and significantly more efficient than previously reported results using graph frameworks on machines with many more cores.
Journal ArticleDOI

The performance of spin lock alternatives for shared-money multiprocessors

TL;DR: The author examines the questions of whether there are efficient algorithms for software spin-waiting given hardware support for atomic instructions, or whether more complex kinds of hardware support are needed for performance.
Journal ArticleDOI

Cache coherence protocols: evaluation using a multiprocessor simulation model

TL;DR: The magnitude of the potential performance difference between the various approaches indicates that the choice of coherence solution is very important in the design of an efficient shared-bus multiprocessor, since it may limit the number of processors in the system.
References
More filters
Journal ArticleDOI

Cache Memories

TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.
Journal ArticleDOI

The NYU Ultracomputer—Designing an MIMD Shared Memory Parallel Computer

TL;DR: The design for the NYU Ultracomputer is presented, a shared-memory MIMD parallel machine composed of thousands of autonomous processing elements that uses an enhanced message switching network with the geometry of an Omega-network to approximate the ideal behavior of Schwartz's paracomputers model of computation.
Journal ArticleDOI

A New Solution to Coherence Problems in Multicache Systems

TL;DR: A memory hierarchy has coherence problems as soon as one of its levels is split in several independent units which are not equally accessible from faster levels or processors.
Proceedings ArticleDOI

Using cache memory to reduce processor-memory traffic

TL;DR: It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.
Proceedings ArticleDOI

Cache system design in the tightly coupled multiprocessor system

C. K. Tang
TL;DR: System requirements in the multiprocessor environment as well as the cost-performance trade-offs of the cache system design are given in detail and the possibility of sharing the Cache system hardware with other multiprocessioning facilities (such as dynamic address translation, storage protection, locks, serialization, and the system clocks) is discussed.
Related Papers (5)