scispace - formally typeset
Proceedings ArticleDOI

How to emulate shared memory

Abhiram Ranade
- pp 185-194
TLDR
In this paper, the authors presented a simple algorithm for emulating an N processor CRCW PRAM on an N node butterfly, where each step of the PRAM is emulated in time O(log N) with high probability, using FIFO queues of size O(1) at each node.
Abstract
We present a simple algorithm for emulating an N processor CRCW PRAM on an N node butterfly. Each step of the PRAM is emulated in time O(log N) with high probability, using FIFO queues of size O(1) at each node. The only use of randomization is in selecting a hash function to distribute the shared address space of the PRAM onto the nodes of the butterfly. The routing itself is both deterministic and oblivious, and messages are combined without the use of associative memories or explicit sorting. As a corollary we improve the result of Pippenger [8] by routing permutations with bounded queues in logarithmic time, without the possibility of deadlock. Besides being optimal, our algorithm has the advantage of extreme simplicity and is readily suited for use in practice.

read more

Citations
More filters
Journal ArticleDOI

A bridging model for parallel computation

TL;DR: The bulk-synchronous parallel (BSP) model is introduced as a candidate for this role, and results quantifying its efficiency both in implementing high-level language features and algorithms, as well as in being implemented in hardware.
Journal ArticleDOI

Scheduling multithreaded computations by work stealing

TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time to execute a fully strict computation on P processors using this scheduler is 1:1.
Book ChapterDOI

Parallel algorithms for shared-memory machines

TL;DR: In this paper, the authors discuss parallel algorithms for shared-memory machines and discuss the theoretical foundations of parallel algorithms and parallel architectures, and present a theoretical analysis of the appropriate logical organization of a massively parallel computer.
Proceedings ArticleDOI

Scheduling multithreaded computations by work stealing

TL;DR: This paper gives the first provably good work-stealing scheduler for multithreaded computations with dependencies, and shows that the expected time T/sub P/ to execute a fully strict computation on P processors using this work- Stealing Scheduler is T/ Sub P/=O(T/sub 1//P+T/ sub /spl infin//), where T/ sub 1/ is the minimum serial execution time of the multith readed computation and T/
MonographDOI

Introduction to Parallel Computing

TL;DR: In this article, a comprehensive introduction to parallel computing is provided, discussing theoretical issues such as the fundamentals of concurrent processes, models of parallel and distributed computing, and metrics for evaluating and comparing parallel algorithms, as well as practical issues, including methods of designing and implementing shared-and distributed-memory programs, and standards for parallel program implementation.
References
More filters
Proceedings ArticleDOI

Universal schemes for parallel communication

TL;DR: This paper shows that there exists an N-processor computer that can simulate arbitrary N- processor parallel computations with only a factor of O(log N) loss of runtime efficiency, and isolates a combinatorial problem that lies at the heart of this question.
Journal ArticleDOI

“Hot spot” contention and combining in multistage interconnection networks

TL;DR: The technique of message combining was found to be an effective means of eliminating this problem if it arises due to lock or synchronization contention, severely degrading all memory access, not just access to shared lock locations, due to an effect the authors call tree saturation.
Journal ArticleDOI

Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

TL;DR: This paper implements several basic operating system primitives by using a "replace-add" operation, which can supersede the standard "test and set" and which appears to be a universal primitive for efficiently coordinating large numbers of independently acting sequential processors.
Journal ArticleDOI

A logarithmic time sort for linear size networks

TL;DR: A randomized algorithm that sorts on an N- node network with constant valence in O(log N) time with probability at least 1 - N- “α” - “ α” for all large enough items.