scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Parallel and Distributed Computing in 1996"


Journal ArticleDOI
TL;DR: It is shown that on real and synthetic applications, the “work” and “critical-path length” of a Cilk computation can be used to model performance accurately, and it is proved that for the class of “fully strict” (well-structured) programs, the Cilk scheduler achieves space, time, and communication bounds all within a constant factor of optimal.

1,688 citations


Journal ArticleDOI
TL;DR: The Connection Machine Model CM-5 supercomputer as discussed by the authors is a massively parallel computer system designed to offer performance in the range of 1 teraflops (1012 floating point operations per second).

453 citations


Journal ArticleDOI
TL;DR: This paper proposes an approach based on global pointer and remote service request mechanisms, and explains how these mechanisms support dynamic communication structures, asynchronous messaging, dynamic thread creation and destruction, and a global memory model via interprocessor references.

298 citations


Journal ArticleDOI
TL;DR: This paper presents two general algorithms for simulated annealing that have been applied to job shop scheduling problem and the traveling salesman problem and it is observed that it is possible to achieve superlinear speedups using the algorithm.

179 citations


Journal ArticleDOI
TL;DR: This paper presents a distributed algorithm for mutual exclusion based on path reversal, which requires onlyO(log(n)) messages on average, where n is the number of processes in the network.

163 citations


Journal ArticleDOI
TL;DR: In this article, the authors describe lazy threads, a new approach for implementing multithreaded execution models on conventional machines, which can implement a parallel call at nearly the efficiency of a sequential call.

88 citations


Journal ArticleDOI
TL;DR: These results show that using the virtual processor approach, efficient code can be generated for execution of array statements involving block-cyclically distributed arrays.

87 citations


Journal ArticleDOI
TL;DR: Algorithms for performing a conflict-free minimum-spanning tree broadcast, a pipelined algorithm that is similar to Ho and Johnsson's EDST algorithm for hypercubes, and a novelscatter?collect approach that is a natural choice for communication libraries due to its simplicity are given.

86 citations


Journal ArticleDOI
TL;DR: Comp compiler analyses and optimizations for explicitly parallel programs that communicate through a shared address space are presented and programs that use linguistic synchronization constructs rather than their user-defined shared memory counterparts will benefit from more accurate analysis and therefore better optimization.

86 citations


Journal ArticleDOI
TL;DR: Deterministic sublinear-time distributed algorithms for network decomposition and for constructing a sparse neighborhood cover of a network lead to improved distributed preprocessing time for a number of distributed algorithms, including all-pairs shortest paths computation, load balancing, broadcast, and bandwidth management.

69 citations


Journal ArticleDOI
TL;DR: A necessary and sufficient condition is proposed that can be used for any adaptive or nonadaptive routing algorithm for wormhole routing, as long as only local information is required for routing, and which omits most channel dependencies that cannot be used to create a deadlock configuration.

Journal ArticleDOI
TL;DR: It is demonstrated that providing both software caching and computation migration can improve the performance of these programs, and a compile-time heuristic that selects between them for each pointer dereference is provided.

Journal ArticleDOI
TL;DR: This paper presents efficient and portable implementations of two useful primitives in image processing algorithms, histogramming and connected components which provide the best known execution times for these two primitives, even when compared with machine-specific implementations.

Journal ArticleDOI
TL;DR: It is shown that in DHC this effect can be reduced by reclaiming the leftover processors when the gang size is smaller than the allocated block of processors, and by adjusting the scheduling time quantum to control the adverse effect of badly matched gangs.

Journal ArticleDOI
TL;DR: This work investigates star graphs under the conditions offorbidden faulty sets, where all the neighbors of any node cannot be faulty simultaneously, and shows that under these conditions star graphs can tolerate upto (2n? 5) faulty nodes and the fault diameter is increased only by 2 in the worst case in presence of maximum number of faults.

Journal ArticleDOI
TL;DR: A new mathematical representation for regular distributions called FALLS is presented and algorithms for redistribution based on this representation are discussed, including being able to handle arbitrary source and target processor sets while performing redistribution.

Journal ArticleDOI
TL;DR: This work describes a new approach that separates real-time constraints from functional aspects of an application; real- time constraints are described by synchronization code between the interfaces of objects, and separates what an object does from when it does it.

Journal ArticleDOI
TL;DR: A range of new real-time virtual channel flow control schemes for wormhole networks are developed and a scheme for dropping messages that miss their deadlines is provided in order to reduce congestion.

Journal ArticleDOI
TL;DR: This paper provides definitions for several types of degenerate sharing, including false sharing, and provides an algorithm that computes the cost of unnecessary coherence (false coherence) in a shared memory system using a single memory trace.

Journal ArticleDOI
TL;DR: In this article, a modular and composable synchronization and real-time specification extensions to the object-oriented model are proposed to overcome inheritance anomaly problems in object-based concurrent systems.

Journal ArticleDOI
TL;DR: A new class of asynchronous iterative methods is proposed: the asynchronous iterations with flexible communication, which communicate to other processors of the value of the components of the iteration vector resulting from intermediary steps of computation.

Journal ArticleDOI
TL;DR: In this article, the authors describe lazy threads, a new approach for implementing multithreaded execution models on conventional machines, which can implement a parallel call at nearly the efficiency of a sequential call.

Journal ArticleDOI
TL;DR: This paper deals with the problem of deadlock detection in asynchronous message passing systems in a system model that covers unspecified receptions and non-FIFO channels and abstracts deadlocks by a general deadlock model that has the same modeling power as the OR-AND model.

Journal ArticleDOI
TL;DR: A special framework is developed on the generalized hypercube, a network that is currently receiving considerable attention, using this framework as the basic tool, and a number of spanning subgraphs with special properties to fit various communication needs are constructed on the network.

Journal ArticleDOI
TL;DR: A prototype implementation shows that a locality-conscious scheduler outperforms approaches ignoring locality information, and proposes novel scheduling policies based on locality information derived from cache miss counters.

Journal ArticleDOI
TL;DR: Tests with several parallel supercomputers demonstrate the speed of communication-free parallel Sobol' sequence generators and the rapid convergence properties of quasirandom Monte Carlo schemes indicate that the method described here may be gainfully applied to a wide range of problems.

Journal ArticleDOI
TL;DR: A new parallel heuristic, PLF, is introduced, and it is shown that this heuristic has the same expected runtime under the PRAM computational model as the scalable coloring heuristic introduced by Jones and Plassmann.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate the contribution of the abilities of a reconfigurable bus-based model to segment and fuse buses and show that the ability to fuse buses is the more crucial of the two.

Journal ArticleDOI
TL;DR: This paper suvrey and analyze several well-known distributed mutual exclusion algorithms according to their related characteristics and compares the performance of these algorithms by a simulation study.

Journal ArticleDOI
TL;DR: New methods for decomposing arrays into a cluster of machines with nonuniform computational power are developed and simulation results show that these methods provide superior decomposition over naive schemes.