scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Parallel and Distributed Computing in 1995"


Journal ArticleDOI
TL;DR: A survey of recent results on distributed loop computer networks and the actual computation of the minimum diameter and the construction of loop networks which can achieve this optimal number is given.

382 citations


Journal ArticleDOI
TL;DR: This paper examines the partitioning and scheduling techniques required to obtain effective parallel performance on applications that use a range of hierarchical N-body methods, and examines a recent hierarchical method for radiosity calculations in computer graphics.

205 citations


Journal ArticleDOI
TL;DR: FORTRAN M is a small set of extensions to FORTRAN 77 that supports a modular approach to the design of message-passing programs that can be compiled efficiently for uniprocessors, shared-memory computers, distributed- memory computers, and networks of workstations.

181 citations


Journal ArticleDOI
TL;DR: An analytical model is developed that predicts the reneging probability and expected resume delay, and this model is used to optimally allocate channels for batching, on-demand playback, and contingency and the effectiveness of the proposed policy over a scheme with no contingency channels and no batching is demonstrated.

128 citations


Journal ArticleDOI
TL;DR: A new static processor assignment policy, called Largest Task First Minimum Finish Time (LTFMFT), is introduced and the analysis shows that this policy is very sensitive to the degree of heterogeneity of the architecture, and that it outperforms all other policies analyzed.

124 citations


Journal ArticleDOI
TL;DR: This work demonstrates a storage scheme for an array A affinely aligned to a template that is distributed across p processors with a cyclic(k) distribution that does not waste any storage and shows that the local memory access sequence of any processor for a computation involving the regular section A(?:h:s) is characterized by a finite state machine of at most k states.

121 citations


Journal ArticleDOI
TL;DR: A detailed description of the design and implementation of the Munin prototype, with special emphasis given to its novel write shared protocol.

95 citations


Journal ArticleDOI
TL;DR: All the communication algorithms presented in this paper are based on the construction of spanning trees with special properties on the star graph to fit different communication needs, and are designed in terms of both time and number of message transmissions.

91 citations


Journal ArticleDOI
TL;DR: Hypermeshes are shown to have high bisection bandwidths, thereby minimizing the time for many common algorithms such as parallel sorting, and are considerably more powerful computational models than meshes, generalized hypercubes, and other orthogonal graphs.

87 citations


Journal ArticleDOI
TL;DR: This work presents a strategy for building cost calculi for skeleton-based programming languages which can be used for derivational software development and which deals in a pragmatic way with the difficulties of composition.

87 citations


Journal ArticleDOI
TL;DR: N nontrivial ways to use the Reconfigurable Mesh to solve several basic arithmetic problems in constant time are shown by novel ways to represent numbers and by exploiting the reconfigurability of the architecture.

Journal ArticleDOI
TL;DR: This paper explores affinity scheduling, a technique that helps reduce cache misses by preferentially scheduling a process on a processor where it has run recently, and shows that it is extremely simple to add to existing schedulers.

Journal ArticleDOI
TL;DR: From the experimental results, the storage system of Onyx machine can potentially provide about 360 concurrent video accesses with guaranteed quality of service and the impact of different concurrent access patterns on the performance of a server is studied.

Journal ArticleDOI
TL;DR: The RMRN is shown to be a truly scalable network, in that each node in the network has a fixed degree of connectivity and the reconfiguration mechanism ensures a network diameter of O (log 2 N ) for an N -processor network.

Journal ArticleDOI
TL;DR: The ultimate impact of fundamental physical limitations on parallel computing machines is considered, and it is found that scalability holds only for neighborly interconnections of bounded-size synchronous modules, presumably of the area-universal type.

Journal ArticleDOI
TL;DR: An open architecture that achieves seamless binding between networking and multimedia devices is proposed and is embedded into a reference model for multimedia networking architectures that supports a clean separation between binding interfaces and binding algorithms.

Journal ArticleDOI
TL;DR: This paper derives the optimal lambda′s for the k -ary n -cube network and its variants-the ring, the torus, the chain, and the mesh, and concludes that the GDE method favors high-dimensional k -ARY n -cubes.

Journal ArticleDOI
TL;DR: A new, constructive and efficient method is presented to determine the optimal (i.e., with smallest latency) affine-by-statement scheduling, and it is shown that these schedules are asymptotically as efficient as parameter-dependent solutions while much more regular.

Journal ArticleDOI
TL;DR: Results of porting and executing the NPB kernels in three different duster environments using low- to medium-powered workstations on Ethernet and two types of FDDI networks indicate that mediocre to good performance could be obtained despite the communications-intensive nature of the applications.

Journal ArticleDOI
TL;DR: A fine-grained parallel implementation of the MPEG-2 video encoder an the Intel Paragon XP/S parallel computer using a data-parallel approach and exploiting parallelism within each frame makes it suitable for real-time applications where the complete video sequence may not be present on the disk and may become available on a frame-by-frame basis with time.

Journal ArticleDOI
TL;DR: This paper familiarizes the reader with statistical global time estimation methods by presenting two methods, which have been introduced in the literature, and shows how a good balance between length of sample period and global time precision can be achieved through a detailed experimental analysis of the estimation error observed on samples.

Journal ArticleDOI
TL;DR: This work presents a new approach, suitable for direct simulations, that avoids all-to-all communication without requiring any geometric clustering, and proves to be fastest for simulations of up to several thousand particles.

Journal ArticleDOI
TL;DR: It is shown that update-based cache protocols can perform significantly better than write-invalidate protocols by incorporating a write cache in each processing node, and the memory-access penalty associated with coherence misses is drastically reduced.

Journal ArticleDOI
TL;DR: This paper presents an evaluation of three software implementations of release consistency, which allow data communication to be aggregated and allow multiple writers to simultaneously modify a single page, and shows that the lazy protocols consistently outperform the eager protocol for all but one application and the lazy hybrid performs the best overall.

Journal ArticleDOI
TL;DR: This work gives several positive answers to the self simulation problem on dynamically reconfigurable meshes, showing that the simulation of a reconfiguring mesh by a smaller one can be carried optimally, by using standard methods, on meshes such that buses are established along rows or along columns.

Journal ArticleDOI
TL;DR: In this paper, the problem of deadlock detection in asynchronous message passing systems in a system model that covers unspecified receptions and non-FIFO channels is dealt with, and a hierarchy of algorithms is presented.

Journal ArticleDOI
TL;DR: This paper describes and evaluates the use of aggregates in a programming langauge, and evaluates language support in CA for composing multiaccess data abstractions (delegation, first-class messages, and first- class and user-defined continuations).

Journal ArticleDOI
TL;DR: This fault-tolerant termination detection algorithm for a distributed system in which processes tend to fail has fewer detection delays than existing algorithms in the literature and comparable performance in terms of message complexity.

Journal ArticleDOI
TL;DR: A simple linear-time heuristic is presented which embeds an arbitrary binary tree into a hypercube with expansion 1 and average dilation no more than 2 and extends good embeddings for parity-balanced binary trees to arbitrary binary trees.

Journal ArticleDOI
TL;DR: In this article, the compiler is presented with dense code and automatically converts it into code operating on sparse data structures, then the dependence information obtained by analysis of the original code can be used to exploit potential concurrency in the generated sparse code.