scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Parallel and Distributed Computing in 2008"


Journal ArticleDOI
TL;DR: This paper uses NVIDIA's C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU.

660 citations


Journal ArticleDOI
TL;DR: In the latest release, this package is improved to enable direct blocking/non-blocking communication of numeric arrays, and to support almost all MPI-2 features.

295 citations


Journal ArticleDOI
TL;DR: The acceleration of an advanced magnetic resonance imaging reconstruction algorithm on NVIDIA's Quadro FX 5600 achieves up to 180 GFLOPS and requires just over one minute on the Quadro, while reconstruction on a quad-core CPU is twenty-one times slower.

268 citations


Journal ArticleDOI
TL;DR: The LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs and outperforms the HEFT and DLS algorithms in terms of schedule length and speedup.

216 citations


Journal ArticleDOI
TL;DR: The algorithm is of complexity nlogn, and for lists of 8 M elements and using a single Geforce 8800 GTS-512, it is 2.5 times as fast as the bitonic sort algorithms, with standard complexity of n(logn)^2.

203 citations


Journal ArticleDOI
TL;DR: This paper presents a system called Jitter, which reduces the frequency on nodes that are assigned less computation and therefore have slack time, and the goal of Jitter is to attempt to ensure that they arrive "just in time" so that they avoid increasing overall execution time.

184 citations


Journal ArticleDOI
TL;DR: This work proposes program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that are likely to contain the global maximum, and shows that this approach is significantly superior to random sampling of the search space.

137 citations


Journal ArticleDOI
TL;DR: This work considers the problem of link scheduling in a sensor network employing a TDMA MAC protocol and develops a distributed edge-coloring algorithm that is the first distributed algorithm that can edge-color a graph using at most (@D+1) colors.

104 citations


Journal ArticleDOI
TL;DR: This paper presents an overview of a typical plasma PIC code and discusses its GPU implementation and focuses on fast algorithms for the performance bottleneck operation of Particle-To-Grid interpolation.

103 citations


Journal ArticleDOI
TL;DR: This work claims that hypergraph partitioning with multiple constraints and fixed vertices should be implemented using direct K-way refinement, instead of the widely adopted recursive bisection paradigm.

92 citations


Journal ArticleDOI
TL;DR: An algorithm, energy minimization with loop fusion and FU schedule (EMLFS), is proposed that uses retiming and partition to fuse nested loops and uses novel FU scheduling algorithms to maximize energy saving without sacrificing performance.

Journal ArticleDOI
TL;DR: A new distributed self-diagnosis protocol, called Dynamic-DSDP, is developed for MANETs that identifies both hard and soft faults in a finite amount of time and is constructed on top of a reliable multi-hop architecture.

Journal ArticleDOI
TL;DR: This paper presents parallel multilevel algorithms for the hypergraph partitioning problem, in particular, for parallel coarsening, parallel greedy k-way refinement and parallel multi-phase refinement and derives the isoefficiency function for these algorithms using an asymptotic theoretical performance model.

Journal ArticleDOI
TL;DR: The stochastic robustness metric proposed in this research is based on a mathematical model where the relationship between uncertainty in system parameters and its impact on system performance are described stochastically.

Journal ArticleDOI
TL;DR: Squid is a peer-to-peer information discovery system that supports flexible searches and provides search guarantees that effectively maps the multi-dimensional information space to physical peers while preserving lexical locality.

Journal ArticleDOI
TL;DR: Simulation results show that the enhanced ODMRP (E-ODMRP) reduces overhead by up to 90% yet keeping similar packet delivery ratio compared to the original OD MRP.

Journal ArticleDOI
TL;DR: This work identifies the matrix-matrix multiplication as a first natural entry-point for a minimally invasive integration of GPUs, and uses its GPU algorithm for PDE-constrained optimization problems and demonstrates that the commodity GPU is a useful co-processor for scientific applications.

Journal ArticleDOI
TL;DR: A unified cost model is presented that captures the minimization of the total object transfer cost in the system, which in turn leads to effective utilization of storage space, replica consistency, fault-tolerance, and load-balancing.

Journal ArticleDOI
TL;DR: Considering deque implementations and systems with low concurrency, the algorithm by Michael shows the best performance, however, as the algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture.

Journal ArticleDOI
TL;DR: It is shown that for the cases studied here, the GENITOR technique finds the best results, but the faster two phase greedy approach also performs very well.

Journal ArticleDOI
TL;DR: In this paper, a scalable framework for parallelizing greedy graph coloring algorithms on distributed-memory computers is presented, which unifies several existing algorithms and blends a variety of techniques for creating or facilitating concurrency.

Journal ArticleDOI
TL;DR: This research proposes two different replica selection techniques, including the k-nearest algorithm, which shows a significant performance improvement over the traditional replica catalog based model, and the neural network predictive technique which estimates the transfer time among sites more accurately than the multi-regression model.

Journal ArticleDOI
TL;DR: This work has developed a comprehensive set of performance modeling strategies for predicting execution times of parallel applications on both dedicated and non-dedicated environments and found that grid scheduling using predictions of execution times from the performance modeling techniques will lead to perfect mapping of applications to resources in many cases.

Journal ArticleDOI
TL;DR: A profile-driven online page migration scheme is introduced and it is demonstrated that cache miss profiles gathered from on-chip CPU monitors can be effectively used to guide dynamic page migrations in applications.

Journal ArticleDOI
TL;DR: In the proposed algorithm, if the ancestor nodes of a join node are duplicated when scheduling the join node, the original allocations of these ancestor nodes are removed using a very efficient method.

Journal ArticleDOI
TL;DR: A Polynomial-based scheme that addresses the problems of Event Region Detection (PERD) by having a aggregation tree of sensor nodes and shows that event(s) can be detected by PERD with error in detection remaining almost constant achieving a percentage error within a threshold of 10% with increase in communication range.

Journal ArticleDOI
TL;DR: This paper proposes a placement algorithm that finds the optimal locations for replicas so that their workload is balanced and describes new algorithms that ensure both workload balance and quality of service simultaneously.

Journal ArticleDOI
TL;DR: This paper introduces a heuristic for the selection of resources based on a solution to the set covering problem (SCP), and pair this mapping heuristic with the well-known MinMin scheduling algorithm and conduct performance evaluation through extensive simulations.

Journal ArticleDOI
TL;DR: A hash-based proximity clustering approach for load balancing in heterogeneous DHTs that performs no worse than existing proximity-aware algorithms and exhibits strong resilience to the effect of churn, and greatly reduces the overhead of resilient randomized load balancing.

Journal ArticleDOI
TL;DR: A performance model developed for the deployment design of IEEE 802.11s Wireless Mesh Networks contains seven metrics to analyze the state of WMN, and novel mechanisms to use multiple evaluation criteria in WMN performance optimization.