Showing papers in "Journal of Parallel and Distributed Computing in 2008"

PDF

Open Access

Journal Article•DOI•

A performance study of general-purpose applications on graphics processors using CUDA

[...]

Shuai Che¹, Michael Boyer¹, Jiayuan Meng¹, David Tarjan¹, Jeremy W. Sheaffer¹, Kevin Skadron¹ - Show less +2 more•Institutions (1)

University of Virginia¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper uses NVIDIA's C-like CUDA language and an engineering sample of their recently introduced GTX 260 GPU to explore the effectiveness of GPUs for a variety of application types, and describes some specific coding idioms that improve their performance on the GPU.

...read moreread less

660 citations

Journal Article•DOI•

MPI for Python: Performance improvements and MPI-2 extensions

[...]

Lisandro Dalcin¹, Rodrigo R. Paz¹, Mario A. Storti¹, Jorge D'Elía¹•Institutions (1)

Intec, Inc.¹

01 May 2008-Journal of Parallel and Distributed Computing

TL;DR: In the latest release, this package is improved to enable direct blocking/non-blocking communication of numeric arrays, and to support almost all MPI-2 features.

...read moreread less

295 citations

Journal Article•DOI•

Accelerating advanced MRI reconstructions on GPUs

[...]

Sam S. Stone¹, Justin P. Haldar¹, Stephanie C. Tsao¹, Wen-mei W. Hwu¹, Bradley P. Sutton¹, Zhi-Pei Liang¹ - Show less +2 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: The acceleration of an advanced magnetic resonance imaging reconstruction algorithm on NVIDIA's Quadro FX 5600 achieves up to 180 GFLOPS and requires just over one minute on the Quadro, while reconstruction on a quad-core CPU is twenty-one times slower.

...read moreread less

268 citations

Journal Article•DOI•

Research Note: A high performance algorithm for static task scheduling in heterogeneous distributed computing systems

[...]

Mohammad I. Daoud¹, Nawwaf Kharma¹•Institutions (1)

Concordia University Wisconsin¹

01 Apr 2008-Journal of Parallel and Distributed Computing

TL;DR: The LDCP algorithm provides a practical solution for scheduling parallel applications with high communication costs in HeDCSs and outperforms the HEFT and DLS algorithms in terms of schedule length and speedup.

...read moreread less

216 citations

Journal Article•DOI•

Fast parallel GPU-sorting using a hybrid algorithm

[...]

Erik Sintorn¹, Ulf Assarsson¹•Institutions (1)

Chalmers University of Technology¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: The algorithm is of complexity nlogn, and for lists of 8 M elements and using a single Geforce 8800 GTS-512, it is 2.5 times as fast as the bitonic sort algorithms, with standard complexity of n(logn)^2.

...read moreread less

203 citations

Journal Article•DOI•

Just-in-time dynamic voltage scaling: Exploiting inter-node slack to save energy in MPI programs

[...]

Vincent W. Freeh¹, Nandini Kappiah¹, David K. Lowenthal², Tyler Bletsch¹•Institutions (2)

North Carolina State University¹, University of Georgia²

01 Sep 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper presents a system called Jitter, which reduces the frequency on nodes that are assigned less computation and therefore have slack time, and the goal of Jitter is to attempt to ensure that they arrive "just in time" so that they avoid increasing overall execution time.

...read moreread less

184 citations

Journal Article•DOI•

Program optimization carving for GPU computing

[...]

Shane Ryoo¹, Christopher I. Rodrigues¹, Sam S. Stone¹, John A. Stratton¹, Sain-Zee Ueng¹, Sara S. Baghsorkhi¹, Wen-mei W. Hwu¹ - Show less +3 more•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: This work proposes program optimization carving, a technique that begins with a complete optimization space and prunes it down to a set of configurations that are likely to contain the global maximum, and shows that this approach is significantly superior to random sampling of the search space.

...read moreread less

137 citations

Journal Article•DOI•

Link scheduling in wireless sensor networks: Distributed edge-coloring revisited

[...]

S. Gandham, Milind Dawande¹, Ravi Prakash¹•Institutions (1)

University of Texas at Dallas¹

01 Aug 2008-Journal of Parallel and Distributed Computing

TL;DR: This work considers the problem of link scheduling in a sensor network employing a TDMA MAC protocol and develops a distributed edge-coloring algorithm that is the first distributed algorithm that can edge-color a graph using at most (@D+1) colors.

...read moreread less

104 citations

Journal Article•DOI•

Fast parallel Particle-To-Grid interpolation for plasma PIC simulations on the GPU

[...]

George Stantchev¹, William Dorland¹, Nail A. Gumerov¹•Institutions (1)

University of Maryland, College Park¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper presents an overview of a typical plasma PIC code and discusses its GPU implementation and focuses on fast algorithms for the performance bottleneck operation of Particle-To-Grid interpolation.

...read moreread less

103 citations

Journal Article•DOI•

Multi-level direct K-way hypergraph partitioning with multiple constraints and fixed vertices

[...]

Cevdet Aykanat¹, B. Barla Cambazoglu², Bora Uçar•Institutions (2)

Bilkent University¹, Ohio State University²

01 May 2008-Journal of Parallel and Distributed Computing

TL;DR: This work claims that hypergraph partitioning with multiple constraints and fixed vertices should be implemented using direct K-way refinement, instead of the widely adopted recursive bisection paradigm.

...read moreread less

92 citations

Journal Article•DOI•

Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

[...]

Meikang Qiu¹, Edwin H.-M. Sha², Meilin Liu³, Man Lin⁴, Shaoxiong Hua⁵, Laurence T. Yang⁴ - Show less +2 more•Institutions (5)

University of New Orleans¹, University of Texas at Dallas², Wright State University³, St. Francis Xavier University⁴, Synopsys⁵

01 Apr 2008-Journal of Parallel and Distributed Computing

TL;DR: An algorithm, energy minimization with loop fusion and FU schedule (EMLFS), is proposed that uses retiming and partition to fuse nested loops and uses novel FU scheduling algorithms to maximize energy saving without sacrificing performance.

...read moreread less

Journal Article•DOI•

A distributed fault identification protocol for wireless and mobile ad hoc networks

[...]

Mourad Elhadef¹, Azzedine Boukerche¹, Hisham Elkadiki¹•Institutions (1)

University of Ottawa¹

01 Mar 2008-Journal of Parallel and Distributed Computing

TL;DR: A new distributed self-diagnosis protocol, called Dynamic-DSDP, is developed for MANETs that identifies both hard and soft faults in a finite amount of time and is constructed on top of a reliable multi-hop architecture.

...read moreread less

Journal Article•DOI•

Parallel multilevel algorithms for hypergraph partitioning

[...]

Aleksandar Trifunovic¹, William J. Knottenbelt¹•Institutions (1)

Imperial College London¹

01 May 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper presents parallel multilevel algorithms for the hypergraph partitioning problem, in particular, for parallel coarsening, parallel greedy k-way refinement and parallel multi-phase refinement and derives the isoefficiency function for these algorithms using an asymptotic theoretical performance model.

...read moreread less

Journal Article•DOI•

Stochastic robustness metric and its use for static resource allocations

[...]

Vladimir Shestak¹, Jay Smith¹, Anthony A. Maciejewski¹, Howard Jay Siegel¹•Institutions (1)

Colorado State University¹

01 Aug 2008-Journal of Parallel and Distributed Computing

TL;DR: The stochastic robustness metric proposed in this research is based on a mathematical model where the relationship between uncertainty in system parameters and its impact on system performance are described stochastically.

...read moreread less

Journal Article•DOI•

Squid: Enabling search in DHT-based systems

[...]

C. Schmidt¹, Manish Parashar¹•Institutions (1)

Rutgers University¹

01 Jul 2008-Journal of Parallel and Distributed Computing

TL;DR: Squid is a peer-to-peer information discovery system that supports flexible searches and provides search guarantees that effectively maps the multi-dimensional information space to physical peers while preserving lexical locality.

...read moreread less

Journal Article•DOI•

E-ODMRP: Enhanced ODMRP with motion adaptive refresh

[...]

Soon Y. Oh¹, Joon-Sang Park², Mario Gerla¹•Institutions (2)

University of California, Los Angeles¹, Hongik University²

01 Aug 2008-Journal of Parallel and Distributed Computing

TL;DR: Simulation results show that the enhanced ODMRP (E-ODMRP) reduces overhead by up to 90% yet keeping similar packet delivery ratio compared to the original OD MRP.

...read moreread less

Journal Article•DOI•

Algorithmic performance studies on graphics processing units

[...]

Olaf Schenk¹, Matthias Christen¹, Helmar Burkhart¹•Institutions (1)

University of Basel¹

01 Oct 2008-Journal of Parallel and Distributed Computing

TL;DR: This work identifies the matrix-matrix multiplication as a first natural entry-point for a minimally invasive integration of GPUs, and uses its GPU algorithm for PDE-constrained optimization problems and demonstrates that the commodity GPU is a useful co-processor for scientific applications.

...read moreread less

Journal Article•DOI•

Comparison and analysis of ten static heuristics-based Internet data replication techniques

[...]

Samee U. Khan¹, Ishfaq Ahmad¹•Institutions (1)

University of Texas at Arlington¹

01 Feb 2008-Journal of Parallel and Distributed Computing

TL;DR: A unified cost model is presented that captures the minimization of the total object transfer cost in the system, which in turn leads to effective utilization of storage space, replica consistency, fault-tolerance, and load-balancing.

...read moreread less

Journal Article•DOI•

Lock-free deques and doubly linked lists

[...]

Håkan Sundell¹, Philippas Tsigas¹•Institutions (1)

Chalmers University of Technology¹

01 Jul 2008-Journal of Parallel and Distributed Computing

TL;DR: Considering deque implementations and systems with low concurrency, the algorithm by Michael shows the best performance, however, as the algorithm is designed for disjoint accesses, it performs significantly better on systems with high concurrency and non-uniform memory architecture.

...read moreread less

Journal Article•DOI•

Static resource allocation for heterogeneous computing environments with tasks having dependencies, priorities, deadlines, and multiple versions

[...]

Tracy D. Braun¹, Howard Jay Siegel², Anthony A. Maciejewski², Ye Hong²•Institutions (2)

Purdue University¹, Colorado State University²

01 Nov 2008-Journal of Parallel and Distributed Computing

TL;DR: It is shown that for the cases studied here, the GENITOR technique finds the best results, but the faster two phase greedy approach also performs very well.

...read moreread less

Journal Article•DOI•

A framework for scalable greedy coloring on distributed-memory parallel computers

[...]

Doruk Bozdağ¹, Assefaw H. Gebremedhin², Fredrik Manne³, Erik G. Boman⁴, Ümit V. Çatalyürek¹ - Show less +1 more•Institutions (4)

Ohio State University¹, Old Dominion University², University of Bergen³, Sandia National Laboratories⁴

01 Apr 2008-Journal of Parallel and Distributed Computing

TL;DR: In this paper, a scalable framework for parallelizing greedy graph coloring algorithms on distributed-memory computers is presented, which unifies several existing algorithms and blends a variety of techniques for creating or facilitating concurrency.

...read moreread less

Journal Article•DOI•

Replica selection strategies in data grid

[...]

Rashedur M. Rahman¹, Reda Alhajj¹, Ken Barker¹•Institutions (1)

University of Calgary¹

01 Dec 2008-Journal of Parallel and Distributed Computing

TL;DR: This research proposes two different replica selection techniques, including the k-nearest algorithm, which shows a significant performance improvement over the traditional replica catalog based model, and the neural network predictive technique which estimates the transfer time among sites more accurately than the multi-regression model.

...read moreread less

Journal Article•DOI•

Performance modeling of parallel applications for grid scheduling

[...]

H. A. Sanjay¹, Sathish Vadhiyar¹•Institutions (1)

Indian Institute of Science¹

01 Aug 2008-Journal of Parallel and Distributed Computing

TL;DR: This work has developed a comprehensive set of performance modeling strategies for predicting execution times of parallel applications on both dedicated and non-dedicated environments and found that grid scheduling using predictions of execution times from the performance modeling techniques will lead to perfect mapping of applications to resources in many cases.

...read moreread less

Journal Article•DOI•

Hardware monitors for dynamic page migration

[...]

Mustafa M. Tikir¹, Jeffrey K. Hollingsworth²•Institutions (2)

San Diego Supercomputer Center¹, University of Maryland, College Park²

01 Sep 2008-Journal of Parallel and Distributed Computing

TL;DR: A profile-driven online page migration scheme is introduced and it is demonstrated that cache miss profiles gathered from on-chip CPU monitors can be effectively used to guide dynamic page migrations in applications.

...read moreread less

Journal Article•DOI•

Task scheduling algorithm using minimized duplications in homogeneous systems

[...]

Kwang-Sik Shin¹, Myongjin Cha¹, Mun-Suck Jang¹, Jin-Ha Jung¹, Wan-Oh Yoon¹, Sang-Bang Choi¹ - Show less +2 more•Institutions (1)

Inha University¹

01 Aug 2008-Journal of Parallel and Distributed Computing

TL;DR: In the proposed algorithm, if the ancestor nodes of a join node are duplicated when scheduling the join node, the original allocations of these ancestor nodes are removed using a very efficient method.

...read moreread less

Journal Article•DOI•

Fault tolerant multiple event detection in a wireless sensor network

[...]

Torsha Banerjee¹, Bin Xie¹, Dharma P. Agrawal¹•Institutions (1)

University of Cincinnati¹

01 Sep 2008-Journal of Parallel and Distributed Computing

TL;DR: A Polynomial-based scheme that addresses the problems of Event Region Detection (PERD) by having a aggregation tree of sensor nodes and shows that event(s) can be detected by PERD with error in detection remaining almost constant achieving a percentage error within a threshold of 10% with increase in communication range.

...read moreread less

Journal Article•DOI•

Optimal replica placement in hierarchical Data Grids with locality assurance

[...]

Jan-Jan Wu¹, Yi-Fang Lin², Pangfeng Liu²•Institutions (2)

Academia Sinica¹, National Taiwan University²

01 Dec 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper proposes a placement algorithm that finds the optimal locations for replicas so that their workload is balanced and describes new algorithms that ensure both workload balance and quality of service simultaneously.

...read moreread less

Journal Article•DOI•

An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids

[...]

Srikumar Venugopal¹, Rajkumar Buyya¹•Institutions (1)

University of Melbourne¹

01 Apr 2008-Journal of Parallel and Distributed Computing

TL;DR: This paper introduces a heuristic for the selection of resources based on a solution to the set covering problem (SCP), and pair this mapping heuristic with the well-known MinMin scheduling algorithm and conduct performance evaluation through extensive simulations.

...read moreread less

Journal Article•DOI•

Hash-based proximity clustering for efficient load balancing in heterogeneous DHT networks

[...]

Haiying Shen¹, Cheng-Zhong Xu²•Institutions (2)

University of Arkansas¹, Wayne State University²

01 May 2008-Journal of Parallel and Distributed Computing

TL;DR: A hash-based proximity clustering approach for load balancing in heterogeneous DHTs that performs no worse than existing proximity-aware algorithms and exhibits strong resilience to the effect of churn, and greatly reduces the overhead of resilient randomized load balancing.

...read moreread less

Journal Article•DOI•

Performance model for IEEE 802.11s wireless mesh network deployment design

[...]

Timo Vanhatupa¹, Marko Hännikäinen¹, Timo Hämäläinen¹•Institutions (1)

Tampere University of Technology¹

01 Mar 2008-Journal of Parallel and Distributed Computing

TL;DR: A performance model developed for the deployment design of IEEE 802.11s Wireless Mesh Networks contains seven metrics to analyze the state of WMN, and novel mechanisms to use multiple evaluation criteria in WMN performance optimization.

...read moreread less