scispace - formally typeset
Search or ask a question

Showing papers in "Journal of Parallel and Distributed Computing in 2013"


Journal ArticleDOI
TL;DR: This work formulate the problem of virtual machine allocation in clouds as a combinatorial auction problem and proposes two mechanisms to solve it, and performs extensive simulation experiments to reveal that the combinatorially auction-based mechanisms can significantly improve the allocation efficiency while generating higher revenue for the cloud providers.

254 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an overview of current GPU programming strategies, profile-driven development, and an outlook to future trends, as well as a discussion of the challenges of getting started with GPU programming.

203 citations


Journal ArticleDOI
TL;DR: This paper proposes a cooperative method to verify the positions of potential Sybil nodes and introduces a statistical method and design a system which is able to verify where a vehicle comes from, termed the Presence Evidence System (PES).

173 citations


Journal ArticleDOI
TL;DR: This paper deals with a GPU implementation of Ant Colony Optimization (ACO), a population-based optimization method which comprises two major stages: tour construction and pheromone update, and proposes a new mechanism called I-Roulette to replicate the classic roulette wheel while improving GPU parallelism.

122 citations


Journal ArticleDOI
TL;DR: A novel algorithm to solve the non-negative single-source shortest path problem on road networks and graphs with low highway dimension that needs fewer operations, has better locality, and is better able to exploit parallelism at multi-core and instruction levels.

115 citations


Journal ArticleDOI
TL;DR: Simulation results demonstrate that GjODE is better than, or at least comparable to, six other algorithms, and employing GPU can effectively reduce computational time.

105 citations


Journal ArticleDOI
TL;DR: A comparative experimental study highlights the performance impact of ACO parameters, GPU technical configuration, memory structures and parallelization granularity on a state-of-the-art Fermi GPU architecture.

105 citations


Journal ArticleDOI
TL;DR: The CRO scheme is used to formulate the scheduling of Directed Acyclic Graph (DAG) jobs in heterogeneous computing systems, and a Double Molecular Structure-based Chemical Reaction Optimization (DMSCRO) method is developed.

97 citations


Journal ArticleDOI
TL;DR: The design of the archive, in particular of the standard FTA data format, and the design of a toolbox that facilitates automated analysis of trace data sets are described, and how different interpretations of the meaning of failure data can result in different conclusions for failure modeling and job scheduling in distributed systems are shown.

90 citations


Journal ArticleDOI
TL;DR: A MapReduce-based algorithm is proposed to data mining event association rules, which utilizes the computational resource of multiple dedicated nodes of the system, and achieves nearly ideal speedup compared to centralized mining approaches.

86 citations


Journal ArticleDOI
Sujatha R. Upadhyaya1
TL;DR: Map reduce is another important technique that has evolved during this period and as the literature has it, it has been proved to be an important aid in delivering performance of machine learning algorithms on GPUs.

Journal ArticleDOI
TL;DR: This work introduces a multicore-oblivious (MO) approach to algorithms and schedulers for HM, and presents efficient MO algorithms for several fundamental problems including matrix transposition, FFT, sorting, the Gaussian Elimination Paradigm, list ranking, and connected components.

Journal ArticleDOI
TL;DR: This paper proposes a robust and scalable mechanism that aims to detect malicious anomalies accurately and efficiently using distributed in-network processing in a hierarchical framework with more than 96% less communication overheads opposed to a centralized approach.

Journal ArticleDOI
TL;DR: A high-level and well-defined Multiscale Modeling Language (MML) is enhanced that describes and specifies multiscale models and their computational architecture in a modular way and is applied to two selected applications in nanotechnology and biophysics, showing its capabilities.

Journal ArticleDOI
TL;DR: In this paper, a multi-GPU SPH program is developed for free-surface flows based on a spatial decomposition technique, whereby different portions (subdomains) of the physical system under study are assigned to different GPUs.

Journal ArticleDOI
TL;DR: The KNEM module for the Linux kernel is presented that provides MPI implementations with a flexible and scalable interface for performing kernel-assisted single-copy data transfers between local processes and brings significant application performance improvements thanks to more efficient point-to-point and collective operations.

Journal ArticleDOI
TL;DR: The development of an MPI/OpenCL implementation of LU, an application-level benchmark from the NAS Parallel Benchmark Suite, is reported, demonstrating the importance of memory arrangement and work-item/work-group distribution strategies when applications are deployed on different device types.

Journal ArticleDOI
TL;DR: MaRCO is implemented in Hadoop's MapReduce and it is shown that on a 128-node Amazon EC2 cluster, MaRCO achieves 23% average speed-up overHadoop for shuffle-heavy MapReductions.

Journal ArticleDOI
TL;DR: The experimental analysis confirms that the proposed K-Means algorithm is very accurate and fault tolerant under unreliable network conditions (message loss and node failures) and is suitable for asynchronous networks of very large and extreme scale.

Journal ArticleDOI
TL;DR: The mapReduce computing framework is designed for distributed computing on massive data sets, and the new algorithm leverages MapReduce techniques to enable processing of graphs with billions of vertices, including graphs that follow a power law degree distribution.

Journal ArticleDOI
TL;DR: GraphCell improves state-of-the-art solutions, especially for larger problems, and it provides an alternative to the GPU Min-min heuristic when more accurate solutions are needed, at the expense of an increased runtime.

Journal ArticleDOI
TL;DR: Detailed analysis of parallelisation possibilities, memory organisation and access patterns, enables the implementation of fast and effective heuristics for QAP on the GPU - the Parallel Multistart Tabu Search (PMTS).

Journal ArticleDOI
TL;DR: This paper attempts to expand PIC's data scalability by implementing a parallel power iteration clustering (p-PIC) algorithm that works well on low-end commodity computers (COTS-based clusters and general purpose servers found at most commercial cloud providers).

Journal ArticleDOI
TL;DR: A novel DAG scheduling approach is proposed to solve this stochastic scheduling problem, based on a Monte Carlo method, and empirical results show that a significant improvement of average application performance can be achieved by the proposed approach at a reasonable execution time cost.

Journal ArticleDOI
TL;DR: A novel parallel SVM training implementation is proposed to accelerate the cross validation procedure by running multiple training tasks simultaneously on a Graphics Processing Unit (GPU) to reduce redundant computations of kernel values across different training tasks.

Journal ArticleDOI
TL;DR: The results demonstrate that the proposed clustering algorithms can significantly improve the data reception ratio, reduce the total energy consumption in the network and prolong network lifetime compared to a typical distributed clustering algorithm, HEED, that does not consider lossy links.

Journal ArticleDOI
TL;DR: This work proposes both a transcription of existing GP parallelization strategies into the OpenCL programming platform and a freely available implementation to evaluate its suitability for GP, by assessing the performance of parallel strategies on the CPU and GPU processors from different vendors.

Journal ArticleDOI
TL;DR: General-Purpose Computation with Graphics Processing Units (GPGPU) is applied, in conjunction with a wildfire simulation model based on the Cellular Automata approach, to the process of BPM building.

Journal ArticleDOI
TL;DR: The main idea was to design and implement an MSA method which can take advantage of modern graphics cards, based on T-Coffee-well known for its high accuracy MSA algorithm, and is highly efficient achieving up to 193-fold speedup on a single GPU.

Journal ArticleDOI
TL;DR: An extensive experimental study shows that MPTS R-tree traversal algorithm on NVIDIA Tesla M2090 GPU consistently outperforms traditional recursive R-trees search algorithm on Intel Xeon E5506 processors.