scispace - formally typeset
Search or ask a question

Showing papers in "Parallel Processing Letters in 2007"


Journal ArticleDOI
TL;DR: The inter-relationships between graph problems, software, and parallel hardware in the current state of the art are presented and the range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.
Abstract: Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for large-scale graph problems. In this paper we present the inter-relationships between graph problems, software, and parallel hardware in the current state of the art and discuss how those issues present inherent challenges in solving large-scale graph problems. The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.

488 citations


Journal ArticleDOI
TL;DR: In this paper, a lazy list-based implementation of a concurrent set object is presented, which is based on an optimistic locking scheme for inserts and removes and includes a simple wait-free membership test.
Abstract: We present a novel “lazy” list-based implementation of a concurrent set object It is based on an optimistic locking scheme for inserts and removes and includes a simple wait-free membership test Our algorithm improves on the performance of all previous such algorithms

143 citations


Journal ArticleDOI
TL;DR: A Kolmogorov-Uspensky machine on the Plasmodium of the slime mold Physarum polycephalum is implemented and basic operations, and elements of programming are illustrated.
Abstract: We implement a Kolmogorov-Uspensky machine on the Plasmodium of the slime mold Physarum polycephalum. We provide experimental findings on realization of the machine instructions, illustrate basic operations, and elements of programming.

122 citations


Journal ArticleDOI
TL;DR: Cases where combinatorial optimization problems are polynomial are identified, for example when the edges of a given color form a connected subgraph, and otherwise hardness and non approximability results for these problems are identified.
Abstract: This article investigates complexity and approximability properties of combinatorial optimization problems yielded by the notion of Shared Risk Resource Group (SRRG). SRRG has been introduced in order to capture network survivability issues where a failure may break a whole set of resources, and has been formalized as colored graphs, where a set of resources is represented by a set of edges with same color. We consider here the analogous of classical problems such as determining paths or cuts with the minimum numbers of colors or color disjoint paths. These optimization problems are much more difficult than their counterparts in classical graph theory. In particular standard relationship such as the Max Flow - Min Cut equality do not hold any longer. In this article we identify cases where these problems are polynomial, for example when the edges of a given color form a connected subgraph, and otherwise give hardness and non approximability results for these problems.

92 citations


Journal ArticleDOI
TL;DR: The challenges facing any new language for scalable parallel computing, including the strong competition presented by MPI and the existing Partitioned Global Address Space (PGAS) Languages are described.
Abstract: We present a summary of the current state of DARPA's HPCS language project. We describe the challenges facing any new language for scalable parallel computing, including the strong competition presented by MPI and the existing Partitioned Global Address Space (PGAS) Languages. We identify some of the major features of the proposed languages, using MPI and the PGAS languages for comparison, and describe the opportunities for higher productivity along with the implementation challenges. Finally, we present the conclusions of a recent workshop in which a concrete plan for the next few years was proposed.

50 citations


Journal ArticleDOI
TL;DR: This paper studies the reasons behind Grid job failures in the context of EGEE, the largest Grid infrastructure currently in operation, and proposes the architecture for a system that could provide failure management support to administrators and end-users of large-scale Grid infrastructures like EGEE.
Abstract: The emergence of Grid infrastructures like EGEE has enabled the deployment of large-scale computational experiments that address challenging scientific problems in various fields. However, to realize their full potential, Grid infrastructures need to achieve a higher degree of dependability, i.e., they need to improve the ratio of Grid-job requests that complete successfully in the presence of Grid-component failures. To achieve this, however, we need to determine, analyze and classify the causes of job failures on Grids. In this paper we study the reasons behind Grid job failures in the context of EGEE, the largest Grid infrastructure currently in operation. We present points of failure in a Grid that affect the execution of jobs, and describe error types and contributing factors. We discuss various information sources that provide users and administrators with indications about failures, and assess their usefulness based on error information accuracy and completeness. We describe two real-life case studies, describing failures that occurred on a production site of EGEE and the troubleshooting process for each case. Finally, we propose the architecture for a system that could provide failure management support to administrators and end-users of large-scale Grid infrastructures like EGEE.

19 citations


Journal ArticleDOI
Fangpeng Dong1
TL;DR: With the help of an abstract scheduling architecture, some key features of the task scheduling problem in the Grid are discussed, followed by a taxonomy of the scheduling algorithms.
Abstract: One motivation of Grid computing is to aggregate the power of widely distributed resources, and provide non-trivial services to users. To achieve this goal, efficient task scheduling algorithms are essential. However, scheduling algorithms in the Grid present high diversities that need to be classified. In this paper, with the help of an abstract scheduling architecture, some key features of the task scheduling problem in the Grid are discussed, followed by a taxonomy of the scheduling algorithms. Some typical examples are given in each category to present a picture of the current research and help to find new research problems.

16 citations


Journal ArticleDOI
TL;DR: This paper describes the three point-to-point communications protocols currently supported in the Open MPI implementation, supported with performance data, and includes comparisons with other MPI implementations using the OpenIB, MX, and GM communications libraries.
Abstract: Open MPI's point-to-point communications abstractions, described in this paper, handle several different communications scenarios, with a portable, high-performance design and implementation. These abstractions support two types of low-level communication protocols – general purpose point-to-point communications, like the OpenIB interface, and MPI-like interfaces, such as Myricom's MX library. Support for the first type of protocols makes use of all communications resources available to a given application run, with optional support for communications error recovery. The latter provides a interface layer, relying on the communications library to guarantee correct MPI message ordering and matching. This paper describes the three point-to-point communications protocols currently supported in the Open MPI implementation, supported with performance data. This includes comparisons with other MPI implementations using the OpenIB, MX, and GM communications libraries.

15 citations


Journal ArticleDOI
TL;DR: There is a deterministic algorithm that, given a uniformly δ-sparse hypergraph, and a positive integer k, outputs k or all minimal transversals for in O (δlog(1 + k)polylog(δ|V|))-time using |V|O(logδ)kO( δ) processors.
Abstract: A hypergraph is called uniformly δ-sparse if for every nonempty subset X ⊆ V of vertices, the average degree of the sub-hypergraph of induced by X is at most δ. We show that there is a deterministic algorithm that, given a uniformly δ-sparse hypergraph , and a positive integer k, outputs k or all minimal transversals for in O(δlog(1 + k)polylog(δ|V|))-time using |V|O(logδ)kO(δ) processors. Equivalently, the algorithm can be used to compute in parallel k or all maximal independent sets for .

15 citations


Journal ArticleDOI
TL;DR: This analysis focuses on the performance of individual codes for finite systems, and addresses several important heretofore unanswered questions about employing LDPC codes in real-world systems.
Abstract: As peer-to-peer and widely distributed storage systems proliferate, the need to perform efficient erasure coding, instead of replication, is crucial to performance and efficiency. Low-Density Parity-Check (LDPC) codes have arisen as alternatives to standard erasure codes, such as Reed-Solomon codes, trading off vastly improved decoding performance for inefficiencies in the amount of data that must be acquired to perform decoding. The scores of papers written on LDPC codes typically analyze their collective and asymptotic behavior. Unfortunately, their practical application requires the generation and analysis of individual codes for finite systems. This paper attempts to illuminate the practical considerations of LDPC codes for peer-to-peer and distributed storage systems. The three main types of LDPC codes are detailed, and a huge variety of codes are generated, then analyzed using simulation. This analysis focuses on the performance of individual codes for finite systems, and addresses several important heretofore unanswered questions about employing LDPC codes in real-world systems.

9 citations


Journal ArticleDOI
TL;DR: This paper is intended as a survey of the state of the art of some branches of Biomolecular Computing, a field in full development, with the promise of important results from the perspective of both Computer Science and Biology.
Abstract: This paper is intended as a survey of the state of the art of some branches of Biomolecular Computing. Biomolecular Computing aims to use biological hardware (biomare), rather than chips, to build a computer. We discuss the following three main research directions: DNA computing, membrane systems, and gene assembly in ciliates. DNA computing combines practical results together with theoretical algorithm design. Various search problems have been implemented using DNA strands. Membrane systems are a family of computational models inspired by the membrane structure of living cells. The process of gene assembly in ciliates has been formalized as an abstract computational model. Biomolecular Computing is a field in full development, with the promise of important results from the perspective of both Computer Science (models of computation) and Biology (understanding biological processes).

Journal ArticleDOI
TL;DR: It is shown that clustering either by requested time, requested number of processors, or the product of the two generally produces more accurate predictions than earlier, more naive, approaches and that automatic clustering outperforms administrator-determined clustering.
Abstract: Most space-sharing parallel computers presently operated by production high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources may choose among different queues (charging different amounts) potentially on a number of machines to which they have access. In such a situation, the amount of time a user's job will wait in any one batch queue can be a significant portion of the overall time from job submission to job completion. It thus becomes desirable to provide a prediction for the amount of time a given job can expect to wait in the queue. Further, it is natural to expect that attributes of an incoming job, specifically the number of processors requested and the amount of time requested, might impact that job's wait time. In this work, we explore the possibility of generating accurate predictions by automatically grouping jobs having similar attributes using model-based clustering. Moreover, we implement this clustering technique for a time series of jobs so that predictions of future wait times can be generated in real time. Using trace-based simulation on data from 7 machines over a 9-year period from across the country, comprising over one million job records, we show that clustering either by requested time, requested number of processors, or the product of the two generally produces more accurate predictions than earlier, more naive, approaches and that automatic clustering outperforms administrator-determined clustering.

Journal ArticleDOI
Ke Qiu1
TL;DR: This work uses a general neighbourhood broadcasting scheme to develop a neighbourhood broadcasting algorithm for the star interconnection network that is asymptotically optimal, conceptually simple, and easy to implement since routing for all nodes involved is uniform.
Abstract: The neighbourhood broadcasting problem in an interconnection network is defined as sending a fixed sized message from the source node to all its neighbours in a single-port model. Previously, this problem has been studied for several interconnection networks including the hypercube and the star. The objective of such works has been to minimize the total number of steps required for the neighbourhood broadcasting algorithms. Here, we first use a general neighbourhood broadcasting scheme to develop a neighbourhood broadcasting algorithm for the star interconnection network that is asymptotically optimal, conceptually simple, and easy to implement since routing for all nodes involved is uniform. It uses the cycle structures of the star graph as well as the standard technique of recursive doubling. We then show that the scheme for the star network is general enough to be applied to a broader family of interconnection networks such as the pancake interconnection network for which no previous neighbourhood broadcasting algorithm is known, resulting in asymptotically optimal algorithms. Finally, we use this scheme to develop neighbourhood broadcasting algorithms for multiple messages for several interconnection networks.

Journal ArticleDOI
TL;DR: The NP completeness of the problem of scheduling and redistributing data on master-slave platforms is proved, and optimal polynomial algorithms for special important topologies are presented.
Abstract: In this work we are interested in the problem of scheduling and redistributing data on master-slave platforms. We consider the case were the workers possess initial loads, some of which having to be redistributed in order to balance their completion times. We assume that the data consists of independent and identical tasks. We prove the NP completeness of the problem for fully heterogeneous platforms. Also, we present optimal polynomial algorithms for special important topologies: a simple greedy algorithm for homogeneous star-networks, and a more complicated algorithm for platforms with homogeneous communication links and heterogeneous workers.

Journal ArticleDOI
Tom Head1
TL;DR: An algorithm for solving instances of the Boolean satisfiability problem is given and illustrated using a photocopying machine with plastic transparencies as medium and requires the assumption that information can be stored with a density that is exponential in the number of variables in the problem instance.
Abstract: We continue to search for methods of parallel computing using light. An algorithm for solving instances of the Boolean satisfiability problem is given and illustrated using a photocopying machine with plastic transparencies as medium. The algorithm solves satisfiability problems in linear time but requires the assumption that information can be stored with a density that is exponential in the number of variables in the problem instance. Consideration is given to situations in which this density limitation is not quite absolute.

Journal ArticleDOI
TL;DR: D determinitic solutions for the Write-All and iterative Write- all problems in the fail-stop synchronous CRCW PRAM model where memory access concurrency needs to be controlled are considered.
Abstract: The abstract problem of using P failure-prone processors to cooperatively update all locations of an N-element shared array is called Write-All. Solutions to Write-All can be used iteratively to construct efficient simulations of PRAM algorithms on failure-prone PRAMS. Such use of Write-All in simulations is abstracted in terms of the iterative Write-All problem. The efficiency of the algorithmic solutions for Write-All and iterative Write-All is measured in terms of work complexity where all processing steps taken by the processors are counted. This paper considers determinitic solutions for the Write-All and iterative Write-All problems in the fail-stop synchronous CRCW PRAM model where memory access concurrency needs to be controlled. A deterministic algorithm of Kanellakis, Michailidis, and Shvartsman [16] efficiently solves the Write-All problem in this model, while controlling read and write memory access concurrency. However it was not shown how the number of processor failures f affects the work efficiency of the algorithm. The results herein give a new analysis of the algorithm [16] that obtain failure-sensitive work bounds, while retaining the known memory access concurrency bounds. Specifically, the new result expresses the work bound as a function of N, Pandf. Another contribution in this paper is the new failure-sensitive analysis for iterative Write-All with controlled memory access concurrency. This result yields tighter bounds on work (vs. [16]) for simulations of PRAM algorithms on fail-stop PRAMS.

Journal ArticleDOI
TL;DR: This paper presents an island-based parallelization of five multi-objective evolutionary algorithms: NSGAII, SPEA2, PESA, msPESA, and a new hybrid version they propose, and experimental results denote that the quality of the solutions tends to improve when the number of islands increases.
Abstract: Recently, the research interest in multi-objective optimization has increased remarkably. Most of the proposed methods use a population of solutions that are simultaneously improved trying to approximate them to the Pareto-optimal front. When the population size increases, the quality of the solutions tends to be better, but the runtime is higher. This paper presents how to apply parallel processing to enhance the convergence to the Pareto-optimal front, without increasing the runtime. In particular, we present an island-based parallelization of five multi-objective evolutionary algorithms: NSGAII, SPEA2, PESA, msPESA, and a new hybrid version we propose. Experimental results in some test problems denote that the quality of the solutions tends to improve when the number of islands increases.

Journal ArticleDOI
TL;DR: In this article, the authors address the problem of accurately estimating the runtime and communication time of a client request in a Network Enabled Server (NES) middleware such as GridSolve, using a template based model for the runtime estimation and a client-server communication test for the transfer time estimation.
Abstract: In this paper we address the problem of accurately estimating the runtime and communication time of a client request in a Network Enabled Server (NES) middleware such as GridSolve. We use a template based model for the runtime estimation and a client-server communication test for the transfer time estimation. We implement these two mechanisms in GridSolve and test them on a real testbed. Experiments show that they allow for significant improvement in terms of client execution time on various scenarios.

Journal ArticleDOI
TL;DR: A new analytical model of Disha in wormhole-routed k-ary n-cubes is proposed and it is confirmed that the proposed model exhibits a good degree of accuracy for various networks sizes and under different traffic conditions.
Abstract: A number of analytical models for predicting message latency in k-ary n-cubes have recently been reported in the literature. Most of these models, however, have been discussed for adaptive routing algorithms based on deadlock avoidance, e.g. Duato's routing. Several research studies have empirically demonstrated that routing algorithms based on deadlock recovery offer maximal adaptivity that can result in considerable improvement in network performance. Disha is an example of a true fully adaptive routing algorithm that uses minimal hardware to implement a simple and efficient progressive method to recover from potential deadlocks. This paper proposes a new analytical model of Disha in wormhole-routed k-ary n-cubes. Simulation experiments confirm that the proposed model exhibits a good degree of accuracy for various networks sizes and under different traffic conditions.

Journal ArticleDOI
TL;DR: A new architecture to build grids that can execute parallel programs based on legacy code is presented that is layer based and software component performances are validated with benchmarks.
Abstract: In this paper, we present a new architecture to build grids that can execute parallel programs based on legacy code. This architecture is layer based and software component performances are validated with benchmarks. To illustrate the construction of a grid using the proposed architecture, we develop a case study that consists of a grid oriented to efficient execution of Java bytecode for which we validate and integrate legacy code of parallel linear algebra.

Journal ArticleDOI
TL;DR: This work decomposes the Automata into pieces and uses OpenMp to parallelize the process, and results show that using a decomposition procedure, and distributing the mesh between a set of processors, 3D Cellular Automata can be studied without having long execution times.
Abstract: This paper describes our research on using Genetic Programming to obtain transition rules for Cellular Automata, which are one type of massively parallel computing system. Our purpose is to determine the existence of a limit of chaos for three dimensional Cellular Automata, empirically demonstrated for the two dimensional case. To do so, we must study statistical properties of 3D Cellular Automata over long simulation periods. When dealing with big three dimensional meshes, applying the transition rule to the whole structure can become a extremely slow task. In this work we decompose the Automata into pieces and use OpenMp to parallelize the process. Results show that using a decomposition procedure, and distributing the mesh between a set of processors, 3D Cellular Automata can be studied without having long execution times.