Showing papers in &quot;Parallel Processing Letters in 2007&quot;

A Lazy Concurrent List-Based Set Algorithm

TL;DR: The inter-relationships between graph problems, software, and parallel hardware in the current state of the art are presented and the range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.

...read moreread less

Abstract: Graph algorithms are becoming increasingly important for solving many problems in scientific computing, data mining and other domains. As these problems grow in scale, parallel computing resources are required to meet their computational and memory requirements. Unfortunately, the algorithms, software, and hardware that have worked well for developing mainstream parallel scientific applications are not necessarily effective for large-scale graph problems. In this paper we present the inter-relationships between graph problems, software, and parallel hardware in the current state of the art and discuss how those issues present inherent challenges in solving large-scale graph problems. The range of these challenges suggests a research agenda for the development of scalable high-performance software for graph problems.

...read moreread less

488 citations

Journal Article•DOI•

[...]

Steven K. Heller¹, Maurice Herlihy², Victor Luchangco¹, Mark S. Moir¹, William N. Scherer³, Nir Shavit⁴ - Show less +2 more•Institutions (4)

Sun Microsystems Laboratories¹, Brown University², Rice University³, Tel Aviv University⁴

Physarum machine: implementation of a kolmogorov-uspensky machine on a biological substrate

TL;DR: In this paper, a lazy list-based implementation of a concurrent set object is presented, which is based on an optimistic locking scheme for inserts and removes and includes a simple wait-free membership test.

...read moreread less

Abstract: We present a novel “lazy” list-based implementation of a concurrent set object It is based on an optimistic locking scheme for inserts and removes and includes a simple wait-free membership test Our algorithm improves on the performance of all previous such algorithms

...read moreread less

143 citations

Journal Article•DOI•

[...]

Andrew Adamatzky¹•Institutions (1)

University of the West of England¹

Shared Risk Resource Group : Complexity and Approximability Issues

TL;DR: A Kolmogorov-Uspensky machine on the Plasmodium of the slime mold Physarum polycephalum is implemented and basic operations, and elements of programming are illustrated.

...read moreread less

Abstract: We implement a Kolmogorov-Uspensky machine on the Plasmodium of the slime mold Physarum polycephalum. We provide experimental findings on realization of the machine instructions, illustrate basic operations, and elements of programming.

...read moreread less

122 citations

Journal Article•DOI•

[...]

David Coudert¹, Pallab Datta², Stéphane Pérennes¹, Hervé Rivano¹, Marie-Emilie Voge¹ - Show less +1 more•Institutions (2)

French Institute for Research in Computer Science and Automation¹, Los Alamos National Laboratory²

Languages for high-productivity computing: the darpa hpcs language project

TL;DR: Cases where combinatorial optimization problems are polynomial are identified, for example when the edges of a given color form a connected subgraph, and otherwise hardness and non approximability results for these problems are identified.

...read moreread less

Abstract: This article investigates complexity and approximability properties of combinatorial optimization problems yielded by the notion of Shared Risk Resource Group (SRRG). SRRG has been introduced in order to capture network survivability issues where a failure may break a whole set of resources, and has been formalized as colored graphs, where a set of resources is represented by a set of edges with same color. We consider here the analogous of classical problems such as determining paths or cuts with the minimum numbers of colors or color disjoint paths. These optimization problems are much more difficult than their counterparts in classical graph theory. In particular standard relationship such as the Max Flow - Min Cut equality do not hold any longer. In this article we identify cases where these problems are polynomial, for example when the edges of a given color form a connected subgraph, and otherwise give hardness and non approximability results for these problems.

...read moreread less

92 citations

Journal Article•DOI•

[...]

Ewing Lusk¹, Katherine Yelick²•Institutions (2)

Argonne National Laboratory¹, University of California, Berkeley²

Failure management in grids: the case of the egee infrastructure

TL;DR: The challenges facing any new language for scalable parallel computing, including the strong competition presented by MPI and the existing Partitioned Global Address Space (PGAS) Languages are described.

...read moreread less

Abstract: We present a summary of the current state of DARPA's HPCS language project. We describe the challenges facing any new language for scalable parallel computing, including the strong competition presented by MPI and the existing Partitioned Global Address Space (PGAS) Languages. We identify some of the major features of the proposed languages, using MPI and the PGAS languages for comparison, and describe the opportunities for higher productivity along with the implementation challenges. Finally, we present the conclusions of a recent workshop in which a concrete plan for the next few years was proposed.

...read moreread less

50 citations

Journal Article•DOI•

[...]

Kyriakos Neocleous¹, Marios D. Dikaiakos¹, Paraskevi Fragopoulou², Evangelos P. Markatos²•Institutions (2)

University of Cyprus¹, Foundation for Research & Technology – Hellas²

A taxonomy of task scheduling algorithms in the grid

TL;DR: This paper studies the reasons behind Grid job failures in the context of EGEE, the largest Grid infrastructure currently in operation, and proposes the architecture for a system that could provide failure management support to administrators and end-users of large-scale Grid infrastructures like EGEE.

...read moreread less

Abstract: The emergence of Grid infrastructures like EGEE has enabled the deployment of large-scale computational experiments that address challenging scientific problems in various fields. However, to realize their full potential, Grid infrastructures need to achieve a higher degree of dependability, i.e., they need to improve the ratio of Grid-job requests that complete successfully in the presence of Grid-component failures. To achieve this, however, we need to determine, analyze and classify the causes of job failures on Grids. In this paper we study the reasons behind Grid job failures in the context of EGEE, the largest Grid infrastructure currently in operation. We present points of failure in a Grid that affect the execution of jobs, and describe error types and contributing factors. We discuss various information sources that provide users and administrators with indications about failures, and assess their usefulness based on error information accuracy and completeness. We describe two real-life case studies, describing failures that occurred on a production site of EGEE and the troubleshooting process for each case. Finally, we propose the architecture for a system that could provide failure management support to administrators and end-users of large-scale Grid infrastructures like EGEE.

...read moreread less

19 citations

Journal Article•DOI•

[...]

Fangpeng Dong¹•Institutions (1)

Queen's University¹

Open MPI : A high performance, flexible implementation of MPI point-to-point communications

TL;DR: With the help of an abstract scheduling architecture, some key features of the task scheduling problem in the Grid are discussed, followed by a taxonomy of the scheduling algorithms.

...read moreread less

Abstract: One motivation of Grid computing is to aggregate the power of widely distributed resources, and provide non-trivial services to users. To achieve this goal, efficient task scheduling algorithms are essential. However, scheduling algorithms in the Grid present high diversities that need to be classified. In this paper, with the help of an abstract scheduling architecture, some key features of the task scheduling problem in the Grid are discussed, followed by a taxonomy of the scheduling algorithms. Some typical examples are given in each category to present a picture of the current research and help to find new research problems.

...read moreread less

16 citations

Journal Article•DOI•

[...]

Richard L. Graham¹, Brian W. Barrett², Galen M. Shipman², Timothy S. Woodall², George Bosilca³ - Show less +1 more•Institutions (3)

National Center for Computational Sciences¹, Los Alamos National Laboratory², University of Tennessee³

Computing many maximal independent sets for hypergraphs in parallel

TL;DR: This paper describes the three point-to-point communications protocols currently supported in the Open MPI implementation, supported with performance data, and includes comparisons with other MPI implementations using the OpenIB, MX, and GM communications libraries.

...read moreread less

Abstract: Open MPI's point-to-point communications abstractions, described in this paper, handle several different communications scenarios, with a portable, high-performance design and implementation. These abstractions support two types of low-level communication protocols – general purpose point-to-point communications, like the OpenIB interface, and MPI-like interfaces, such as Myricom's MX library. Support for the first type of protocols makes use of all communications resources available to a given application run, with optional support for communications error recovery. The latter provides a interface layer, relying on the communications library to guarantee correct MPI message ordering and matching. This paper describes the three point-to-point communications protocols currently supported in the Open MPI implementation, supported with performance data. This includes comparisons with other MPI implementations using the OpenIB, MX, and GM communications libraries.

...read moreread less

15 citations

Journal Article•DOI•

[...]

Leonid Khachiyan¹, Endre Boros¹, Vladimir Gurvich¹, Khaled Elbassioni²•Institutions (2)

Rutgers University¹, Max Planck Society²

An exploration of non-asymptotic low-density, parity check erasure codes for wide-area storage applications

TL;DR: There is a deterministic algorithm that, given a uniformly δ-sparse hypergraph, and a positive integer k, outputs k or all minimal transversals for in O (δlog(1 + k)polylog(δ|V|))-time using |V|O(logδ)kO( δ) processors.

...read moreread less

Abstract: A hypergraph is called uniformly δ-sparse if for every nonempty subset X ⊆ V of vertices, the average degree of the sub-hypergraph of induced by X is at most δ. We show that there is a deterministic algorithm that, given a uniformly δ-sparse hypergraph , and a positive integer k, outputs k or all minimal transversals for in O(δlog(1 + k)polylog(δ|V|))-time using |V|O(logδ)kO(δ) processors. Equivalently, the algorithm can be used to compute in parallel k or all maximal independent sets for .

...read moreread less

15 citations

Journal Article•DOI•

[...]

James S. Plank¹, Michael G. Thomason¹•Institutions (1)

University of Tennessee¹

Aspects of biomolecular computing

TL;DR: This analysis focuses on the performance of individual codes for finite systems, and addresses several important heretofore unanswered questions about employing LDPC codes in real-world systems.

...read moreread less

Abstract: As peer-to-peer and widely distributed storage systems proliferate, the need to perform efficient erasure coding, instead of replication, is crucial to performance and efficiency. Low-Density Parity-Check (LDPC) codes have arisen as alternatives to standard erasure codes, such as Reed-Solomon codes, trading off vastly improved decoding performance for inefficiencies in the amount of data that must be acquired to perform decoding. The scores of papers written on LDPC codes typically analyze their collective and asymptotic behavior. Unfortunately, their practical application requires the generation and analysis of individual codes for finite systems. This paper attempts to illuminate the practical considerations of LDPC codes for peer-to-peer and distributed storage systems. The three main types of LDPC codes are detailed, and a huge variety of codes are generated, then analyzed using simulation. This analysis focuses on the performance of individual codes for finite systems, and addresses several important heretofore unanswered questions about employing LDPC codes in real-world systems.

...read moreread less

9 citations

Journal Article•DOI•

[...]

Naya Nagy¹, Selim G. Akl¹•Institutions (1)

Queen's University¹

Using Model-based Clustering to Improve Predictions for Queueing Delay on Parallel Machines

TL;DR: This paper is intended as a survey of the state of the art of some branches of Biomolecular Computing, a field in full development, with the promise of important results from the perspective of both Computer Science and Biology.

...read moreread less

Abstract: This paper is intended as a survey of the state of the art of some branches of Biomolecular Computing. Biomolecular Computing aims to use biological hardware (biomare), rather than chips, to build a computer. We discuss the following three main research directions: DNA computing, membrane systems, and gene assembly in ciliates. DNA computing combines practical results together with theoretical algorithm design. Various search problems have been implemented using DNA strands. Membrane systems are a family of computational models inspired by the membrane structure of living cells. The process of gene assembly in ciliates has been formalized as an abstract computational model. Biomolecular Computing is a field in full development, with the promise of important results from the perspective of both Computer Science (models of computation) and Biology (understanding biological processes).

...read moreread less

Journal Article•DOI•

[...]

John Brevik¹, Daniel Nurmi¹, Richard Wolski¹•Institutions (1)

University of California, Santa Barbara¹

On a unified neighbourhood broadcasting scheme for interconnection networks

TL;DR: It is shown that clustering either by requested time, requested number of processors, or the product of the two generally produces more accurate predictions than earlier, more naive, approaches and that automatic clustering outperforms administrator-determined clustering.

...read moreread less

Abstract: Most space-sharing parallel computers presently operated by production high-performance computing centers use batch-queuing systems to manage processor allocation. In many cases, users wishing to use these batch-queued resources may choose among different queues (charging different amounts) potentially on a number of machines to which they have access. In such a situation, the amount of time a user's job will wait in any one batch queue can be a significant portion of the overall time from job submission to job completion. It thus becomes desirable to provide a prediction for the amount of time a given job can expect to wait in the queue. Further, it is natural to expect that attributes of an incoming job, specifically the number of processors requested and the amount of time requested, might impact that job's wait time. In this work, we explore the possibility of generating accurate predictions by automatically grouping jobs having similar attributes using model-based clustering. Moreover, we implement this clustering technique for a time series of jobs so that predictions of future wait times can be generated in real time. Using trace-based simulation on data from 7 machines over a 9-year period from across the country, comprising over one million job records, we show that clustering either by requested time, requested number of processors, or the product of the two generally produces more accurate predictions than earlier, more naive, approaches and that automatic clustering outperforms administrator-determined clustering.

...read moreread less

Journal Article•DOI•

[...]

Ke Qiu¹•Institutions (1)

Brock University¹

Scheduling algorithms for data redistribution and load-balancing on master-slave platforms

TL;DR: This work uses a general neighbourhood broadcasting scheme to develop a neighbourhood broadcasting algorithm for the star interconnection network that is asymptotically optimal, conceptually simple, and easy to implement since routing for all nodes involved is uniform.

...read moreread less

Abstract: The neighbourhood broadcasting problem in an interconnection network is defined as sending a fixed sized message from the source node to all its neighbours in a single-port model. Previously, this problem has been studied for several interconnection networks including the hypercube and the star. The objective of such works has been to minimize the total number of steps required for the neighbourhood broadcasting algorithms. Here, we first use a general neighbourhood broadcasting scheme to develop a neighbourhood broadcasting algorithm for the star interconnection network that is asymptotically optimal, conceptually simple, and easy to implement since routing for all nodes involved is uniform. It uses the cycle structures of the star graph as well as the standard technique of recursive doubling. We then show that the scheme for the star network is general enough to be applied to a broader family of interconnection networks such as the pancake interconnection network for which no previous neighbourhood broadcasting algorithm is known, resulting in asymptotically optimal algorithms. Finally, we use this scheme to develop neighbourhood broadcasting algorithms for multiple messages for several interconnection networks.

...read moreread less

Journal Article•DOI•

[...]

Loris Marchal¹, Veronika Rehn¹, Yves Robert¹, Frédéric Vivien¹•Institutions (1)

École normale supérieure de Lyon¹

Photocomputing : explorations with transparency and opacity

TL;DR: The NP completeness of the problem of scheduling and redistributing data on master-slave platforms is proved, and optimal polynomial algorithms for special important topologies are presented.

...read moreread less

Abstract: In this work we are interested in the problem of scheduling and redistributing data on master-slave platforms. We consider the case were the workers possess initial loads, some of which having to be redistributed in order to balance their completion times. We assume that the data consists of independent and identical tasks. We prove the NP completeness of the problem for fully heterogeneous platforms. Also, we present optimal polynomial algorithms for special important topologies: a simple greedy algorithm for homogeneous star-networks, and a more complicated algorithm for platforms with homogeneous communication links and heterogeneous workers.

...read moreread less

Journal Article•DOI•

[...]

Tom Head¹•Institutions (1)

Binghamton University¹

Failure-sensitive analysis of parallel algorithms with controlled memory access concurrency

TL;DR: An algorithm for solving instances of the Boolean satisfiability problem is given and illustrated using a photocopying machine with plastic transparencies as medium and requires the assumption that information can be stored with a density that is exponential in the number of variables in the problem instance.

...read moreread less

Abstract: We continue to search for methods of parallel computing using light. An algorithm for solving instances of the Boolean satisfiability problem is given and illustrated using a photocopying machine with plastic transparencies as medium. The algorithm solves satisfiability problems in linear time but requires the assumption that information can be stored with a density that is exponential in the number of variables in the problem instance. Consideration is given to situations in which this density limitation is not quite absolute.

...read moreread less

Journal Article•DOI•

[...]

Chryssis Georgiou¹, Alexander Russell², Alexander A. Shvartsman²•Institutions (2)

University of Cyprus¹, University of Connecticut²

Improving the performance of multi-objective evolutionary algorithms using the island parallel model

TL;DR: D determinitic solutions for the Write-All and iterative Write- all problems in the fail-stop synchronous CRCW PRAM model where memory access concurrency needs to be controlled are considered.

...read moreread less

Abstract: The abstract problem of using P failure-prone processors to cooperatively update all locations of an N-element shared array is called Write-All. Solutions to Write-All can be used iteratively to construct efficient simulations of PRAM algorithms on failure-prone PRAMS. Such use of Write-All in simulations is abstracted in terms of the iterative Write-All problem. The efficiency of the algorithmic solutions for Write-All and iterative Write-All is measured in terms of work complexity where all processing steps taken by the processors are counted. This paper considers determinitic solutions for the Write-All and iterative Write-All problems in the fail-stop synchronous CRCW PRAM model where memory access concurrency needs to be controlled. A deterministic algorithm of Kanellakis, Michailidis, and Shvartsman [16] efficiently solves the Write-All problem in this model, while controlling read and write memory access concurrency. However it was not shown how the number of processor failures f affects the work efficiency of the algorithm. The results herein give a new analysis of the algorithm [16] that obtain failure-sensitive work bounds, while retaining the known memory access concurrency bounds. Specifically, the new result expresses the work bound as a function of N, Pandf. Another contribution in this paper is the new failure-sensitive analysis for iterative Write-All with controlled memory access concurrency. This result yields tighter bounds on work (vs. [16]) for simulations of PRAM algorithms on fail-stop PRAMS.

...read moreread less

Journal Article•DOI•

[...]

A. L. Márquez¹, Consolación Gil¹, Raúl Baños¹, Julio Gómez¹•Institutions (1)

University of Almería¹

Improved runtime and transfer time prediction mechanisms in a network enabled servers middleware

TL;DR: This paper presents an island-based parallelization of five multi-objective evolutionary algorithms: NSGAII, SPEA2, PESA, msPESA, and a new hybrid version they propose, and experimental results denote that the quality of the solutions tends to improve when the number of islands increases.

...read moreread less

Abstract: Recently, the research interest in multi-objective optimization has increased remarkably. Most of the proposed methods use a population of solutions that are simultaneously improved trying to approximate them to the Pareto-optimal front. When the population size increases, the quality of the solutions tends to be better, but the runtime is higher. This paper presents how to apply parallel processing to enhance the convergence to the Pareto-optimal front, without increasing the runtime. In particular, we present an island-based parallelization of five multi-objective evolutionary algorithms: NSGAII, SPEA2, PESA, msPESA, and a new hybrid version we propose. Experimental results in some test problems denote that the quality of the solutions tends to improve when the number of islands increases.

...read moreread less

Journal Article•DOI•

[...]

Emmanuel Jeannot¹, Keith Seymour², Asym Yarkhan², Jack Dongarra²•Institutions (2)

French Institute for Research in Computer Science and Automation¹, University of Tennessee²

A performance model of disha routing in k-ary n-cube networks

TL;DR: In this article, the authors address the problem of accurately estimating the runtime and communication time of a client request in a Network Enabled Server (NES) middleware such as GridSolve, using a template based model for the runtime estimation and a client-server communication test for the transfer time estimation.

...read moreread less

Abstract: In this paper we address the problem of accurately estimating the runtime and communication time of a client request in a Network Enabled Server (NES) middleware such as GridSolve. We use a template based model for the runtime estimation and a client-server communication test for the transfer time estimation. We implement these two mechanisms in GridSolve and test them on a real testbed. Experiments show that they allow for significant improvement in terms of client execution time on various scenarios.

...read moreread less

Journal Article•DOI•

[...]

Ahmad Khonsari¹, Alireza Shahrabi², Mohamed Ould-Khaoua³•Institutions (3)

University of Tehran¹, Glasgow Caledonian University², University of Glasgow³

An architectural approach to building grids using legacy code

TL;DR: A new analytical model of Disha in wormhole-routed k-ary n-cubes is proposed and it is confirmed that the proposed model exhibits a good degree of accuracy for various networks sizes and under different traffic conditions.

...read moreread less

Abstract: A number of analytical models for predicting message latency in k-ary n-cubes have recently been reported in the literature. Most of these models, however, have been discussed for adaptive routing algorithms based on deadlock avoidance, e.g. Duato's routing. Several research studies have empirically demonstrated that routing algorithms based on deadlock recovery offer maximal adaptivity that can result in considerable improvement in network performance. Disha is an example of a true fully adaptive routing algorithm that uses minimal hardware to implement a simple and efficient progressive method to recover from potential deadlocks. This paper proposes a new analytical model of Disha in wormhole-routed k-ary n-cubes. Simulation experiments confirm that the proposed model exhibits a good degree of accuracy for various networks sizes and under different traffic conditions.

...read moreread less

Journal Article•DOI•

[...]

Eric Gamess¹•Institutions (1)

Central University of Venezuela¹

PARALLELIZING THREE DIMENSIONAL CELLULAR AUTOMATA WITH OpenMP

TL;DR: A new architecture to build grids that can execute parallel programs based on legacy code is presented that is layer based and software component performances are validated with benchmarks.

...read moreread less

Abstract: In this paper, we present a new architecture to build grids that can execute parallel programs based on legacy code. This architecture is layer based and software component performances are validated with benchmarks. To illustrate the construction of a grid using the proposed architecture, we develop a case study that consists of a grid oriented to efficient execution of Java bytecode for which we validate and integrate legacy code of parallel linear algebra.

...read moreread less

Journal Article•DOI•

[...]

Santiago García Carbajal¹•Institutions (1)

University of Oviedo¹