Showing papers on "Bulk synchronous parallel published in 2009"

PDF

Open Access

Proceedings Article•DOI•

Parallel community detection on large networks with propinquity dynamics

[...]

Yuzhou Zhang¹, Jianyong Wang¹, Yi Wang², Lizhu Zhou¹•Institutions (2)

28 Jun 2009

TL;DR: This paper proposes a novel community detection algorithm, which utilizes a dynamic process by contradicting the network topology and the topology-based propinquity, where thepropinquity is a measure of the probability for a pair of nodes involved in a coherent community structure.

...read moreread less

Abstract: Graphs or networks can be used to model complex systems. Detecting community structures from large network data is a classic and challenging task. In this paper, we propose a novel community detection algorithm, which utilizes a dynamic process by contradicting the network topology and the topology-based propinquity, where the propinquity is a measure of the probability for a pair of nodes involved in a coherent community structure. Through several rounds of mutual reinforcement between topology and propinquity, the community structures are expected to naturally emerge. The overlapping vertices shared between communities can also be easily identified by an additional simple postprocessing. To achieve better efficiency, the propinquity is incrementally calculated. We implement the algorithm on a vertex-oriented bulk synchronous parallel(BSP) model so that the mining load can be distributed on thousands of machines. We obtained interesting experimental results on several real network data.

...read moreread less

131 citations

Journal Article•DOI•

A Parallel Harmonic-Balance Approach to Steady-State and Envelope-Following Simulation of Driven and Autonomous Circuits

[...]

Wei Dong¹, Peng Li¹•Institutions (1)

Texas A&M University¹

01 Apr 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: The proposed parallel preconditioning technique can be combined with more conventional parallel approaches such as parallel device model evaluation, parallel fast Fourier transform operation, and parallel matrix-vector product to further improve runtime efficiency.

...read moreread less

Abstract: In this paper, we present a parallel harmonic-balance approach, applicable to the steady-state and envelope-following analyses of both driven and autonomous circuits. Our approach is centered on a naturally parallelizable preconditioning technique that speeds up the core computation in harmonic-balance-based analysis. As a coarse-grained parallel approach by algorithm construction, the proposed method facilitates parallel computing via the use of domain knowledge and simplifies parallel programming compared with fine-grained strategies. The proposed parallel preconditioning technique can be combined with more conventional parallel approaches such as parallel device model evaluation, parallel fast Fourier transform operation, and parallel matrix-vector product to further improve runtime efficiency. In our message-passing-interface-based implementation over a cluster of workstations and multithreading-based implementation on a shared-memory machine, favorable runtime speedups with respect to the conventional serial approaches and the serial implementations of the same parallel algorithms are achieved.

...read moreread less

24 citations

Book Chapter•DOI•

OSL: Optimized Bulk Synchronous Parallel Skeletons on Distributed Arrays

[...]

Noman Javed¹, Frédéric Loulergue¹•Institutions (1)

University of Orléans¹

21 Aug 2009-Lecture Notes in Computer Science

TL;DR: OSL, the Orleans Skeleton Library is presented: it is a library of BSP algorithmic skeletons in C++ that offers data-parallel skeletons on arrays as well as communication oriented skeletons.

...read moreread less

Abstract: The existing solutions to program parallel architectures range from parallelizing compilers to distributed concurrent programming Intermediate approaches propose a more structured parallelism: Algorithmic skeletons are higher-order functions that capture the patterns of parallel algorithms The user of the library has just to compose some of the skeletons to write her parallel application When one is designing a parallel program, the parallel performance is important It is thus very interesting for the programmer to rely on a simple yet realistic parallel performance model such as the Bulk Synchronous Parallel (BSP) model We present OSL, the Orleans Skeleton Library: it is a library of BSP algorithmic skeletons in C++ It offers data-parallel skeletons on arrays as well as communication oriented skeletons The performance of OSL is demonstrated with two applications: heat equation and FFT

...read moreread less

20 citations

Proceedings Article•DOI•

An efficient parallel algorithm for evaluating join queries on heterogeneous distributed systems

[...]

M. Al Hajj Hassan¹, Mostafa Bamha¹•Institutions (1)

University of Orléans¹

01 Dec 2009

TL;DR: A new parallel join algorithm for heterogeneous distributed architectures based on an efficient dynamic data distribution and task allocation which makes it insensitive to data skew and ensures perfect balancing properties during all stages of join computation is presented.

...read moreread less

Abstract: Owing to the fast development of network technologies, executing parallel programs on distributed systems that connect heterogeneous machines became feasible but we still face some challenges: Workload imbalance in such environment may not only be due to uneven load distribution among machines as in parallel systems but also due to distribution that is not adequate with the characteristics of each machine. In this paper, we present a new parallel join algorithm for heterogeneous distributed architectures based on an efficient dynamic data distribution and task allocation which makes it insensitive to data skew and ensures perfect balancing properties during all stages of join computation. The performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model. We show that our algorithm guarantees optimal complexity and near linear speed-up while reducing communication and disk input/output costs to a minimum.

...read moreread less

17 citations

Proceedings Article•DOI•

MigBSP: A Novel Migration Model for Bulk-Synchronous Parallel Processes Rescheduling

[...]

Rodrigo da Rosa Righi, Laércio Lima Pilla, Alexandre Carissimi, Philippe O. A. Navaux, Hans-Ulrich Heiss¹ - Show less +1 more•Institutions (1)

Technical University of Berlin¹

25 Jun 2009

TL;DR: MigBSP is a model that controls processes rescheduling in BSP (Bulk Synchronous Parallel) applications and its final idea is to adjust the processes location in order to reduce the supersteps’ times.

...read moreread less

Abstract: We have developed a model called MigBSP that controls processes rescheduling in BSP (Bulk Synchronous Parallel)applications. A BSP application is composed by one or more supersteps, each one containing both computation and communication phases followed by a synchronization barrier. Since the barrier waits for the slowest process, MigBSP’s final idea is to adjust the processes location in order to reduce the supersteps’ times. Considering the scope of the BSP model, the novel ideas of MigBSPare: (i) combination of three metrics - Memory, Computation and Communication - to measure the potential of migration of each BSP process; (ii) use of both Computation and Communication Patterns to control processes’ regularity;(iii) adaptation regarding the periodicity to launch the processes rescheduling. This paper describes MigBSP and presents some experimental results and related work.

...read moreread less

15 citations

Book Chapter•DOI•

Applying Processes Rescheduling over Irregular BSP Application

[...]

Rodrigo da Rosa Righi¹, Laércio Lima Pilla¹, Alexandre Carissimi¹, Philippe O. A. Navaux¹, Hans-Ulrich Heiss² - Show less +1 more•Institutions (2)

Universidade Federal do Rio Grande do Sul¹, Technical University of Berlin²

20 May 2009

TL;DR: The main contribution of this paper includes the viability to use processes migration on irregular BSP applications, and presented that automatic processes rebalancing is an effortless technique to obtain performance.

...read moreread less

Abstract: This paper shows an evaluation of processes rescheduling over an irregular BSP (Bulk Synchronous Parallel) application. Such application is based on dynamic programming and its irregularity is presented through the variation of computation density along the matrix' cells. We are using MigBSP model for processes rescheduling, which combines multiple metrics - Computation, Communication and Memory - to decide about processes migration. The main contribution of this paper includes the viability to use processes migration on irregular BSP applications. Instead to adjust the load of each process by hand, we presented that automatic processes rebalancing is an effortless technique to obtain performance. The results showed gains greater than 10% over our multi-cluster architecture. Moreover, an acceptable overhead from MigBSP was observed when no migrations happen during application execution.

...read moreread less

7 citations

New syntax of a High-level BSP language

[...]

Louis Gesbert, Frédéric Gava

01 Jan 2009

TL;DR: The problem of the past syntax, the new one, matching of parallel values and exceptions, and a smart and uniform syntax for parallel patterns and exceptions handlers in BSML are presented.

...read moreread less

Abstract: Bulk-Synchronous Parallel (BSP) ML is a high-level language for programming parallel algorithms. Built upon OCaml, it provides a safe setting for the implementation of BSP algorithms and avoiding concurrency related problems (deadlocks, indeterminism etc.). Currently, BSML is based on a very small core of parallel primitives that extended ML sequential programming to BSP one. But we found that currently the price was to read programs with hardness. We have thus choose to design a new syntax that makes programs easier to read and so to debug. This new syntax also gives us a smart and uniform syntax for parallel patterns and exceptions handlers in BSML. In this paper, we present the problem of the past syntax, the new one, matching of parallel values and exceptions. Implementations are also detailed and examples are given to show the useful of the work (and BSML). In final, some benchmarks complete this article.

...read moreread less

7 citations

Proceedings Article•DOI•

New implementation of a BSP composition primitive with application to the implementation of algorithmic skeletons

[...]

Frédéric Gava¹, Ilias Garnier•Institutions (1)

University of Paris¹

23 May 2009

TL;DR: A new implementation of this primitive, called parallel superposition, is presented based on a continuation-passing-style (CPS) transformation guided by a flow analysis, which allows an estimation of execution time, avoids deadlocks and non-determinism.

...read moreread less

Abstract: BSML is a ML based language designed to code Bulk Synchronous Parallel (BSP) algorithms. It allows an estimation of execution time, avoids deadlocks and non-determinism. BSML proposes an extension of ML programming with a small set of primitives. One of these primitives, called parallel superposition, allows the parallel composition of two BSP programs. Nevertheless, its past implementation used system threads and have unjustified limitations. This paper presents a new implementation of this primitive based on a continuation-passing-style (CPS) transformation guided by a flow analysis. To test it and show its usefulness, we also have implemented the OCamlP3l algorithmic skeletons and compared their efficiencies with the original ones.

...read moreread less

7 citations

Journal Article•DOI•

Efficient parallel Text Retrieval techniques on Bulk Synchronous Parallel (BSP)/Coarse Grained Multicomputers (CGM)

[...]

Charalampos Konstantopoulos¹, Basilis Mamalis², Grammati Pantziou², Damianos Gavalas³•Institutions (3)

Research Academic Computer Technology Institute¹, Technological Educational Institute of Athens², University of the Aegean³

01 Jun 2009-The Journal of Supercomputing

TL;DR: This paper presents efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection.

...read moreread less

Abstract: In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval (IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk capacity for holding their local data.

...read moreread less

6 citations

MigBSP : a new approach for processes rescheduling management on bulk synchronous parallel applications

[...]

Rodrigo da Rosa Righi

01 Jan 2009

TL;DR: A chronology of key events and stories from the reporting and editing of the Pulitzer Prize-winning book, “Jurassic Park: The Making of a Novel” (2003):.

...read moreread less

Abstract: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 RESUMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

...read moreread less

5 citations

Book Chapter•DOI•

Study on Parallel Computing

[...]

Guoliang Chen¹•Institutions (1)

University of Science and Technology of China¹

20 Jun 2009

TL;DR: It is argued that the parallel computing research should form an integrated methodology of "architecture-algorithm-programming-application" only in this way, parallel Computing research becomes continuous development and more realistic.

...read moreread less

Abstract: In this talk, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel programming, parallel application which is the development impetus of parallel computing. Specially, we also introduce some enabling technologies of parallel application. We argue that the parallel computing research should form an integrated methodology of "architecture-algorithm-programming-application". Only in this way, parallel computing research becomes continuous development and more realistic.

...read moreread less

Book Chapter•DOI•

Characterising effective resource analyses for parallel and distributed coordination

[...]

Phil Trinder¹, Murray Cole², Hans-Wolfgang Loidl¹, Greg Michaelson¹•Institutions (2)

Heriot-Watt University¹, University of Edinburgh²

06 Nov 2009

TL;DR: A continuum of coordination cost models and a range of analysis techniques are outlined, including six representative parallel/distributed applications of resource analysis techniques, and general principles governing why the combination of techniques is effective in its context are extracted.

...read moreread less

Abstract: An important application of resource analysis is to improve the performance of parallel and distributed programs. In this context key resources are time, space and communication. Given the spectrum of cost models and associated analysis techniques available, what combination should be selected for a specific parallel or distributed context? We address the question as follows. We outline a continuum of coordination cost models and a range of analysis techniques. We consider six representative parallel/distributed applications of resource analysis techniques, and aim to extract general principles governing why the combination of techniques is effective in its context.

...read moreread less

Journal Article•DOI•

Boolean circuit programming: A new paradigm to design parallel algorithms

[...]

Kunsoo Park¹, Heejin Park², Woo-Chul Jeun¹, Soonhoi Ha¹•Institutions (2)

Seoul National University¹, Hanyang University²

01 Jun 2009-Journal of Discrete Algorithms

TL;DR: Through example codes, it is shown that the description language is a convenient tool to design parallel algorithms due to its general iterative and recursive structures and the ease of modular design.

...read moreread less

Proceedings Article•DOI•

Distributed Monte Carlo simulation for option pricing: The first completed benchmark and applications of distributed Monte Carlo simulation model on high-performance computing architecture

[...]

Frank Yonghui Wang¹•Institutions (1)

Barclays Investment Bank¹

01 Dec 2009

TL;DR: This work has taken Monte Carlo algorithm from my C++ Quantitative Library and rewrite it for this benchmark purpose and test the algorithm with some clever numerical adaptation with the Bulk Synchronous Parallel (BSP) computing model in order to leverage the distributed computing architecture.

...read moreread less

Abstract: As financial institution computing requirements grow exponentially, we have explored the potential for the ClearSpeed Accelerator, the Cell processor and the FPGA (A field-programmable gate array) to run risk analytics applications. We also invented the Smoothed Alias Method based generator for FPGA in order to achieve the fast result. We have taken Monte Carlo algorithm from my C++ Quantitative Library and rewrite it for this benchmark purpose and test the algorithm with some clever numerical adaptation with the Bulk Synchronous Parallel (BSP) computing model in order to leverage the distributed computing architecture. Following the initial benchmark, we have chosen to use the ClearSpeed Accelerator. With some smart quant re-engineering, we have further optimized the Distributed MC algorithm for pricing Bermudan Swaption to exploit the potential of distributed-based architecture. We will show the comparative benchmark results of the MC algorithm on ClearSpeed Accelerator, Cell and FPGA platform for the first time within our industry based on my working notes from my time in Barclays Capital London.

...read moreread less

Proceedings Article•DOI•

Parallel integration model

[...]

Dingju Zhu¹, Jianping Fan¹•Institutions (1)

Chinese Academy of Sciences¹

24 Nov 2009

TL;DR: A parallel integration model is brought forward, in which, the parallel programs need interoperate each other for the integration, and the interoperation ability can be achieved by the message communication in the parallel environment.

...read moreread less

Abstract: The parallel programs can be integrated into the parallel software in order to avoid different departments or different applications developing a same type of parallel programs many times. This article brought forward a parallel integration model. In which, the parallel programs need interoperate each other for the integration, and the interoperation ability can be achieved by the message communication in the parallel environment, and the parallel programs can compose parallel software or bigger parallel programs.

...read moreread less