scispace - formally typeset
Search or ask a question

Showing papers on "Bulk synchronous parallel published in 2009"


Proceedings ArticleDOI
28 Jun 2009
TL;DR: This paper proposes a novel community detection algorithm, which utilizes a dynamic process by contradicting the network topology and the topology-based propinquity, where thepropinquity is a measure of the probability for a pair of nodes involved in a coherent community structure.
Abstract: Graphs or networks can be used to model complex systems. Detecting community structures from large network data is a classic and challenging task. In this paper, we propose a novel community detection algorithm, which utilizes a dynamic process by contradicting the network topology and the topology-based propinquity, where the propinquity is a measure of the probability for a pair of nodes involved in a coherent community structure. Through several rounds of mutual reinforcement between topology and propinquity, the community structures are expected to naturally emerge. The overlapping vertices shared between communities can also be easily identified by an additional simple postprocessing. To achieve better efficiency, the propinquity is incrementally calculated. We implement the algorithm on a vertex-oriented bulk synchronous parallel(BSP) model so that the mining load can be distributed on thousands of machines. We obtained interesting experimental results on several real network data.

131 citations


Journal ArticleDOI
Wei Dong1, Peng Li1
TL;DR: The proposed parallel preconditioning technique can be combined with more conventional parallel approaches such as parallel device model evaluation, parallel fast Fourier transform operation, and parallel matrix-vector product to further improve runtime efficiency.
Abstract: In this paper, we present a parallel harmonic-balance approach, applicable to the steady-state and envelope-following analyses of both driven and autonomous circuits. Our approach is centered on a naturally parallelizable preconditioning technique that speeds up the core computation in harmonic-balance-based analysis. As a coarse-grained parallel approach by algorithm construction, the proposed method facilitates parallel computing via the use of domain knowledge and simplifies parallel programming compared with fine-grained strategies. The proposed parallel preconditioning technique can be combined with more conventional parallel approaches such as parallel device model evaluation, parallel fast Fourier transform operation, and parallel matrix-vector product to further improve runtime efficiency. In our message-passing-interface-based implementation over a cluster of workstations and multithreading-based implementation on a shared-memory machine, favorable runtime speedups with respect to the conventional serial approaches and the serial implementations of the same parallel algorithms are achieved.

24 citations


Book ChapterDOI
TL;DR: OSL, the Orleans Skeleton Library is presented: it is a library of BSP algorithmic skeletons in C++ that offers data-parallel skeletons on arrays as well as communication oriented skeletons.
Abstract: The existing solutions to program parallel architectures range from parallelizing compilers to distributed concurrent programming Intermediate approaches propose a more structured parallelism: Algorithmic skeletons are higher-order functions that capture the patterns of parallel algorithms The user of the library has just to compose some of the skeletons to write her parallel application When one is designing a parallel program, the parallel performance is important It is thus very interesting for the programmer to rely on a simple yet realistic parallel performance model such as the Bulk Synchronous Parallel (BSP) model We present OSL, the Orleans Skeleton Library: it is a library of BSP algorithmic skeletons in C++ It offers data-parallel skeletons on arrays as well as communication oriented skeletons The performance of OSL is demonstrated with two applications: heat equation and FFT

20 citations


Proceedings ArticleDOI
01 Dec 2009
TL;DR: A new parallel join algorithm for heterogeneous distributed architectures based on an efficient dynamic data distribution and task allocation which makes it insensitive to data skew and ensures perfect balancing properties during all stages of join computation is presented.
Abstract: Owing to the fast development of network technologies, executing parallel programs on distributed systems that connect heterogeneous machines became feasible but we still face some challenges: Workload imbalance in such environment may not only be due to uneven load distribution among machines as in parallel systems but also due to distribution that is not adequate with the characteristics of each machine. In this paper, we present a new parallel join algorithm for heterogeneous distributed architectures based on an efficient dynamic data distribution and task allocation which makes it insensitive to data skew and ensures perfect balancing properties during all stages of join computation. The performance of this algorithm is analyzed using the scalable and portable BSP (Bulk Synchronous Parallel) cost model. We show that our algorithm guarantees optimal complexity and near linear speed-up while reducing communication and disk input/output costs to a minimum.

17 citations


Proceedings ArticleDOI
25 Jun 2009
TL;DR: MigBSP is a model that controls processes rescheduling in BSP (Bulk Synchronous Parallel) applications and its final idea is to adjust the processes location in order to reduce the supersteps’ times.
Abstract: We have developed a model called MigBSP that controls processes rescheduling in BSP (Bulk Synchronous Parallel)applications. A BSP application is composed by one or more supersteps, each one containing both computation and communication phases followed by a synchronization barrier. Since the barrier waits for the slowest process, MigBSP’s final idea is to adjust the processes location in order to reduce the supersteps’ times. Considering the scope of the BSP model, the novel ideas of MigBSPare: (i) combination of three metrics - Memory, Computation and Communication - to measure the potential of migration of each BSP process; (ii) use of both Computation and Communication Patterns to control processes’ regularity;(iii) adaptation regarding the periodicity to launch the processes rescheduling. This paper describes MigBSP and presents some experimental results and related work.

15 citations


Book ChapterDOI
20 May 2009
TL;DR: The main contribution of this paper includes the viability to use processes migration on irregular BSP applications, and presented that automatic processes rebalancing is an effortless technique to obtain performance.
Abstract: This paper shows an evaluation of processes rescheduling over an irregular BSP (Bulk Synchronous Parallel) application. Such application is based on dynamic programming and its irregularity is presented through the variation of computation density along the matrix' cells. We are using MigBSP model for processes rescheduling, which combines multiple metrics - Computation, Communication and Memory - to decide about processes migration. The main contribution of this paper includes the viability to use processes migration on irregular BSP applications. Instead to adjust the load of each process by hand, we presented that automatic processes rebalancing is an effortless technique to obtain performance. The results showed gains greater than 10% over our multi-cluster architecture. Moreover, an acceptable overhead from MigBSP was observed when no migrations happen during application execution.

7 citations


01 Jan 2009
TL;DR: The problem of the past syntax, the new one, matching of parallel values and exceptions, and a smart and uniform syntax for parallel patterns and exceptions handlers in BSML are presented.
Abstract: Bulk-Synchronous Parallel (BSP) ML is a high-level language for programming parallel algorithms. Built upon OCaml, it provides a safe setting for the implementation of BSP algorithms and avoiding concurrency related problems (deadlocks, indeterminism etc.). Currently, BSML is based on a very small core of parallel primitives that extended ML sequential programming to BSP one. But we found that currently the price was to read programs with hardness. We have thus choose to design a new syntax that makes programs easier to read and so to debug. This new syntax also gives us a smart and uniform syntax for parallel patterns and exceptions handlers in BSML. In this paper, we present the problem of the past syntax, the new one, matching of parallel values and exceptions. Implementations are also detailed and examples are given to show the useful of the work (and BSML). In final, some benchmarks complete this article.

7 citations


Proceedings ArticleDOI
23 May 2009
TL;DR: A new implementation of this primitive, called parallel superposition, is presented based on a continuation-passing-style (CPS) transformation guided by a flow analysis, which allows an estimation of execution time, avoids deadlocks and non-determinism.
Abstract: BSML is a ML based language designed to code Bulk Synchronous Parallel (BSP) algorithms. It allows an estimation of execution time, avoids deadlocks and non-determinism. BSML proposes an extension of ML programming with a small set of primitives. One of these primitives, called parallel superposition, allows the parallel composition of two BSP programs. Nevertheless, its past implementation used system threads and have unjustified limitations. This paper presents a new implementation of this primitive based on a continuation-passing-style (CPS) transformation guided by a flow analysis. To test it and show its usefulness, we also have implemented the OCamlP3l algorithmic skeletons and compared their efficiencies with the original ones.

7 citations


Journal ArticleDOI
TL;DR: This paper presents efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection.
Abstract: In this paper, we present efficient, scalable, and portable parallel algorithms for the off-line clustering, the on-line retrieval and the update phases of the Text Retrieval (TR) problem based on the vector space model and using clustering to organize and handle a dynamic document collection. The algorithms are running on the Coarse-Grained Multicomputer (CGM) and/or the Bulk Synchronous Parallel (BSP) model which are two models that capture within a few parameters the characteristics of the parallel machine. To the best of our knowledge, our parallel retrieval algorithms are the first ones analyzed under these specific parallel models. For all the phases of the proposed algorithms, we analytically determine the relevant communication and computation cost thereby formally proving the efficiency of the proposed solutions. In addition, we prove that our technique for the on-line retrieval phase performs very well in comparison to other possible alternatives in the typical case of a multiuser information retrieval (IR) system where a number of user queries are concurrently submitted to an IR system. Finally, we discuss external memory issues and show how our techniques can be adapted to the case when processors have limited main memory but sufficient disk capacity for holding their local data.

6 citations


01 Jan 2009
TL;DR: A chronology of key events and stories from the reporting and editing of the Pulitzer Prize-winning book, “Jurassic Park: The Making of a Novel” (2003):.
Abstract: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 RESUMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5 citations


Book ChapterDOI
20 Jun 2009
TL;DR: It is argued that the parallel computing research should form an integrated methodology of "architecture-algorithm-programming-application" only in this way, parallel Computing research becomes continuous development and more realistic.
Abstract: In this talk, we present a general survey on parallel computing. The main contents include parallel computer system which is the hardware platform of parallel computing, parallel algorithm which is the theoretical base of parallel computing, parallel programming which is the software support of parallel programming, parallel application which is the development impetus of parallel computing. Specially, we also introduce some enabling technologies of parallel application. We argue that the parallel computing research should form an integrated methodology of "architecture-algorithm-programming-application". Only in this way, parallel computing research becomes continuous development and more realistic.

Book ChapterDOI
06 Nov 2009
TL;DR: A continuum of coordination cost models and a range of analysis techniques are outlined, including six representative parallel/distributed applications of resource analysis techniques, and general principles governing why the combination of techniques is effective in its context are extracted.
Abstract: An important application of resource analysis is to improve the performance of parallel and distributed programs. In this context key resources are time, space and communication. Given the spectrum of cost models and associated analysis techniques available, what combination should be selected for a specific parallel or distributed context? We address the question as follows. We outline a continuum of coordination cost models and a range of analysis techniques. We consider six representative parallel/distributed applications of resource analysis techniques, and aim to extract general principles governing why the combination of techniques is effective in its context.

Journal ArticleDOI
TL;DR: Through example codes, it is shown that the description language is a convenient tool to design parallel algorithms due to its general iterative and recursive structures and the ease of modular design.

Proceedings ArticleDOI
01 Dec 2009
TL;DR: This work has taken Monte Carlo algorithm from my C++ Quantitative Library and rewrite it for this benchmark purpose and test the algorithm with some clever numerical adaptation with the Bulk Synchronous Parallel (BSP) computing model in order to leverage the distributed computing architecture.
Abstract: As financial institution computing requirements grow exponentially, we have explored the potential for the ClearSpeed Accelerator, the Cell processor and the FPGA (A field-programmable gate array) to run risk analytics applications. We also invented the Smoothed Alias Method based generator for FPGA in order to achieve the fast result. We have taken Monte Carlo algorithm from my C++ Quantitative Library and rewrite it for this benchmark purpose and test the algorithm with some clever numerical adaptation with the Bulk Synchronous Parallel (BSP) computing model in order to leverage the distributed computing architecture. Following the initial benchmark, we have chosen to use the ClearSpeed Accelerator. With some smart quant re-engineering, we have further optimized the Distributed MC algorithm for pricing Bermudan Swaption to exploit the potential of distributed-based architecture. We will show the comparative benchmark results of the MC algorithm on ClearSpeed Accelerator, Cell and FPGA platform for the first time within our industry based on my working notes from my time in Barclays Capital London.

Proceedings ArticleDOI
24 Nov 2009
TL;DR: A parallel integration model is brought forward, in which, the parallel programs need interoperate each other for the integration, and the interoperation ability can be achieved by the message communication in the parallel environment.
Abstract: The parallel programs can be integrated into the parallel software in order to avoid different departments or different applications developing a same type of parallel programs many times. This article brought forward a parallel integration model. In which, the parallel programs need interoperate each other for the integration, and the interoperation ability can be achieved by the message communication in the parallel environment, and the parallel programs can compose parallel software or bigger parallel programs.