Showing papers on "Sequential algorithm published in 2011"

PDF

Open Access

Journal Article•DOI•

Elastic Bandwidth Allocation in Flexible OFDM-Based Optical Networks

[...]

Konstantinos Christodoulopoulos¹, Ioannis Tomkos, Emmanouel Varvarigos¹•Institutions (1)

01 May 2011-Journal of Lightwave Technology

TL;DR: This work introduces the Routing, Modulation Level and Spectrum Allocation (RMLSA) problem, as opposed to the typical Routing and Wavelength Assignment (RWA) problem of traditional WDM networks, proves that it is also NP-complete and presents various algorithms to solve it.

...read moreread less

Abstract: Orthogonal Frequency Division Multiplexing (OFDM) has recently been proposed as a modulation technique for optical networks, because of its good spectral efficiency, flexibility, and tolerance to impairments. We consider the planning problem of an OFDM optical network, where we are given a traffic matrix that includes the requested transmission rates of the connections to be served. Connections are provisioned for their requested rate by elastically allocating spectrum using a variable number of OFDM subcarriers and choosing an appropriate modulation level, taking into account the transmission distance. We introduce the Routing, Modulation Level and Spectrum Allocation (RMLSA) problem, as opposed to the typical Routing and Wavelength Assignment (RWA) problem of traditional WDM networks, prove that is also NP-complete and present various algorithms to solve it. We start by presenting an optimal ILP RMLSA algorithm that minimizes the spectrum used to serve the traffic matrix, and also present a decomposition method that breaks RMLSA into its two substituent subproblems, namely 1) routing and modulation level and 2) spectrum allocation (RML+SA), and solves them sequentially. We also propose a heuristic algorithm that serves connections one-by-one and use it to solve the planning problem by sequentially serving all the connections in the traffic matrix. In the sequential algorithm, we investigate two policies for defining the order in which connections are considered. We also use a simulated annealing meta-heuristic to obtain even better orderings. We examine the performance of the proposed algorithms through simulation experiments and evaluate the spectrum utilization benefits that can be obtained by utilizing OFDM elastic bandwidth allocation, when compared to a traditional WDM network.

...read moreread less

732 citations

Journal Article•DOI•

A sequential importance sampling algorithm for generating random graphs with prescribed degrees

[...]

Joseph K. Blitzstein¹, Persi Diaconis²•Institutions (2)

Harvard University¹, Stanford University²

09 Mar 2011-Internet Mathematics

TL;DR: An extension of a combinatorial characterization due to Erdős and Gallai is used to develop a sequential algorithm for generating a random labeled graph with a given degree sequence, which allows for surprisingly efficient sequential importance sampling.

...read moreread less

Abstract: Random graphs with given degrees are a natural next step in complexity beyond the Erdős–Renyi model, yet the degree constraint greatly complicates simulation and estimation. We use an extension of a combinatorial characterization due to Erdős and Gallai to develop a sequential algorithm for generating a random labeled graph with a given degree sequence. The algorithm is easy to implement and allows for surprisingly efficient sequential importance sampling. The resulting probabilities are easily computed on the fly, allowing the user to reweight estimators appropriately, in contrast to some ad hoc approaches that generate graphs with the desired degrees but with completely unknown probabilities. Applications are given, including simulating an ecological network and estimating the number of graphs with a given degree sequence.

...read moreread less

355 citations

Book•

A Fast and Simple Algorithm for the Maximum Flow Problem

[...]

Ravindra K. Ahuja¹, James B. Orlin²•Institutions (2)

Indian Institute of Technology Kanpur¹, Massachusetts Institute of Technology²

27 Aug 2011

TL;DR: This work presents a simple sequential algorithm for the maximum flow problem on a network with n nodes, m arcs, and integer arc capacities bounded by U and describes a parallel implementation that runs in On2 log U log p time in the PRAM model with EREW and uses only p processors.

...read moreread less

Abstract: We present a simple sequential algorithm for the maximum flow problem on a network with n nodes, m arcs, and integer arc capacities bounded by U. Under the practical assumption that U is polynomially bounded in n, our algorithm runs in time Onm + n2 log n. This result improves the previous best bound of Onm logn2/m, obtained by Goldberg and Tarjan, by a factor of log n for networks that are both nonsparse and nondense without using any complex data structures. We also describe a parallel implementation of the algorithm that runs in On2 log U log p time in the PRAM model with EREW and uses only p processors where p = âm/nâ.

...read moreread less

134 citations

Book•

Synthesizing linear-array algorithms from nested for loop algorithms

[...]

Lee Peizong¹, Zvi M. Kedem¹•Institutions (1)

New York University¹

12 Sep 2011

TL;DR: The mapping of algorithms structured as depth-p nested FOR loops into special-purpose systolic VLSI linear arrays is addressed by using linear functions to transform the original sequential algorithms into a form suitable for parallel execution on linear arrays.

...read moreread less

Abstract: The mapping of algorithms structured as depth-p nested FOR loops into special-purpose systolic VLSI linear arrays is addressed The mappings are done by using linear functions to transform the original sequential algorithms into a form suitable for parallel execution on linear arrays A feasible mapping is derived by identifying formal criteria to be satisfied by both the original sequential algorithm and the proposed transformation function The methodology is illustrated by synthesizing algorithms for matrix multiplication and a version of the Warshall-Floyd transitive closure algorithm >

...read moreread less

116 citations

Proceedings Article•DOI•

Algorithms for speeding up distance-based outlier detection

[...]

Kanishka Bhaduri¹, Bryan Matthews¹, Chris Giannella²•Institutions (2)

Ames Research Center¹, Mitre Corporation²

21 Aug 2011

TL;DR: By combining simple but effective indexing and disk block accessing techniques, a sequential algorithm iOrca is developed that is up to an order- of-magnitude faster than the state-of-the-art.

...read moreread less

Abstract: The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address this problem and develop sequential and distributed algorithms that are significantly more efficient than state-of-the-art methods while still guaranteeing the same outliers. By combining simple but effective indexing and disk block accessing techniques, we have developed a sequential algorithm iOrca that is up to an order-of-magnitude faster than the state-of-the-art. The indexing scheme is based on sorting the data points in order of increasing distance from a fixed reference point and then accessing those points based on this sorted order. To speed up the basic outlier detection technique, we develop two distributed algorithms (DOoR and iDOoR) for modern distributed multi-core clusters of machines, connected on a ring topology. The first algorithm passes data blocks from each machine around the ring, incrementally updating the nearest neighbors of the points passed. By maintaining a cutoff threshold, it is able to prune a large number of points in a distributed fashion. The second distributed algorithm extends this basic idea with the indexing scheme discussed earlier. In our experiments, both distributed algorithms exhibit significant improvements compared to the state-of-the-art distributed method [13].

...read moreread less

89 citations

Journal Article•DOI•

Implementations of a Parallel Algorithm for Computing Euclidean Distance Map in Multicore Processors and GPUs

[...]

Duhu Man¹, Kenji Uda¹, Hironobu Ueyama¹, Yasuaki Ito¹, Koji Nakano¹ - Show less +1 more•Institutions (1)

Hiroshima University¹

01 Jul 2011-International journal of networking and computing

TL;DR: A simple parallel algorithm for the EDM is developed and implemented and it achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system.

...read moreread less

Abstract: Given a 2-D binary image of size n×n, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in O(n2) and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, the presented parallel algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two different parallel platforms: multicore processors and Graphics Processing Units (GPUs). We have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in the following two modern GPU systems, Tesla C1060 and GTX 480, respectively. The experimental results have shown that, for an input binary image with size of 9216×9216, our implementation in the multicore system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 26 over the sequential algorithm implementation.

...read moreread less

72 citations

Journal Article•DOI•

A Fast Multiple Longest Common Subsequence (MLCS) Algorithm

[...]

Qingguo Wang¹, Dmitry Korkin¹, Yi Shang¹•Institutions (1)

University of Missouri¹

01 Mar 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper presents a new algorithm for the general case of multiple LCS, i.e., finding an LCS of any number of strings, and its parallel realization, based on the dominant point approach and employs a fast divide-and-conquer technique to compute the dominant points.

...read moreread less

Abstract: Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special cases, the increasing complexity and size of biological data require more efficient methods applicable to an arbitrary number of strings. In this paper, we present a new algorithm for the general case of multiple LCS (or MLCS) problem, i.e., finding an LCS of any number of strings, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divide-and-conquer technique to compute the dominant points. When applied to a case of three strings, our algorithm demonstrates the same performance as the fastest existing MLCS algorithm designed for that specific case. When applied to more than three strings, our algorithm is significantly faster than the best existing sequential methods, reaching up to 2-3 orders of magnitude faster speed on large-size problems. Finally, we present an efficient parallel implementation of the algorithm. Evaluating the parallel algorithm on a benchmark set of both random and biological sequences reveals a near-linear speedup with respect to the sequential algorithm.

...read moreread less

69 citations

Book Chapter•DOI•

Parallel community detection for massive graphs

[...]

E. Jason Riedy¹, Henning Meyerhenke², David Ediger¹, David A. Bader¹•Institutions (2)

Georgia Institute of Technology¹, Karlsruhe Institute of Technology²

11 Sep 2011

TL;DR: This work extends work on analyzing massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT.

...read moreread less

Abstract: Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore's sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output's modularity compares well with the standard sequential algorithms.

...read moreread less

60 citations

Proceedings Article•DOI•

A GPU Implementation of Computing Euclidean Distance Map with Efficient Memory Access

[...]

Duhu Man¹, Kenji Uda¹, Yasuaki Ito¹, Koji Nakano¹•Institutions (1)

Hiroshima University¹

30 Nov 2011

TL;DR: This paper presents a GPU implementation of computing Euclidean Distance Map (EDM) with efficient memory access, and shows that, for an input binary image with size of $9216, the implementation can achieve a speedup factor of 52 over the sequential algorithm implementation.

...read moreread less

Abstract: Recent Graphics Processing Units (GPUs), which have many processing units, can be used for general purpose parallel computation To utilize the powerful computing ability, GPUs are widely used for general purpose processing Since GPUs have very high memory bandwidth, the performance of GPUs greatly depends on memory access The main contribution of this paper is to present a GPU implementation of computing Euclidean Distance Map (EDM) with efficient memory access Given a 2-D binary image, EDM is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel In the proposed GPU implementation, we have considered many programming issues of the GPU system such as coalescing access of global memory, shared memory bank conflicts and partition camping In practice, we have implemented our parallel algorithm in the following two modern GPU systems: Tesla C1060 and GTX 480, respectively The experimental results have shown that, for an input binary image with size of $9216\times 9216$, our implementation can achieve a speedup factor of 52 over the sequential algorithm implementation

...read moreread less

49 citations

Book Chapter•DOI•

Parallelization Techniques for Random Number Generators

[...]

Thomas Bradley, Jacques du Toit, Robert Tong, Michael B. Giles, Paul Woodhams - Show less +1 more

01 Jan 2011

TL;DR: The way in which consideration of the number of registers required, the details of data dependency in advancing the state, and the desire for memory coalescence in storing the output lead to different implementations in the three cases is of most importance.

...read moreread less

Abstract: Publisher Summary Random number generation is a key component of many forms of simulation, and fast parallel generation is particularly important for the naturally parallel Monte Carlo simulations that are used extensively in computational finance and many areas of computational science and engineering. This chapter discusses the parallelization of three very popular random number generators. In each case, the random number sequence that is generated is identical to that produced on a CPU by the standard sequential algorithm. The key to the parallelization is that each CUDA thread block generates a particular block of numbers within the original sequence, and to do this step, it needs an efficient skip-ahead algorithm to jump to the start of its block. Although there is much in common in the underlying mathematical formulation of these three generators, there are also very significant differences owing to differences in the size of the state information required by each generator. The Intel random number generators are contained in the vector statistical library (VSL). This library is not multithreaded, but is thread safe and contains all the necessary skip-ahead functions to advance the generators' states. The way in which consideration of the number of registers required, the details of data dependency in advancing the state, and the desire for memory coalescence in storing the output lead to different implementations in the three cases is of most importance.

...read moreread less

33 citations

Journal Article•DOI•

A matching algorithm for catalytic residue site selection in computational enzyme design.

[...]

Yulin Lei¹, Wenjia Luo¹, Yushan Zhu¹•Institutions (1)

Tsinghua University¹

01 Sep 2011-Protein Science

TL;DR: The calculation results for the test set show that the native catalytic residue sites were successfully identified and ranked within the top 10 designs for 7 of the 10 chemical reactions, indicating that the matching algorithm has the potential to be used for designing industrial enzymes for desired reactions.

...read moreread less

Abstract: A loop closure-based sequential algorithm, PRODA_MATCH, was developed to match catalytic residues onto a scaffold for enzyme design in silico. The computational complexity of this algorithm is polynomial with respect to the number of active sites, the number of catalytic residues, and the maximal iteration number of cyclic coordinate descent steps. This matching algorithm is independent of a rotamer library that enables the catalytic residue to take any required conformation during the reaction coordinate. The catalytic geometric parameters defined between functional groups of transition state (TS) and the catalytic residues are continuously optimized to identify the accurate position of the TS. Pseudo-spheres are introduced for surrounding residues, which make the algorithm take binding into account as early as during the matching process. Recapitulation of native catalytic residue sites was used as a benchmark to evaluate the novel algorithm. The calculation results for the test set show that the native catalytic residue sites were successfully identified and ranked within the top 10 designs for 7 of the 10 chemical reactions. This indicates that the matching algorithm has the potential to be used for designing industrial enzymes for desired reactions.

...read moreread less

Proceedings Article•DOI•

Massive Jacobi power flow based on SIMD-processor

[...]

C. Vilacha¹, J. C. Moreira¹, E. Miguez¹, Antonio F. Otero¹•Institutions (1)

University of Vigo¹

08 May 2011

TL;DR: The results show a significant speed-up of the algorithm compared to the time required to solve the algorithm in a conventional CPU, even when a more efficient sequential algorithm, such as the Newton-Raphson, is used.

...read moreread less

Abstract: This paper presents an implementation of the Jacobi power flow algorithm to be run on a single instruction multiple data (SIMD) unit processor. The purpose is to be able to solve a large number of power flows in parallel as quickly as possible. This well-known algorithm was modified taking into account the characteristics of the SIMD architecture. The results show a significant speed-up of the algorithm compared to the time required to solve the algorithm in a conventional CPU, even when a more efficient sequential algorithm, such as the Newton-Raphson, is used. The accuracy of the performance has been validated with the results of the IEEE-118 standard network.

...read moreread less

Book Chapter•DOI•

Deconvolution of 3d fluorescence microscopy images using graphics processing units

[...]

Luisa D'Amore¹, Livia Marcellino¹, Valeria Mele¹, Diego Romano²•Institutions (2)

University of Naples Federico II¹, Indian Council of Agricultural Research²

11 Sep 2011

TL;DR: The deconvolution of 3D Fluorescence Microscopy RGB images is considered, describing the benefits arising from facing medical imaging problems on modern graphics processing units (GPUs), that are non expensive parallel processing devices available on many up-to-date personal computers.

...read moreread less

Abstract: We consider the deconvolution of 3D Fluorescence Microscopy RGB images, describing the benefits arising from facing medical imaging problems on modern graphics processing units (GPUs), that are non expensive parallel processing devices available on many up-to-date personal computers. We found that execution time of CUDA version is about 2 orders of magnitude less than the one of sequential algorithm. Anyway, the experiments lead some reflections upon the best setting for the CUDA-based algorithm. That is, we notice the need to model the GPUs architectures and their characteristics to better describe the performance of GPU-algorithms and what we can expect of them.

...read moreread less

Book Chapter•DOI•

Actor-based parallel dataflow analysis

[...]

Jonathan Rodriguez¹, Ondřej Lhoták¹•Institutions (1)

University of Waterloo¹

26 Mar 2011

TL;DR: This work presents IFDS-A, a parallel algorithm for solving context-sensitive interprocedural finite distributive subset (IFDS) dataflow problems, and concludes that Actors are an effective way to parallelize this type of algorithm.

...read moreread less

Abstract: Defining algorithms in a way which allows parallel execution is becoming increasingly important as multicore computers become ubiquitous. We present IFDS-A, a parallel algorithm for solving context-sensitive interprocedural finite distributive subset (IFDS) dataflow problems. IFDS-A defines these problems in terms of Actors, and dataflow dependencies as messages passed between these Actors. We implement the algorithm in Scala, and evaluate its performance against a comparable sequential algorithm.With eight cores, IFDS-A is 6.12 times as fast as with one core, and 3.35 times as fast as a baseline sequential algorithm. We also found that Scala's default Actors implementation is not optimal for this algorithm, and that a custom-built implementation outperforms it by a significant margin. We conclude that Actors are an effective way to parallelize this type of algorithm.

...read moreread less

Efficient and Scalable Algorithms for Network Motifs Discovery

[...]

Pedro Ribeiro

01 Jan 2011

TL;DR: A novel data-structure, g-tries, designed to represent a collection of graphs, is proposed, Akin to a prefix tree, which takes advantage of common substructures to both reduce the memory needed to store the graphs, and to produce a new more efficient sequential algorithm to compute their frequency as subgraphs of another larger graph.

...read moreread less

Abstract: Networks are a powerful representation for a multitude of natural and artificial systems. They are ubiquitous in real-world systems, presenting substantial non-trivial topological features. These are called complex networks and have received increasing attention in recent years. In order to understand their design principles, the concept of network motifs emerged. These are recurrent over-represented patterns of interconnections, conjectured to have some significance, that can be seen as basic building blocks of networks. Algorithmically, discovering network motifs is a hard problem related to graph isomorphism. The needed execution time grows exponentially as the size of networks or motifs increases, thus limiting their applicability. Since motifs are a fundamental concept, increasing the efficiency in its detection can lead to new insights in several areas of knowledge. To develop efficient and scalable algorithms for motifs discovery is precisely the main aim of this thesis. We provide a thorough survey of existing methods, complete with an associated chronology, taxonomy, algorithmic description and empirical evaluation and comparison. We propose a novel data-structure, g-tries, designed to represent a collection of graphs. Akin to a prefix tree, it takes advantage of common substructures to both reduce the memory needed to store the graphs, and to produce a new more efficient sequential algorithm to compute their frequency as subgraphs of another larger graph. We also introduce a sampling methodology for g-tries that successfully trades accuracy for faster execution times. We identify opportunities for parallelism in motif discovery, creating an associated taxonomy. We expose the whole motif computation as a tree based search and devise a general methodology for parallel execution with dynamic load balancing, including a novel strategy capable of efficiently stopping and dividing computation on the fly. In particular we provide parallel algorithms for ESU and g-tries. Finally, we extensively evaluate our algorithms on a set of diversified complex networks. We show that we are able to outperform all existing sequential algorithms, and are able to scale our parallel algorithms up to 128 processors almost linearly. By combining the power of g-tries and parallelism, we speedup motif discovery by several orders of magnitude, thus effectively pushing the limits in its applicability.

...read moreread less

Journal Article•DOI•

A sequential algorithm and error sensitivity analysis for the inverse heat conduction problems with multiple heat sources

[...]

Shih-Ming Lin¹•Institutions (1)

National Taitung Junior College¹

01 Jun 2011-Applied Mathematical Modelling

TL;DR: In this paper, the authors proposed a sequential approach to determine the unknown parameters for inverse heat conduction problems which have multiple time-dependent heat sources and discussed the sensitivity problem and analyzed what factors cause the growth in error sensitivity.

...read moreread less

Book Chapter•DOI•

Solving variants of the vehicle routing problem with a simple parallel iterated Tabu search

[...]

Mirko Maischberger¹, Jean-François Cordeau²•Institutions (2)

University of Florence¹, HEC Montréal²

13 Jun 2011

TL;DR: This work introduces a parallel iterated tabu search heuristic for solving eight different variants of the vehicle routing problem and shows that the proposed heuristic is both general and competitive with specific heuristics designed for each problem type.

...read moreread less

Abstract: We introduce a parallel iterated tabu search heuristic for solving eight different variants of the vehicle routing problem. Through extensive computational results we show that the proposed heuristic is both general and competitive with specific heuristics designed for each problem type.

...read moreread less

Proceedings Article•DOI•

Optimizing simulated annealing on GPU: A case study with IC floorplanning

[...]

Yiding Han¹, Sanghamitra Roy¹, Koushik Chakraborty¹•Institutions (1)

Utah State University¹

14 Mar 2011

TL;DR: A novel floorplanning algorithm based on simulated annealing on GPUs that achieves 6–160X speedup for a range of MCNC and GSRC benchmarks, while delivering comparable or better solution quality.

...read moreread less

Abstract: In this paper, we propose a novel floorplanning algorithm based on simulated annealing on GPUs. Simulated annealing is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Data (SIMD) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floorplan solution space, where we evaluate concurrent moves on a given floorplan. We illustrate several performance optimization techniques for this algorithm on GPUs. Compared to the sequential algorithm, our techniques achieve 6–160X speedup for a range of MCNC and GSRC benchmarks, while delivering comparable or better solution quality.

...read moreread less

Journal Article•DOI•

D-optimal minimax design criterion for two-level fractional factorial designs

[...]

Michael J. Wilmut¹, Julie Zhou¹•Institutions (1)

University of Victoria¹

01 Jan 2011-Journal of Statistical Planning and Inference

TL;DR: In this paper, a D-optimal minimax design criterion is proposed to construct two-level fractional factorial designs, which can be used to estimate a linear model with main effects and some specified interactions.

...read moreread less

The Universal Transactional Memory Construction

[...]

Jons-Tobias Wamhoff¹, Christof Fetzer¹•Institutions (1)

Dresden University of Technology¹

01 Jan 2011

TL;DR: A variant of the universal construction that keeps a bounded state, provides wait-free parallel processing, tolerates thread crashes, and handles non-terminating operations is introduced.

...read moreread less

Abstract: The universal construction shows how to convert a sequential algorithm into a concurrent wait-free algorithm. We introduce a variant of this construction that (1) keeps a bounded state, (2) provides wait-free parallel processing, (3) tolerates thread crashes, and (4) handles non-terminating operations. The foundation of this construction is a wait-free transactional memory that is capable of isolating crash failures and non-termination failures.

...read moreread less

Journal Article•DOI•

Computer simulation of optical wave propagation with the use of parallel programming

[...]

P. A. Konyaev¹, E. A. Tartakovskii¹, G. A. Filimonov¹•Institutions (1)

Russian Academy of Sciences¹

06 Oct 2011-Atmospheric and Oceanic Optics

TL;DR: Two parallel algorithms for numerical simulation of optical wave propagation have been constructed and it is shown that the parallel algorithms have a significant speed advantage (by tens of times) over the common sequential algorithm; and the larger the grids in a computation task, the more significant the advantage.

...read moreread less

Abstract: Methods and peculiarities of parallel algorithms for numerical simulation of optical wave propagation are considered. A scalar parabolic equation for the complex amplitude of monochromatic-wave field was solved numerically using the Fourier transform method for homogeneous media and split-step Fourier method for inhomogeneous media. Two parallel algorithms have been constructed—with the use of OpenMP technology with the MKL library for Intel multicore processors and CUDA technology for NVIDIA graphics accelerators. Speed comparison of these algorithms with each other and with a conventional sequential two-dimensional algorithm from the FFTW library is carried out by calculating the average number of test task solutions per second. It is shown that the parallel algorithms have a significant speed advantage (by tens of times) over the common sequential algorithm; and the larger the grids in a computation task, the more significant the advantage. Comparison of the above parallel algorithms shows the following: the approach based on the OpenMP technology holds the lead for grids of up to 1024 × 1024 in size, while the approach using CUDA technology was faster for large grids (from 1024 × 1024 or larger). The results are discussed, and recommendations on switching from sequential algorithms to the parallel ones are given.

...read moreread less

Journal Article•DOI•

Parallel Scatter Search Algorithms for Exam Timetabling

[...]

Nashat Mansour¹, Ghia Sleiman-Haidar¹•Institutions (1)

Lebanese American University¹

01 Jul 2011-International Journal of Applied Metaheuristic Computing

TL;DR: Empirical results show that the proposed parallel scatter search algorithms yield good speed-up and improve solution quality because they explore larger parts of the search space within reasonable time, in contrast with the sequential algorithm.

...read moreread less

Abstract: University exam timetabling refers to scheduling exams into predefined days, time periods and rooms, given a set of constraints. Exam timetabling is a computationally intractable optimization problem, which requires heuristic techniques for producing adequate solutions within reasonable execution time. For large numbers of exams and students, sequential algorithms are likely to be time consuming. This paper presents parallel scatter search meta-heuristic algorithms for producing good sub-optimal exam timetables in a reasonable time. Scatter search is a population-based approach that generates solutions over a number of iterations and aims to combine diversification and search intensification. The authors propose parallel scatter search algorithms that are based on distributing the population of candidate solutions over a number of processors in a PC cluster environment. The main components of scatter search are computed in parallel and efficient communication techniques are employed. Empirical results show that the proposed parallel scatter search algorithms yield good speed-up. Also, they show that parallel scatter search algorithms improve solution quality because they explore larger parts of the search space within reasonable time, in contrast with the sequential algorithm.

...read moreread less

Journal Article•DOI•

State-preserving nonlinear model reduction procedure

[...]

Yunfei Chu¹, Mitchell Serpas¹, Juergen Hahn¹•Institutions (1)

Texas A&M University¹

01 Sep 2011-Chemical Engineering Science

TL;DR: A new reduction technique, which preserves a non-prescribed subset of the original state variables in the reduced model, is presented in this work, derived from the Petrov–Galerkin projection by adding constraints on the projection matrix.

...read moreread less

Journal Article•DOI•

Massively parallel acceleration of a document-similarity classifier to detect web attacks

[...]

Craig D. Ulmer¹, Maya Gokhale², Brian Gallagher², Philip Top², Tina Eliassi-Rad² - Show less +1 more•Institutions (2)

Sandia National Laboratories¹, Lawrence Livermore National Laboratory²

01 Feb 2011-Journal of Parallel and Distributed Computing

TL;DR: This paper describes the approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric to two massively multi-core hardware platforms.

...read moreread less

Book Chapter•DOI•

Concurrent computation of differential morphological profiles on giga-pixel images

[...]

Michael H. F. Wilkinson¹, Pierre Soille², Martino Pesaresi², Georgios K. Ouzounis²•Institutions (2)

University of Groningen¹, Institute for the Protection and Security of the Citizen²

06 Jul 2011

TL;DR: An efficient parallel algorithm for reconstruction from markers, and multi-scale analysis through differential morphological profiles, which are top-hat scale spaces based on openings and closings by reconstruction, which provides speed gain through parallelism and more efficient re-use of previously computed data.

...read moreread less

Abstract: In this paper we provide an efficient parallel algorithm for reconstruction from markers, and multi-scale analysis through differential morphological profiles, which are top-hat scale spaces based on openings and closings by reconstruction. The new algorithms provide speed gain in two ways: (i) through parallelism, and (ii) through more efficient re-use of previously computed data. The best version of the algorithm provided a 17× speed-up on 24 cores, over computation of the same algorithm on a single core. Compared to the basic method of repeated reconstructions by a sequential algorithm, a speed gain of 25.1 times was obtained.

...read moreread less

Book Chapter•DOI•

A Mutable Hardware Abstraction to Replace Threads

[...]

Sean Halle¹, Sean Halle², Sean Halle³, Albert Cohen²•Institutions (3)

Technical University of Berlin¹, École Normale Supérieure², University of California, Santa Cruz³

08 Sep 2011

TL;DR: An abstraction to alleviate the difficulty of programming with threads is proposed, which makes available a virtual time in which events in different program time-lines are sequentialized.

...read moreread less

Abstract: We propose an abstraction to alleviate the difficulty of programming with threads This abstraction is not directly usable by application programmers Instead, application-visible behavior is defined through a semantical plugin, and invoked via a language or library that uses the plugin The main benefit is that parallel language runtimes become simpler to implement, because they use sequential algorithms for the parallel semantics This is possible because the abstraction makes available a virtual time in which events in different program time-lines are sequentialized The parallel semantics relate events in different time-lines via relating the sequentialized versions within the virtual time-line

...read moreread less

Journal Article•DOI•

Volatility, Jumps and Predictability of Returns: a Sequential Analysis

[...]

Davide Raggi¹, Silvano Bordignon²•Institutions (2)

University of Bologna¹, University of Padua²

09 Jun 2011-Econometric Reviews

TL;DR: In this article, a sequential Monte Carlo algorithm is proposed to estimate a stochastic volatility model with leverage effects and non-constant conditional mean and jumps, which relies on the auxiliary particle filter algorithm mixed together with Markov Chain Monte Carlo (MCMC) methodology.

...read moreread less

Abstract: In this paper we propose a sequential Monte Carlo algorithm to estimate a stochastic volatility model with leverage effects and non constant conditional mean and jumps. We are interested in estimating the time invariant parameters and the non-observable dynamics involved in the model. Our idea relies on the auxiliary particle filter algorithm mixed together with Markov Chain Monte Carlo (MCMC) methodology. Adding an MCMC step to the auxiliary particle filter prevents numerical degeneracies in the sequential algorithm and allows sequential evaluation of the fixed parameters and the latent processes. Empirical evaluation on simulated and real data is presented to assess the performance of the algorithm.

...read moreread less

Posted Content•

Soft-Decision-Driven Channel Estimation for Pipelined Turbo Receivers

[...]

Daejung Yoon¹, Jaekyun Moon²•Institutions (2)

University of Minnesota¹, KAIST²

03 Apr 2011-arXiv: Information Theory

TL;DR: This paper develops a soft-decision-driven sequential algorithm geared to the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols that shows clear performance advantages relative to existing channel estimation techniques.

...read moreread less

Abstract: We consider channel estimation specific to turbo equalization for multiple-input multiple-output (MIMO) wireless communication. We develop a soft-decision-driven sequential algorithm geared to the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols. One interesting feature of the pipelined turbo equalizer is that multiple soft-decisions become available at various processing stages. A tricky issue is that these multiple decisions from different pipeline stages have varying levels of reliability. This paper establishes an effective strategy for the channel estimator to track the target channel, while dealing with observation sets with different qualities. The resulting algorithm is basically a linear sequential estimation algorithm and, as such, is Kalman-based in nature. The main difference here, however, is that the proposed algorithm employs puncturing on observation samples to effectively deal with the inherent correlation among the multiple demapper/decoder module outputs that cannot easily be removed by the traditional innovations approach. The proposed algorithm continuously monitors the quality of the feedback decisions and incorporates it in the channel estimation process. The proposed channel estimation scheme shows clear performance advantages relative to existing channel estimation techniques.

...read moreread less

Journal Article•DOI•

Soft-Decision-Driven Channel Estimation for Pipelined Turbo Receivers

[...]

Daejung Yoon¹, Jaekyun Moon²•Institutions (2)

University of Minnesota¹, KAIST²

27 Jun 2011-IEEE Transactions on Communications

TL;DR: In this article, a soft-decision-driven sequential channel estimation algorithm is proposed for the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols.

...read moreread less

Proceedings Article•DOI•

Parallel Saturation Based Model Checking

[...]

Andr´s Voros¹, Tam´s Szabo¹, Attila J´mbor¹, D´niel Darvas¹, Akos Horv´th¹, Tam´s Bartha - Show less +2 more•Institutions (1)

Budapest University of Technology and Economics¹

06 Jul 2011

TL;DR: Improvements to an existing parallel model checking algorithm are proposed and the resulting new algorithm has better scalability and performance than both the former parallel approach and the sequential algorithm.

...read moreread less

Abstract: Formal verification is becoming a fundamental step of safety-critical and model-based software development. As part of the verification process, model checking is one of the current advanced techniques to analyze the behavior of a system. In this paper, we examine an existing parallel model checking algorithm and we propose improvements to eliminate some computational bottlenecks. Our measurements show that the resulting new algorithm has better scalability and performance than both the former parallel approach and the sequential algorithm.

...read moreread less