scispace - formally typeset
Search or ask a question

Showing papers on "Sequential algorithm published in 2012"


Proceedings ArticleDOI
25 Jun 2012
TL;DR: It is shown that for any graph, and for a random ordering of the vertices, the dependence length of the sequential greedy MIS algorithm is polylogarithmic (O(log^2 n) with high probability).
Abstract: The greedy sequential algorithm for maximal independent set (MIS) loops over the vertices in an arbitrary order adding a vertex to the resulting set if and only if no previous neighboring vertex has been added. In this loop, as in many sequential loops, each iterate will only depend on a subset of the previous iterates (i.e. knowing that any one of a vertex's previous neighbors is in the MIS, or knowing that it has no previous neighbors, is sufficient to decide its fate one way or the other). This leads to a dependence structure among the iterates. If this structure is shallow then running the iterates in parallel while respecting the dependencies can lead to an efficient parallel implementation mimicking the sequential algorithm.In this paper, we show that for any graph, and for a random ordering of the vertices, the dependence length of the sequential greedy MIS algorithm is polylogarithmic (O(log^2 n) with high probability). Our results extend previous results that show polylogarithmic bounds only for random graphs. We show similar results for greedy maximal matching (MM). For both problems we describe simple linear-work parallel algorithms based on the approach. The algorithms allow for a smooth tradeoff between more parallelism and reduced work, but always return the same result as the sequential greedy algorithms. We present experimental results that demonstrate efficiency and the tradeoff between work and parallelism.

100 citations


Proceedings ArticleDOI
21 May 2012
TL;DR: SAHAD is the first such Hadoop based subgraph/subtree analysis algorithm, and performs significantly better than prior approaches for very large graphs and templates, and is also amenable to running quite easily on Amazon EC2, without needs for any system level optimization.
Abstract: Relational sub graph analysis, e.g. finding labeled sub graphs in a network, which are isomorphic to a template, is a key problem in many graph related applications. It is computationally challenging for large networks and complex templates. In this paper, we develop SAHAD, an algorithm for relational sub graph analysis using Hadoop, in which the sub graph is in the form of a tree. SAHAD is able to solve a variety of problems closely related with sub graph isomorphism, including counting labeled/unlabeled sub graphs, finding supervised motifs, and computing graph let frequency distribution. We prove that the worst case work complexity for SAHAD is asymptotically very close to that of the best sequential algorithm. On a mid-size cluster with about 40 compute nodes, SAHAD scales to networks with up to 9 million nodes and a quarter billion edges, and templates with up to 12 nodes. To the best of our knowledge, SAHAD is the first such Hadoop based subgraph/subtree analysis algorithm, and performs significantly better than prior approaches for very large graphs and templates. Another unique aspect is that SAHAD is also amenable to running quite easily on Amazon EC2, without needs for any system level optimization.

82 citations


01 Jan 2012
TL;DR: In this paper, the authors proposed a massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges on a massively multithreaded Cray XMT.
Abstract: Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore's sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output's modularity compares well with the standard sequential algorithms.

66 citations


Posted Content
TL;DR: In this paper, it was shown that for any graph, and for a random ordering of the vertices, the dependence depth of the sequential greedy MIS algorithm is polylogarithmic (O(log 2 n) with high probability.
Abstract: The greedy sequential algorithm for maximal independent set (MIS) loops over the vertices in arbitrary order adding a vertex to the resulting set if and only if no previous neighboring vertex has been added. In this loop, as in many sequential loops, each iterate will only depend directly on a subset of the previous iterates (i.e. knowing that any one of a vertices neighbors is in the MIS or knowing that it has no previous neighbors is sufficient to decide its fate). This leads to a dependence structure among the iterates. If this structure is shallow then running the iterates in parallel while respecting the dependencies can lead to an efficient parallel implementation mimicking the sequential algorithm. In this paper, we show that for any graph, and for a random ordering of the vertices, the dependence depth of the sequential greedy MIS algorithm is polylogarithmic (O(log^2 n) with high probability). Our results extend previous results that show polylogarithmic bounds only for random graphs. We show similar results for a greedy maximal matching (MM). For both problems we describe simple linear work parallel algorithms based on the approach. The algorithms allow for a smooth tradeoff between more parallelism and reduced work, but always return the same result as the sequential greedy algorithms. We present experimental results that demonstrate efficiency and the tradeoff between work and parallelism.

50 citations


Journal ArticleDOI
TL;DR: This work devise secure distributed algorithms that allow the different sites to obtain a k-anonymized and ℓ-diverse view of the union of their databases, without disclosing sensitive information.
Abstract: We consider the problem of computing efficient anonymizations of partitioned databases. Given a database that is partitioned between several sites, either horizontally or vertically, we devise secure distributed algorithms that allow the different sites to obtain a k-anonymized and e-diverse view of the union of their databases, without disclosing sensitive information. Our algorithms are based on the sequential algorithm lGoldberger and Tassa 2010r that offers anonymizations with utility that is significantly better than other anonymization algorithms, and in particular those that were implemented so far in the distributed setting. Our algorithms can apply to different generalization techniques and utility measures and to any number of sites. While previous distributed algorithms depend on costly cryptographic primitives, the cryptographic assumptions of our solution are surprisingly minimal.

45 citations


Journal ArticleDOI
TL;DR: The algorithm incorporates an unsplit second-order Godunov scheme that provides accurate resolution of sharp fronts and is implemented within a block structured adaptive mesh refinement (AMR) framework that allows grids to dynamically adapt to features of the flow and enables efficient parallelization of the algorithm.
Abstract: We describe a second-order accurate sequential algorithm for solving two-phase multicomponent flow in porous media. The algorithm incorporates an unsplit second-order Godunov scheme that provides accurate resolution of sharp fronts. The method is implemented within a block structured adaptive mesh refinement (AMR) framework that allows grids to dynamically adapt to features of the flow and enables efficient parallelization of the algorithm. We demonstrate the second-order convergence rate of the algorithm and the accuracy of the AMR solutions compared to uniform fine-grid solutions. The algorithm is then used to simulate the leakage of gas from a Liquified Petroleum Gas (LPG) storage cavern, demonstrating its capability to capture complex behavior of the resulting flow. We further examine differences resulting from using different relative permeability functions.

40 citations



Proceedings ArticleDOI
25 Mar 2012
TL;DR: It is confirmed that it would be more appropriate to use the point processes induced by the Random Sequential Algorithm in order to describe such point patterns, and it is shown that this point process is in fact as tractable as the Matérn model.
Abstract: In order to represent the set of transmitters simultaneously accessing a wireless network using carrier sensing based medium access protocols, one needs tractable point processes satisfying certain exclusion rules. Such exclusion rules forbid the use of Poisson point processes within this context. It has been observed that Matern point processes, which have been advocated in the past because of their exclusion based definition, are rather conservative within this context. The present paper confirms that it would be more appropriate to use the point processes induced by the Random Sequential Algorithm in order to describe such point patterns. It also shows that this point process is in fact as tractable as the Matern model. The generating functional of this point process is shown to be the solution of a differential equation, which is the main new mathematical result of the paper. In comparison, no equivalent result is known for the Matern hard-core model. Using this differential equation, a new heuristic method is proposed, which leads to simple bounds and estimates for several important network performance metrics. These bounds and estimates are evaluated by Monte Carlo simulation.

33 citations


Journal ArticleDOI
12 Aug 2012
TL;DR: In this paper, an objective-oriented sequential sampling approach to robust design with consideration of both noise variable uncertainty and interpolation uncertainty is presented. But, the authors do not consider the effect of constraints on the number of simulation runs.
Abstract: Sequential sampling strategies have been developed for managing complexity when using computationally expensive computer simulations in engineering design. However, much of the literature has focused on objective-oriented sequential sampling methods for deterministic optimization. These methods cannot be directly applied to robust design which must account for uncontrollable variations in certain input variables (i.e., noise variables). Obtaining a robust design that is insensitive to variations in the noise variables is more challenging. Even though methods exist for sequential sampling in design under uncertainty, the majority of the existing literature does not systematically take into account the interpolation uncertainty that results from limitations on the number of simulation runs, the effect of which is inherently more severe than in deterministic design. In this paper, we develop a systematic objective-oriented sequential sampling approach to robust design with consideration of both noise variable uncertainty and interpolation uncertainty. The method uses Gaussian processes to model the costly simulator and quantify the interpolation uncertainty within a robust design objective. We examine several criteria, including our own proposed criteria, for sampling the design and noise variables and provide insight into their performance behaviors. We show that for both of the examples considered in this paper the proposed sequential algorithm is more efficient in finding the robust design solution than a one-shot space filling design.Copyright © 2012 by ASME

31 citations


Book ChapterDOI
Yuri Gurevich1
21 Jan 2012
TL;DR: It is attempted to put the title problem and the Church-Turing thesis into a proper perspective and to clarify some common misconceptions related to Turing's analysis of computation.
Abstract: We attempt to put the title problem and the Church-Turing thesis into a proper perspective and to clarify some common misconceptions related to Turing's analysis of computation. We examine two approaches to the title problem, one well-known among philosophers and another among logicians.

30 citations


Journal ArticleDOI
TL;DR: In this article, a link between the recent literature which finds optimal secondary looks and optimal route planning software is provided, for the first time, to enable current remote minehunting systems to achieve secondary paths minimizing the total distance to be traveled and satisfying all motion and imaging constraints.
Abstract: When conducting remote mine-hunting operations with a sidescan-sonar-equipped vehicle, a lawn-mowing search pattern is standard if no prior information on potential target locations is available. Upon completion of this initial search, a list of contacts is obtained. The overall classification performance can be significantly improved by revisiting these contacts to collect additional looks. This paper provides, for the first time, a link between the recent literature which finds optimal secondary looks and optimal route planning software. Automated planning algorithms are needed to generate multiaspect routes to improve the performance of mine-hunting systems and increase the capability of navies to efficiently clear potential mine fields. This paper introduces two new numerical techniques designed to enable current remote mine-hunting systems to achieve secondary paths minimizing the total distance to be traveled and satisfying all motion and imaging constraints. The first "local" approach is based on a sequential algorithm dealing with more tractable subproblems, while the second is "global" and based on simulated annealing. These numerical techniques are applied to two test sites created for the Mongoose sea trial held at the 2007 Autonomous Underwater Vehicle (AUV) Fest, Panama City, FL. Highly satisfactory planning solutions are obtained.

Proceedings ArticleDOI
09 Mar 2012
TL;DR: A parallel LU factorization (with partial pivoting) algorithm on shared-memory computers with multi-core CPUs, to accelerate circuit simulation and find a predictive method to decide whether a matrix should use parallel or sequential algorithm.
Abstract: Sparse matrix solver has become the bottleneck in SPICE simulator. It is difficult to parallelize the solver because of the high data-dependency during the numerical LU factorization. This paper proposes a parallel LU factorization (with partial pivoting) algorithm on shared-memory computers with multi-core CPUs, to accelerate circuit simulation. Since not every matrix is suitable for parallel algorithm, a predictive method is proposed to decide whether a matrix should use parallel or sequential algorithm. The experimental results on 35 circuit matrices reveal that the developed algorithm achieves speedups of 2.11×∼8.38× (on geometric-average), compared with KLU, with 1∼8 threads, on the matrices which are suitable for parallel algorithm. Our solver can be downloaded from http://nicslu.weebly.com.

Proceedings ArticleDOI
06 Nov 2012
TL;DR: This work adapts an existing serial algorithm into a GPU parallel algorithm, resulting in substantial speed-ups, in some cases up to 11× faster, and increasing the size of the data that can be handled in reasonable amount of time.
Abstract: Given a trajectory T we study the problem of reporting all subtrajectory clusters of T. To measure similarity between curves we choose the Frechet distance. We show how the existing sequential algorithm can be modified exploiting parallel algorithms together with the GPU computational power showing substantial speed-ups.This is to the best of our knowledge not only the first GPU implementation of a subtrajectory clustering algorithm but also the first implementation using the continuous Frechet distance, instead of the discrete Frechet distance.

Proceedings ArticleDOI
21 May 2012
TL;DR: A highly-parallel reuse distance analysis algorithm (HP-RDA) to speedup the process using the SPMD execution model of GPUs by proposing a hybrid data structure of hash table and local arrays to flatten the traditional tree representation of memory access traces.
Abstract: Reuse distance analysis is a runtime approach that has been widely used to accurately model the memory system behavior of applications. However, traditional reuse distance analysis algorithms use tree-based data structures and are hard to parallelize, missing the tremendous computing power of modern architectures such as the emerging GPUs. This paper presents a highly-parallel reuse distance analysis algorithm (HP-RDA) to speedup the process using the SPMD execution model of GPUs. In particular, we propose a hybrid data structure of hash table and local arrays to flatten the traditional tree representation of memory access traces. Further, we use a probabilistic model to correct any loss of precision from a straightforward parallelization of the original sequential algorithm. Our experimental results show that using an NVIDIA GPU, our algorithm achieves a factor of 20 speedup over the traditional sequential algorithm with less than 1% loss in precision.

Proceedings ArticleDOI
21 May 2012
TL;DR: A new parallel programming model based on CUDA is presented and the best parallel algorithm on GPU achieves speedup of 40-80 and the OPT-block-thread parallel algorithm can take full advantage of the powerful parallel capability of GPU.
Abstract: Approximate string matching using the k-mismatch technique has been widely applied to many fields such as virus detection and computational biology. The traditional parallel algorithms are all based on multiple processors, which have high costs of computing and communication. GPU has high parallel processing capability, low cost of computing, and less time of communication. To the best of our knowledge, there is no any parallel algorithm for approximate string matching with k mismatches on GPU. With a new parallel programming model based on CUDA, we present three parallel algorithms and their implementations on GPU, namely, the thread parallel algorithm, the block-thread parallel algorithm, and the OPT-block-thread parallel algorithm. The OPT-block thread parallel algorithm can take full advantage of the powerful parallel capability of GPU. Furthermore, it balances the load among the threads and optimizes the execution time with the memory model of GPU. Experimental results show that compared with the traditional sequential algorithm on CPU, our best parallel algorithm on GPU in this paper achieves speedup of 40¨C80.

Posted Content
TL;DR: In this article, a constant-time linear-time algorithm for the longest path problem in meshes is presented. But this algorithm can be considered as an improvement of [13] and [14] and it can be run on every parallel machine.
Abstract: In this paper, first we give a sequential linear-time algorithm for the longest path problem in meshes. This algorithm can be considered as an improvement of [13]. Then based on this sequential algorithm, we present a constant-time parallel algorithm for the problem which can be run on every parallel machine.

Journal ArticleDOI
TL;DR: This letter proposes a new criterion which is related to the estimated endmember abundances and uses the sequential forward floating search method as the substitute of SFS, which can improve the performance of all the sequential endmember extraction algorithms.
Abstract: Endmember extraction is an important step in spectral mixture analysis when endmembers are unknown. Endmembers are usually assumed to be pure pixels present in an image scene. Under this circumstance, endmember extraction is to find the most distinctive pixels. To make the searching process more efficient, the sequential forward search (SFS) method is generally used, where the next endmember is determined with a certain criterion based on the currently extracted endmember set. This letter proposes a new criterion which is related to the estimated endmember abundances. Compared to other sequential endmember extraction algorithms, the proposed method can find all the different endmembers faster. This letter also proposes to use the sequential forward floating search method as the substitute of SFS, which can improve the performance of all the sequential endmember extraction algorithms.

Proceedings ArticleDOI
26 Sep 2012
TL;DR: In this article, the authors study and compare two different approaches, relying on distributed and cloud frameworks, respectively, for symbolic state-space exploration of real-time systems specified by Petri Nets.
Abstract: The growing availability of distributed and cloud computing frameworks makes it possible to face complex computational problems in a more effective and convenient way. A notable example is state-space exploration of discrete-event systems specified in a formal way. The exponential complexity of this task is a major limitation to the usage of consolidated analysis techniques and tools. Several techniques for addressing the state space explosion problem within this context have been studied in the literature. One of these is to use distributed memory and computation to deal with the state space explosion problem. In this paper we study and compare two different approaches, relying on distributed and cloud frameworks, respectively. These approaches were designed and implemented following the same computational schema, a sort of map a fold. They are applied on symbolic state-space exploration of real-time systems specified by (a timed extension of) Petri Nets, by re-adapting a sequential algorithm implemented as a command-line Java tool. The outcome of several tests performed on a benchmarking specification are presented, thus showing the convenience of distributed approaches.

Proceedings ArticleDOI
27 Jun 2012
TL;DR: The limitations of this strategy are shown and a more general setting where the candidate solution may violate the specifications for a reduced number of elements of the validation set is presented.
Abstract: In this paper, we present a randomized strategy for design under uncertainty. The main contribution is to provide a general class of sequential algorithms which satisfy the required specifications using probabilistic validation. At each iteration of the sequential algorithm, a candidate solution is probabilistically validated by means of a set of randomly generated uncertainty samples. The idea of validation sets has been used in some randomized algorithms when a given candidate solution is classified as probabilistic solution when it satisfies all the constraints on the validation set. In this paper, we show the limitations of this strategy and present a more general setting where the candidate solution may violate the specifications for a reduced number of elements of the validation set. This generalized scheme exhibits some advantages, in particular in terms of obtaining a probabilistic solution.

Posted Content
TL;DR: A sequential algorithm adapted from Del Moral et al. (2012) which runs twice as fast as traditional ABC algorithms and is calibrated to minimize the number of simulations from the model.
Abstract: Approximate Bayesian Computation has been successfully used in population genetics models to bypass the calculation of the likelihood. These algorithms provide an accurate estimator by comparing the observed dataset to a sample of datasets simulated from the model. Although parallelization is easily achieved, computation times for assuring a suitable approximation quality of the posterior distribution are still long. To alleviate this issue, we propose a sequential algorithm adapted from Del Moral et al. (2012) which runs twice as fast as traditional ABC algorithms. Its parameters are calibrated to minimize the number of simulations from the model.

Proceedings ArticleDOI
15 Mar 2012
TL;DR: This paper adopted the features of the Particle Swarm Optimization (PSO) algorithm with Smallest Position Value (SPV) rule and designed a sequential algorithm to solve DNA sequence assembly and named it DSAPSO.
Abstract: DNA sequence assembly is one of the most popular problems in molecular biology. DNA sequence assembly refers to aligning and merging fragments of a much longer DNA sequence in order to reconstruct the original sequence. There are many solutions provided by researchers. In this paper we have adopted the features of the Particle Swarm Optimization (PSO) algorithm with Smallest Position Value (SPV) rule and designed a sequential algorithm to solve DNA sequence assembly and named it DSAPSO. DNA sequence assembly problem is a discrete optimization problem, so there is need of discrete optimization algorithm to solve it. We use continuous version of PSO with SPV rule to solve the DNA sequence assembly problem. SPV rule transforms continuous version of PSO to discrete version. To check the efficiency of proposed methodology the results of DSAPSO is compared with the results of genetic algorithm (GA).

Proceedings ArticleDOI
10 Sep 2012
TL;DR: This paper proposes integrated maximum flow algorithms for the generalized optimal response time retrieval problem using Ford-Fulkerson method and Push-relabel algorithm and shows results that show the integrated algorithm runs up to 2.5X faster than the black box version.
Abstract: Efficient retrieval of replicated data from multiple disks is a challenging problem. Traditional retrieval techniques assume that replication is done at a single site using homogeneous disk arrays having no initial load or network delay. Recently, generalized retrieval algorithms are proposed to cover heterogeneous disk arrays, initial loads, and network delays. Generalized retrieval algorithms achieve the optimal response time retrieval schedule by performing multiple runs of a maximum flow algorithm. Since the maximum flow algorithm is used as a black box technique, flow values of the previous runs cannot be conserved to speed up the process. In this paper, we propose integrated maximum flow algorithms for the generalized optimal response time retrieval problem. Our first algorithm uses Ford-Fulkerson method and the second algorithm uses Push-relabel algorithm. Besides the sequential implementations, a multi-threaded version of the push-relabel algorithm is also implemented. Proposed algorithms are investigated using various replication schemes, query types, query loads, disk specifications, and system delays. Experimental results show that the sequential integrated push-relabel algorithm runs up to 2.5X faster than the black box version. Furthermore, parallel integrated push-relabel implementation achieves up to 1.7X speed up (~1.2X on average) over the sequential algorithm using two threads, which makes the integrated algorithm up to 4.25X (~3X on average) faster than its black box counterpart.

Journal ArticleDOI
TL;DR: A hierarchical sequential algorithm with progressive data transmission considerably reduces bandwidth requirements in cloud-based detection systems.
Abstract: Background: In the concept of cloud-computing-based systems, various authorized users have secure access to patient records from a number of care delivery organizations from any location. This creates a growing need for remote visualization, advanced image processing, state-of-the-art image analysis, and computer aided diagnosis. Objectives: This paper proposes a system of algorithms for automatic detection of anatomical landmarks in 3D volumes in the cloud computing environment. The system addresses the inherent problem of limited bandwidth between a (thin) client, data center, and data analysis server. Methods: The problem of limited bandwidth is solved by a hierarchical sequential detection algorithm that obtains data by progressively transmitting only image regions required for processing. The client sends a request to detect a set of landmarks for region visualization or further analysis. The algorithm running on the data analysis server obtains a coarse level image from the data center and generates landmark location candidates. The candidates are then used to obtain image neighborhood regions at a finer resolution level for further detection. This way, the landmark locations are hierarchically and sequentially detected and refined. Results: Only image regions surrounding landmark location candidates need to be trans- mitted during detection. Furthermore, the image regions are lossy compressed with JPEG 2000. Together, these properties amount to at least 30 times bandwidth reduction while achieving similar accuracy when compared to an algorithm using the original data. Conclusions: The hierarchical sequential algorithm with progressive data transmission considerably reduces bandwidth requirements in cloud-based detection systems.

Posted Content
TL;DR: Many Control Systems are indeed Software Based Control Systems, i.e. control systems whose controller consists of control software running on a microcontroller device, which motivates investigation on Formal Model Based Design approaches for automatic synthesis of controlSoftware.
Abstract: Many Control Systems are indeed Software Based Control Systems, i.e. control systems whose controller consists of control software running on a microcontroller device. This motivates investigation on Formal Model Based Design approaches for automatic synthesis of control software. Available algorithms and tools (e.g., QKS) may require weeks or even months of computation to synthesize control software for large-size systems. This motivates search for parallel algorithms for control software synthesis. In this paper, we present a Map-Reduce style parallel algorithm for control software synthesis when the controlled system (plant) is modeled as discrete time linear hybrid system. Furthermore we present an MPI-based implementation PQKS of our algorithm. To the best of our knowledge, this is the first parallel approach for control software synthesis. We experimentally show effectiveness of PQKS on two classical control synthesis problems: the inverted pendulum and the multi-input buck DC/DC converter. Experiments show that PQKS efficiency is above 65%. As an example, PQKS requires about 16 hours to complete the synthesis of control software for the pendulum on a cluster with 60 processors, instead of the 25 days needed by the sequential algorithm in QKS.

Proceedings ArticleDOI
10 Jun 2012
TL;DR: An application of fuzzy modeling to the problem of telecommunications data prediction is proposed and real world telecommunications data are used in order to highlight the characteristics of the proposed forecaster and to provide a comparative analysis with well-established forecasting models.
Abstract: An application of fuzzy modeling to the problem of telecommunications data prediction is proposed in this paper. The model building process is a two-stage sequential algorithm, based on the Orthogonal Least Squares (OLS) technique. Particularly, the OLS is first employed to partition the input space and determine the number of fuzzy rules and the premise parameters. In the sequel, a second orthogonal estimator determines the input terms which should be included in the consequent part of each fuzzy rule and calculate their parameters. Input selection is automatically performed, given a large input candidate set. Real world telecommunications data are used in order to highlight the characteristics of the proposed forecaster and to provide a comparative analysis with well-established forecasting models.

Proceedings Article
01 Jan 2012
TL;DR: This paper proposes a parallel greedy algorithm based on a depth-first branch-and-bound search strategy that efficiently covers a larger portion of the search space by exchanging information about improvements, and finds better solutions for more complicated disturbances such as infrastructure problems.
Abstract: Railways are an important part of the infrastructure in most countries. As the railway networks become more and more saturated, even small traffic disturbances can propagate and have severe consequences. Therefore, efficient re-scheduling support for the traffic managers is needed. In this paper, the train real-time re-scheduling problem is studied in order to minimize the total delay, subject to a set of safety and operational constraints. We propose a parallel greedy algorithm based on a depth-first branch-and-bound search strategy. A number of comprehensive numerical experiments are conducted to compare the parallel implementation to the sequential implementation of the same algorithm in terms of the quality of the solution and the number of nodes evaluated. The comparison is based on 20 disturbance scenarios from three different types of disturbances. Our results show that the parallel algorithm; (i) efficiently covers a larger portion of the search space by exchanging information about improvements, and (ii) finds better solutions for more complicated disturbances such as infrastructure problems. Our results show that the parallel implementation significantly improves the solution for 5 out of 20 disturbance scenarios, as compared to the sequential algorithm.

Journal ArticleDOI
TL;DR: In this article, a modified Piyavskii's algorithm (nC) was proposed to maximize a univariate differentiable function by iteratively constructing an upper bound of a piece-wise concave function of f and evaluating f at a point where Φ reaches its maximum.
Abstract: Piyavskii’s algorithm maximizes a univariate function satisfying a Lipschitz condition. We propose a modified Piyavskii’s sequential algorithm which maximizes a univariate differentiable function f by iteratively constructing an upper bounding piece-wise concave function Φ of f and evaluating f at a point where Φ reaches its maximum. We compare the numbers of iterations needed by the modified Piyavskii’s algorithm (nC) to obtain a bounding piece-wise concave function Φ whose maximum is within e of the globally optimal value foptwith that required by the reference sequential algorithm (nref). The main result is that nC≤ 2nref + 1 and this bound is sharp. We also show that the number of iterations needed by modified Piyavskii’s algorithm to obtain a globally e-optimal value together with a corresponding point (nB) satisfies nBnref + 1 Lower and upper bounds for nref are obtained as functions of f(x) , e, M1 and M0 where M0 is a constant defined by M0 = supx∈[a,b] - f’’(x) and M1 ≥ M0 is an evaluation of M0.

Proceedings ArticleDOI
01 Jul 2012
TL;DR: This version combined with the use of Hyperspectral Image Reduction for Endmember Extraction technique (HIREE) provides an algorithm that is 8 times faster than the original N-FINDER sequential algorithm.
Abstract: The N-FINDER algorithm is widely used for endmember extraction from hyperspectral images. One of the disadvantages of N-FINDER is that its sequential implementations have long run times due to their relatively large computational complexity. A fast parallel version of N-FINDER is developed in this paper. This version combined with the use of Hyperspectral Image Reduction for Endmember Extraction technique (HIREE) provides an algorithm that is 8 times faster than the original N-FINDER sequential algorithm.

Proceedings ArticleDOI
25 Feb 2012
TL;DR: This work presents an algorithm that asymptotically reduces communication, and it is shown that it indeed performs well in practice, and demonstrates that avoiding communication improves runtime even at the expense of extra arithmetic.
Abstract: The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present both theoretical and practical results for tridiagonalizing a symmetric band matrix: we present an algorithm that asymptotically reduces communication, and we show that it indeed performs well in practice.The tridiagonalization of a symmetric band matrix is a key kernel in solving the symmetric eigenvalue problem for both full and band matrices. In order to preserve sparsity, tridiagonalization routines use annihilate-and-chase procedures that previously have suffered from poor data locality. We improve data locality by reorganizing the computation, asymptotically reducing communication costs compared to existing algorithms. Our sequential implementation demonstrates that avoiding communication improves runtime even at the expense of extra arithmetic: we observe a 2x speedup over Intel MKL while doing 43% more floating point operations.Our parallel implementation targets shared-memory multicore platforms. It uses pipelined parallelism and a static scheduler while retaining the locality properties of the sequential algorithm. Due to lightweight synchronization and effective data reuse, we see 9.5x scaling over our serial code and up to 6x speedup over the PLASMA library, comparing parallel performance on a ten-core processor.

Proceedings ArticleDOI
10 May 2012
TL;DR: A parallel version of a sequential algorithm “Partition” is proposed, which is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules.
Abstract: Subsequently the expansion of the physical supports storage and the needs ceaseless to accumulate several data, the sequential algorithms of associations' rules research proved to be ineffective. Thus the introduction of the new parallel versions is imperative. We propose in this paper, a parallel version of a sequential algorithm “Partition”. This last is fundamentally different from the other sequential algorithms, because it scans the data base only twice to generate the significant association rules. By consequence, the parallel approach does not require much communication between the sites. The proposed approach was implemented for an experimental study. The obtained results, shows a great reduction in execution time compared to the sequential version and Count Distributed algorithm.