scispace - formally typeset
Search or ask a question

Showing papers on "Sequential algorithm published in 2011"


Journal ArticleDOI
TL;DR: This work introduces the Routing, Modulation Level and Spectrum Allocation (RMLSA) problem, as opposed to the typical Routing and Wavelength Assignment (RWA) problem of traditional WDM networks, proves that it is also NP-complete and presents various algorithms to solve it.
Abstract: Orthogonal Frequency Division Multiplexing (OFDM) has recently been proposed as a modulation technique for optical networks, because of its good spectral efficiency, flexibility, and tolerance to impairments. We consider the planning problem of an OFDM optical network, where we are given a traffic matrix that includes the requested transmission rates of the connections to be served. Connections are provisioned for their requested rate by elastically allocating spectrum using a variable number of OFDM subcarriers and choosing an appropriate modulation level, taking into account the transmission distance. We introduce the Routing, Modulation Level and Spectrum Allocation (RMLSA) problem, as opposed to the typical Routing and Wavelength Assignment (RWA) problem of traditional WDM networks, prove that is also NP-complete and present various algorithms to solve it. We start by presenting an optimal ILP RMLSA algorithm that minimizes the spectrum used to serve the traffic matrix, and also present a decomposition method that breaks RMLSA into its two substituent subproblems, namely 1) routing and modulation level and 2) spectrum allocation (RML+SA), and solves them sequentially. We also propose a heuristic algorithm that serves connections one-by-one and use it to solve the planning problem by sequentially serving all the connections in the traffic matrix. In the sequential algorithm, we investigate two policies for defining the order in which connections are considered. We also use a simulated annealing meta-heuristic to obtain even better orderings. We examine the performance of the proposed algorithms through simulation experiments and evaluate the spectrum utilization benefits that can be obtained by utilizing OFDM elastic bandwidth allocation, when compared to a traditional WDM network.

732 citations


Journal ArticleDOI
TL;DR: An extension of a combinatorial characterization due to Erdős and Gallai is used to develop a sequential algorithm for generating a random labeled graph with a given degree sequence, which allows for surprisingly efficient sequential importance sampling.
Abstract: Random graphs with given degrees are a natural next step in complexity beyond the Erdős–Renyi model, yet the degree constraint greatly complicates simulation and estimation. We use an extension of a combinatorial characterization due to Erdős and Gallai to develop a sequential algorithm for generating a random labeled graph with a given degree sequence. The algorithm is easy to implement and allows for surprisingly efficient sequential importance sampling. The resulting probabilities are easily computed on the fly, allowing the user to reweight estimators appropriately, in contrast to some ad hoc approaches that generate graphs with the desired degrees but with completely unknown probabilities. Applications are given, including simulating an ecological network and estimating the number of graphs with a given degree sequence.

355 citations


Book
27 Aug 2011
TL;DR: This work presents a simple sequential algorithm for the maximum flow problem on a network with n nodes, m arcs, and integer arc capacities bounded by U and describes a parallel implementation that runs in On2 log U log p time in the PRAM model with EREW and uses only p processors.
Abstract: We present a simple sequential algorithm for the maximum flow problem on a network with n nodes, m arcs, and integer arc capacities bounded by U. Under the practical assumption that U is polynomially bounded in n, our algorithm runs in time Onm + n2 log n. This result improves the previous best bound of Onm logn2/m, obtained by Goldberg and Tarjan, by a factor of log n for networks that are both nonsparse and nondense without using any complex data structures. We also describe a parallel implementation of the algorithm that runs in On2 log U log p time in the PRAM model with EREW and uses only p processors where p = ⌈m/n⌉.

134 citations


Book
12 Sep 2011
TL;DR: The mapping of algorithms structured as depth-p nested FOR loops into special-purpose systolic VLSI linear arrays is addressed by using linear functions to transform the original sequential algorithms into a form suitable for parallel execution on linear arrays.
Abstract: The mapping of algorithms structured as depth-p nested FOR loops into special-purpose systolic VLSI linear arrays is addressed The mappings are done by using linear functions to transform the original sequential algorithms into a form suitable for parallel execution on linear arrays A feasible mapping is derived by identifying formal criteria to be satisfied by both the original sequential algorithm and the proposed transformation function The methodology is illustrated by synthesizing algorithms for matrix multiplication and a version of the Warshall-Floyd transitive closure algorithm >

116 citations


Proceedings ArticleDOI
21 Aug 2011
TL;DR: By combining simple but effective indexing and disk block accessing techniques, a sequential algorithm iOrca is developed that is up to an order- of-magnitude faster than the state-of-the-art.
Abstract: The problem of distance-based outlier detection is difficult to solve efficiently in very large datasets because of potential quadratic time complexity. We address this problem and develop sequential and distributed algorithms that are significantly more efficient than state-of-the-art methods while still guaranteeing the same outliers. By combining simple but effective indexing and disk block accessing techniques, we have developed a sequential algorithm iOrca that is up to an order-of-magnitude faster than the state-of-the-art. The indexing scheme is based on sorting the data points in order of increasing distance from a fixed reference point and then accessing those points based on this sorted order. To speed up the basic outlier detection technique, we develop two distributed algorithms (DOoR and iDOoR) for modern distributed multi-core clusters of machines, connected on a ring topology. The first algorithm passes data blocks from each machine around the ring, incrementally updating the nearest neighbors of the points passed. By maintaining a cutoff threshold, it is able to prune a large number of points in a distributed fashion. The second distributed algorithm extends this basic idea with the indexing scheme discussed earlier. In our experiments, both distributed algorithms exhibit significant improvements compared to the state-of-the-art distributed method [13].

89 citations


Journal ArticleDOI
Duhu Man1, Kenji Uda1, Hironobu Ueyama1, Yasuaki Ito1, Koji Nakano1 
TL;DR: A simple parallel algorithm for the EDM is developed and implemented and it achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system.
Abstract: Given a 2-D binary image of size n×n, Euclidean Distance Map (EDM) is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel. It is known that a sequential algorithm can compute the EDM in O(n2) and thus this algorithm is optimal. Also, work-time optimal parallel algorithms for shared memory model have been presented. However, the presented parallel algorithms are too complicated to implement in existing shared memory parallel machines. The main contribution of this paper is to develop a simple parallel algorithm for the EDM and implement it in two different parallel platforms: multicore processors and Graphics Processing Units (GPUs). We have implemented our parallel algorithm in a Linux server with four Intel hexad-core processors (Intel Xeon X7460 2.66GHz). We have also implemented it in the following two modern GPU systems, Tesla C1060 and GTX 480, respectively. The experimental results have shown that, for an input binary image with size of 9216×9216, our implementation in the multicore system achieves a speedup factor of 18 over the performance of a sequential algorithm using a single processor in the same system. Meanwhile, for the same input binary image, our implementation on the GPU achieves a speedup factor of 26 over the sequential algorithm implementation.

72 citations


Journal ArticleDOI
TL;DR: This paper presents a new algorithm for the general case of multiple LCS, i.e., finding an LCS of any number of strings, and its parallel realization, based on the dominant point approach and employs a fast divide-and-conquer technique to compute the dominant points.
Abstract: Finding the longest common subsequence (LCS) of multiple strings is an NP-hard problem, with many applications in the areas of bioinformatics and computational genomics. Although significant efforts have been made to address the problem and its special cases, the increasing complexity and size of biological data require more efficient methods applicable to an arbitrary number of strings. In this paper, we present a new algorithm for the general case of multiple LCS (or MLCS) problem, i.e., finding an LCS of any number of strings, and its parallel realization. The algorithm is based on the dominant point approach and employs a fast divide-and-conquer technique to compute the dominant points. When applied to a case of three strings, our algorithm demonstrates the same performance as the fastest existing MLCS algorithm designed for that specific case. When applied to more than three strings, our algorithm is significantly faster than the best existing sequential methods, reaching up to 2-3 orders of magnitude faster speed on large-size problems. Finally, we present an efficient parallel implementation of the algorithm. Evaluating the parallel algorithm on a benchmark set of both random and biological sequences reveals a near-linear speedup with respect to the sequential algorithm.

69 citations


Book ChapterDOI
11 Sep 2011
TL;DR: This work extends work on analyzing massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT.
Abstract: Tackling the current volume of graph-structured data requires parallel tools. We extend our work on analyzing such massive graph data with the first massively parallel algorithm for community detection that scales to current data sizes, scaling to graphs of over 122 million vertices and nearly 2 billion edges in under 7300 seconds on a massively multithreaded Cray XMT. Our algorithm achieves moderate parallel scalability without sacrificing sequential operational complexity. Community detection partitions a graph into subgraphs more densely connected within the subgraph than to the rest of the graph. We take an agglomerative approach similar to Clauset, Newman, and Moore's sequential algorithm, merging pairs of connected intermediate subgraphs to optimize different graph properties. Working in parallel opens new approaches to high performance. On smaller data sets, we find the output's modularity compares well with the standard sequential algorithms.

60 citations


Proceedings ArticleDOI
30 Nov 2011
TL;DR: This paper presents a GPU implementation of computing Euclidean Distance Map (EDM) with efficient memory access, and shows that, for an input binary image with size of $9216, the implementation can achieve a speedup factor of 52 over the sequential algorithm implementation.
Abstract: Recent Graphics Processing Units (GPUs), which have many processing units, can be used for general purpose parallel computation To utilize the powerful computing ability, GPUs are widely used for general purpose processing Since GPUs have very high memory bandwidth, the performance of GPUs greatly depends on memory access The main contribution of this paper is to present a GPU implementation of computing Euclidean Distance Map (EDM) with efficient memory access Given a 2-D binary image, EDM is a 2-D array of the same size such that each element is storing the Euclidean distance to the nearest black pixel In the proposed GPU implementation, we have considered many programming issues of the GPU system such as coalescing access of global memory, shared memory bank conflicts and partition camping In practice, we have implemented our parallel algorithm in the following two modern GPU systems: Tesla C1060 and GTX 480, respectively The experimental results have shown that, for an input binary image with size of $9216\times 9216$, our implementation can achieve a speedup factor of 52 over the sequential algorithm implementation

49 citations


Book ChapterDOI
01 Jan 2011
TL;DR: The way in which consideration of the number of registers required, the details of data dependency in advancing the state, and the desire for memory coalescence in storing the output lead to different implementations in the three cases is of most importance.
Abstract: Publisher Summary Random number generation is a key component of many forms of simulation, and fast parallel generation is particularly important for the naturally parallel Monte Carlo simulations that are used extensively in computational finance and many areas of computational science and engineering. This chapter discusses the parallelization of three very popular random number generators. In each case, the random number sequence that is generated is identical to that produced on a CPU by the standard sequential algorithm. The key to the parallelization is that each CUDA thread block generates a particular block of numbers within the original sequence, and to do this step, it needs an efficient skip-ahead algorithm to jump to the start of its block. Although there is much in common in the underlying mathematical formulation of these three generators, there are also very significant differences owing to differences in the size of the state information required by each generator. The Intel random number generators are contained in the vector statistical library (VSL). This library is not multithreaded, but is thread safe and contains all the necessary skip-ahead functions to advance the generators' states. The way in which consideration of the number of registers required, the details of data dependency in advancing the state, and the desire for memory coalescence in storing the output lead to different implementations in the three cases is of most importance.

33 citations


Journal ArticleDOI
TL;DR: The calculation results for the test set show that the native catalytic residue sites were successfully identified and ranked within the top 10 designs for 7 of the 10 chemical reactions, indicating that the matching algorithm has the potential to be used for designing industrial enzymes for desired reactions.
Abstract: A loop closure-based sequential algorithm, PRODA_MATCH, was developed to match catalytic residues onto a scaffold for enzyme design in silico. The computational complexity of this algorithm is polynomial with respect to the number of active sites, the number of catalytic residues, and the maximal iteration number of cyclic coordinate descent steps. This matching algorithm is independent of a rotamer library that enables the catalytic residue to take any required conformation during the reaction coordinate. The catalytic geometric parameters defined between functional groups of transition state (TS) and the catalytic residues are continuously optimized to identify the accurate position of the TS. Pseudo-spheres are introduced for surrounding residues, which make the algorithm take binding into account as early as during the matching process. Recapitulation of native catalytic residue sites was used as a benchmark to evaluate the novel algorithm. The calculation results for the test set show that the native catalytic residue sites were successfully identified and ranked within the top 10 designs for 7 of the 10 chemical reactions. This indicates that the matching algorithm has the potential to be used for designing industrial enzymes for desired reactions.

Proceedings ArticleDOI
08 May 2011
TL;DR: The results show a significant speed-up of the algorithm compared to the time required to solve the algorithm in a conventional CPU, even when a more efficient sequential algorithm, such as the Newton-Raphson, is used.
Abstract: This paper presents an implementation of the Jacobi power flow algorithm to be run on a single instruction multiple data (SIMD) unit processor. The purpose is to be able to solve a large number of power flows in parallel as quickly as possible. This well-known algorithm was modified taking into account the characteristics of the SIMD architecture. The results show a significant speed-up of the algorithm compared to the time required to solve the algorithm in a conventional CPU, even when a more efficient sequential algorithm, such as the Newton-Raphson, is used. The accuracy of the performance has been validated with the results of the IEEE-118 standard network.

Book ChapterDOI
11 Sep 2011
TL;DR: The deconvolution of 3D Fluorescence Microscopy RGB images is considered, describing the benefits arising from facing medical imaging problems on modern graphics processing units (GPUs), that are non expensive parallel processing devices available on many up-to-date personal computers.
Abstract: We consider the deconvolution of 3D Fluorescence Microscopy RGB images, describing the benefits arising from facing medical imaging problems on modern graphics processing units (GPUs), that are non expensive parallel processing devices available on many up-to-date personal computers. We found that execution time of CUDA version is about 2 orders of magnitude less than the one of sequential algorithm. Anyway, the experiments lead some reflections upon the best setting for the CUDA-based algorithm. That is, we notice the need to model the GPUs architectures and their characteristics to better describe the performance of GPU-algorithms and what we can expect of them.

Book ChapterDOI
26 Mar 2011
TL;DR: This work presents IFDS-A, a parallel algorithm for solving context-sensitive interprocedural finite distributive subset (IFDS) dataflow problems, and concludes that Actors are an effective way to parallelize this type of algorithm.
Abstract: Defining algorithms in a way which allows parallel execution is becoming increasingly important as multicore computers become ubiquitous. We present IFDS-A, a parallel algorithm for solving context-sensitive interprocedural finite distributive subset (IFDS) dataflow problems. IFDS-A defines these problems in terms of Actors, and dataflow dependencies as messages passed between these Actors. We implement the algorithm in Scala, and evaluate its performance against a comparable sequential algorithm.With eight cores, IFDS-A is 6.12 times as fast as with one core, and 3.35 times as fast as a baseline sequential algorithm. We also found that Scala's default Actors implementation is not optimal for this algorithm, and that a custom-built implementation outperforms it by a significant margin. We conclude that Actors are an effective way to parallelize this type of algorithm.

01 Jan 2011
TL;DR: A novel data-structure, g-tries, designed to represent a collection of graphs, is proposed, Akin to a prefix tree, which takes advantage of common substructures to both reduce the memory needed to store the graphs, and to produce a new more efficient sequential algorithm to compute their frequency as subgraphs of another larger graph.
Abstract: Networks are a powerful representation for a multitude of natural and artificial systems. They are ubiquitous in real-world systems, presenting substantial non-trivial topological features. These are called complex networks and have received increasing attention in recent years. In order to understand their design principles, the concept of network motifs emerged. These are recurrent over-represented patterns of interconnections, conjectured to have some significance, that can be seen as basic building blocks of networks. Algorithmically, discovering network motifs is a hard problem related to graph isomorphism. The needed execution time grows exponentially as the size of networks or motifs increases, thus limiting their applicability. Since motifs are a fundamental concept, increasing the efficiency in its detection can lead to new insights in several areas of knowledge. To develop efficient and scalable algorithms for motifs discovery is precisely the main aim of this thesis. We provide a thorough survey of existing methods, complete with an associated chronology, taxonomy, algorithmic description and empirical evaluation and comparison. We propose a novel data-structure, g-tries, designed to represent a collection of graphs. Akin to a prefix tree, it takes advantage of common substructures to both reduce the memory needed to store the graphs, and to produce a new more efficient sequential algorithm to compute their frequency as subgraphs of another larger graph. We also introduce a sampling methodology for g-tries that successfully trades accuracy for faster execution times. We identify opportunities for parallelism in motif discovery, creating an associated taxonomy. We expose the whole motif computation as a tree based search and devise a general methodology for parallel execution with dynamic load balancing, including a novel strategy capable of efficiently stopping and dividing computation on the fly. In particular we provide parallel algorithms for ESU and g-tries. Finally, we extensively evaluate our algorithms on a set of diversified complex networks. We show that we are able to outperform all existing sequential algorithms, and are able to scale our parallel algorithms up to 128 processors almost linearly. By combining the power of g-tries and parallelism, we speedup motif discovery by several orders of magnitude, thus effectively pushing the limits in its applicability.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a sequential approach to determine the unknown parameters for inverse heat conduction problems which have multiple time-dependent heat sources and discussed the sensitivity problem and analyzed what factors cause the growth in error sensitivity.

Book ChapterDOI
13 Jun 2011
TL;DR: This work introduces a parallel iterated tabu search heuristic for solving eight different variants of the vehicle routing problem and shows that the proposed heuristic is both general and competitive with specific heuristics designed for each problem type.
Abstract: We introduce a parallel iterated tabu search heuristic for solving eight different variants of the vehicle routing problem. Through extensive computational results we show that the proposed heuristic is both general and competitive with specific heuristics designed for each problem type.

Proceedings ArticleDOI
14 Mar 2011
TL;DR: A novel floorplanning algorithm based on simulated annealing on GPUs that achieves 6–160X speedup for a range of MCNC and GSRC benchmarks, while delivering comparable or better solution quality.
Abstract: In this paper, we propose a novel floorplanning algorithm based on simulated annealing on GPUs. Simulated annealing is an inherently sequential algorithm, far from the typical programs suitable for Single Instruction Multiple Data (SIMD) style concurrency in a GPU. We propose a fundamentally different approach of exploring the floorplan solution space, where we evaluate concurrent moves on a given floorplan. We illustrate several performance optimization techniques for this algorithm on GPUs. Compared to the sequential algorithm, our techniques achieve 6–160X speedup for a range of MCNC and GSRC benchmarks, while delivering comparable or better solution quality.

Journal ArticleDOI
TL;DR: In this paper, a D-optimal minimax design criterion is proposed to construct two-level fractional factorial designs, which can be used to estimate a linear model with main effects and some specified interactions.

01 Jan 2011
TL;DR: A variant of the universal construction that keeps a bounded state, provides wait-free parallel processing, tolerates thread crashes, and handles non-terminating operations is introduced.
Abstract: The universal construction shows how to convert a sequential algorithm into a concurrent wait-free algorithm. We introduce a variant of this construction that (1) keeps a bounded state, (2) provides wait-free parallel processing, (3) tolerates thread crashes, and (4) handles non-terminating operations. The foundation of this construction is a wait-free transactional memory that is capable of isolating crash failures and non-termination failures.

Journal ArticleDOI
TL;DR: Two parallel algorithms for numerical simulation of optical wave propagation have been constructed and it is shown that the parallel algorithms have a significant speed advantage (by tens of times) over the common sequential algorithm; and the larger the grids in a computation task, the more significant the advantage.
Abstract: Methods and peculiarities of parallel algorithms for numerical simulation of optical wave propagation are considered. A scalar parabolic equation for the complex amplitude of monochromatic-wave field was solved numerically using the Fourier transform method for homogeneous media and split-step Fourier method for inhomogeneous media. Two parallel algorithms have been constructed—with the use of OpenMP technology with the MKL library for Intel multicore processors and CUDA technology for NVIDIA graphics accelerators. Speed comparison of these algorithms with each other and with a conventional sequential two-dimensional algorithm from the FFTW library is carried out by calculating the average number of test task solutions per second. It is shown that the parallel algorithms have a significant speed advantage (by tens of times) over the common sequential algorithm; and the larger the grids in a computation task, the more significant the advantage. Comparison of the above parallel algorithms shows the following: the approach based on the OpenMP technology holds the lead for grids of up to 1024 × 1024 in size, while the approach using CUDA technology was faster for large grids (from 1024 × 1024 or larger). The results are discussed, and recommendations on switching from sequential algorithms to the parallel ones are given.

Journal ArticleDOI
TL;DR: Empirical results show that the proposed parallel scatter search algorithms yield good speed-up and improve solution quality because they explore larger parts of the search space within reasonable time, in contrast with the sequential algorithm.
Abstract: University exam timetabling refers to scheduling exams into predefined days, time periods and rooms, given a set of constraints. Exam timetabling is a computationally intractable optimization problem, which requires heuristic techniques for producing adequate solutions within reasonable execution time. For large numbers of exams and students, sequential algorithms are likely to be time consuming. This paper presents parallel scatter search meta-heuristic algorithms for producing good sub-optimal exam timetables in a reasonable time. Scatter search is a population-based approach that generates solutions over a number of iterations and aims to combine diversification and search intensification. The authors propose parallel scatter search algorithms that are based on distributing the population of candidate solutions over a number of processors in a PC cluster environment. The main components of scatter search are computed in parallel and efficient communication techniques are employed. Empirical results show that the proposed parallel scatter search algorithms yield good speed-up. Also, they show that parallel scatter search algorithms improve solution quality because they explore larger parts of the search space within reasonable time, in contrast with the sequential algorithm.

Journal ArticleDOI
TL;DR: A new reduction technique, which preserves a non-prescribed subset of the original state variables in the reduced model, is presented in this work, derived from the Petrov–Galerkin projection by adding constraints on the projection matrix.

Journal ArticleDOI
TL;DR: This paper describes the approach to adapting a text document similarity classifier based on the Term Frequency Inverse Document Frequency (TFIDF) metric to two massively multi-core hardware platforms.

Book ChapterDOI
06 Jul 2011
TL;DR: An efficient parallel algorithm for reconstruction from markers, and multi-scale analysis through differential morphological profiles, which are top-hat scale spaces based on openings and closings by reconstruction, which provides speed gain through parallelism and more efficient re-use of previously computed data.
Abstract: In this paper we provide an efficient parallel algorithm for reconstruction from markers, and multi-scale analysis through differential morphological profiles, which are top-hat scale spaces based on openings and closings by reconstruction. The new algorithms provide speed gain in two ways: (i) through parallelism, and (ii) through more efficient re-use of previously computed data. The best version of the algorithm provided a 17× speed-up on 24 cores, over computation of the same algorithm on a single core. Compared to the basic method of repeated reconstructions by a sequential algorithm, a speed gain of 25.1 times was obtained.

Book ChapterDOI
08 Sep 2011
TL;DR: An abstraction to alleviate the difficulty of programming with threads is proposed, which makes available a virtual time in which events in different program time-lines are sequentialized.
Abstract: We propose an abstraction to alleviate the difficulty of programming with threads This abstraction is not directly usable by application programmers Instead, application-visible behavior is defined through a semantical plugin, and invoked via a language or library that uses the plugin The main benefit is that parallel language runtimes become simpler to implement, because they use sequential algorithms for the parallel semantics This is possible because the abstraction makes available a virtual time in which events in different program time-lines are sequentialized The parallel semantics relate events in different time-lines via relating the sequentialized versions within the virtual time-line

Journal ArticleDOI
TL;DR: In this article, a sequential Monte Carlo algorithm is proposed to estimate a stochastic volatility model with leverage effects and non-constant conditional mean and jumps, which relies on the auxiliary particle filter algorithm mixed together with Markov Chain Monte Carlo (MCMC) methodology.
Abstract: In this paper we propose a sequential Monte Carlo algorithm to estimate a stochastic volatility model with leverage effects and non constant conditional mean and jumps. We are interested in estimating the time invariant parameters and the non-observable dynamics involved in the model. Our idea relies on the auxiliary particle filter algorithm mixed together with Markov Chain Monte Carlo (MCMC) methodology. Adding an MCMC step to the auxiliary particle filter prevents numerical degeneracies in the sequential algorithm and allows sequential evaluation of the fixed parameters and the latent processes. Empirical evaluation on simulated and real data is presented to assess the performance of the algorithm.

Posted Content
TL;DR: This paper develops a soft-decision-driven sequential algorithm geared to the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols that shows clear performance advantages relative to existing channel estimation techniques.
Abstract: We consider channel estimation specific to turbo equalization for multiple-input multiple-output (MIMO) wireless communication. We develop a soft-decision-driven sequential algorithm geared to the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols. One interesting feature of the pipelined turbo equalizer is that multiple soft-decisions become available at various processing stages. A tricky issue is that these multiple decisions from different pipeline stages have varying levels of reliability. This paper establishes an effective strategy for the channel estimator to track the target channel, while dealing with observation sets with different qualities. The resulting algorithm is basically a linear sequential estimation algorithm and, as such, is Kalman-based in nature. The main difference here, however, is that the proposed algorithm employs puncturing on observation samples to effectively deal with the inherent correlation among the multiple demapper/decoder module outputs that cannot easily be removed by the traditional innovations approach. The proposed algorithm continuously monitors the quality of the feedback decisions and incorporates it in the channel estimation process. The proposed channel estimation scheme shows clear performance advantages relative to existing channel estimation techniques.

Journal ArticleDOI
TL;DR: In this article, a soft-decision-driven sequential channel estimation algorithm is proposed for the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols.
Abstract: We consider channel estimation specific to turbo equalization for multiple-input multiple-output (MIMO) wireless communication. We develop a soft-decision-driven sequential algorithm geared to the pipelined turbo equalizer architecture operating on orthogonal frequency division multiplexing (OFDM) symbols. One interesting feature of the pipelined turbo equalizer is that multiple soft-decisions become available at various processing stages. A tricky issue is that these multiple decisions from different pipeline stages have varying levels of reliability. This paper establishes an effective strategy for the channel estimator to track the target channel, while dealing with observation sets with different qualities. The resulting algorithm is basically a linear sequential estimation algorithm and, as such, is Kalman-based in nature. The main difference here, however, is that the proposed algorithm employs puncturing on observation samples to effectively deal with the inherent correlation among the multiple demapper/decoder module outputs that cannot easily be removed by the traditional innovations approach. The proposed algorithm continuously monitors the quality of the feedback decisions and incorporates it in the channel estimation process. The proposed channel estimation scheme shows clear performance advantages relative to existing channel estimation techniques.

Proceedings ArticleDOI
06 Jul 2011
TL;DR: Improvements to an existing parallel model checking algorithm are proposed and the resulting new algorithm has better scalability and performance than both the former parallel approach and the sequential algorithm.
Abstract: Formal verification is becoming a fundamental step of safety-critical and model-based software development. As part of the verification process, model checking is one of the current advanced techniques to analyze the behavior of a system. In this paper, we examine an existing parallel model checking algorithm and we propose improvements to eliminate some computational bottlenecks. Our measurements show that the resulting new algorithm has better scalability and performance than both the former parallel approach and the sequential algorithm.