scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 2015"


Journal ArticleDOI
TL;DR: The versatility of the SDF framework is demonstrated by means of four diverse applications, which are all solved entirely within Tensorlab's DSL.
Abstract: We present structured data fusion (SDF) as a framework for the rapid prototyping of knowledge discovery in one or more possibly incomplete data sets. In SDF, each data set—stored as a dense, sparse, or incomplete tensor—is factorized with a matrix or tensor decomposition. Factorizations can be coupled, or fused, with each other by indicating which factors should be shared between data sets. At the same time, factors may be imposed to have any type of structure that can be constructed as an explicit function of some underlying variables. With the right choice of decomposition type and factor structure, even well-known matrix factorizations such as the eigenvalue decomposition, singular value decomposition and QR factorization can be computed with SDF. A domain specific language (DSL) for SDF is implemented as part of the software package Tensorlab, with which we offer a library of tensor decompositions and factor structures to choose from. The versatility of the SDF framework is demonstrated by means of four diverse applications, which are all solved entirely within Tensorlab’s DSL.

185 citations


Journal ArticleDOI
TL;DR: A new framework for constructing the discrete empirical interpolation method (\sf DEIM) projection operator is introduced, formulated using the QR factorization with column pivoting, and it enjoys a sharper error bound for the \sfDEIM projection error.
Abstract: This paper introduces a new framework for constructing the Discrete Empirical Interpolation Method DEIM projection operator. The interpolation node selection procedure is formulated using the QR factorization with column pivoting, and it enjoys a sharper error bound for the DEIM projection error. Furthermore, for a subspace $\mathcal{U}$ given as the range of an orthonormal $U$, the DEIM projection does not change if $U$ is replaced by $U \Omega$ with arbitrary unitary matrix $\Omega$. In a large-scale setting, the new approach allows modifications that use only randomly sampled rows of $U$, but with the potential of producing good approximations with corresponding probabilistic error bounds. Another salient feature of the new framework is that robust and efficient software implementation is easily developed, based on readily available high performance linear algebra packages.

177 citations


Journal ArticleDOI
TL;DR: A low-rank format called Block Low-Rank (BLR) is proposed, and it is explained how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method.
Abstract: Matrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense algebra operations. Several strategies have been proposed to exploit this property. We propose a low-rank format called Block Low-Rank (BLR), and explain how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method. We present experimental results that show how the BLR format delivers gains that are comparable to those obtained with hierarchical formats such as Hierarchical matrices (H matrices) and Hierarchically Semi-Separable (HSS matrices) but provides much greater flexibility and ease of use which are essential in the context of a general purpose, algebraic solver.

170 citations


Journal ArticleDOI
TL;DR: A novel fast implementation of the Non-Negative OMP is presented, which is based on the QR decomposition and an iterative coefficients update, which can fully incorporate the positivity constraint of the coefficients, throughout the selection stage of the algorithm.
Abstract: One of the important classes of sparse signals is the non-negative signals. Many algorithms have already been proposed to recover such non-negative representations, where greedy and convex relaxed algorithms are among the most popular methods. The greedy techniques have been modified to incorporate the non-negativity of the representations. One such modification has been proposed for Orthogonal Matching Pursuit (OMP), which first chooses positive coefficients and uses a non-negative optimisation technique as a replacement for the orthogonal projection onto the selected support. Beside the extra computational costs of the optimisation program, it does not benefit from the fast implementation techniques of OMP. These fast implementations are based on the matrix factorisations. We here first investigate the problem of positive representation, using pursuit algorithms. We will then describe a new implementation, which can fully incorporate the positivity constraint of the coefficients, throughout the selection stage of the algorithm. As a result, we present a novel fast implementation of the Non-Negative OMP, which is based on the QR decomposition and an iterative coefficients update. We will empirically show that such a modification can easily accelerate the implementation by a factor of ten in a reasonable size problem.

80 citations


Journal ArticleDOI
TL;DR: Numerical experiments demonstrate that the proposed new batch LDA algorithm called LDA/QR is very efficient and competitive with the state-of-the-art ILDA algorithms in terms of classification accuracy, computational complexity, and space complexity.
Abstract: It has always been a challenging task to develop a fast and an efficient incremental linear discriminant analysis (ILDA) algorithm. For this purpose, we conduct a new study for linear discriminant analysis (LDA) in this paper and develop a new ILDA algorithm. We propose a new batch LDA algorithm called LDA/QR. LDA/QR is a simple and fast LDA algorithm, which is obtained by computing the economic QR factorization of the data matrix followed by solving a lower triangular linear system. The relationship between LDA/QR and uncorrelated LDA (ULDA) is also revealed. Based on LDA/QR, we develop a new incremental LDA algorithm called ILDA/QR. The main features of our ILDA/QR include that: 1) it can easily handle the update from one new sample or a chunk of new samples; 2) it has efficient computational complexity and space complexity; and 3) it is very fast and always achieves competitive classification accuracy compared with ULDA algorithm and existing ILDA algorithms. Numerical experiments based on some real-world data sets demonstrate that our ILDA/QR is very efficient and competitive with the state-of-the-art ILDA algorithms in terms of classification accuracy, computational complexity, and space complexity.

59 citations


Journal ArticleDOI
TL;DR: A digital image watermarking algorithm using partial pivoting lower and upper triangular (PPLU) decomposition is proposed and is highly reliable with better imperceptibility of the embedded image and computationally efficient compared with recently existed methods.
Abstract: A digital image watermarking algorithm using partial pivoting lower and upper triangular (PPLU) decomposition is proposed. In this method, a digital watermark image is factorised into lower triangular, upper triangular and permutation matrices by PPLU decomposition. The permutation matrix is used as the valid key matrix for authentication of the rightful ownership of the watermark image. The product of the lower and upper triangular matrices is embedded into particular sub-bands of a cover image that is decomposed by wavelet transform using the singular value decomposition. The weightage-based differential evolution algorithm is used to achieve the possible scaling factor for obtaining the maximum possible robustness against various image processing operations and pirate attacks. The authors experiments show that the proposed algorithm is highly reliable with better imperceptibility of the embedded image and computationally efficient compared with recently existed methods.

57 citations


Journal ArticleDOI
TL;DR: This paper analyzes the numerical properties of the mixed-precision CholQR, which requires only one global reduction between the parallel processing units and performs most of its computation using BLAS-3 kernels.
Abstract: To orthonormalize the columns of a dense matrix, the Cholesky QR (CholQR) requires only one global reduction between the parallel processing units and performs most of its computation using BLAS-3 kernels. As a result, compared to other orthogonalization algorithms, CholQR obtains superior performance on many of the current computer architectures, where the communication is becoming increasingly expensive compared to the arithmetic operations. This is especially true when the input matrix is tall-skinny. Unfortunately, the orthogonality error of CholQR depends quadratically on the condition number of the input matrix, and it is numerically unstable when the matrix is ill-conditioned. To enhance the stability of CholQR, we recently used mixed-precision arithmetic; the input and output matrices are in the working precision, but some of its intermediate results are accumulated in the doubled precision. In this paper, we analyze the numerical properties of this mixed-precision CholQR. Our analysis shows that ...

54 citations


Journal ArticleDOI
TL;DR: This work proposes a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of level-3 BLAS QR and QRCP and demonstrates strong parallel scalability on shared-memory multiple core systems using an implementation in Fortran with OpenMP.
Abstract: The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions We use randomized sampling to approximate this process which dramatically reduces communication in column selection We also introduce a sample update formula to reduce the cost of sampling trailing matrices Using our column selection mechanism we observe results that are comparable in quality to those obtained from the QRCP algorithm, but with performance near unpivoted QR We also demonstrate strong parallel scalability on shared memory multiple core systems using an implementation in Fortran with OpenMP This work immediately extends to produce low-rank truncated approximations of large matrices We propose a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of level-3 BLAS QR and QRCP Provided the truncation rank is small, avoiding trailing matrix updates reduces approximation time by nearly half By using these techniques and employing a variation on Stewart's QLP algorithm, we develop an approximate truncated SVD that runs nearly as fast as truncated QR

53 citations


Journal ArticleDOI
TL;DR: A stable algorithm to compute the roots of polynomials by computing the eigenvalues of the associated companion matrix by Francis's implicitly shifted QR algorithm is presented.
Abstract: A stable algorithm to compute the roots of polynomials is presented. The roots are found by computing the eigenvalues of the associated companion matrix by Francis's implicitly shifted QR algorithm. A companion matrix is an upper Hessenberg matrix that is unitary-plus-rank-one, that is, it is the sum of a unitary matrix and a rank-one matrix. These properties are preserved by iterations of Francis's algorithm, and it is these properties that are exploited here. The matrix is represented as a product of 3n-1 Givens rotators plus the rank-one part, so only $O(n)$ storage space is required. In fact, the information about the rank-one part is also encoded in the rotators, so it is not necessary to store the rank-one part explicitly. Francis's algorithm implemented on this representation requires only O(n) flops per iteration and thus $O(n^{2})$ flops overall. The algorithm is described, normwise backward stability is proved, and an extensive set of numerical experiments is presented. The algorithm is shown to...

52 citations


Journal ArticleDOI
TL;DR: A hardware design to achieve high-throughput QR decomposition, using the Givens rotation method, which utilizes a new 2-D systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm.
Abstract: This brief presents a hardware design to achieve high-throughput QR decomposition, using the Givens rotation method. It utilizes a new 2-D systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point field-programmable gate array (FPGA) architecture for 4 $\times$ 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared with other previous proposals for FPGA, our design achieves at least 50% more throughput, as well as much less resource utilization.

42 citations


Book ChapterDOI
12 Jul 2015
TL;DR: The development of one-sided factorizations that work for a set of small dense matrices in parallel, and the development and optimization of the batched factorization to achieve up to a 2-fold speedup and a 3-fold energy efficiency improvement compared to highly optimized batched CPU implementations based on the MKL library.
Abstract: As modern hardware keeps evolving, an increasingly effective approach to developing energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development of one-sided factorizations that work for a set of small dense matrices in parallel, and we illustrate our techniques on the QR factorization based on Householder transformations. We refer to this mode of operation as a batched factorization. Our approach is based on representing the algorithms as a sequence of batched BLAS routines for GPU-only execution. This is in contrast to the hybrid CPU-GPU algorithms that rely heavily on using the multicore CPU for specific parts of the workload. But for a system to benefit fully from the GPU’s significantly higher energy efficiency, avoiding the use of the multicore CPU must be a primary design goal, so the system can rely more heavily on the more efficient GPU. Additionally, this will result in the removal of the costly CPU-to-GPU communication. Furthermore, we do not use a single symmetric multiprocessor (on the GPU) to factorize a single problem at a time. We illustrate how our performance analysis, and the use of profiling and tracing tools, guided the development and optimization of our batched factorization to achieve up to a 2-fold speedup and a 3-fold energy efficiency improvement compared to our highly optimized batched CPU implementations based on the MKL library (when using two sockets of Intel Sandy Bridge CPUs). Compared to a batched QR factorization featured in the CUBLAS library for GPUs, we achieved up to \(5\times \) speedup on the K40 GPU.

Journal ArticleDOI
TL;DR: The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm.

Posted Content
TL;DR: The manuscript describes a algorithm for computing a QR factorization where $P$ is a permutation matrix, $Q$ is orthonormal, and $R$ is upper triangular, and the algorithm is blocked, to allow it to be implemented efficiently.
Abstract: Given a matrix $A$ of size $m\times n$, the manuscript describes a algorithm for computing a QR factorization $AP=QR$ where $P$ is a permutation matrix, $Q$ is orthonormal, and $R$ is upper triangular. The algorithm is blocked, to allow it to be implemented efficiently. The need for single vector pivoting in classical algorithms for computing QR factorizations is avoided by the use of randomized sampling to find blocks of pivot vectors at once. The advantage of blocking becomes particularly pronounced when $A$ is very large, and possibly stored out-of-core, or on a distributed memory machine. The manuscript also describes a generalization of the QR factorization that allows $P$ to be a general orthonormal matrix. In this setting, one can at moderate cost compute a \textit{rank-revealing} factorization where the mass of $R$ is concentrated to the diagonal entries. Moreover, the diagonal entries of $R$ closely approximate the singular values of $A$. The algorithms described have asymptotic flop count $O(m\,n\,\min(m,n))$, just like classical deterministic methods. The scaling constant is slightly higher than those of classical techniques, but this is more than made up for by reduced communication and the ability to block the computation.

Posted Content
TL;DR: A spectral method for solving univariate singular integral equations over unions of intervals by utilizing Chebyshev and ultraspherical polynomials to reformulate the equations as almost-banded infinite-dimensional systems is developed.
Abstract: We develop a spectral method for solving univariate singular integral equations over unions of intervals by utilizing Chebyshev and ultraspherical polynomials to reformulate the equations as almost-banded infinite-dimensional systems. This is accomplished by utilizing low rank approximations for sparse representations of the bivariate kernels. The resulting system can be solved in ${\cal O}(m^2n)$ operations using an adaptive QR factorization, where $m$ is the bandwidth and $n$ is the optimal number of unknowns needed to resolve the true solution. The complexity is reduced to ${\cal O}(m n)$ operations by pre-caching the QR factorization when the same operator is used for multiple right-hand sides. Stability is proved by showing that the resulting linear operator can be diagonally preconditioned to be a compact perturbation of the identity. Applications considered include the Faraday cage, and acoustic scattering for the Helmholtz and gravity Helmholtz equations, including spectrally accurate numerical evaluation of the far- and near-field solution. The Julia software package SingularIntegralEquations.jl implements our method with a convenient, user-friendly interface.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: The method combines an established model order reduction method and a clustering algorithm to produce a graph partition used for reduction, thus preserving structure and consensus.
Abstract: In this paper we present an efficient model order reduction method for multi-agent systems with Laplacian-based dynamics. The method combines an established model order reduction method and a clustering algorithm to produce a graph partition used for reduction, thus preserving structure and consensus. By the Iterative Rational Krylov Algorithm, a good reduced order model can be found which is not necessarily structure preserving. However, based on this we can efficiently find a partition using the QR decomposition with column pivoting as a clustering algorithm, so that the structure can be restored. We illustrate the effectiveness on an example from the open literature.

Journal ArticleDOI
18 Feb 2015
TL;DR: This article proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures and presents a generic solution for protecting the right factor, where the updates are applied, of all above mentioned factorizations.
Abstract: Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This article proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures. We consider extreme conditions, such as the absence of any reliable node and the possibility of losing both data and checksum from a single failure. We will present a generic solution for protecting the right factor, where the updates are applied, of all above mentioned factorizations. For the left factor, where the panel has been applied, we propose a scalable checkpointing algorithm. This algorithm features high degree of checkpointing parallelism and cooperatively utilizes the checksum storage leftover from the right factor protection. The fault-tolerant algorithms derived from this hybrid solution is applicable to a wide range of dense matrix factorizations, with minor modifications. Theoretical analysis shows that the fault tolerance overhead decreases inversely to the scaling in the number of computing units and the problem size. Experimental results of LU and QR factorization on the Kraken (Cray XT5) supercomputer validate the theoretical evaluation and confirm negligible overhead, with- and without-errors. Applicability to tolerate multiple failures and accuracy after multiple recovery is also considered.

Journal ArticleDOI
TL;DR: Applications of these algorithms to frequency estimation and adaptive beamforming in time-varying speech and audio signals and the convergence and tracking performance of the proposed algorithms compare favorably with conventional algorithms.
Abstract: This paper proposes a new class of local polynomial modeling (LPM)-based variable forgetting factor (VFF) recursive least squares (RLS) algorithms called the LPM-based VFF RLS (LVFF-RLS) algorithms. It models the time-varying channel coefficients as local polynomials so as to obtain the expressions of the bias and variance terms in the mean square error (MSE) of the RLS algorithm. A new locally optimal VFF (LOVFF) is then derived by minimizing the resulting MSE and the theoretical analysis is found to be in good agreement with experimental results. Methods for estimating the parameters involved in this LOVFF are also developed, resulting in an improved RLS algorithm with VFF. The algorithm is further extended to include variable regularization and a QR decomposition (QRD) version which is numerically more stable and amenable to multiplier-less implementation using coordinate rotation digital computer (CORDIC) algorithm. Applications of these algorithms to frequency estimation and adaptive beamforming in time-varying speech and audio signals are also presented to illustrate the effectiveness of the proposed algorithms. Simulations show that the convergence and tracking performance of the proposed algorithms compare favorably with conventional algorithms.

Journal ArticleDOI
TL;DR: The results indicate that the proposed intelligent audio watermarking method in terms of collaborating QR decomposition (QR factorization) method and Genetic Algorithm has more robustness in comparison with previous robust audioWatermarking methods.
Abstract: Watermarking is a method used to hide the owner's data in the host signal in an inaudible way. The watermark signal must not reduce the quality of host signal. Furthermore, it must be resistant to various attacks. In this paper, we will propose an intelligent audio watermarking method in terms of collaborating QR decomposition (QR factorization) method and Genetic Algorithm (GA). At the outset, the host signal is segmented into several frames. Then, every frame is decomposed by using QR decomposition method, and, subsequently, the best place for embedding the watermark bit which has a high robustness to the possible attacks is searched by using GA. In order to evaluate effectiveness of this method, we have examined the robustness of watermark against several attacks for several different audio signals. The results indicate that the proposed method has more robustness in comparison with previous robust audio watermarking methods.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of efficiently computing the eigenvalues of limited-memory quasi-Newton matrices that exhibit a compact formulation, and proposed a compact formula for quasiNewton matrix generated by any member of the Broyden convex class of updates.
Abstract: In this paper, we consider the problem of efficiently computing the eigenvalues of limited-memory quasi-Newton matrices that exhibit a compact formulation. In addition, we produce a compact formula for quasi-Newton matrices generated by any member of the Broyden convex class of updates. Our proposed method makes use of efficient updates to the QR factorization that substantially reduce the cost of computing the eigenvalues after the quasi-Newton matrix is updated. Numerical experiments suggest that the proposed method is able to compute eigenvalues to high accuracy. Applications for this work include modified quasi-Newton methods and trust-region methods for large-scale optimization, the efficient computation of condition numbers and singular values, and sensitivity analysis.

Journal ArticleDOI
TL;DR: A flexible dual-mode soft-output multiple-input multiple-output (MIMO) detector to support open-loop and closed-loop in Chinese enhanced ultra high throughput (EUHT) wireless local area network (LAN) standard is proposed.
Abstract: This paper proposes a flexible dual-mode soft-output multiple-input multiple-output (MIMO) detector to support open-loop and closed-loop in Chinese enhanced ultra high throughput (EUHT) wireless local area network (LAN) standard. The proposed detector uses minimum mean square error (MMSE) sorted QR decomposition (MMSE-SQRD) to produce channel preprocessing result, which is realized by a modified systolic array architecture with concurrent sorting. Moreover, the adopted square-root MMSE algorithm for closed-loop reuses MMSE-SQRD preprocessing to largely save hardware overhead. In addition, an optimized K-Best detection algorithm is proposed for open-loop, which increases throughput by odd-even parallel sorting and produces high quality soft-output with discarded paths (DPs). A flexible VLSI architecture is designed for the proposed dual-mode detector, which supports $1\times 1\sim 4\times 4$ antennas and BPSK $\sim$ 64-QAM modulation configuration. Implemented in SMIC 65 nm CMOS technology, the detector is capable of running at 550 MHz, which has a maximum throughput of 2.64 Gb/s for K-Best detection and 3.3 Gb/s for linear MMSE detection. The proposed detector is competitive to recent published works and meets the data-rate requirement of the EUHT standard.

Journal ArticleDOI
TL;DR: Robust and imperceptible nonblind color image watermarking algorithm is proposed, which benefit from the fact that watermark can be hidden in different color channel which results into further robustness of the proposed technique to attacks.
Abstract: Internet has affected our everyday life drastically. Expansive volumes of information are exchanged over the Internet consistently which causes numerous security concerns. Issues like content identification, document and image security, audience measurement, ownership, copyrights and others can be settled by using digital watermarking. In this work, robust and imperceptible nonblind color image watermarking algorithm is proposed, which benefit from the fact that watermark can be hidden in different color channel which results into further robustness of the proposed technique to attacks. Given method uses some algorithms such as entropy, discrete wavelet transform, Chirp z-transform, orthogonal-triangular decomposition and Singular value decomposition in order to embed the watermark in a color image. Many experiments are performed using well-known signal processing attacks such as histogram equalization, adding noise and compression. Experimental results show that the proposed scheme is imperceptible and robust against common signal processing attacks.

Journal ArticleDOI
TL;DR: This paper proposes some computational improvements of the forward search algorithm and provides a recursive implementation of the procedure which exploits the information of the previous step and produces a set of efficient routines for fast updating of the model parameter estimates and fast computation of likelihood contributions.
Abstract: The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthemore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.

Journal ArticleDOI
TL;DR: In this paper, the cubature rule was combined with a QR decomposition, singular value decomposition and a linear update without requirement of cubature points, and the convergence analysis of NL-SCKF was performed.
Abstract: This paper extends the cubature Kalman filter (CKF) to deal with systems involving nonlinear states and linear measurements (herein called the nonlinear–linear combined systems) with additive noise. The method is referred to as the nonlinear–linear square-root cubature Kalman filtering (NL-SCKF). In NL-SCKF, the cubature rule, combined with a QR decomposition, singular value decomposition and a linear update without requirement of cubature points, is designed to update nonlinear states and linear measurements. In addition, the convergence analysis of NL-SCKF is performed. Simulation results in two selected problems, namely filtering chaotic signals and chaos-based communications, indicate that the proposed NL-SCKF with lower computation complexity achieves the same accuracy as the standard SCKF, and outperforms CKF significantly.

Posted Content
TL;DR: A truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of BLAS-3 QR and QRCP is proposed and an approximate truncated SVD is developed that runs nearly as fast as truncation QR.
Abstract: The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions. We use randomized sampling to approximate this process which dramatically reduces communication in column selection. We also introduce a sample update formula to reduce the cost of sampling trailing matrices. Using our column selection mechanism we observe results that are comparable to those obtained from the QRCP algorithm, but with performance near unpivoted QR. We also demonstrate strong parallel scalability on shared memory multiple core systems using an implementation in Fortran with OpenMP. This work immediately extends to produce low-rank truncated approximations of large matrices. We propose a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of BLAS-3 QR and QRCP. Provided the truncation rank is small, avoiding trailing matrix updates reduces approximation time by nearly half. By using these techniques and employing a variation on Stewart's QLP algorithm, we develop an approximate truncated SVD that runs nearly as fast as truncated QR.

Journal ArticleDOI
01 Nov 2015
TL;DR: A parallel solver for general tridiagonal irreducible systems and its CUDA implementation are described, indicating that g-Spike is competitive in runtime with existing GPU methods, and can provide acceptable results when other methods cannot be applied or fail.
Abstract: A parallel solver for general tridiagonal irreducible systems is described.Solver based on Spike framework and Givens-QR with occasional low-rank modification.Modifications handle singularities exposed by QR in blocks of the parallel partition.The GPU implementation has similar performance to existing methods.Method returns accurate results when current GPU tridiagonal solvers fail. g-Spike, a parallel algorithm for solving general nonsymmetric tridiagonal systems for the GPU, and its CUDA implementation are described. The solver is based on the Spike framework, applying Givens rotations and QR factorization without pivoting. It also implements a low-rank modification strategy to compute the Spike DS decomposition even when the partitioning defines singular submatrices along the diagonal. The method is also used to solve the reduced system resulting from the Spike partitioning. Numerical experiments with problems of high order indicate that g-Spike is competitive in runtime with existing GPU methods, and can provide acceptable results when other methods cannot be applied or fail.

Journal ArticleDOI
TL;DR: In this paper, the eigenvalues of a permuted version of the companion matrix associated with the polynomial were computed by computing the coefficients of the QR eigenvalue algorithm.
Abstract: In this paper we present a novel matrix method for polynomial rootfinding. We approximate the roots by computing the eigenvalues of a permuted version of the companion matrix associated with the polynomial. This form, referred to as a lower staircase form of the companion matrix in the literature, has a block upper Hessenberg shape with possibly nonsquare subdiagonal blocks. It is shown that this form is well suited to the application of the QR eigenvalue algorithm. In particular, each matrix generated under this iteration is block upper Hessenberg and, moreover, all its submatrices located in a specified upper triangular portion are of rank two at most, with entries represented by means of four given vectors. By exploiting these properties we design a fast and computationally simple structured QR iteration which computes the eigenvalues of a companion matrix of size $n$ in lower staircase form using $O(n^2)$ flops and $O(n)$ memory storage. So far, this iteration is theoretically faster than the fastest ...

Journal ArticleDOI
TL;DR: Combining the modified matrix–vector equation approach with the technique of Lyapunov majorant function and the Banach fixed point theorem, improved rigorous perturbation bounds for the LU and QR factorizations with normwise perturbations in the given matrix are obtained.
Abstract: Summary Combining the modified matrix–vector equation approach with the technique of Lyapunov majorant function and the Banach fixed point theorem, we obtain improved rigorous perturbation bounds for the LU and QR factorizations with normwise perturbation in the given matrix. Each of the improved rigorous perturbation bounds is a rigorous version of the first-order perturbation bound derived by the matrix–vector equation approach in the literature, and we present their explicit expressions. These bounds are always tighter than those given by Chang and Stehle in the paper entitled “Rigorous perturbation bounds of some matrix factorizations”. This fact is illustrated by numerical examples. Copyright © 2015 John Wiley & Sons, Ltd.

Journal ArticleDOI
TL;DR: This approach, QR factorization based Incremental Extreme Learning Machine (QRI-ELM), is able to add random hidden nodes to SLFNs one by one and is fast and effective with good generalization and accuracy performance.

Journal ArticleDOI
TL;DR: This study solves this ill-posed equation by Tikhonov regularization and the least square QR decomposition (LSQR) method, and automatically determines an optional interval and a typical value for the damped factor of regularization, which are dependent on the peak removal rate of tool influence functions.
Abstract: The linear equation dwell time model can translate the 2D convolution process of material removal during subaperture polishing into a more intuitional expression, and may provide relatively fast and reliable results. However, the accurate solution of this ill-posed equation is not so easy, and its practicability for a large scale surface error matrix is still limited. This study first solves this ill-posed equation by Tikhonov regularization and the least square QR decomposition (LSQR) method, and automatically determines an optional interval and a typical value for the damped factor of regularization, which are dependent on the peak removal rate of tool influence functions. Then, a constrained LSQR method is presented to increase the robustness of the damped factor, which can provide more consistent dwell time maps than traditional LSQR. Finally, a matrix segmentation and stitching method is used to cope with large scale surface error matrices. Using these proposed methods, the linear equation model becomes more reliable and efficient in practical engineering.

Journal ArticleDOI
TL;DR: The new approach applies the QR decomposition and the regression to solve for a new orthogonal projection vector at each iteration, leading to the by far cheaper computational cost.