Showing papers on "QR decomposition published in 2015"

PDF

Open Access

Journal Article•DOI•

[...]

Laurent Sorber¹, Marc Van Barel¹, Lieven De Lathauwer²•Institutions (2)

University of Copenhagen Faculty of Science¹, Katholieke Universiteit Leuven²

11 Feb 2015-IEEE Journal of Selected Topics in Signal Processing

TL;DR: The versatility of the SDF framework is demonstrated by means of four diverse applications, which are all solved entirely within Tensorlab's DSL.

...read moreread less

Abstract: We present structured data fusion (SDF) as a framework for the rapid prototyping of knowledge discovery in one or more possibly incomplete data sets. In SDF, each data set—stored as a dense, sparse, or incomplete tensor—is factorized with a matrix or tensor decomposition. Factorizations can be coupled, or fused, with each other by indicating which factors should be shared between data sets. At the same time, factors may be imposed to have any type of structure that can be constructed as an explicit function of some underlying variables. With the right choice of decomposition type and factor structure, even well-known matrix factorizations such as the eigenvalue decomposition, singular value decomposition and QR factorization can be computed with SDF. A domain specific language (DSL) for SDF is implemented as part of the software package Tensorlab, with which we offer a library of tensor decompositions and factor structures to choose from. The versatility of the SDF framework is demonstrated by means of four diverse applications, which are all solved entirely within Tensorlab’s DSL.

...read moreread less

185 citations

Journal Article•DOI•

A New Selection Operator for the Discrete Empirical Interpolation Method -- improved a priori error bound and extensions

[...]

Zlatko Drmač, Serkan Gugercin

02 May 2015-arXiv: Numerical Analysis

TL;DR: A new framework for constructing the discrete empirical interpolation method (\sf DEIM) projection operator is introduced, formulated using the QR factorization with column pivoting, and it enjoys a sharper error bound for the \sfDEIM projection error.

...read moreread less

Abstract: This paper introduces a new framework for constructing the Discrete Empirical Interpolation Method DEIM projection operator. The interpolation node selection procedure is formulated using the QR factorization with column pivoting, and it enjoys a sharper error bound for the DEIM projection error. Furthermore, for a subspace $\mathcal{U}$ given as the range of an orthonormal $U$, the DEIM projection does not change if $U$ is replaced by $U \Omega$ with arbitrary unitary matrix $\Omega$. In a large-scale setting, the new approach allows modifications that use only randomly sampled rows of $U$, but with the potential of producing good approximations with corresponding probabilistic error bounds. Another salient feature of the new framework is that robust and efficient software implementation is easily developed, based on readily available high performance linear algebra packages.

...read moreread less

177 citations

Journal Article•DOI•

Improving Multifrontal Methods by Means of Block Low-Rank Representations

[...]

Patrick R. Amestoy¹, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari², Jean-Yves L'Excellent, Clement Weisbecker¹ - Show less +2 more•Institutions (2)

École Normale Supérieure¹, Centre national de la recherche scientifique²

11 Jun 2015-SIAM Journal on Scientific Computing

TL;DR: A low-rank format called Block Low-Rank (BLR) is proposed, and it is explained how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method.

...read moreread less

Abstract: Matrices coming from elliptic Partial Differential Equations (PDEs) have been shown to have a low-rank property: well defined off-diagonal blocks of their Schur complements can be approximated by low-rank products. Given a suitable ordering of the matrix which gives to the blocks a geometrical meaning, such approximations can be computed using an SVD or a rank-revealing QR factorization. The resulting representation offers a substantial reduction of the memory requirement and gives efficient ways to perform many of the basic dense algebra operations. Several strategies have been proposed to exploit this property. We propose a low-rank format called Block Low-Rank (BLR), and explain how it can be used to reduce the memory footprint and the complexity of direct solvers for sparse matrices based on the multifrontal method. We present experimental results that show how the BLR format delivers gains that are comparable to those obtained with hierarchical formats such as Hierarchical matrices (H matrices) and Hierarchically Semi-Separable (HSS matrices) but provides much greater flexibility and ease of use which are essential in the context of a general purpose, algebraic solver.

...read moreread less

170 citations

Journal Article•DOI•

Fast Non-Negative Orthogonal Matching Pursuit

[...]

Mehrdad Yaghoobi¹, Di Wu¹, Michael Davies¹•Institutions (1)

University of Edinburgh¹

16 Jan 2015-IEEE Signal Processing Letters

TL;DR: A novel fast implementation of the Non-Negative OMP is presented, which is based on the QR decomposition and an iterative coefficients update, which can fully incorporate the positivity constraint of the coefficients, throughout the selection stage of the algorithm.

...read moreread less

Abstract: One of the important classes of sparse signals is the non-negative signals. Many algorithms have already been proposed to recover such non-negative representations, where greedy and convex relaxed algorithms are among the most popular methods. The greedy techniques have been modified to incorporate the non-negativity of the representations. One such modification has been proposed for Orthogonal Matching Pursuit (OMP), which first chooses positive coefficients and uses a non-negative optimisation technique as a replacement for the orthogonal projection onto the selected support. Beside the extra computational costs of the optimisation program, it does not benefit from the fast implementation techniques of OMP. These fast implementations are based on the matrix factorisations. We here first investigate the problem of positive representation, using pursuit algorithms. We will then describe a new implementation, which can fully incorporate the positivity constraint of the coefficients, throughout the selection stage of the algorithm. As a result, we present a novel fast implementation of the Non-Negative OMP, which is based on the QR decomposition and an iterative coefficients update. We will empirically show that such a modification can easily accelerate the implementation by a factor of ten in a reasonable size problem.

...read moreread less

80 citations

Journal Article•DOI•

Incremental Linear Discriminant Analysis: A Fast Algorithm and Comparisons

[...]

Delin Chu¹, Li-Zhi Liao², Michael K. Ng², Xiaoyan Wang¹•Institutions (2)

National University of Singapore¹, Hong Kong Baptist University²

29 Jan 2015-IEEE Transactions on Neural Networks

TL;DR: Numerical experiments demonstrate that the proposed new batch LDA algorithm called LDA/QR is very efficient and competitive with the state-of-the-art ILDA algorithms in terms of classification accuracy, computational complexity, and space complexity.

...read moreread less

Abstract: It has always been a challenging task to develop a fast and an efficient incremental linear discriminant analysis (ILDA) algorithm. For this purpose, we conduct a new study for linear discriminant analysis (LDA) in this paper and develop a new ILDA algorithm. We propose a new batch LDA algorithm called LDA/QR. LDA/QR is a simple and fast LDA algorithm, which is obtained by computing the economic QR factorization of the data matrix followed by solving a lower triangular linear system. The relationship between LDA/QR and uncorrelated LDA (ULDA) is also revealed. Based on LDA/QR, we develop a new incremental LDA algorithm called ILDA/QR. The main features of our ILDA/QR include that: 1) it can easily handle the update from one new sample or a chunk of new samples; 2) it has efficient computational complexity and space complexity; and 3) it is very fast and always achieves competitive classification accuracy compared with ULDA algorithm and existing ILDA algorithms. Numerical experiments based on some real-world data sets demonstrate that our ILDA/QR is very efficient and competitive with the state-of-the-art ILDA algorithms in terms of classification accuracy, computational complexity, and space complexity.

...read moreread less

59 citations

Journal Article•DOI•

Digital image watermarking using partial pivoting lower and upper triangular decomposition into the wavelet domain

[...]

Nazeer Muhammad, Nargis Bibi¹•Institutions (1)

University of Manchester¹

01 Sep 2015-Iet Image Processing

TL;DR: A digital image watermarking algorithm using partial pivoting lower and upper triangular (PPLU) decomposition is proposed and is highly reliable with better imperceptibility of the embedded image and computationally efficient compared with recently existed methods.

...read moreread less

Abstract: A digital image watermarking algorithm using partial pivoting lower and upper triangular (PPLU) decomposition is proposed. In this method, a digital watermark image is factorised into lower triangular, upper triangular and permutation matrices by PPLU decomposition. The permutation matrix is used as the valid key matrix for authentication of the rightful ownership of the watermark image. The product of the lower and upper triangular matrices is embedded into particular sub-bands of a cover image that is decomposed by wavelet transform using the singular value decomposition. The weightage-based differential evolution algorithm is used to achieve the possible scaling factor for obtaining the maximum possible robustness against various image processing operations and pirate attacks. The authors experiments show that the proposed algorithm is highly reliable with better imperceptibility of the embedded image and computationally efficient compared with recently existed methods.

...read moreread less

57 citations

Journal Article•DOI•

Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs

[...]

Ichitaro Yamazaki¹, Stanimire Tomov¹, Jack Dongarra¹•Institutions (1)

University of Tennessee¹

12 May 2015-SIAM Journal on Scientific Computing

TL;DR: This paper analyzes the numerical properties of the mixed-precision CholQR, which requires only one global reduction between the parallel processing units and performs most of its computation using BLAS-3 kernels.

...read moreread less

Abstract: To orthonormalize the columns of a dense matrix, the Cholesky QR (CholQR) requires only one global reduction between the parallel processing units and performs most of its computation using BLAS-3 kernels. As a result, compared to other orthogonalization algorithms, CholQR obtains superior performance on many of the current computer architectures, where the communication is becoming increasingly expensive compared to the arithmetic operations. This is especially true when the input matrix is tall-skinny. Unfortunately, the orthogonality error of CholQR depends quadratically on the condition number of the input matrix, and it is numerically unstable when the matrix is ill-conditioned. To enhance the stability of CholQR, we recently used mixed-precision arithmetic; the input and output matrices are in the working precision, but some of its intermediate results are accumulated in the doubled precision. In this paper, we analyze the numerical properties of this mixed-precision CholQR. Our analysis shows that ...

...read moreread less

54 citations

Journal Article•DOI•

Randomized QR with Column Pivoting

[...]

Jed A. Duersch, Ming Gu

23 Sep 2015-arXiv: Numerical Analysis

TL;DR: This work proposes a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of level-3 BLAS QR and QRCP and demonstrates strong parallel scalability on shared-memory multiple core systems using an implementation in Fortran with OpenMP.

...read moreread less

Abstract: The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions We use randomized sampling to approximate this process which dramatically reduces communication in column selection We also introduce a sample update formula to reduce the cost of sampling trailing matrices Using our column selection mechanism we observe results that are comparable in quality to those obtained from the QRCP algorithm, but with performance near unpivoted QR We also demonstrate strong parallel scalability on shared memory multiple core systems using an implementation in Fortran with OpenMP This work immediately extends to produce low-rank truncated approximations of large matrices We propose a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of level-3 BLAS QR and QRCP Provided the truncation rank is small, avoiding trailing matrix updates reduces approximation time by nearly half By using these techniques and employing a variation on Stewart's QLP algorithm, we develop an approximate truncated SVD that runs nearly as fast as truncated QR

...read moreread less

53 citations

Journal Article•DOI•

Fast and Backward Stable Computation of Roots of Polynomials

[...]

Jared L. Aurentz¹, Thomas Mach², Raf Vandebril², David S. Watkins³•Institutions (3)

University of Oxford¹, Katholieke Universiteit Leuven², Washington State University³

07 Jul 2015-SIAM Journal on Matrix Analysis and Applications

TL;DR: A stable algorithm to compute the roots of polynomials by computing the eigenvalues of the associated companion matrix by Francis's implicitly shifted QR algorithm is presented.

...read moreread less

Abstract: A stable algorithm to compute the roots of polynomials is presented. The roots are found by computing the eigenvalues of the associated companion matrix by Francis's implicitly shifted QR algorithm. A companion matrix is an upper Hessenberg matrix that is unitary-plus-rank-one, that is, it is the sum of a unitary matrix and a rank-one matrix. These properties are preserved by iterations of Francis's algorithm, and it is these properties that are exploited here. The matrix is represented as a product of 3n-1 Givens rotators plus the rank-one part, so only $O(n)$ storage space is required. In fact, the information about the rank-one part is also encoded in the rotators, so it is not necessary to store the rank-one part explicitly. Francis's algorithm implemented on this representation requires only O(n) flops per iteration and thus $O(n^{2})$ flops overall. The algorithm is described, normwise backward stability is proved, and an extensive set of numerical experiments is presented. The algorithm is shown to...

...read moreread less

52 citations

Journal Article•DOI•

High-Throughput FPGA Implementation of QR Decomposition

[...]

Sergio D. Munoz, Javier Hormigo

20 May 2015-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A hardware design to achieve high-throughput QR decomposition, using the Givens rotation method, which utilizes a new 2-D systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm.

...read moreread less

Abstract: This brief presents a hardware design to achieve high-throughput QR decomposition, using the Givens rotation method. It utilizes a new 2-D systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point field-programmable gate array (FPGA) architecture for 4 $\times$ 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared with other previous proposals for FPGA, our design achieves at least 50% more throughput, as well as much less resource utilization.

...read moreread less

42 citations

Book Chapter•DOI•

A Framework for Batched and GPU-Resident Factorization Algorithms Applied to Block Householder Transformations

[...]

Azzam Haidar¹, Tingxing "Tim" Dong¹, Stanimire Tomov¹, Piotr Luszczek¹, Jack Dongarra², Jack Dongarra³, Jack Dongarra¹ - Show less +3 more•Institutions (3)

University of Tennessee¹, University of Manchester², Oak Ridge National Laboratory³

12 Jul 2015

TL;DR: The development of one-sided factorizations that work for a set of small dense matrices in parallel, and the development and optimization of the batched factorization to achieve up to a 2-fold speedup and a 3-fold energy efficiency improvement compared to highly optimized batched CPU implementations based on the MKL library.

...read moreread less

Abstract: As modern hardware keeps evolving, an increasingly effective approach to developing energy efficient and high-performance solvers is to design them to work on many small size and independent problems. Many applications already need this functionality, especially for GPUs, which are currently known to be about four to five times more energy efficient than multicore CPUs. We describe the development of one-sided factorizations that work for a set of small dense matrices in parallel, and we illustrate our techniques on the QR factorization based on Householder transformations. We refer to this mode of operation as a batched factorization. Our approach is based on representing the algorithms as a sequence of batched BLAS routines for GPU-only execution. This is in contrast to the hybrid CPU-GPU algorithms that rely heavily on using the multicore CPU for specific parts of the workload. But for a system to benefit fully from the GPU’s significantly higher energy efficiency, avoiding the use of the multicore CPU must be a primary design goal, so the system can rely more heavily on the more efficient GPU. Additionally, this will result in the removal of the costly CPU-to-GPU communication. Furthermore, we do not use a single symmetric multiprocessor (on the GPU) to factorize a single problem at a time. We illustrate how our performance analysis, and the use of profiling and tracing tools, guided the development and optimization of our batched factorization to achieve up to a 2-fold speedup and a 3-fold energy efficiency improvement compared to our highly optimized batched CPU implementations based on the MKL library (when using two sockets of Intel Sandy Bridge CPUs). Compared to a batched QR factorization featured in the CUBLAS library for GPUs, we achieved up to $5\times $ speedup on the K40 GPU.

...read moreread less

Journal Article•DOI•

Reconstructing Householder vectors from Tall-Skinny QR

[...]

Grey Ballard¹, James Demmel², Laura Grigori³, Mathias Jacquelin⁴, Nicholas Knight², Hong Diep Nguyen² - Show less +2 more•Institutions (4)

Sandia National Laboratories¹, University of California, Berkeley², French Institute for Research in Computer Science and Automation³, Lawrence Berkeley National Laboratory⁴

01 Nov 2015-Journal of Parallel and Distributed Computing

TL;DR: The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm.

...read moreread less

Posted Content•

Blocked rank-revealing QR factorizations: How randomized sampling can be used to avoid single-vector pivoting

[...]

Per-Gunnar Martinsson

29 May 2015-arXiv: Numerical Analysis

TL;DR: The manuscript describes a algorithm for computing a QR factorization where $P$ is a permutation matrix, $Q$ is orthonormal, and $R$ is upper triangular, and the algorithm is blocked, to allow it to be implemented efficiently.

...read moreread less

Abstract: Given a matrix $A$ of size $m\times n$, the manuscript describes a algorithm for computing a QR factorization $AP=QR$ where $P$ is a permutation matrix, $Q$ is orthonormal, and $R$ is upper triangular. The algorithm is blocked, to allow it to be implemented efficiently. The need for single vector pivoting in classical algorithms for computing QR factorizations is avoided by the use of randomized sampling to find blocks of pivot vectors at once. The advantage of blocking becomes particularly pronounced when $A$ is very large, and possibly stored out-of-core, or on a distributed memory machine. The manuscript also describes a generalization of the QR factorization that allows $P$ to be a general orthonormal matrix. In this setting, one can at moderate cost compute a \textit{rank-revealing} factorization where the mass of $R$ is concentrated to the diagonal entries. Moreover, the diagonal entries of $R$ closely approximate the singular values of $A$. The algorithms described have asymptotic flop count $O(m\,n\,\min(m,n))$, just like classical deterministic methods. The scaling constant is slightly higher than those of classical techniques, but this is more than made up for by reduced communication and the ability to block the computation.

...read moreread less

Posted Content•

A fast and well-conditioned spectral method for singular integral equations

[...]

Richard Mikael Slevinsky¹, Sheehan Olver²•Institutions (2)

University of Manitoba¹, University of Sydney²

02 Jul 2015-arXiv: Numerical Analysis

TL;DR: A spectral method for solving univariate singular integral equations over unions of intervals by utilizing Chebyshev and ultraspherical polynomials to reformulate the equations as almost-banded infinite-dimensional systems is developed.

...read moreread less

Abstract: We develop a spectral method for solving univariate singular integral equations over unions of intervals by utilizing Chebyshev and ultraspherical polynomials to reformulate the equations as almost-banded infinite-dimensional systems. This is accomplished by utilizing low rank approximations for sparse representations of the bivariate kernels. The resulting system can be solved in ${\cal O}(m^2n)$ operations using an adaptive QR factorization, where $m$ is the bandwidth and $n$ is the optimal number of unknowns needed to resolve the true solution. The complexity is reduced to ${\cal O}(m n)$ operations by pre-caching the QR factorization when the same operator is used for multiple right-hand sides. Stability is proved by showing that the resulting linear operator can be diagonally preconditioned to be a compact perturbation of the identity. Applications considered include the Faraday cage, and acoustic scattering for the Helmholtz and gravity Helmholtz equations, including spectrally accurate numerical evaluation of the far- and near-field solution. The Julia software package SingularIntegralEquations.jl implements our method with a convenient, user-friendly interface.

...read moreread less

Proceedings Article•DOI•

Efficient model order reduction for multi-agent systems using QR decomposition-based clustering

[...]

Petar Mlinarić¹, Sara Grundel¹, Peter Benner¹•Institutions (1)

Max Planck Society¹

01 Dec 2015

TL;DR: The method combines an established model order reduction method and a clustering algorithm to produce a graph partition used for reduction, thus preserving structure and consensus.

...read moreread less

Abstract: In this paper we present an efficient model order reduction method for multi-agent systems with Laplacian-based dynamics. The method combines an established model order reduction method and a clustering algorithm to produce a graph partition used for reduction, thus preserving structure and consensus. By the Iterative Rational Krylov Algorithm, a good reduced order model can be found which is not necessarily structure preserving. However, based on this we can efficiently find a partition using the QR decomposition with column pivoting as a clustering algorithm, so that the structure can be restored. We illustrate the effectiveness on an example from the open literature.

...read moreread less

Journal Article•DOI•

Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy

[...]

Aurelien Bouteiller¹, Thomas Herault¹, George Bosilca¹, Peng Du¹, Jack Dongarra¹ - Show less +1 more•Institutions (1)

University of Tennessee¹

18 Feb 2015

TL;DR: This article proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures and presents a generic solution for protecting the right factor, where the updates are applied, of all above mentioned factorizations.

...read moreread less

Abstract: Dense matrix factorizations, such as LU, Cholesky and QR, are widely used for scientific applications that require solving systems of linear equations, eigenvalues and linear least squares problems. Such computations are normally carried out on supercomputers, whose ever-growing scale induces a fast decline of the Mean Time To Failure (MTTF). This article proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures. We consider extreme conditions, such as the absence of any reliable node and the possibility of losing both data and checksum from a single failure. We will present a generic solution for protecting the right factor, where the updates are applied, of all above mentioned factorizations. For the left factor, where the panel has been applied, we propose a scalable checkpointing algorithm. This algorithm features high degree of checkpointing parallelism and cooperatively utilizes the checksum storage leftover from the right factor protection. The fault-tolerant algorithms derived from this hybrid solution is applicable to a wide range of dense matrix factorizations, with minor modifications. Theoretical analysis shows that the fault tolerance overhead decreases inversely to the scaling in the number of computing units and the problem size. Experimental results of LU and QR factorization on the Kraken (Cray XT5) supercomputer validate the theoretical evaluation and confirm negligible overhead, with- and without-errors. Applicability to tolerate multiple failures and accuracy after multiple recovery is also considered.

...read moreread less

Journal Article•DOI•

A new local polynomial modeling-based variable forgetting factor RLS algorithm and its acoustic applications

[...]

Y. J. Chu¹, S.C. Chan¹•Institutions (1)

University of Hong Kong¹

01 Nov 2015-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Applications of these algorithms to frequency estimation and adaptive beamforming in time-varying speech and audio signals and the convergence and tracking performance of the proposed algorithms compare favorably with conventional algorithms.

...read moreread less

Abstract: This paper proposes a new class of local polynomial modeling (LPM)-based variable forgetting factor (VFF) recursive least squares (RLS) algorithms called the LPM-based VFF RLS (LVFF-RLS) algorithms. It models the time-varying channel coefficients as local polynomials so as to obtain the expressions of the bias and variance terms in the mean square error (MSE) of the RLS algorithm. A new locally optimal VFF (LOVFF) is then derived by minimizing the resulting MSE and the theoretical analysis is found to be in good agreement with experimental results. Methods for estimating the parameters involved in this LOVFF are also developed, resulting in an improved RLS algorithm with VFF. The algorithm is further extended to include variable regularization and a QR decomposition (QRD) version which is numerically more stable and amenable to multiplier-less implementation using coordinate rotation digital computer (CORDIC) algorithm. Applications of these algorithms to frequency estimation and adaptive beamforming in time-varying speech and audio signals are also presented to illustrate the effectiveness of the proposed algorithms. Simulations show that the convergence and tracking performance of the proposed algorithms compare favorably with conventional algorithms.

...read moreread less

Journal Article•DOI•

Audio watermarking method using QR decomposition and genetic algorithm

[...]

Seyed Mohammadreza Mohsenfar¹, Mohammad Mosleh¹, Ali Barati¹•Institutions (1)

Islamic Azad University¹

01 Feb 2015-Multimedia Tools and Applications

TL;DR: The results indicate that the proposed intelligent audio watermarking method in terms of collaborating QR decomposition (QR factorization) method and Genetic Algorithm has more robustness in comparison with previous robust audioWatermarking methods.

...read moreread less

Abstract: Watermarking is a method used to hide the owner's data in the host signal in an inaudible way. The watermark signal must not reduce the quality of host signal. Furthermore, it must be resistant to various attacks. In this paper, we will propose an intelligent audio watermarking method in terms of collaborating QR decomposition (QR factorization) method and Genetic Algorithm (GA). At the outset, the host signal is segmented into several frames. Then, every frame is decomposed by using QR decomposition method, and, subsequently, the best place for embedding the watermark bit which has a high robustness to the possible attacks is searched by using GA. In order to evaluate effectiveness of this method, we have examined the robustness of watermark against several attacks for several different audio signals. The results indicate that the proposed method has more robustness in comparison with previous robust audio watermarking methods.

...read moreread less

Journal Article•DOI•

On Efficiently Computing the Eigenvalues of Limited-Memory Quasi-Newton Matrices

[...]

Jennifer B. Erway, Roummel F. Marcia

15 Sep 2015-SIAM Journal on Matrix Analysis and Applications

TL;DR: In this article, the authors considered the problem of efficiently computing the eigenvalues of limited-memory quasi-Newton matrices that exhibit a compact formulation, and proposed a compact formula for quasiNewton matrix generated by any member of the Broyden convex class of updates.

...read moreread less

Abstract: In this paper, we consider the problem of efficiently computing the eigenvalues of limited-memory quasi-Newton matrices that exhibit a compact formulation. In addition, we produce a compact formula for quasi-Newton matrices generated by any member of the Broyden convex class of updates. Our proposed method makes use of efficient updates to the QR factorization that substantially reduce the cost of computing the eigenvalues after the quasi-Newton matrix is updated. Numerical experiments suggest that the proposed method is able to compute eigenvalues to high accuracy. Applications for this work include modified quasi-Newton methods and trust-region methods for large-scale optimization, the efficient computation of condition numbers and singular values, and sensitivity analysis.

...read moreread less

Journal Article•DOI•

Design and Implementation of Flexible Dual-Mode Soft-Output MIMO Detector With Channel Preprocessing

[...]

Zhiting Yan¹, Guanghui He¹, Yifan Ren¹, Weifeng He¹, Jianfei Jiang¹, Zhigang Mao¹ - Show less +2 more•Institutions (1)

Shanghai Jiao Tong University¹

28 Sep 2015-IEEE Transactions on Circuits and Systems

TL;DR: A flexible dual-mode soft-output multiple-input multiple-output (MIMO) detector to support open-loop and closed-loop in Chinese enhanced ultra high throughput (EUHT) wireless local area network (LAN) standard is proposed.

...read moreread less

Abstract: This paper proposes a flexible dual-mode soft-output multiple-input multiple-output (MIMO) detector to support open-loop and closed-loop in Chinese enhanced ultra high throughput (EUHT) wireless local area network (LAN) standard. The proposed detector uses minimum mean square error (MMSE) sorted QR decomposition (MMSE-SQRD) to produce channel preprocessing result, which is realized by a modified systolic array architecture with concurrent sorting. Moreover, the adopted square-root MMSE algorithm for closed-loop reuses MMSE-SQRD preprocessing to largely save hardware overhead. In addition, an optimized K-Best detection algorithm is proposed for open-loop, which increases throughput by odd-even parallel sorting and produces high quality soft-output with discarded paths (DPs). A flexible VLSI architecture is designed for the proposed dual-mode detector, which supports $1\times 1\sim 4\times 4$ antennas and BPSK $\sim$ 64-QAM modulation configuration. Implemented in SMIC 65 nm CMOS technology, the detector is capable of running at 550 MHz, which has a maximum throughput of 2.64 Gb/s for K-Best detection and 3.3 Gb/s for linear MMSE detection. The proposed detector is competitive to recent published works and meets the data-rate requirement of the EUHT standard.

...read moreread less

Journal Article•DOI•

A Robust Color Image Watermarking Scheme Using Entropy and QR Decomposition

[...]

Lauri Laur, Pejman Rasti, Mary Agoyi, Gholamreza Anbarjafari

15 Sep 2015-Radioengineering

TL;DR: Robust and imperceptible nonblind color image watermarking algorithm is proposed, which benefit from the fact that watermark can be hidden in different color channel which results into further robustness of the proposed technique to attacks.

...read moreread less

Abstract: Internet has affected our everyday life drastically. Expansive volumes of information are exchanged over the Internet consistently which causes numerous security concerns. Issues like content identification, document and image security, audience measurement, ownership, copyrights and others can be settled by using digital watermarking. In this work, robust and imperceptible nonblind color image watermarking algorithm is proposed, which benefit from the fact that watermark can be hidden in different color channel which results into further robustness of the proposed technique to attacks. Given method uses some algorithms such as entropy, discrete wavelet transform, Chirp z-transform, orthogonal-triangular decomposition and Singular value decomposition in order to embed the watermark in a color image. Many experiments are performed using well-known signal processing attacks such as histogram equalization, adding noise and compression. Experimental results show that the proposed scheme is imperceptible and robust against common signal processing attacks.

...read moreread less

Journal Article•DOI•

The Forward Search for Very Large Datasets

[...]

Marco Riani¹, Domenico Perrotta, Andrea Cerioli¹•Institutions (1)

University of Parma¹

07 Oct 2015-Journal of Statistical Software

TL;DR: This paper proposes some computational improvements of the forward search algorithm and provides a recursive implementation of the procedure which exploits the information of the previous step and produces a set of efficient routines for fast updating of the model parameter estimates and fast computation of likelihood contributions.

...read moreread less

Abstract: The identification of atypical observations and the immunization of data analysis against both outliers and failures of modeling are important aspects of modern statistics. The forward search is a graphics rich approach that leads to the formal detection of outliers and to the detection of model inadequacy combined with suggestions for model enhancement. The key idea is to monitor quantities of interest, such as parameter estimates and test statistics, as the model is fitted to data subsets of increasing size. In this paper we propose some computational improvements of the forward search algorithm and we provide a recursive implementation of the procedure which exploits the information of the previous step. The output is a set of efficient routines for fast updating of the model parameter estimates, which do not require any data sorting, and fast computation of likelihood contributions, which do not require matrix inversion or qr decomposition. It is shown that the new algorithms enable a reduction of the computation time by more than 80%. Furthemore, the running time now increases almost linearly with the sample size. All the routines described in this paper are included in the FSDA toolbox for MATLAB which is freely downloadable from the internet.

...read moreread less

Journal Article•DOI•

Novel cubature Kalman filtering for systems involving nonlinear states and linear measurements

[...]

Shiyuan Wang¹, Shiyuan Wang², Jiuchao Feng³, Chi K. Tse¹•Institutions (3)

Hong Kong Polytechnic University¹, Southwest University², South China University of Technology³

01 Jan 2015-Aeu-international Journal of Electronics and Communications

TL;DR: In this paper, the cubature rule was combined with a QR decomposition, singular value decomposition and a linear update without requirement of cubature points, and the convergence analysis of NL-SCKF was performed.

...read moreread less

Abstract: This paper extends the cubature Kalman filter (CKF) to deal with systems involving nonlinear states and linear measurements (herein called the nonlinear–linear combined systems) with additive noise. The method is referred to as the nonlinear–linear square-root cubature Kalman filtering (NL-SCKF). In NL-SCKF, the cubature rule, combined with a QR decomposition, singular value decomposition and a linear update without requirement of cubature points, is designed to update nonlinear states and linear measurements. In addition, the convergence analysis of NL-SCKF is performed. Simulation results in two selected problems, namely filtering chaotic signals and chaos-based communications, indicate that the proposed NL-SCKF with lower computation complexity achieves the same accuracy as the standard SCKF, and outperforms CKF significantly.

...read moreread less

Posted Content•

True BLAS-3 Performance QRCP using Random Sampling

[...]

Jed A. Duersch, Ming Gu

23 Sep 2015-arXiv: Numerical Analysis

TL;DR: A truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of BLAS-3 QR and QRCP is proposed and an approximate truncated SVD is developed that runs nearly as fast as truncation QR.

...read moreread less

Abstract: The dominant contribution to communication complexity in factorizing a matrix using QR with column pivoting is due to column-norm updates that are required to process pivot decisions. We use randomized sampling to approximate this process which dramatically reduces communication in column selection. We also introduce a sample update formula to reduce the cost of sampling trailing matrices. Using our column selection mechanism we observe results that are comparable to those obtained from the QRCP algorithm, but with performance near unpivoted QR. We also demonstrate strong parallel scalability on shared memory multiple core systems using an implementation in Fortran with OpenMP. This work immediately extends to produce low-rank truncated approximations of large matrices. We propose a truncated QR factorization with column pivoting that avoids trailing matrix updates which are used in current implementations of BLAS-3 QR and QRCP. Provided the truncation rank is small, avoiding trailing matrix updates reduces approximation time by nearly half. By using these techniques and employing a variation on Stewart's QLP algorithm, we develop an approximate truncated SVD that runs nearly as fast as truncated QR.

...read moreread less

Journal Article•DOI•

A direct tridiagonal solver based on Givens rotations for GPU architectures

[...]

Ioannis E. Venetis¹, Alexandros Kouris¹, Alexandros Sobczyk¹, Efstratios Gallopoulos¹, Ahmed H. Sameh² - Show less +1 more•Institutions (2)

University of Patras¹, Purdue University²

01 Nov 2015

TL;DR: A parallel solver for general tridiagonal irreducible systems and its CUDA implementation are described, indicating that g-Spike is competitive in runtime with existing GPU methods, and can provide acceptable results when other methods cannot be applied or fail.

...read moreread less

Abstract: A parallel solver for general tridiagonal irreducible systems is described.Solver based on Spike framework and Givens-QR with occasional low-rank modification.Modifications handle singularities exposed by QR in blocks of the parallel partition.The GPU implementation has similar performance to existing methods.Method returns accurate results when current GPU tridiagonal solvers fail. g-Spike, a parallel algorithm for solving general nonsymmetric tridiagonal systems for the GPU, and its CUDA implementation are described. The solver is based on the Spike framework, applying Givens rotations and QR factorization without pivoting. It also implements a low-rank modification strategy to compute the Spike DS decomposition even when the partitioning defines singular submatrices along the diagonal. The method is also used to solve the reduced system resulting from the Spike partitioning. Numerical experiments with problems of high order indicate that g-Spike is competitive in runtime with existing GPU methods, and can provide acceptable results when other methods cannot be applied or fail.

...read moreread less

Journal Article•DOI•

A cmv-based eigensolver for companion matrices ∗

[...]

Roberto Bevilacqua, G. M. Del Corso, Luca Gemignani

21 Jul 2015-SIAM Journal on Matrix Analysis and Applications

TL;DR: In this paper, the eigenvalues of a permuted version of the companion matrix associated with the polynomial were computed by computing the coefficients of the QR eigenvalue algorithm.

...read moreread less

Abstract: In this paper we present a novel matrix method for polynomial rootfinding. We approximate the roots by computing the eigenvalues of a permuted version of the companion matrix associated with the polynomial. This form, referred to as a lower staircase form of the companion matrix in the literature, has a block upper Hessenberg shape with possibly nonsquare subdiagonal blocks. It is shown that this form is well suited to the application of the QR eigenvalue algorithm. In particular, each matrix generated under this iteration is block upper Hessenberg and, moreover, all its submatrices located in a specified upper triangular portion are of rank two at most, with entries represented by means of four given vectors. By exploiting these properties we design a fast and computationally simple structured QR iteration which computes the eigenvalues of a companion matrix of size $n$ in lower staircase form using $O(n^2)$ flops and $O(n)$ memory storage. So far, this iteration is theoretically faster than the fastest ...

...read moreread less

Journal Article•DOI•

Improved rigorous perturbation bounds for the LU and QR factorizations

[...]

Hanyu Li¹, Yimin Wei²•Institutions (2)

Chongqing University¹, Fudan University²

01 Dec 2015-Numerical Linear Algebra With Applications

TL;DR: Combining the modified matrix–vector equation approach with the technique of Lyapunov majorant function and the Banach fixed point theorem, improved rigorous perturbation bounds for the LU and QR factorizations with normwise perturbations in the given matrix are obtained.

...read moreread less

Abstract: Summary Combining the modified matrix–vector equation approach with the technique of Lyapunov majorant function and the Banach fixed point theorem, we obtain improved rigorous perturbation bounds for the LU and QR factorizations with normwise perturbation in the given matrix. Each of the improved rigorous perturbation bounds is a rigorous version of the first-order perturbation bound derived by the matrix–vector equation approach in the literature, and we present their explicit expressions. These bounds are always tighter than those given by Chang and Stehle in the paper entitled “Rigorous perturbation bounds of some matrix factorizations”. This fact is illustrated by numerical examples. Copyright © 2015 John Wiley & Sons, Ltd.

...read moreread less

Journal Article•DOI•

QR factorization based Incremental Extreme Learning Machine with growth of hidden nodes

[...]

Yibin Ye, Yang Qin

01 Nov 2015-Pattern Recognition Letters

TL;DR: This approach, QR factorization based Incremental Extreme Learning Machine (QRI-ELM), is able to add random hidden nodes to SLFNs one by one and is fast and effective with good generalization and accuracy performance.

...read moreread less

Journal Article•DOI•

Robust linear equation dwell time model compatible with large scale discrete surface error matrix.

[...]

Zhichao Dong¹, Haobo Cheng¹, Hon-Yuen Tam²•Institutions (2)

Beijing Institute of Technology¹, City University of Hong Kong²

01 Apr 2015-Applied Optics

TL;DR: This study solves this ill-posed equation by Tikhonov regularization and the least square QR decomposition (LSQR) method, and automatically determines an optional interval and a typical value for the damped factor of regularization, which are dependent on the peak removal rate of tool influence functions.

...read moreread less

Abstract: The linear equation dwell time model can translate the 2D convolution process of material removal during subaperture polishing into a more intuitional expression, and may provide relatively fast and reliable results. However, the accurate solution of this ill-posed equation is not so easy, and its practicability for a large scale surface error matrix is still limited. This study first solves this ill-posed equation by Tikhonov regularization and the least square QR decomposition (LSQR) method, and automatically determines an optional interval and a typical value for the damped factor of regularization, which are dependent on the peak removal rate of tool influence functions. Then, a constrained LSQR method is presented to increase the robustness of the damped factor, which can provide more consistent dwell time maps than traditional LSQR. Finally, a matrix segmentation and stitching method is used to cope with large scale surface error matrices. Using these proposed methods, the linear equation model becomes more reliable and efficient in practical engineering.

...read moreread less

Journal Article•DOI•

Fast orthogonal linear discriminant analysis with application to image classification

[...]

Qiaolin Ye¹, Ning Ye¹, Tongming Yin¹•Institutions (1)

Nanjing Forestry University¹

22 Jun 2015-Neurocomputing

TL;DR: The new approach applies the QR decomposition and the regression to solve for a new orthogonal projection vector at each iteration, leading to the by far cheaper computational cost.

...read moreread less

Collapse