scispace - formally typeset
Search or ask a question

Showing papers on "QR decomposition published in 2018"


Posted Content
TL;DR: Numpywren is presented, a system for linear algebra built on a serverless architecture, and LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in aserverless setting, which highlights how cloud providers could better support these types of computations through small changes in their infrastructure.
Abstract: Linear algebra operations are widely used in scientific computing and machine learning applications. However, it is challenging for scientists and data analysts to run linear algebra at scales beyond a single machine. Traditional approaches either require access to supercomputing clusters, or impose configuration and cluster management challenges. In this paper we show how the disaggregation of storage and compute resources in so-called "serverless" environments, combined with compute-intensive workload characteristics, can be exploited to achieve elastic scalability and ease of management. We present numpywren, a system for linear algebra built on a serverless architecture. We also introduce LAmbdaPACK, a domain-specific language designed to implement highly parallel linear algebra algorithms in a serverless setting. We show that, for certain linear algebra algorithms such as matrix multiply, singular value decomposition, and Cholesky decomposition, numpywren's performance (completion time) is within 33% of ScaLAPACK, and its compute efficiency (total CPU-hours) is up to 240% better due to elasticity, while providing an easier to use interface and better fault tolerance. At the same time, we show that the inability of serverless runtimes to exploit locality across the cores in a machine fundamentally limits their network efficiency, which limits performance on other algorithms such as QR factorization. This highlights how cloud providers could better support these types of computations through small changes in their infrastructure.

73 citations


Journal ArticleDOI
Mahdi Roozbeh1
TL;DR: A modified estimator based on the QR decomposition to combat the multicollinearity problem of design matrix is proposed in partially linear regression model which makes the data to be less distorted than the other methods.

53 citations


Journal ArticleDOI
TL;DR: New real structure-preserving decompositions are introduced to develop fast and robust algorithms for the (right) eigenproblem of general quaternion matrices to demonstrate the efficiency and accuracy of newly proposed algorithms.

47 citations


Journal ArticleDOI
TL;DR: In this paper, the capacity of the intensity modulation direct detection multiple-input-multiple-output (IMO) channel is studied and capacity lower bounds are derived by deriving the achievable rates of two precoding-free schemes: channel inversion and orthogonal-upper triangular matrix product decomposition.
Abstract: The capacity of the intensity modulation direct detection multiple-input-multiple-output channel is studied. Therein, the nonnegativity constraint of the transmit signal limits the applicability of classical schemes, including precoding. Thus, new ways are required for deriving capacity bounds for this channel. To this end, capacity lower bounds are developed in this paper by deriving the achievable rates of two precoding-free schemes: channel inversion and orthogonal-upper triangular matrix product decomposition. The achievable rate of a dc-offset singular-value decomposition-based scheme is also derived as a benchmark. Then, capacity upper bounds are derived and compared against the lower bounds. As a result, the capacity at high signal-to-noise ratio (SNR) is characterized for the case where the number of transmit apertures is not larger than the number of receive apertures, and is shown to be achievable by the QR decomposition scheme. This is shown for a channel with average intensity or peak intensity constraints. Under both constraints, the high-SNR capacity is approximated within a small gap. Extensions to a channel with more transmit apertures than receive apertures are discussed, and capacity bounds for this case are derived.

41 citations


Journal ArticleDOI
01 May 2018
TL;DR: Batched QR, SVD, and GEMM kernels are used to compress hierarchical matrices entirely on the GPU as mentioned in this paper, which is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.
Abstract: High performance GPU hosted batched QR decomposition kernels are developed and outperform current implementations for small and rectangular matrices.Various GPU hosted batched singular value decomposition kernels are developed and used as building blocks of a batched randomized SVD kernel for numerically low rank matrix blocks.Batched QR, SVD, and GEMM kernels are used to compress hierarchical matrices entirely on the GPU. We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

37 citations


Journal ArticleDOI
TL;DR: A novel data-driven predictive control method based on the subspace identification of Hammerstein–Wiener systems is presented, and the effectiveness and feasibility of the proposed controller is verified by numerical simulation on a fermentation bioreactor system.

36 citations


Journal ArticleDOI
Xiaopeng Gong1, Shengfeng Gu1, Yidong Lou1, Fu Zheng1, Maorong Ge, Jingnan Liu1 
TL;DR: The blocked QR factorization of the simulated matrix can greatly improve processing efficiency with a magnitude of nearly two orders on a personal computer with four 3.30 GHz cores.
Abstract: Global navigation satellite systems (GNSS) are acting as an indispensable tool for geodetic research and global monitoring of the Earth, and they have been rapidly developed over the past few years with abundant GNSS networks, modern constellations, and significant improvement in mathematic models of data processing. However, due to the increasing number of satellites and stations, the computational efficiency becomes a key issue and it could hamper the further development of GNSS applications. In this contribution, this problem is overcome from the aspects of both dense linear algebra algorithms and GNSS processing strategy. First, in order to fully explore the power of modern microprocessors, the square root information filter solution based on the blocked QR factorization employing as many matrix–matrix operations as possible is introduced. In addition, the algorithm complexity of GNSS data processing is further decreased by centralizing the carrier-phase observations and ambiguity parameters, as well as performing the real-time ambiguity resolution and elimination. Based on the QR factorization of the simulated matrix, we can conclude that compared to unblocked QR factorization, the blocked QR factorization can greatly improve processing efficiency with a magnitude of nearly two orders on a personal computer with four 3.30 GHz cores. Then, with 82 globally distributed stations, the processing efficiency is further validated in multi-GNSS (GPS/BDS/Galileo) satellite clock estimation. The results suggest that it will take about 31.38 s per epoch for the unblocked method. While, without any loss of accuracy, it only takes 0.50 and 0.31 s for our new algorithm per epoch for float and fixed clock solutions, respectively.

35 citations


Journal ArticleDOI
TL;DR: This work proposes a new approach to find the interpolation points based on the centroidal Voronoi tessellation (CVT) method, which offers a much less expensive alternative to the QRCP procedure when ISDF is used in the context of hybrid functional electronic structure calculations.
Abstract: The recently developed interpolative separable density fitting (ISDF) decomposition is a powerful way for compressing the redundant information in the set of orbital pairs and has been used to accelerate quantum chemistry calculations in a number of contexts. The key ingredient of the ISDF decomposition is to select a set of nonuniform grid points, so that the values of the orbital pairs evaluated at such grid points can be used to accurately interpolate those evaluated at all grid points. The set of nonuniform grid points, called the interpolation points, can be automatically selected by a QR factorization with column pivoting (QRCP) procedure. This is the computationally most expensive step in the construction of the ISDF decomposition. In this work, we propose a new approach to find the interpolation points based on the centroidal Voronoi tessellation (CVT) method, which offers a much less expensive alternative to the QRCP procedure when ISDF is used in the context of hybrid functional electronic struc...

31 citations


Journal ArticleDOI
TL;DR: Numerical simulation results confirm the validity and effectiveness of the proposed quick response (QR) code based nonlinear optical image encryption technique using spiral phase transform (SPT), equal modulus decomposition (EMD) and singular value decomposition(SVD).
Abstract: In this study, we propose a quick response (QR) code based nonlinear optical image encryption technique using spiral phase transform (SPT), equal modulus decomposition (EMD) and singular value decomposition (SVD). First, the primary image is converted into a QR code and then multiplied with a spiral phase mask (SPM). Next, the product is spiral phase transformed with particular spiral phase function, and further, the EMD is performed on the output of SPT, which results into two complex images, Z 1 and Z 2. Among these, Z 1 is further Fresnel propagated with distance d, and Z 2 is reserved as a decryption key. Afterwards, SVD is performed on Fresnel propagated output to get three decomposed matrices i.e. one diagonal matrix and two unitary matrices. The two unitary matrices are modulated with two different SPMs and then, the inverse SVD is performed using the diagonal matrix and modulated unitary matrices to get the final encrypted image. Numerical simulation results confirm the validity and effectiveness of the proposed technique. The proposed technique is robust against noise attack, specific attack, and brutal force attack. Simulation results are presented in support of the proposed idea.

29 citations


Journal ArticleDOI
TL;DR: This paper presents a direct and thorough comparison of many k-fold cross- validation versions as well as leave-one-out cross-validation versions, and demonstrates theoretically and experimentally that while the type of matrix decomposition plays one important role, another equally important role is played by the version of cross- validation.

28 citations


Journal ArticleDOI
Guiqiang Peng1, Leibo Liu1, Sheng Zhou1, Yang Xue1, Shouyi Yin1, Shaojun Wei1 
TL;DR: A fully pipelined very-large-scale-integration architecture with an optimal tradeoff among throughput, area consumption, and power consumption is proposed, demonstrating that the proposed system is much more efficient than state-of-the-art designs.
Abstract: As a branch of sphere decoding, the K -best method has played an important role in detection in large-scale multiple-input-multiple-output (MIMO) systems. However, as the numbers of users and antennas grow, the preprocessing complexity increases significantly, which is one of the major issues with the K -best method. To address this problem, this paper proposes a preprocessing algorithm combining Cholesky sorted QR decomposition and partial iterative lattice reduction (CHOSLAR) for K -best detection in a 64-quadrature amplitude modulation (QAM) 16 $\times$ 16 MIMO system. First, Cholesky decomposition is conducted to perform sorted QR decomposition. Compared with conventional sorted QR decomposition, this method reduces the number of multiplications by 25.1% and increases parallelism. Then, a constant-throughput partial iterative lattice reduction method is adopted to achieve near-optimal detection accuracy. This method further increases parallelism, reduces the number of matrix swaps by 45.5%, and reduces the number of multiplications by 67.3%. Finally, a sorting-reduced K -best strategy is used for vector estimation, thereby, reducing the number of comparators by 84.7%. This method suffers an accuracy loss of only approximately 1.44 dB compared with maximum likelihood detection. Based on CHOSLAR, this paper proposes a fully pipelined very-large-scale-integration architecture. A series of different systolic arrays and parallel processing units achieves an optimal tradeoff among throughput, area consumption, and power consumption. This architectural layout is obtained via TSMC 65-nm 1P9M CMOS technology, and throughput metrics of 1.40 Gbps/W (throughput/power) and 0.62 Mbps/kG (throughput/area) are achieved, demonstrating that the proposed system is much more efficient than state-of-the-art designs.

Journal ArticleDOI
TL;DR: Hardware implementation of proposed direction of arrival estimation algorithms employing LU factorization establishes the superiority of the proposed methods over existing methods reported in the literature, such as QR decomposition-based implementations.
Abstract: In this paper, authors present their work on field-programmable gate array (FPGA) hardware implementation of proposed direction of arrival estimation algorithms employing LU factorization. Both $L$ and $U$ matrices were considered in computing the angle estimates. Hardware implementation was done on a Virtex-5 FPGA and its experimental verification was performed using National Instruments PXI platform which provides hardware modules for data acquisition, RF down-conversion, digitization, etc. A uniform linear array consisting of four antenna elements was deployed at the receiver. LabVIEW FPGA modules with high throughput math functions were used for implementing the proposed algorithms. MATLAB simulations of the proposed algorithms were also performed to validate the efficacy of the proposed algorithms prior to hardware implementation of the same. Both MATLAB simulation and experimental verification establish the superiority of the proposed methods over existing methods reported in the literature, such as QR decomposition-based implementations. FPGA compilation results report low resource usage and faster computation time compared with the QR-based hardware implementation. Performance comparison in terms of estimation accuracy, percentage resource utilization, and processing time is also presented for different data and matrix sizes.

Journal ArticleDOI
TL;DR: Experiments show that LU_CRTP provides a good low rank approximation of the input matrix and it is less expensive than the rank revealing QR factorization in terms of computational and memory usage costs, while also minimizing the communication cost.
Abstract: In this paper we present an algorithm for computing a low rank approximation of a sparse matrix based on a truncated LU factorization with column and row permutations. We present various approaches for determining the column and row permutations that show a trade-off between speed versus deterministic/probabilistic accuracy. We show that if the permutations are chosen by using tournament pivoting based on QR factorization, then the obtained truncated LU factorization with column/row tournament pivoting, LU_CRTP, satisfies bounds on the singular values which have similarities with the ones obtained by a communication avoiding rank revealing QR factorization. Experiments on challenging matrices show that LU_CRTP provides a good low rank approximation of the input matrix and it is less expensive than the rank revealing QR factorization in terms of computational and memory usage costs, while also minimizing the communication cost. We also compare the computational complexity of our algorithm with randomized a...

Posted Content
TL;DR: In this paper, the authors extended the applicability of CholeskyQR2 by introducing a shift to the computed Gram matrix so as to guarantee the QR factorization of a tall-skinny matrix succeeds numerically.
Abstract: The Cholesky QR algorithm is an efficient communication-minimizing algorithm for computing the QR factorization of a tall-skinny matrix. Unfortunately it has the inherent numerical instability and breakdown when the matrix is ill-conditioned. A recent work establishes that the instability can be cured by repeating the algorithm twice (called CholeskyQR2). However, the applicability of CholeskyQR2 is still limited by the requirement that the Cholesky factorization of the Gram matrix runs to completion, which means it does not always work for matrices $X$ with $\kappa_2(X)\gtrsim {\bf u}^{-\frac{1}{2}}$ where ${\bf u}$ is the unit roundoff. In this work we extend the applicability to $\kappa_2(X)=\mathcal{O}({\bf u}^{-1})$ by introducing a shift to the computed Gram matrix so as to guarantee the Cholesky factorization $R^TR= A^TA+sI$ succeeds numerically. We show that the computed $AR^{-1}$ has reduced condition number $\leq {\bf u}^{-\frac{1}{2}}$, for which CholeskyQR2 safely computes the QR factorization, yielding a computed $Q$ of orthogonality $\|Q^TQ-I\|_2$ and residual $\|A-QR\|_F/\|A\|_F$ both $\mathcal{O}({\bf u})$. Thus we obtain the required QR factorization by essentially running Cholesky QR thrice. We extensively analyze the resulting algorithm shiftedCholeskyQR to reveal its excellent numerical stability. shiftedCholeskyQR is also highly parallelizable, and applicable and effective also when working in an oblique inner product space. We illustrate our findings through experiments, in which we achieve significant (up to x40) speedup over alternative methods.

Journal ArticleDOI
TL;DR: A reversed indoor multitarget localization system employing compressive sensing (CS) theory is proposed for the first time in terms of visible light positioning, and a subgrid localization algorithm is proposed to overcome a common unpractical assumption of on-grid locations, tackle the false peak problem in sparse matrix reconstruction, and ultimately improve the localization precision.
Abstract: A reversed indoor multitarget localization system employing compressive sensing (CS) theory is proposed for the first time in terms of visible light positioning (VLP). Unlike conventional VLP systems, where targets process the received light signals to localize themselves, our system works reversely by using multiple photodiodes (PDs) mounted on the ceiling to localize mobile targets that carry light emitting diodes. By utilizing its nature of sparsity, the problem of multitarget localization is formulated as a problem of sparse matrix reconstruction, and a 3-step workflow is developed to solve the problem. In this workflow, first, a sensing matrix is redesigned by using QR decomposition to enable CS theory. Next, the conventional ${l} _{\textbf {1}}$ -minimization ( ${l} _{\textbf {1}}\text{M}$ ) algorithm which is highly vulnerable to noise in solving a localization problem is theoretically analyzed and subsequently improved by adopting a reweighted ${l} _{\textbf {1}}\text{M}$ approach. Finally, a subgrid localization algorithm is proposed to overcome a common unpractical assumption of on-grid locations, tackle the false peak problem in sparse matrix reconstruction, and ultimately improve the localization precision. The feasibility of our system and supporting algorithms is verified through extensive simulations. Our system demonstrates a good positioning accuracy of 7.4 cm by using 25 PDs when SNR = 20 dB. We also investigate the impact of various factors on the positioning performance, and the obtained results provide an insightful reference paving the way to a practical system design.

Journal ArticleDOI
TL;DR: In this article, the complex J-symmetric Schur decomposition is used to obtain a complex unitary unitary decomposition for complex J -symmetry matrices H C = [A C D − A T ], A, C T, D T ∈ C n × n.

Journal ArticleDOI
TL;DR: A new QR-decomposition-based rotation-invariant search strategy is added to the ACS algorithm, called A+, which has only one control parameter, α, and experimental results have shown that its performance does not strongly depend on the initial value of α.
Abstract: The recently proposed artificial cooperative search (ACS) algorithm is a population-based iterative evolutionary algorithm (EA) for solving real-valued numerical optimization problems. It uses a rotation-invariant line recombination-based mutation strategy and rule-based crossover operator. However, it performs poorly for problems that include closely-related variables because, in these cases, generating uncorrelated feasible trial solution vectors using stochastic crossover methods is extremely difficult, and its mutation and crossover operators are also less effective. This paper adds a new QR-decomposition-based rotation-invariant search strategy to the ACS algorithm to improve its ability to solve such problems. This new, advanced ACS algorithm, called A+, has only one control parameter, α, and experimental results have shown that its performance does not strongly depend on the initial value of α. This paper also examines A+’s performance for noisy point cloud filtering, which is a complex real-world problem. The results of numerical experiments demonstrate that A+’s performance when solving numerical and real-world problems with closely-related variables is better than those of the comparison algorithms.

Journal ArticleDOI
TL;DR: In this article, a family of low-complexity detection schemes based on channel matrix puncturing targeted for large multiple-input multiple-output (MIMO) systems is proposed, which can be accelerated by employing standard schemes, such as chase detection, list detection, nulling-and-cancellation detection, and sub-space detection on the transformed matrix.
Abstract: A family of low-complexity detection schemes based on channel matrix puncturing targeted for large multiple-input multiple-output (MIMO) systems is proposed. It is well known that the computational cost of MIMO detection based on QR decomposition is directly proportional to the number of non-zero entries involved in back-substitution and slicing operations in the triangularized channel matrix, which can be too high for low-latency applications involving large MIMO dimensions. By systematically puncturing the channel to have a specific structure, it is demonstrated that the detection process can be accelerated by employing standard schemes, such as chase detection, list detection, nulling-and-cancellation detection, and sub-space detection on the transformed matrix. The performance of these schemes is characterized and analyzed mathematically, and bounds on the achievable diversity gain and probability of bit error are derived. Surprisingly, it is shown that puncturing does not negatively impact the receive diversity gain in hard-output detectors. The analysis is extended to soft-output detection when computing per-layer bit log-likelihood ratios; it is shown that significant performance gains are attainable by ordering the layer of interest to be at the root when puncturing the channel. Simulations of coded and uncoded scenarios certify that the proposed schemes scale up efficiently both in the number of antennas and constellation size, as well as in the presence of correlated channels. In particular, soft-output per-layer sub-space detection is shown to achieve a 2.5 dB signal-to-noise ratio gain at 10−4 bit error rate in 256-quadratic-amplitude modulation $16\times 16$ MIMO, while saving 77% of nulling-and-cancellation computations.

Journal ArticleDOI
TL;DR: An efficient compressed sensing method to solve the measurements of delay and Doppler spreading in underwater acoustic channels (UACs) by inserting a projection matrix that adopts QR decomposition for an efficient computation is proposed.
Abstract: The measurements of delay and Doppler (DD) spreading in underwater acoustic channels (UACs) have multiple applications, including communications as well as the development of a dynamic UAC simulator. However, these measurements suffer from the difficulties of fast time variations and large data sets. This paper addresses an efficient compressed sensing (CS) method to solve these problems. First, the DD spreading in UACs is studied by using a doubly spread model; second, the least-square criterion is implemented and its limit is analyzed. Subsequently, the matching pursuit (MP) method is applied to the problem by exploiting the sparsity of the DD model-based UACs. Although the MP method improves the performance of the LS method, it has unavoidable deficiencies, e.g., the redundant selections of bases that lead to a limited measurement of DD spreading. Thus, this paper proposes an improved version by inserting a projection matrix. The projected MP (PMP) method adopts QR decomposition for an efficient computation. Finally, at-sea data-based comparisons among the abovementioned three methods are conducted to verify the superiority of the PMP method.

Journal ArticleDOI
TL;DR: The sharp upper bounds for the derived normwise, mixed and componentwise condition numbers are obtained and can be estimated efficiently by means of the classical Hager–Higham algorithm for estimating matrix one-norm.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A new structure (K,S,p,v,q) of the SR-K-best algorithm and a new discrete optimization method to increase performance are proposed and the proposed sorting demonstrates huge performance gain compare to a common power-based one.
Abstract: In this paper, we proposed a discrete sorting optimization approach for uplink channel of Massive Multiple Input, Multiple Output (MIMO) system. The algorithm includes users (UEs) sorting before QR decomposition (QRD) and sorting-reduced (SR) K-best detector for 48x64 MIMO uncoded systems. Simulation results show that detector losses is about 1dB to a Maximum Likelihood (ML) detection in low detector complexity.The UEs sorting is required to sort diagonal elements of the R matrix in ascending order to avoid error propagation in multi-user (MU) scenario. Fast and low complexity method of online discrete optimization is used to find the loss function minimum. Sorting tracking is proposed, so that the pre-sorted interpolated R matrix is used for further sorting optimization, resulting in low sorting complexity. The proposed sorting demonstrates huge performance gain compare to a common power-based one. Simulation results in 5G QuaDRiGa channel are presented.SR-K-best detector is a variant of K-best detector. The SR-K-best with (K,S,p) parameters results in significant losses in scenarios with high correlated users, therefore we proposed a new structure (K,S,p,v,q) of the SR-K-best algorithm and a new discrete optimization method to increase performance. Discrete stochastic optimization was done offline in QuaDRiGa channel to find optimal (K,S,p,v,q) parameters for fixed detector structure.

Journal ArticleDOI
TL;DR: Simulation results show that the proposed algorithms offer improved performance over the conventional PAST algorithm and a comparable performance to the Kalman filter with variable measurement subspace tracking algorithm, which requires a considerably higher arithmetic complexity.
Abstract: This paper proposes a new local polynomial modeling based variable forgetting factor (VFF) and variable regularized (VR) projection approximation subspace tracking (PAST) algorithm, which is based on a novel VR-VFF recursive least squares (RLS) algorithm with multiple outputs. The subspace to be estimated is modeled as a local polynomial model so that a new locally optimal forgetting factor (LOFF) can be obtained by minimizing the resulting mean square deviation of the RLS algorithm after using the projection approximation. An $l_2$ -regularization term is also incorporated to the LOFF-PAST algorithm to reduce the estimation variance of the subspace during signal fading. The proposed LOFF-VR-PAST algorithm can be implemented by the conventional RLS algorithm as well as the numerically more stable QR decomposition. Applications of the proposed algorithms to subspace-based direction-of-arrival estimation under stationary and nonstationary environments are presented to validate their effectiveness. Simulation results show that the proposed algorithms offer improved performance over the conventional PAST algorithm and a comparable performance to the Kalman filter with variable measurement subspace tracking algorithm, which requires a considerably higher arithmetic complexity. The new LOFF-VR-RLS algorithm may also be applicable to other RLS problems involving multiple outputs.

Proceedings ArticleDOI
Martin Langhammer1, Bogdan Pasca1
15 Feb 2018
TL;DR: This work presents the mapping to parallel structures with inter-vector connectivity of a new QRD algorithm based on a Modified Gram-Schmidt (MGS) algorithm, which has a theoretical sustained-to-peak performance close to 100% for large matrices, which is roughly three times the functional density of the previously best known implementations.
Abstract: QR decomposition (QRD) is of increasing importance for many current applications, such as wireless and radar. Data dependencies in known algorithms and approaches, combined with the data access patterns used in many of these methods, restrict the achievable performance in software programmable targets. Some FPGA architectures now incorporate hard floating-point (HFP) resources, and in combination with distributed memories, as well as the flexibility of internal connectivity, can support high-performance matrix arithmetic. In this work, we present the mapping to parallel structures with inter-vector connectivity of a new QRD algorithm. Based on a Modified Gram-Schmidt (MGS) algorithm, this new algorithm has a different loop organization, but the dependent functional sequences are unchanged, so error analysis and numerical stability are unaffected. This work has a theoretical sustained-to-peak performance close to 100% for large matrices, which is roughly three times the functional density of the previously best known implementations. Mapped to an Intel Arria 10 device, we achieve 80us for a 256x256 single precision real matrix, for a 417 GFLOP equivalent. This corresponds to a 95% sustained to peak ratio, for the portion of the device used for this work.

Journal ArticleDOI
TL;DR: In this paper, a convected Cartesian coordinate system is defined and strain rates for any state of deformation are defined and quantified in terms of these metrics and their rates.

Journal ArticleDOI
TL;DR: It is shown that the state-of-the-art non- blind watermarking algorithms are vulnerable to the TMO’s attacks, and an improved non-blind water marking method is proposed and extensively evaluate with state- of-the art non- Blind watermarked schemes using a broad set of TMOs’ attacks.
Abstract: High dynamic range (HDR) imaging has experienced a widespread during the recent years through various technologies, including social network applications, which necessitates robust watermarking schemes to protect the copyright and image authentication. As watermarked high dynamic range images need to be tone mapped for visualization purposes on the traditional low dynamic range displays, the associated tone mapping operators (TMOs) can be deemed inevitable attacks. In this paper, we show that the state-of-the-art non-blind watermarking algorithms are vulnerable to the TMO’s attacks. Based on the results of our investigation, we propose an improved non-blind watermarking method and extensively evaluate with state-of-the-art non-blind watermarking schemes using a broad set of TMOs’ attacks. The proposed method first divides a given host image into patches, each of which is then decomposed using the discrete wavelet transform (DWT). The high-high sub-band of DWT is then passed through chirp-z transformation, followed by QR decomposition. The proposed solution embeds the watermark into each of value of the upper triangular matrix obtained from QR decomposition. The efficiency of the proposed embedding scheme is evaluated by applying 14 different TMOs on the watermarked image and extracting the embedded watermark. The average of 100 normalized correlation values for each image is then taken into account as a criterion for comparison, which demonstrates the noticeably stronger performance of the proposed watermarking scheme with respect to the state-of-the-art non-blind watermarking alternatives.

Journal ArticleDOI
TL;DR: In this article, the authors considered continuous-time distributed synchronization of columns of rotation matrices and designed dynamic control laws based on the introduction of auxiliary variables in combination with a QR-factorization approach.

Journal ArticleDOI
TL;DR: This work presents a novel implementation of GPU based SENSE algorithm, which employs QR decomposition for the inversion of the rectangular encoding matrix, which significantly reduces the computation time of SENSE reconstruction.

Journal ArticleDOI
TL;DR: The proposed algorithm inherits the high efficiency of the initial LDA/QR, and has a better adaptability to data due to its ability of making full use of the discriminative information in data.
Abstract: Discriminative Clustering (DC) can effectively cluster high dimension data sets. It performs in the iterative Linear Discriminant Analysis (LDA) dimensionality reduction and clustering process. However, most existing algorithms for DC have high computational complexity and are not feasible to apply in practical problems. In order to improve the efficiency of DC, we first present a variant of QR decomposition based LDA (LDA/QR) algorithm by making a minor modification to it. The proposed algorithm inherits the high efficiency of the initial LDA/QR, and has a better adaptability to data due to its ability of making full use of the discriminative information in data. We also present an objective function for the proposed variant of LDA/QR, and the proposed variant of LDA/QR can solve this objective function approximately. We then combine the proposed variant of LDA/QR and K-means (KM) into a single clustering algorithm, and obtain an efficient algorithm for DC: LDA/QR guided KM (LDA/QR-KM). Finally, in order to make LDA/QR-KM escape local minima, we adopt anomalous cluster based intelligent KM (IKM) to initialize it. Extensive experiments on a collection of benchmark data sets are presented to show the effectiveness and efficiency of the proposed LDA/QR-KM algorithm.

Journal ArticleDOI
10 Jul 2018
TL;DR: In this paper, a rank-revealing QR (RRQR) decomposition is proposed, and the approximation error incurred is similar to the proper orthogonal decomposition (POD) error.
Abstract: While the proper orthogonal decomposition (POD) is optimal under certain norms, its also expensive to compute. For large matrix sizes, the QR decomposition provides a tractable alternative. Under the assumption that it is a rank-revealing QR (RRQR), the approximation error incurred is similar to the POD error; furthermore, the authors show the existence of an RRQR with exactly the same error estimate as POD. To numerically realize an RRQR decomposition, they discuss the (iterative) modified Gram Schmidt with pivoting and reduced basis methods. They show that these two seemingly different approaches are equivalent. They then describe an MPI/OpenMP parallel code that implements one of the QR-based model reduction algorithms analyzed, and document the codes scalability for large problems, such as gravitational waves, and demonstrate excellent scalability up to 32,768 cores and, for complex dense matrices, as large as 10,000x3,276,800.

Journal ArticleDOI
TL;DR: A simulation study shows that the proposed procedure can exactly identify the true sparsest models and real data examples demonstrate the usefulness of the variable clustering performed by SSFA.
Abstract: We propose a new procedure for sparse factor analysis (FA) such that each variable loads only one common factor. Thus, the loading matrix has a single nonzero element in each row and zeros elsewhere. Such a loading matrix is the sparsest possible for certain number of variables and common factors. For this reason, the proposed method is named sparsest FA (SSFA). It may also be called FA-based variable clustering, since the variables loading the same common factor can be classified into a cluster. In SSFA, all model parts of FA (common factors, their correlations, loadings, unique factors, and unique variances) are treated as fixed unknown parameter matrices and their least squares function is minimized through specific data matrix decomposition. A useful feature of the algorithm is that the matrix of common factor scores is re-parameterized using QR decomposition in order to efficiently estimate factor correlations. A simulation study shows that the proposed procedure can exactly identify the true sparsest models. Real data examples demonstrate the usefulness of the variable clustering performed by SSFA.