scispace - formally typeset
Search or ask a question

Showing papers in "SIAM Journal on Matrix Analysis and Applications in 2008"


Journal ArticleDOI
TL;DR: It is argued that the naive approach to this problem is doomed to failure because, unlike matrices, tensors of order 3 or higher can fail to have best rank-r approximations, and a natural way of overcoming the ill-posedness of the low-rank approximation problem is proposed by using weak solutions when true solutions do not exist.
Abstract: There has been continued interest in seeking a theorem describing optimal low-rank approximations to tensors of order 3 or higher that parallels the Eckart-Young theorem for matrices. In this paper, we argue that the naive approach to this problem is doomed to failure because, unlike matrices, tensors of order 3 or higher can fail to have best rank-$r$ approximations. The phenomenon is much more widespread than one might suspect: examples of this failure can be constructed over a wide range of dimensions, orders, and ranks, regardless of the choice of norm (or even Bregman divergence). Moreover, we show that in many instances these counterexamples have positive volume: they cannot be regarded as isolated phenomena. In one extreme case, we exhibit a tensor space in which no rank-3 tensor has an optimal rank-2 approximation. The notable exceptions to this misbehavior are rank-1 tensors and order-2 tensors (i.e., matrices). In a more positive spirit, we propose a natural way of overcoming the ill-posedness of the low-rank approximation problem, by using weak solutions when true solutions do not exist. For this to work, it is necessary to characterize the set of weak solutions, and we do this in the case of rank 2, order 3 (in arbitrary dimensions). In our work we emphasize the importance of closely studying concrete low-dimensional examples as a first step toward more general results. To this end, we present a detailed analysis of equivalence classes of $2 \times 2 \times 2$ tensors, and we develop methods for extending results upward to higher orders and dimensions. Finally, we link our work to existing studies of tensors from an algebraic geometric point of view. The rank of a tensor can in theory be given a semialgebraic description; in other words, it can be determined by a system of polynomial inequalities. We study some of these polynomials in cases of interest to us; in particular, we make extensive use of the hyperdeterminant $\Delta$ on $\mathbb{R}^{2\times 2 \times 2}$.

874 citations


Journal ArticleDOI
TL;DR: This paper introduces an algorithm for NMF based on alternating nonnegativity constrained least squares (NMF/ANLS) and the active set-based fast algorithm for nonNegativity constrained most squares with multiple right-hand side vectors, and discusses its convergence properties and a rigorous convergence criterion based on the Karush-Kuhn-Tucker (KKT) conditions.
Abstract: Nonnegative matrix factorization (NMF) determines a lower rank approximation of a matrix $A \in \mathbb{R}^{m \times n} \approx WH$ where an integer $k \ll \min(m,n)$ is given and nonnegativity is imposed on all components of the factors $W \in \mathbb{R}^{m \times k}$ and $H \in \mathbb{R}^{k \times n}$. NMF has attracted much attention for over a decade and has been successfully applied to numerous data analysis problems. In applications where the components of the data are necessarily nonnegative, such as chemical concentrations in experimental results or pixels in digital images, NMF provides a more relevant interpretation of the results since it gives nonsubtractive combinations of nonnegative basis vectors. In this paper, we introduce an algorithm for NMF based on alternating nonnegativity constrained least squares (NMF/ANLS) and the active set-based fast algorithm for nonnegativity constrained least squares with multiple right-hand side vectors, and we discuss its convergence properties and a rigorous convergence criterion based on the Karush-Kuhn-Tucker (KKT) conditions. In addition, we also describe algorithms for sparse NMFs and regularized NMF. We show how we impose a sparsity constraint on one of the factors by $L_1$-norm minimization and discuss its convergence properties. Our algorithms are compared to other commonly used NMF algorithms in the literature on several test data sets in terms of their convergence behavior.

612 citations


Journal ArticleDOI
TL;DR: A new unifying framework for the optimal $\mathcal{H}_2$ approximation problem is developed using best approximation properties in the underlying Hilbert space and leads to a new set of local optimality conditions taking the form of a structured orthogonality condition.
Abstract: The optimal $\mathcal{H}_2$ model reduction problem is of great importance in the area of dynamical systems and simulation. In the literature, two independent frameworks have evolved focusing either on solution of Lyapunov equations on the one hand or interpolation of transfer functions on the other, without any apparent connection between the two approaches. In this paper, we develop a new unifying framework for the optimal $\mathcal{H}_2$ approximation problem using best approximation properties in the underlying Hilbert space. This new framework leads to a new set of local optimality conditions taking the form of a structured orthogonality condition. We show that the existing Lyapunov- and interpolation-based conditions are each equivalent to our conditions and so are equivalent to each other. Also, we provide a new elementary proof of the interpolation-based condition that clarifies the importance of the mirror images of the reduced system poles. Based on the interpolation framework, we describe an iteratively corrected rational Krylov algorithm for $\mathcal{H}_2$ model reduction. The formulation is based on finding a reduced order model that satisfies interpolation-based first-order necessary conditions for $\cHtwo$ optimality and results in a method that is numerically effective and suited for large-scale problems. We illustrate the performance of the method with a variety of numerical experiments and comparisons with existing methods.

607 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied various properties of symmetric tensors in relation to a decomposition into a symmetric sum of outer product of vectors and showed that symmetric rank is equal in a number of cases and that they always exist in an algebraically closed field.
Abstract: A symmetric tensor is a higher order generalization of a symmetric matrix. In this paper, we study various properties of symmetric tensors in relation to a decomposition into a symmetric sum of outer product of vectors. A rank-1 order-$k$ tensor is the outer product of $k$ nonzero vectors. Any symmetric tensor can be decomposed into a linear combination of rank-1 tensors, each of which is symmetric or not. The rank of a symmetric tensor is the minimal number of rank-1 tensors that is necessary to reconstruct it. The symmetric rank is obtained when the constituting rank-1 tensors are imposed to be themselves symmetric. It is shown that rank and symmetric rank are equal in a number of cases and that they always exist in an algebraically closed field. We will discuss the notion of the generic symmetric rank, which, due to the work of Alexander and Hirschowitz [J. Algebraic Geom., 4 (1995), pp. 201-222], is now known for any values of dimension and order. We will also show that the set of symmetric tensors of symmetric rank at most $r$ is not closed unless $r=1$.

545 citations


Journal ArticleDOI
TL;DR: The MCL process is the engine for the graph clustering algorithm called the MCL algorithm, and the process (and algorithm) iterands posses structural properties generalizing the mapping from process limits onto clusterings.
Abstract: A discrete uncoupling process for finite spaces is introduced, called the Markov Cluster Process or the MCL process. The process is the engine for the graph clustering algorithm called the MCL algorithm. The MCL process takes a stochastic matrix as input, and then alternates expansion and inflation, each step defining a stochastic matrix in terms of the previous one. Expansion corresponds with taking the $k$th power of a stochastic matrix, where $k\in\N$. Inflation corresponds with a parametrized operator $\Gamma_r$, $r\geq 0$, that maps the set of (column) stochastic matrices onto itself. The image $\Gamma_r M$ is obtained by raising each entry in $M$ to the $r$th power and rescaling each column to have sum 1 again. In practice the process converges very fast towards a limit that is invariant under both matrix multiplication and inflation, with quadratic convergence around the limit points. The heuristic behind the process is its expected behavior for (Markov) graphs possessing cluster structure. The process is typically applied to the matrix of random walks on a given graph $G$, and the connected components of (the graph associated with) the process limit generically allow a clustering interpretation of $G$. The limit is in general extremely sparse and iterands are sparse in a weighted sense, implying that the MCL algorithm is very fast and highly scalable. Several mathematical properties of the MCL process are established. Most notably, the process (and algorithm) iterands posses structural properties generalizing the mapping from process limits onto clusterings. The inflation operator $\Gamma_r$ maps the class of matrices that are diagonally similar to a symmetric matrix onto itself. The phrase diagonally positive semi-definite (dpsd) is used for matrices that are diagonally similar to a positive semi-definite matrix. For $r\in\N$ and for $M$ a stochastic dpsd matrix, the image $\Gamma_r M$ is again dpsd. Determinantal inequalities satisfied by a dpsd matrix $M$ imply a natural ordering among the diagonal elements of $M$, generalizing the mapping of process limits onto clusterings. The spectrum of $\Gamma_{\infty} M$ is of the form $\{0^{n-k}, 1^k\}$, where $k$ is the number of endclasses of the ordering associated with $M$, and $n$ is the dimension of $M$. This attests to the uncoupling effect of the inflation operator.

488 citations


Journal ArticleDOI
TL;DR: A new class of tensor decompositions is introduced, where the size is characterized by a set of mode-$n$ ranks, and conditions under which essential uniqueness is guaranteed are derived.
Abstract: In this paper we introduce a new class of tensor decompositions. Intuitively, we decompose a given tensor block into blocks of smaller size, where the size is characterized by a set of mode-$n$ ranks. We study different types of such decompositions. For each type we derive conditions under which essential uniqueness is guaranteed. The parallel factor decomposition and Tucker's decomposition can be considered as special cases in the new framework. The paper sheds new light on fundamental aspects of tensor algebra.

431 citations


Journal ArticleDOI
TL;DR: Subspace sampling as discussed by the authors is a sampling method for low-rank matrix decompositions with relative error guarantees. But it is not known whether such a matrix decomposition exists in general.
Abstract: Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-$k$ approximation provided by truncating the SVD of $A$, and where $\|X\|_F$ is the Frobenius norm of the matrix $X$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top $k$ singular vectors of $A$. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-$A$” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.

398 citations


Journal ArticleDOI
TL;DR: This work first formulate a convex relaxation of this combinatorial problem, then detail two efficient first-order algorithms with low memory requirements to solve large-scale, dense problem instances.
Abstract: Given a sample covariance matrix, we solve a maximum likelihood problem penalized by the number of nonzero coefficients in the inverse covariance matrix. Our objective is to find a sparse representation of the sample data and to highlight conditional independence relationships between the sample variables. We first formulate a convex relaxation of this combinatorial problem, we then detail two efficient first-order algorithms with low memory requirements to solve large-scale, dense problem instances.

299 citations


Journal ArticleDOI
TL;DR: In this article, the authors give an explicit expression for the rate of convergence for fully indecomposable matrices and compare the measure with some well known alternatives, including PageRank.
Abstract: As long as a square nonnegative matrix $A$ contains sufficient nonzero elements, then the Sinkhorn-Knopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of $A$ that is doubly stochastic. It is known that the convergence is linear, and an upper bound has been given for the rate of convergence for positive matrices. In this paper we give an explicit expression for the rate of convergence for fully indecomposable matrices. We describe how balancing algorithms can be used to give a measure of web page significance. We compare the measure with some well known alternatives, including PageRank. We show that, with an appropriate modification, the Sinkhorn-Knopp algorithm is a natural candidate for computing the measure on enormous data sets.

273 citations


Journal ArticleDOI
TL;DR: In this paper, alternating least squares algorithms for the computation of the block term decompositions introduced in Part II are derived and it is shown that degeneracy can also occur for block term decomppositions.
Abstract: In this paper we derive alternating least squares algorithms for the computation of the block term decompositions introduced in Part II. We show that degeneracy can also occur for block term decompositions.

211 citations


Journal ArticleDOI
TL;DR: A novel method for speeding up the ALS algorithm that shows better results in simulations compared to the existing methods, especially in the case of degeneracy.
Abstract: Several modifications have been proposed to speed up the alternating least squares (ALS) method of fitting the PARAFAC model. The most widely used is line search, which extrapolates from linear trends in the parameter changes over prior iterations to estimate the parameter values that would be obtained after many additional ALS iterations. We propose some extensions of this approach that incorporate a more sophisticated extrapolation, using information on nonlinear trends in the parameters and changing all the parameter sets simultaneously. The new method, called “enhanced line search (ELS),” can be implemented at different levels of complexity, depending on how many different extrapolation parameters (for different modes) are jointly optimized during each iteration. We report some tests of the simplest parameter version, using simulated data. The performance of this lowest-level of ELS depends on the nature of the convergence difficulty. It significantly outperforms standard LS when there is a “convergence bottleneck,” a situation where some modes have almost collinear factors but others do not, but is somewhat less effective in classic “swamp” situations where factors are highly collinear in all modes. This is illustrated by examples. To demonstrate how ELS can be adapted to different N-way decompositions, we also apply it to a four-way array to perform a blind identification of an under-determined mixture (UDM). Since analysis of this dataset happens to involve a serious convergence “bottleneck” (collinear factors in two of the four modes), it provides another example of a situation in which ELS dramatically outperforms standard line search.

Journal ArticleDOI
TL;DR: It is shown that if the tensor admits a good Tucker approximation for some (small) rank $r$ then this approximation can be computed using only $\mathcal{O}(nr)$ entries with $\math Calculus(nr^{3})$ complexity.
Abstract: We consider Tucker-like approximations with an $r \times r \times r$ core tensor for three-dimensional $n \times n \times n$ arrays in the case of $r \ll n$ and possibly very large $n$ (up to $10^4$-$10^6$). As the approximation contains only $\mathcal{O}(rn + r^3)$ parameters, it is natural to ask if it can be computed using only a small amount of entries of the given array. A similar question for matrices (two-dimensional tensors) was asked and positively answered in [S. A. Goreinov, E. E. Tyrtyshnikov, and N. L. Zamarashkin, A theory of pseudo-skeleton approximations, Linear Algebra Appl., 261 (1997), pp. 1-21]. In the present paper we extend the positive answer to the case of three-dimensional tensors. More specifically, it is shown that if the tensor admits a good Tucker approximation for some (small) rank $r$, then this approximation can be computed using only $\mathcal{O}(nr)$ entries with $\mathcal{O}(nr^{3})$ complexity.

Journal ArticleDOI
TL;DR: A generalization of Kruskal's permutation lemma to partitioned matrices is studied and it is proved that Khatri-Rao products of partitioning matrices are generically full column rank.
Abstract: In this paper we study a generalization of Kruskal's permutation lemma to partitioned matrices. We define the k'-rank of partitioned matrices as a generalization of the k-rank of matrices. We derive a lower-bound on the k'-rank of Khatri-Rao products of partitioned matrices. We prove that Khatri-Rao products of partitioned matrices are generically full column rank.

Journal ArticleDOI
TL;DR: The tensor-CUR decomposition is used to reconstruct missing entries in a user-product-product preference tensor, and it is shown that high quality recommendations can be made on the basis of a small number of basis users and aSmall number of product-product comparisons from a new user.
Abstract: Motivated by numerous applications in which the data may be modeled by a variable subscripted by three or more indices, we develop a tensor-based extension of the matrix CUR decomposition. The tensor-CUR decomposition is most relevant as a data analysis tool when the data consist of one mode that is qualitatively different from the others. In this case, the tensor-CUR decomposition approximately expresses the original data tensor in terms of a basis consisting of underlying subtensors that are actual data elements and thus that have a natural interpretation in terms of the processes generating the data. Assume the data may be modeled as a $(2+1)$-tensor, i.e., an $m \times n \times p$ tensor $\mathcal{A}$ in which the first two modes are similar and the third is qualitatively different. We refer to each of the $p$ different $m \times n$ matrices as “slabs” and each of the $mn$ different $p$-vectors as “fibers.” In this case, the tensor-CUR algorithm computes an approximation to the data tensor $\mathcal{A}$ that is of the form $\mathcal{CUR}$, where $\mathcal{C}$ is an $m \times n \times c$ tensor consisting of a small number $c$ of the slabs, $\mathcal{R}$ is an $r \times p$ matrix consisting of a small number $r$ of the fibers, and $\mathcal{U}$ is an appropriately defined and easily computed $c \times r$ encoding matrix. Both $\mathcal{C}$ and $\mathcal{R}$ may be chosen by randomly sampling either slabs or fibers according to a judiciously chosen and data-dependent probability distribution, and both $c$ and $r$ depend on a rank parameter $k$, an error parameter $\epsilon$, and a failure probability $\delta$. Under appropriate assumptions, provable bounds on the Frobenius norm of the error tensor $\mathcal{A} - \mathcal{CUR}$ are obtained. In order to demonstrate the general applicability of this tensor decomposition, we apply it to problems in two diverse domains of data analysis: hyperspectral medical image analysis and consumer recommendation system analysis. In the hyperspectral data application, the tensor-CUR decomposition is used to compress the data, and we show that classification quality is not substantially reduced even after substantial data compression. In the recommendation system application, the tensor-CUR decomposition is used to reconstruct missing entries in a user-product-product preference tensor, and we show that high quality recommendations can be made on the basis of a small number of basis users and a small number of product-product comparisons from a new user.

Journal ArticleDOI
TL;DR: The existence of an optimal approximation is theoretically guaranteed under certain conditions, and this optimal approximation yields a tensor decomposition where the diagonal of the core is maximized.
Abstract: It is known that a higher order tensor does not necessarily have an optimal low rank approximation, and that a tensor might not be orthogonally decomposable (i.e., admit a tensor SVD). We provide several sufficient conditions which lead to the failure of the tensor SVD, and characterize the existence of the tensor SVD with respect to the higher order SVD (HOSVD). In the face of these difficulties to generalize standard results known in the matrix case to tensors, we consider the low rank orthogonal approximation of tensors. The existence of an optimal approximation is theoretically guaranteed under certain conditions, and this optimal approximation yields a tensor decomposition where the diagonal of the core is maximized. We present an algorithm to compute this approximation and analyze its convergence behavior. Numerical experiments indicate a linear convergence rate for this algorithm.

Journal ArticleDOI
TL;DR: This paper considers the uniqueness conditions for the problem of exact joint diagonalization (EJD), which is closely related to the issue of uniqueness in tensor decompositions, and derives the well-known identifiability conditions for independent component analysis (ICA) based on an EJD formulation of ICA.
Abstract: We investigate the sensitivity of the problem of nonorthogonal (matrix) joint diagonalization (NOJD). First, we consider the uniqueness conditions for the problem of exact joint diagonalization (EJD), which is closely related to the issue of uniqueness in tensor decompositions. As a byproduct, we derive the well-known identifiability conditions for independent component analysis (ICA) based on an EJD formulation of ICA. We next introduce some known cost functions for NOJD and derive flows based on these cost functions for NOJD. Then we define and investigate the noise sensitivity of the stationary points of these flows. We show that the condition number of the joint diagonalizer and uniqueness of the joint diagonalizer as measured by modulus of uniqueness (as defined in this paper) affect the sensitivity. We also investigate the effect of the number of matrices on the sensitivity. Our numerical experiments confirm the theoretical results.While this paper was under review a few of its results were presented in ICASSP07 conference in Honolulu, HI .

Journal ArticleDOI
TL;DR: It is shown that the implementation of the scaling and squaring method can be extended to compute both $e^A$ and the Frechet derivative at $A$ in the direction of E, denoted by $L(A,E)$, at a cost about three times that for computing $e^{A$ alone.
Abstract: The matrix exponential is a much-studied matrix function having many applications. The Frechet derivative of the matrix exponential describes the first-order sensitivity of $e^A$ to perturbations in $A$ and its norm determines a condition number for $e^A$. Among the numerous methods for computing $e^A$ the scaling and squaring method is the most widely used. We show that the implementation of the method in [N. J. Higham, The scaling and squaring method for the matrix exponential revisited, SIAM J. Matrix Anal. Appl., 26 (2005), pp. 1179-1193] can be extended to compute both $e^A$ and the Frechet derivative at $A$ in the direction $E$, denoted by $L(A,E)$, at a cost about three times that for computing $e^A$ alone. The algorithm is derived from the scaling and squaring method by differentiating the Pade approximants and the squaring recurrence, reusing quantities computed during the evaluation of the Pade approximant, and intertwining the recurrences in the squaring phase. To guide the choice of algorithmic parameters, an extension of the existing backward error analysis for the scaling and squaring method is developed which shows that, modulo rounding errors, the approximations obtained are $e^{A+\Delta A}$ and $L(A+\Delta A,E+\Delta E)$, with the same $\Delta A$ in both cases, and with computable bounds on $\|\Delta A\|$ and $\|\Delta E\|$. The algorithm for $L(A,E)$ is used to develop an algorithm that computes $e^A$ together with an estimate of its condition number. In addition to results specific to the exponential, we develop some results and techniques for arbitrary functions. We show how a matrix iteration for $f(A)$ yields an iteration for the Frechet derivative and show how to efficiently compute the Frechet derivative of a power series. We also show that a matrix polynomial and its Frechet derivative can be evaluated at a cost at most three times that of computing the polynomial itself and give a general framework for evaluating a matrix function and its Frechet derivative via Pade approximation.

Journal ArticleDOI
TL;DR: absolute perturbation bounds for the coefficients of the characteristic polynomial of a complex matrix are derived and it is suggested that coefficients of normal matrices are better conditioned with regard to absolute perturbations than those of general matrices.
Abstract: We derive absolute perturbation bounds for the coefficients of the characteristic polynomial of a $n\times n$ complex matrix. The bounds consist of elementary symmetric functions of singular values, and suggest that coefficients of normal matrices are better conditioned with regard to absolute perturbations than those of general matrices. When the matrix is Hermitian positive-definite, the bounds can be expressed in terms of the coefficients themselves. We also improve absolute and relative perturbation bounds for determinants. The basis for all bounds is an expansion of the determinant of a perturbed diagonal matrix.

Journal ArticleDOI
TL;DR: Initial conditions $x_0$ that result in nonnegative states $x(t)$ in finite time are shown to form a convex cone that is related to the matrix exponential $e^{tA}$ and its eventual nonnegativity.
Abstract: Linear differential systems $\dot{x}(t)=Ax(t)$ ($A\in\mathbb{R}^{n\times n}$, $x_0=x(0)\in\mathbb{R}^n$, $t\geq0$) whose solutions become and remain nonnegative are studied. It is shown that the eigenvalue of $A$ furthest to the right must be real and must possess nonnegative right and left eigenvectors. Moreover, for some $a\geq0$, $A+aI$ must be eventually nonnegative, that is, its powers must become and remain entrywise nonnegative. Initial conditions $x_0$ that result in nonnegative states $x(t)$ in finite time are shown to form a convex cone that is related to the matrix exponential $e^{tA}$ and its eventual nonnegativity.

Journal ArticleDOI
TL;DR: Upper and lower bounds for the minimum rank of all matrices in the set of all positive semidefinite matrices whose graph is G are given and used to determineoperatorname(G) for some well-known graphs.
Abstract: Let $\mathcal{P}(G)$ be the set of all positive semidefinite matrices whose graph is $G$, and $\operatorname{msr}(G)$ be the minimum rank of all matrices in $\mathcal{P}(G)$. Upper and lower bounds for $\operatorname{msr}(G)$ are given and used to determine $\operatorname{msr}(G)$ for some well-known graphs, including chordal graphs, and for all simple graphs on less than seven vertices.

Journal ArticleDOI
TL;DR: This paper investigates some well-established and more recent methods that aim at approximating the vector $\exp(A)v$ when $A$ is a large symmetric negative semidefinite matrix, by efficiently combining subspace projections and spectral transformations.
Abstract: In this paper we investigate some well-established and more recent methods that aim at approximating the vector $\exp(A)v$ when $A$ is a large symmetric negative semidefinite matrix, by efficiently combining subspace projections and spectral transformations. We show that some recently developed acceleration procedures may be restated as preconditioning techniques for the partial fraction expansion form of an approximating rational function. These new results allow us to devise a priori strategies to select the associated acceleration parameters; theoretical and numerical results are shown to justify these choices. Moreover, we provide a performance evaluation among several numerical approaches to approximate the action of the exponential of large matrices. Our numerical experiments provide a new, and in some cases, unexpected picture of the actual behavior of the discussed methods.

Journal ArticleDOI
TL;DR: The original equation is transformed into an equivalent Riccati equation where the singularity is removed while the matrix coefficients maintain the same structure as in the original equation, leading to a quadratically convergent algorithm with complexity $O(n^2)$ which provides approximations with full precision.
Abstract: A special instance of the algebraic Riccati equation $XCX-XE-AX+B=0$ where the $n\times n$ matrix coefficients $A,B,C,E$ are rank structured matrices is considered. Relying on the structural properties of Cauchy-like matrices, an algorithm is designed for performing the customary Newton iteration in $O(n^2)$ arithmetic operations (ops). The same technique is used to reduce the cost of the algorithm proposed by L.-Z. Lu in [Numer. Linear Algebra Appl., 12 (2005), pp. 191-200] from $O(n^3)$ to $O(n^2)$ ops while still preserving quadratic convergence in the generic case. As a byproduct we show that the latter algorithm is closely related to the customary Newton method by simple formal relations. In critical cases where the Jacobian at the required solution is singular and quadratic convergence turns to linear, we provide an adaptation of the shift technique in order to get rid of the singularity. The original equation is transformed into an equivalent Riccati equation where the singularity is removed while the matrix coefficients maintain the same structure as in the original equation. This leads to a quadratically convergent algorithm with complexity $O(n^2)$ which provides approximations with full precision. Numerical experiments and comparisons which confirm the effectiveness of the new approach are reported.

Journal ArticleDOI
Roland Badeau, Remy Boyer1
TL;DR: This paper presents fast algorithms for computing the full and the rank-truncated HOSVD of third-order structured tensors, derived by considering two specific ways to unfold a structured tensor, leading to structured matrix unfoldings whose SVD can be efficiently computed.
Abstract: The higher-order singular value decomposition (HOSVD) is a generalization of the singular value decomposition (SVD) to higher-order tensors (i.e., arrays with more than two indices) and plays an important role in various domains. Unfortunately, this decomposition is computationally demanding. Indeed, the HOSVD of a third-order tensor involves the computation of the SVD of three matrices, which are referred to as “modes” or “matrix unfoldings.” In this paper, we present fast algorithms for computing the full and the rank-truncated HOSVD of third-order structured (symmetric, Toeplitz, and Hankel) tensors. These algorithms are derived by considering two specific ways to unfold a structured tensor, leading to structured matrix unfoldings whose SVD can be efficiently computed.

Journal ArticleDOI
TL;DR: It is shown that under certain conditions, the 2-norm of residuals produced by GMRES combined with deflation is never larger than the 1-norm created by GM RES combined with the abstract balancing preconditioner, indicating that many results for symmetric positive definite matrices carry over to arbitrary nonsymmetric matrices.
Abstract: For quite some time, the deflation preconditioner has been proposed and used to accelerate the convergence of Krylov subspace methods. For symmetric positive definite linear systems, the convergence of conjugate gradient methods combined with deflation has been analyzed and compared with other preconditioners, e.g., with the abstract balancing preconditioner [R. Nabben and C. Vuik, SIAM J. Sci. Comput., 27 (2006), pp. 1742-1759]. In this paper, we extend the convergence analysis to nonsymmetric linear systems in the context of GMRES iteration and compare it with the abstract nonsymmetric balancing preconditioner. We are able to show that many results for symmetric positive definite matrices carry over to arbitrary nonsymmetric matrices. First we establish that the spectra of the preconditioned systems are similar. Moreover, we show that under certain conditions, the 2-norm of residuals produced by GMRES combined with deflation is never larger than the 2-norm of residuals produced by GMRES combined with the abstract balancing preconditioner. Numerical experiments are done to nonsymmetric linear systems arising from a finite volume discretization of the convection-diffusion equation, and the numerical results confirm our theoretical results.

Journal ArticleDOI
TL;DR: This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality, and suggests various useful extensions and generalizations to metricNearness.
Abstract: Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness.

Journal ArticleDOI
TL;DR: It is shown that if a best rank-$R$ approximation does not exist, then any sequence of CP updates will exhibit diverging CP components, which implies that several components are highly correlated in all three modes and their component weights become arbitrarily large.
Abstract: We consider the low-rank approximation over the real field of generic $p \times q \times 2$ arrays. For all possible combinations of $p$, $q$, and $R$, we present conjectures on the existence of a best rank-$R$ approximation. Our conjectures are motivated by a detailed analysis of the boundary of the set of arrays with at most rank $R$. We link these results to the Candecomp/Parafac (CP) model for three-way component analysis. Essentially, CP tries to find a best rank-$R$ approximation to a given three-way array. In the case of $p \times q \times 2$ arrays, we show (under some regularity condition) that if a best rank-$R$ approximation does not exist, then any sequence of CP updates will exhibit diverging CP components, which implies that several components are highly correlated in all three modes and their component weights become arbitrarily large. This extends Stegeman [Psychometrika, 71 (2006), pp. 483-501], who considers $p \times p \times 2$ arrays of rank $p+1$ or higher. We illustrate our results by means of simulations.

Journal ArticleDOI
TL;DR: The condition under which the SCF iteration becomes a contractive fixed point iteration that guarantees its convergence is identified, characterized by an upper bound placed on a parameter that weighs the contribution from the nonlinear component of the eigenvalue problem.
Abstract: We investigate the convergence of the self-consistent field (SCF) iteration used to solve a class of nonlinear eigenvalue problems. We show that for the class of problems considered, the SCF iteration produces a sequence of approximate solutions that contain two convergent subsequences. These subsequences may converge to two different limit points, neither of which is the solution to the nonlinear eigenvalue problem. We identify the condition under which the SCF iteration becomes a contractive fixed point iteration that guarantees its convergence. This condition is characterized by an upper bound placed on a parameter that weighs the contribution from the nonlinear component of the eigenvalue problem. We derive such a bound for the general case as well as for a special case in which the dimension of the problem is $2$.

Journal ArticleDOI
TL;DR: The eigenvalues and eigenvectors are shown to be expressible in terms of solutions of a certain scalar trigonometric equation, and explicit solutions of this equation are obtained for several special cases.
Abstract: The eigenvalue problem for a certain tridiagonal matrix with complex coefficients is considered The eigenvalues and eigenvectors are shown to be expressible in terms of solutions of a certain scalar trigonometric equation Explicit solutions of this equation are obtained for several special cases, and further analysis of this equation in several other cases provides information about the distribution of eigenvalues

Journal ArticleDOI
TL;DR: It is shown that the principal Pade family of iterations for the matrix sign function and the matrix square root is a special case of a family of rational iterations due to Ernst Schroder, which provides afamily of iterationsfor the matrix $p$th root which preserve the structure of a group of automorphisms associated with a bilinear or a sesquilinear form.
Abstract: Matrix fixed-point iterations $z_{n+1}=\psi(z_n)$ defined by a rational function $\psi$ are considered. For these iterations a new proof is given that matrix convergence is essentially reduced to scalar convergence. It is shown that the principal Pade family of iterations for the matrix sign function and the matrix square root is a special case of a family of rational iterations due to Ernst Schroder. This characterization provides a family of iterations for the matrix $p$th root which preserve the structure of a group of automorphisms associated with a bilinear or a sesquilinear form. The first iteration in that family is the Halley method for which a convergence result is proved. Finally, new algorithms for the matrix $p$th root based on the Newton and Halley iterations are designed using the idea of the Schur-Newton method of Guo and Higham.

Journal ArticleDOI
TL;DR: It is proved that if matrix inverses computed in finite precision arithmetic satisfy a backward-forward error model, then the numerical method is backward stable and it is also proved that Newton's method with Higham's scaling or with Frobenius norm scaling is backwardstable.
Abstract: We propose a scaling scheme for Newton's iteration for calculating the polar decomposition. The scaling factors are generated by a simple scalar iteration in which the initial value depends only on estimates of the extreme singular values of the original matrix, which can, for example, be the Frobenius norms of the matrix and its inverse. In exact arithmetic, for matrices with condition number no greater than $10^{16}$, with this scaling scheme no more than 9 iterations are needed for convergence to the unitary polar factor with a convergence tolerance roughly equal to $10^{-16}$. It is proved that if matrix inverses computed in finite precision arithmetic satisfy a backward-forward error model, then the numerical method is backward stable. It is also proved that Newton's method with Higham's scaling or with Frobenius norm scaling is backward stable.