scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Orthogonal least squares methods and their application to non-linear system identification

01 Nov 1989-International Journal of Control (Taylor & Francis)-Vol. 50, Iss: 5, pp 1873-1896
TL;DR: Identification algorithms based on the well-known linear least squares methods of gaussian elimination, Cholesky decomposition, classical Gram-Schmidt, modified Gram- Schmidt, Householder transformation, Givens method, and singular value decomposition are reviewed.
Abstract: Identification algorithms based on the well-known linear least squares methods of gaussian elimination, Cholesky decomposition, classical Gram-Schmidt, modified Gram-Schmidt, Householder transformation, Givens method, and singular value decomposition are reviewed. The classical Gram-Schmidt, modified Gram-Schmidt, and Householder transformation algorithms are then extended to combine structure determination, or which terms to include in the model, and parameter estimation in a very simple and efficient manner for a class of multivariate discrete-time non-linear stochastic systems which are linear in the parameters.

Content maybe subject to copyright    Report

Citations
More filters
Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations


Cites methods from "Orthogonal least squares methods an..."

  • ...For an earlier instance of a related algorithm, see [ 5 ]....

    [...]

Journal ArticleDOI
TL;DR: A novel algorithm for adapting dictionaries in order to achieve sparse signal representations, the K-SVD algorithm, an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data.
Abstract: In recent years there has been a growing interest in the study of sparse representation of signals. Using an overcomplete dictionary that contains prototype signal-atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, regularization in inverse problems, feature extraction, and more. Recent activity in this field has concentrated mainly on the study of pursuit algorithms that decompose signals with respect to a given dictionary. Designing dictionaries to better fit the above model can be done by either selecting one from a prespecified set of linear transforms or adapting the dictionary to a set of training signals. Both of these techniques have been considered, but this topic is largely still open. In this paper we propose a novel algorithm for adapting dictionaries in order to achieve sparse signal representations. Given a set of training signals, we seek the dictionary that leads to the best representation for each member in this set, under strict sparsity constraints. We present a new method-the K-SVD algorithm-generalizing the K-means clustering process. K-SVD is an iterative method that alternates between sparse coding of the examples based on the current dictionary and a process of updating the dictionary atoms to better fit the data. The update of the dictionary columns is combined with an update of the sparse representations, thereby accelerating convergence. The K-SVD algorithm is flexible and can work with any pursuit method (e.g., basis pursuit, FOCUSS, or matching pursuit). We analyze this algorithm and demonstrate its results both on synthetic tests and in applications on real image data

8,905 citations


Cites background from "Orthogonal least squares methods an..."

  • ...It is in this setting that we ask what the proper dictionary is....

    [...]

Journal ArticleDOI
TL;DR: This work gives examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution, and obtains reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries---stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis pursuit (BP) is a principle for decomposing a signal into an "optimal"' superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear and quadratic programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

4,387 citations

Journal ArticleDOI
TL;DR: This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries and develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal.
Abstract: This article presents new results on using a greedy algorithm, orthogonal matching pursuit (OMP), to solve the sparse approximation problem over redundant dictionaries. It provides a sufficient condition under which both OMP and Donoho's basis pursuit (BP) paradigm can recover the optimal representation of an exactly sparse signal. It leverages this theory to show that both OMP and BP succeed for every sparse input signal from a wide class of dictionaries. These quasi-incoherent dictionaries offer a natural generalization of incoherent dictionaries, and the cumulative coherence function is introduced to quantify the level of incoherence. This analysis unifies all the recent results on BP and extends them to OMP. Furthermore, the paper develops a sufficient condition under which OMP can identify atoms from an optimal approximation of a nonsparse signal. From there, it argues that OMP is an approximation algorithm for the sparse problem over a quasi-incoherent dictionary. That is, for every input signal, OMP calculates a sparse approximant whose error is only a small factor worse than the minimal error that can be attained with the same number of terms.

3,865 citations


Cites background from "Orthogonal least squares methods an..."

  • ...The earliest reference appears to be a 1989 paper of Chen, Billings, and Luo [17]....

    [...]

References
More filters
Book
01 Jan 1966
TL;DR: In this article, the Straight Line Case is used to fit a straight line by least squares, and the Durbin-Watson Test is used for checking the straight line fit.
Abstract: Basic Prerequisite Knowledge. Fitting a Straight Line by Least Squares. Checking the Straight Line Fit. Fitting Straight Lines: Special Topics. Regression in Matrix Terms: Straight Line Case. The General Regression Situation. Extra Sums of Squares and Tests for Several Parameters Being Zero. Serial Correlation in the Residuals and the Durbin--Watson Test. More of Checking Fitted Models. Multiple Regression: Special Topics. Bias in Regression Estimates, and Expected Values of Mean Squares and Sums of Squares. On Worthwhile Regressions, Big F's, and R 2 . Models Containing Functions of the Predictors, Including Polynomial Models. Transformation of the Response Variable. "Dummy" Variables. Selecting the "Best" Regression Equation. Ill--Conditioning in Regression Data. Ridge Regression. Generalized Linear Models (GLIM). Mixture Ingredients as Predictor Variables. The Geometry of Least Squares. More Geometry of Least Squares. Orthogonal Polynomials and Summary Data. Multiple Regression Applied to Analysis of Variance Problems. An Introduction to Nonlinear Estimation. Robust Regression. Resampling Procedures (Bootstrapping). Bibliography. True/False Questions. Answers to Exercises. Tables. Indexes.

18,952 citations

Journal ArticleDOI
TL;DR: The decomposition of A is called the singular value decomposition (SVD) and the diagonal elements of ∑ are the non-negative square roots of the eigenvalues of A T A; they are called singular values.
Abstract: Let A be a real m×n matrix with m≧n. It is well known (cf. [4]) that $$A = U\sum {V^T}$$ (1) where $${U^T}U = {V^T}V = V{V^T} = {I_n}{\text{ and }}\sum {\text{ = diag(}}{\sigma _{\text{1}}}{\text{,}} \ldots {\text{,}}{\sigma _n}{\text{)}}{\text{.}}$$ The matrix U consists of n orthonormalized eigenvectors associated with the n largest eigenvalues of AA T , and the matrix V consists of the orthonormalized eigenvectors of A T A. The diagonal elements of ∑ are the non-negative square roots of the eigenvalues of A T A; they are called singular values. We shall assume that $${\sigma _1} \geqq {\sigma _2} \geqq \cdots \geqq {\sigma _n} \geqq 0.$$ Thus if rank(A)=r, σ r+1 = σ r+2=⋯=σ n = 0. The decomposition (1) is called the singular value decomposition (SVD).

3,036 citations

Book
01 Jan 1977
TL;DR: In this paper, the authors take into serious consideration the further development of regression computer programs that are efficient, accurate, and considered an important part of statistical research, and provide up-to-date accounts of computational methods and algorithms currently in use without getting entrenched in minor computing details.
Abstract: Description: Regression analysis is an often used tool in the statistician's toolbox. This new edition takes into serious consideration the furthering development of regression computer programs that are efficient, accurate, and considered an important part of statistical research. The book provides up-to-date accounts of computational methods and algorithms currently in use without getting entrenched in minor computing details.

2,811 citations

Journal ArticleDOI
TL;DR: Recursive input-output models for non-linear multivariate discrete-time systems are derived, and sufficient conditions for their existence are defined.
Abstract: Recursive input-output models for non-linear multivariate discrete-time systems are derived, and sufficient conditions for their existence are defined. The paper is divided into two parts. The first part introduces and defines concepts such as Nerode realization, multistructural forms and results from differential geometry which are then used to derive a recursive input-output model for multivariable deterministic non-linear systems. The second part introduces several examples, compares the derived model with other representations and extends the results to create prediction error or innovation input-output models for non-linear stochastic systems. These latter models are the generalization of the multivariable ARM AX models for linear systems and are referred to as NARMAX or Non-linear AutoRegressive Moving Average models with exogenous inputs.

1,198 citations


"Orthogonal least squares methods an..." refers background or methods in this paper

  • ...Non-linear system identification and linear least squares problems Under some mild assumptions a discrete-time multivariable non-linear stochastic control system with m outputs and r inputs can be described by the NARMAX model (Leontaritis and Billings 1985) y(t) = f(y(t -...

    [...]

  • ...The NARMAX (Non-linear AutoRegressive Moving Average with eXogenous inputs) model which was introduced by Leontaritis and Billings (1985) provides a basis for such a development....

    [...]