scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Atomic Norm Denoising With Applications to Line Spectral Estimation

TL;DR: It is demonstrated that the SDP outperforms the l1 optimization which outperforms MUSIC, Cadzow's, and Matrix Pencil approaches in terms of MSE over a wide range of signal-to-noise ratios.
Abstract: Motivated by recent work on atomic norms in inverse problems, we propose a new approach to line spectral estimation that provides theoretical guarantees for the mean-squared-error (MSE) performance in the presence of noise and without knowledge of the model order. We propose an abstract theory of denoising with atomic norms and specialize this theory to provide a convex optimization problem for estimating the frequencies and phases of a mixture of complex exponentials. We show that the associated convex optimization problem can be solved in polynomial time via semidefinite programming (SDP). We also show that the SDP can be approximated by an l1-regularized least-squares problem that achieves nearly the same error rate as the SDP but can scale to much larger problems. We compare both SDP and l1-based approaches with classical line spectral analysis methods and demonstrate that the SDP outperforms the l1 optimization which outperforms MUSIC, Cadzow's, and Matrix Pencil approaches in terms of MSE over a wide range of signal-to-noise ratios.
Citations
More filters
01 Mar 1995
TL;DR: This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series and results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages.
Abstract: : This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series. Two approaches to feature selection are used. First, a subset enumeration method is used to determine which financial indicators are most useful for aiding in prediction of the S&P 500 futures daily price. The candidate indicators evaluated include RSI, Stochastics and several moving averages. Results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages. The second approach to feature selection is calculation of individual saliency metrics. A new decision boundary-based individual saliency metric, and a classifier independent saliency metric are developed and tested. Ruck's saliency metric, the decision boundary based saliency metric, and the classifier independent saliency metric are compared for a data set consisting of the RSI and Stochastics indicators as well as delayed closing price values. The decision based metric and the Ruck metric results are similar, but the classifier independent metric agrees with neither of the other metrics. The nine most salient features, determined by the decision boundary based metric, are used to train a neural network and the results are presented and compared to other published results. (AN)

1,545 citations

Journal ArticleDOI
TL;DR: In this article, the authors developed a mathematical theory of super-resolution, which is the problem of recovering the details of an object from coarse scale information only from samples at the low end of the spectrum.
Abstract: This paper develops a mathematical theory of super-resolution. Broadly speaking, superresolution is the problem of recovering the ne details of an object|the high end of its spectrum| from coarse scale information only|from samples at the low end of the spectrum. Suppose we have many point sources at unknown locations in [0; 1] and with unknown complex-valued amplitudes. We only observe Fourier samples of this object up until a frequency cut-o fc. We show that one can super-resolve these point sources with innite precision|i.e. recover the exact locations and amplitudes|by solving a simple convex optimization problem, which can essentially be reformulated as a semidenite program. This holds provided that the distance between sources is at least 2=fc. This result extends to higher dimensions and other models. In one dimension for instance, it is possible to recover a piecewise smooth function by resolving the discontinuity points with innite precision as well. We also show that the theory and methods are robust to noise. In particular, in the discrete setting we develop some theoretical results explaining how the accuracy of the super-resolved signal is expected to degrade when both the noise level and the super-resolution factor vary.

1,157 citations

Posted Content
TL;DR: In this article, the frequency components of a mixture of s complex sinusoids from a random subset of n regularly spaced samples are estimated using an atomic norm minimization approach to exactly recover the unobserved samples.
Abstract: We consider the problem of estimating the frequency components of a mixture of s complex sinusoids from a random subset of n regularly spaced samples. Unlike previous work in compressed sensing, the frequencies are not assumed to lie on a grid, but can assume any values in the normalized frequency domain [0,1]. We propose an atomic norm minimization approach to exactly recover the unobserved samples. We reformulate this atomic norm minimization as an exact semidefinite program. Even with this continuous dictionary, we show that most sampling sets of size O(s log s log n) are sufficient to guarantee the exact frequency estimation with high probability, provided the frequencies are well separated. Numerical experiments are performed to illustrate the effectiveness of the proposed method.

704 citations

Book
26 May 2015
TL;DR: This book provides a comprehensive guide to the theory and practice of sampling from an engineering perspective and is also an invaluable reference or self-study guide for engineers and students across industry and academia.
Abstract: Covering the fundamental mathematical underpinnings together with key principles and applications, this book provides a comprehensive guide to the theory and practice of sampling from an engineering perspective. Beginning with traditional ideas such as uniform sampling in shift-invariant spaces and working through to the more recent fields of compressed sensing and sub-Nyquist sampling, the key concepts are addressed in a unified and coherent way. Emphasis is given to applications in signal processing and communications, as well as hardware considerations, throughout. With 200 worked examples and over 200 end-of-chapter problems, this is an ideal course textbook for senior undergraduate and graduate students. It is also an invaluable reference or self-study guide for engineers and students across industry and academia.

371 citations

Posted Content
Vincent Duval1, Gabriel Peyré1
TL;DR: This paper shows that when the signal-to-noise level is large enough, and provided the aforementioned dual certificate is non-degenerate, the solution of the discretized problem is supported on pairs of Diracs which are neighbors of the Diracs of the input measure, as the grid size tends to zero.
Abstract: This paper studies sparse spikes deconvolution over the space of measures. We focus our attention to the recovery properties of the support of the measure, i.e. the location of the Dirac masses. For non-degenerate sums of Diracs, we show that, when the signal-to-noise ratio is large enough, total variation regularization (which is the natural extension of the L1 norm of vectors to the setting of measures) recovers the exact same number of Diracs. We also show that both the locations and the heights of these Diracs converge toward those of the input measure when the noise drops to zero. The exact speed of convergence is governed by a specific dual certificate, which can be computed by solving a linear system. We draw connections between the support of the recovered measure on a continuous domain and on a discretized grid. We show that when the signal-to-noise level is large enough, the solution of the discretized problem is supported on pairs of Diracs which are neighbors of the Diracs of the input measure. This gives a precise description of the convergence of the solution of the discretized problem toward the solution of the continuous grid-free problem, as the grid size tends to zero.

243 citations


Cites background from "Atomic Norm Denoising With Applicat..."

  • ...In a series of paper [2, 30] the authors study the prediction (i....

    [...]

  • ...To the best of our knowledge, the work of [2] is the only one to provide some conclusion about this convergence in term of denoising error....

    [...]

  • ...Following recent proposals [12, 4, 8, 2], we consider here this sparse deconvolution over a continuous domain, i....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

40,785 citations


"Atomic Norm Denoising With Applicat..." refers methods or result in this paper

  • ...1) Example: Sparse Model Selection: We can specialize our stability guarantee to Lasso [16] and recover known results....

    [...]

  • ...Therefore, the proposed optimization problem (1) coincides with the Lasso estimator [16]....

    [...]

  • ...Our approach is essentially a generalization of the Lasso [16], [17] to infinite dictionaries....

    [...]

Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations


"Atomic Norm Denoising With Applicat..." refers methods in this paper

  • ...To put our problem in an appropriate form for ADMM, rewrite (3.4) as minimizet,u,x,Z 1 2‖x− y‖ 2 2 + τ 2 (t+ u1) subject to Z = [ T (u) x x∗ t ] Z 0. and dualize the equality constraint via an Augmented Lagrangian: Lρ(t, u, x, Z,Λ) = 1 2 ‖x− y‖22 + τ 2 (t+ u1)+〈 Λ, Z − [ T (u) x x∗ t ]〉 + ρ 2 ∥∥∥∥Z − [T (u) xx∗ t ]∥∥∥∥2 F ADMM then consists of the update steps: (tl+1, ul+1, xl+1)← arg min t,u,x Lρ(t, u, x, Z l,Λl) Z l+1 ← arg min Z 0 Lρ(tl+1, ul+1, xl+1, Z,Λl) Λl+1 ← Λl + ρ ( Z l+1 − [ T (ul+1) xl+1 xl+1 ∗ tl+1 ]) ....

    [...]

  • ...We used the stopping criteria described in [20] and set for all experiments....

    [...]

  • ...A thorough survey of the ADMM algorithm is given in [20]....

    [...]

  • ...Note that the dual solution ẑ can be obtained as ẑ = y− x̂ from the primal solution x̂ obtained from ADMM by using Lemma 2....

    [...]

  • ...For the interested reader, we provide a reasonably efficient algorithm based upon the Alternating Direction Method of Multipliers (ADMM) [20] in Appendix...

    [...]

Journal ArticleDOI
TL;DR: In this article, a description of the multiple signal classification (MUSIC) algorithm, which provides asymptotically unbiased estimates of 1) number of incident wavefronts present; 2) directions of arrival (DOA) (or emitter locations); 3) strengths and cross correlations among the incident waveforms; 4) noise/interference strength.
Abstract: Processing the signals received on an array of sensors for the location of the emitter is of great enough interest to have been treated under many special case assumptions. The general problem considers sensors with arbitrary locations and arbitrary directional characteristics (gain/phase/polarization) in a noise/interference environment of arbitrary covariance matrix. This report is concerned first with the multiple emitter aspect of this problem and second with the generality of solution. A description is given of the multiple signal classification (MUSIC) algorithm, which provides asymptotically unbiased estimates of 1) number of incident wavefronts present; 2) directions of arrival (DOA) (or emitter locations); 3) strengths and cross correlations among the incident waveforms; 4) noise/interference strength. Examples and comparisons with methods based on maximum likelihood (ML) and maximum entropy (ME), as well as conventional beamforming are included. An example of its use as a multiple frequency estimator operating on time series is included.

12,446 citations

Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations


"Atomic Norm Denoising With Applicat..." refers methods in this paper

  • ...This method is also known as Basis Pursuit Denoising [17]....

    [...]

  • ...Our approach is essentially a generalization of the Lasso [16], [17] to infinite dictionaries....

    [...]

Journal ArticleDOI
TL;DR: The authors prove two results about this type of estimator that are unprecedented in several ways: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures.
Abstract: Donoho and Johnstone (1994) proposed a method for reconstructing an unknown function f on [0,1] from noisy data d/sub i/=f(t/sub i/)+/spl sigma/z/sub i/, i=0, ..., n-1,t/sub i/=i/n, where the z/sub i/ are independent and identically distributed standard Gaussian random variables. The reconstruction f/spl circ/*/sub n/ is defined in the wavelet domain by translating all the empirical wavelet coefficients of d toward 0 by an amount /spl sigma//spl middot//spl radic/(2log (n)/n). The authors prove two results about this type of estimator. [Smooth]: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures. [Adapt]: the estimator comes nearly as close in mean square to f as any measurable estimator can come, uniformly over balls in each of two broad scales of smoothness classes. These two properties are unprecedented in several ways. The present proof of these results develops new facts about abstract statistical inference and its connection with an optimal recovery model. >

9,359 citations


"Atomic Norm Denoising With Applicat..." refers background in this paper

  • ...Indeed, when is the set of 1-sparse atoms, the atomic norm is the -norm, and the proximal operator corresponds to soft-thresholding by element-wise shrinking towards zero [29]....

    [...]