Atomic Norm Denoising With Applications to Line Spectral Estimation

doi:10.1109/TSP.2013.2273443

Home
/
Papers
/
Atomic Norm Denoising With Applications to Line Spectral Estimation

Journal Article•DOI•

Atomic Norm Denoising With Applications to Line Spectral Estimation

Badri Narayan Bhaskar¹, Gongguo Tang¹, Benjamin Recht¹•Institutions (1)

University of Wisconsin-Madison¹

01 Dec 2013-IEEE Transactions on Signal Processing (IEEE)-Vol. 61, Iss: 23, pp 5987-5999

TL;DR: It is demonstrated that the SDP outperforms the l1 optimization which outperforms MUSIC, Cadzow's, and Matrix Pencil approaches in terms of MSE over a wide range of signal-to-noise ratios.

read less

Abstract: Motivated by recent work on atomic norms in inverse problems, we propose a new approach to line spectral estimation that provides theoretical guarantees for the mean-squared-error (MSE) performance in the presence of noise and without knowledge of the model order. We propose an abstract theory of denoising with atomic norms and specialize this theory to provide a convex optimization problem for estimating the frequencies and phases of a mixture of complex exponentials. We show that the associated convex optimization problem can be solved in polynomial time via semidefinite programming (SDP). We also show that the SDP can be approximated by an l1-regularized least-squares problem that achieves nearly the same error rate as the SDP but can scale to much larger problems. We compare both SDP and l1-based approaches with classical line spectral analysis methods and demonstrate that the SDP outperforms the l1 optimization which outperforms MUSIC, Cadzow's, and Matrix Pencil approaches in terms of MSE over a wide range of signal-to-noise ratios.

...read moreread less

Citations

PDF

Open Access

More filters

Nonlinear Time Series Analysis.

[...]

James A. Stewart

01 Mar 1995

TL;DR: This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series and results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages.

...read moreread less

Abstract: : This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series. Two approaches to feature selection are used. First, a subset enumeration method is used to determine which financial indicators are most useful for aiding in prediction of the S&P 500 futures daily price. The candidate indicators evaluated include RSI, Stochastics and several moving averages. Results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages. The second approach to feature selection is calculation of individual saliency metrics. A new decision boundary-based individual saliency metric, and a classifier independent saliency metric are developed and tested. Ruck's saliency metric, the decision boundary based saliency metric, and the classifier independent saliency metric are compared for a data set consisting of the RSI and Stochastics indicators as well as delayed closing price values. The decision based metric and the Ruck metric results are similar, but the classifier independent metric agrees with neither of the other metrics. The nine most salient features, determined by the decision boundary based metric, are used to train a neural network and the results are presented and compared to other published results. (AN)

...read moreread less

1,545 citations

Journal Article•DOI•

Towards a Mathematical Theory of Super-resolution

[...]

Emmanuel J. Candès¹, Carlos Fernandez-Granda¹•Institutions (1)

Stanford University¹

01 Jun 2014-Communications on Pure and Applied Mathematics

TL;DR: In this article, the authors developed a mathematical theory of super-resolution, which is the problem of recovering the details of an object from coarse scale information only from samples at the low end of the spectrum.

...read moreread less

Abstract: This paper develops a mathematical theory of super-resolution. Broadly speaking, superresolution is the problem of recovering the ne details of an object|the high end of its spectrum| from coarse scale information only|from samples at the low end of the spectrum. Suppose we have many point sources at unknown locations in [0; 1] and with unknown complex-valued amplitudes. We only observe Fourier samples of this object up until a frequency cut-o fc. We show that one can super-resolve these point sources with innite precision|i.e. recover the exact locations and amplitudes|by solving a simple convex optimization problem, which can essentially be reformulated as a semidenite program. This holds provided that the distance between sources is at least 2=fc. This result extends to higher dimensions and other models. In one dimension for instance, it is possible to recover a piecewise smooth function by resolving the discontinuity points with innite precision as well. We also show that the theory and methods are robust to noise. In particular, in the discrete setting we develop some theoretical results explaining how the accuracy of the super-resolved signal is expected to degrade when both the noise level and the super-resolution factor vary.

...read moreread less

1,157 citations

Posted Content•

Compressed Sensing off the Grid

[...]

Gongguo Tang¹, Badri Narayan Bhaskar¹, Parikshit Shah¹, Benjamin Recht¹•Institutions (1)

University of Wisconsin-Madison¹

25 Jul 2012-arXiv: Information Theory

TL;DR: In this article, the frequency components of a mixture of s complex sinusoids from a random subset of n regularly spaced samples are estimated using an atomic norm minimization approach to exactly recover the unobserved samples.

...read moreread less

Abstract: We consider the problem of estimating the frequency components of a mixture of s complex sinusoids from a random subset of n regularly spaced samples. Unlike previous work in compressed sensing, the frequencies are not assumed to lie on a grid, but can assume any values in the normalized frequency domain [0,1]. We propose an atomic norm minimization approach to exactly recover the unobserved samples. We reformulate this atomic norm minimization as an exact semidefinite program. Even with this continuous dictionary, we show that most sampling sets of size O(s log s log n) are sufficient to guarantee the exact frequency estimation with high probability, provided the frequencies are well separated. Numerical experiments are performed to illustrate the effectiveness of the proposed method.

...read moreread less

704 citations

Book•

Sampling Theory: Beyond Bandlimited Systems

[...]

Yonina C. Eldar¹•Institutions (1)

Weizmann Institute of Science¹

26 May 2015

TL;DR: This book provides a comprehensive guide to the theory and practice of sampling from an engineering perspective and is also an invaluable reference or self-study guide for engineers and students across industry and academia.

...read moreread less

Abstract: Covering the fundamental mathematical underpinnings together with key principles and applications, this book provides a comprehensive guide to the theory and practice of sampling from an engineering perspective. Beginning with traditional ideas such as uniform sampling in shift-invariant spaces and working through to the more recent fields of compressed sensing and sub-Nyquist sampling, the key concepts are addressed in a unified and coherent way. Emphasis is given to applications in signal processing and communications, as well as hardware considerations, throughout. With 200 worked examples and over 200 end-of-chapter problems, this is an ideal course textbook for senior undergraduate and graduate students. It is also an invaluable reference or self-study guide for engineers and students across industry and academia.

...read moreread less

371 citations

Posted Content•

Exact Support Recovery for Sparse Spikes Deconvolution

[...]

Vincent Duval¹, Gabriel Peyré¹•Institutions (1)

CEREMADE¹

28 Jun 2013-arXiv: Optimization and Control

TL;DR: This paper shows that when the signal-to-noise level is large enough, and provided the aforementioned dual certificate is non-degenerate, the solution of the discretized problem is supported on pairs of Diracs which are neighbors of the Diracs of the input measure, as the grid size tends to zero.

...read moreread less

Abstract: This paper studies sparse spikes deconvolution over the space of measures. We focus our attention to the recovery properties of the support of the measure, i.e. the location of the Dirac masses. For non-degenerate sums of Diracs, we show that, when the signal-to-noise ratio is large enough, total variation regularization (which is the natural extension of the L1 norm of vectors to the setting of measures) recovers the exact same number of Diracs. We also show that both the locations and the heights of these Diracs converge toward those of the input measure when the noise drops to zero. The exact speed of convergence is governed by a specific dual certificate, which can be computed by solving a linear system. We draw connections between the support of the recovered measure on a continuous domain and on a discretized grid. We show that when the signal-to-noise level is large enough, the solution of the discretized problem is supported on pairs of Diracs which are neighbors of the Diracs of the input measure. This gives a precise description of the convergence of the solution of the discretized problem toward the solution of the continuous grid-free problem, as the grid size tends to zero.

...read moreread less

243 citations

Cites background from "Atomic Norm Denoising With Applicat..."

...In a series of paper [2, 30] the authors study the prediction (i....
[...]
...To the best of our knowledge, the work of [2] is the only one to provide some conclusion about this convergence in term of denoising error....
[...]
...Following recent proposals [12, 4, 8, 2], we consider here this sparse deconvolution over a continuous domain, i....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Regression Shrinkage and Selection via the Lasso

[...]

Robert Tibshirani

01 Jan 1996-Journal of the royal statistical society series b-methodological

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.

...read moreread less

Abstract: SUMMARY We propose a new method for estimation in linear models. The 'lasso' minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant. Because of the nature of this constraint it tends to produce some coefficients that are exactly 0 and hence gives interpretable models. Our simulation studies suggest that the lasso enjoys some of the favourable properties of both subset selection and ridge regression. It produces interpretable models like subset selection and exhibits the stability of ridge regression. There is also an interesting relationship with recent work in adaptive function estimation by Donoho and Johnstone. The lasso idea is quite general and can be applied in a variety of statistical models: extensions to generalized regression models and tree-based models are briefly described.

...read moreread less

40,785 citations

"Atomic Norm Denoising With Applicat..." refers methods or result in this paper

...1) Example: Sparse Model Selection: We can specialize our stability guarantee to Lasso [16] and recover known results....
[...]
...Therefore, the proposed optimization problem (1) coincides with the Lasso estimator [16]....
[...]
...Our approach is essentially a generalization of the Lasso [16], [17] to infinite dictionaries....
[...]

Book•

Distributed Optimization and Statistical Learning Via the Alternating Direction Method of Multipliers

[...]

Stephen Boyd¹, Neal Parikh¹, Eric Chu¹, Borja Peleato¹, Jonathan Eckstein² - Show less +1 more•Institutions (2)

Stanford University¹, Rutgers University²

23 May 2011

TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.

...read moreread less

Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

...read moreread less

17,433 citations

"Atomic Norm Denoising With Applicat..." refers methods in this paper

...To put our problem in an appropriate form for ADMM, rewrite (3.4) as minimizet,u,x,Z 1 2‖x− y‖ 2 2 + τ 2 (t+ u1) subject to Z = [ T (u) x x∗ t ] Z 0. and dualize the equality constraint via an Augmented Lagrangian: Lρ(t, u, x, Z,Λ) = 1 2 ‖x− y‖22 + τ 2 (t+ u1)+〈 Λ, Z − [ T (u) x x∗ t ]〉 + ρ 2 ∥∥∥∥Z − [T (u) xx∗ t ]∥∥∥∥2 F ADMM then consists of the update steps: (tl+1, ul+1, xl+1)← arg min t,u,x Lρ(t, u, x, Z l,Λl) Z l+1 ← arg min Z 0 Lρ(tl+1, ul+1, xl+1, Z,Λl) Λl+1 ← Λl + ρ ( Z l+1 − [ T (ul+1) xl+1 xl+1 ∗ tl+1 ]) ....
[...]
...We used the stopping criteria described in [20] and set for all experiments....
[...]
...A thorough survey of the ADMM algorithm is given in [20]....
[...]
...Note that the dual solution ẑ can be obtained as ẑ = y− x̂ from the primal solution x̂ obtained from ADMM by using Lemma 2....
[...]
...For the interested reader, we provide a reasonably efficient algorithm based upon the Alternating Direction Method of Multipliers (ADMM) [20] in Appendix...
[...]

Journal Article•DOI•

Multiple emitter location and signal parameter estimation

[...]

R. O. Schmidt

01 Mar 1986-IEEE Transactions on Antennas and Propagation

TL;DR: In this article, a description of the multiple signal classification (MUSIC) algorithm, which provides asymptotically unbiased estimates of 1) number of incident wavefronts present; 2) directions of arrival (DOA) (or emitter locations); 3) strengths and cross correlations among the incident waveforms; 4) noise/interference strength.

...read moreread less

Abstract: Processing the signals received on an array of sensors for the location of the emitter is of great enough interest to have been treated under many special case assumptions. The general problem considers sensors with arbitrary locations and arbitrary directional characteristics (gain/phase/polarization) in a noise/interference environment of arbitrary covariance matrix. This report is concerned first with the multiple emitter aspect of this problem and second with the generality of solution. A description is given of the multiple signal classification (MUSIC) algorithm, which provides asymptotically unbiased estimates of 1) number of incident wavefronts present; 2) directions of arrival (DOA) (or emitter locations); 3) strengths and cross correlations among the incident waveforms; 4) noise/interference strength. Examples and comparisons with methods based on maximum likelihood (ML) and maximum entropy (ME), as well as conventional beamforming are included. An example of its use as a multiple frequency estimator operating on time series is included.

...read moreread less

12,446 citations

Journal Article•DOI•

Atomic Decomposition by Basis Pursuit

[...]

Scott Chen¹, David L. Donoho², Michael A. Saunders²•Institutions (2)

Renaissance Technologies¹, Stanford University²

11 Dec 1998-SIAM Journal on Scientific Computing

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.

...read moreread less

Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

...read moreread less

9,950 citations

"Atomic Norm Denoising With Applicat..." refers methods in this paper

...This method is also known as Basis Pursuit Denoising [17]....
[...]
...Our approach is essentially a generalization of the Lasso [16], [17] to infinite dictionaries....
[...]

Journal Article•DOI•

De-noising by soft-thresholding

[...]

David L. Donoho¹•Institutions (1)

Stanford University¹

01 May 1995-IEEE Transactions on Information Theory

TL;DR: The authors prove two results about this type of estimator that are unprecedented in several ways: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures.

...read moreread less

Abstract: Donoho and Johnstone (1994) proposed a method for reconstructing an unknown function f on [0,1] from noisy data d/sub i/=f(t/sub i/)+/spl sigma/z/sub i/, i=0, ..., n-1,t/sub i/=i/n, where the z/sub i/ are independent and identically distributed standard Gaussian random variables. The reconstruction f/spl circ/*/sub n/ is defined in the wavelet domain by translating all the empirical wavelet coefficients of d toward 0 by an amount /spl sigma//spl middot//spl radic/(2log (n)/n). The authors prove two results about this type of estimator. [Smooth]: with high probability f/spl circ/*/sub n/ is at least as smooth as f, in any of a wide variety of smoothness measures. [Adapt]: the estimator comes nearly as close in mean square to f as any measurable estimator can come, uniformly over balls in each of two broad scales of smoothness classes. These two properties are unprecedented in several ways. The present proof of these results develops new facts about abstract statistical inference and its connection with an optimal recovery model. >

...read moreread less

9,359 citations

"Atomic Norm Denoising With Applicat..." refers background in this paper

...Indeed, when is the set of 1-sparse atoms, the atomic norm is the -norm, and the proximal operator corresponds to soft-thresholding by element-wise shrinking towards zero [29]....
[...]