Signal Recovery from Random Measurements Via Orthogonal Matching Pursuit: The Gaussian Case

Home
/
Papers
/
Signal Recovery from Random Measurements Via Orthogonal Matching Pursuit: The Gaussian Case

Signal Recovery from Random Measurements Via Orthogonal Matching Pursuit: The Gaussian Case

01 Aug 2007-

TL;DR: In this paper, a greedy algorithm called Orthogonal Matching Pursuit (OMP) was proposed to recover a signal with m nonzero entries in dimension 1 given O(m n d) random linear measurements of that signal.

read less

Abstract: This report demonstrates theoretically and empirically that a greedy algorithm called Orthogonal Matching Pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(mln d) random linear measurements of that signal. This is a massive improvement over previous results, which require O(m2) measurements. The new results for OMP are comparable with recent results for another approach called Basis Pursuit (BP). In some settings, the OMP algorithm is faster and easier to implement, so it is an attractive alternative to BP for signal recovery problems.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition

[...]

Zhuolin Jiang¹, Zhe Lin², Larry S. Davis¹•Institutions (2)

University of Maryland, College Park¹, Adobe Systems²

01 Nov 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding and introduces a new label consistency constraint called "discriminative sparse-code error" to enforce discriminability in sparse codes during the dictionary learning process.

...read moreread less

Abstract: A label consistent K-SVD (LC-KSVD) algorithm to learn a discriminative dictionary for sparse coding is presented. In addition to using class labels of training data, we also associate label information with each dictionary item (columns of the dictionary matrix) to enforce discriminability in sparse codes during the dictionary learning process. More specifically, we introduce a new label consistency constraint called "discriminative sparse-code error" and combine it with the reconstruction error and the classification error to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. Our algorithm learns a single overcomplete dictionary and an optimal linear classifier jointly. The incremental dictionary learning algorithm is presented for the situation of limited memory resources. It yields dictionaries so that feature points with the same class labels have similar sparse codes. Experimental results demonstrate that our algorithm outperforms many recently proposed sparse-coding techniques for face, action, scene, and object category recognition under the same learning conditions.

...read moreread less

1,232 citations

Journal Article•DOI•

Beyond Nyquist: Efficient Sampling of Sparse Bandlimited Signals

[...]

Joel A. Tropp¹, Jason N. Laska², Marco F. Duarte², Justin Romberg³, Richard G. Baraniuk² - Show less +1 more•Institutions (3)

California Institute of Technology¹, Rice University², Georgia Institute of Technology³

01 Jan 2010-IEEE Transactions on Information Theory

TL;DR: A new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components that supports the empirical observations, and a detailed theoretical analysis of the system's performance is provided.

...read moreread less

Abstract: Wideband analog signals push contemporary analog-to-digital conversion (ADC) systems to their performance limits. In many applications, however, sampling at the Nyquist rate is inefficient because the signals of interest contain only a small number of significant frequencies relative to the band limit, although the locations of the frequencies may not be known a priori. For this type of sparse signal, other sampling strategies are possible. This paper describes a new type of data acquisition system, called a random demodulator, that is constructed from robust, readily available components. Let K denote the total number of frequencies in the signal, and let W denote its band limit in hertz. Simulations suggest that the random demodulator requires just O(K log(W/K)) samples per second to stably reconstruct the signal. This sampling rate is exponentially lower than the Nyquist rate of W hertz. In contrast to Nyquist sampling, one must use nonlinear methods, such as convex programming, to recover the signal from the samples taken by the random demodulator. This paper provides a detailed theoretical analysis of the system's performance that supports the empirical observations.

...read moreread less

1,138 citations

Journal Article•DOI•

Compressed sensing and best k-term approximation

[...]

Albert Cohen¹, Wolfgang Dahmen², Ronald A. DeVore³•Institutions (3)

Pierre-and-Marie-Curie University¹, RWTH Aachen University², University of South Carolina³

31 Jul 2008-Journal of the American Mathematical Society

TL;DR: The typical paradigm for obtaining a compressed version of a discrete signal represented by a vector x ∈ R is to choose an appropriate basis, compute the coefficients of x in this basis, and then retain only the k largest of these with k < N .

...read moreread less

Abstract: The typical paradigm for obtaining a compressed version of a discrete signal represented by a vector x ∈ R is to choose an appropriate basis, compute the coefficients of x in this basis, and then retain only the k largest of these with k < N . If we are interested in a bit stream representation, we also need in addition to quantize these k coefficients. Assuming, without loss of generality, that x already represents the coefficients of the signal in the appropriate basis, this means that we pick an approximation to x in the set Σk of k-sparse vectors (1.1) Σk := {x ∈ R : # supp(x) ≤ k}, where supp(x) is the support of x, i.e., the set of i for which xi = 0, and #A is the number of elements in the set A. The best performance that we can achieve by such an approximation process in some given norm ‖ · ‖X of interest is described by the best k-term approximation error

...read moreread less

1,105 citations

Journal Article•DOI•

Hyperspectral Image Classification Using Dictionary-Based Sparse Representation

[...]

Yi Chen¹, Nasser M. Nasrabadi², Trac D. Tran¹•Institutions (2)

Johns Hopkins University¹, United States Army Research Laboratory²

12 May 2011-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: Experimental results show that the proposed sparsity-based algorithm for the classification of hyperspectral imagery outperforms the classical supervised classifier support vector machines in most cases.

...read moreread less

Abstract: A new sparsity-based algorithm for the classification of hyperspectral imagery is proposed in this paper. The proposed algorithm relies on the observation that a hyperspectral pixel can be sparsely represented by a linear combination of a few training samples from a structured dictionary. The sparse representation of an unknown pixel is expressed as a sparse vector whose nonzero entries correspond to the weights of the selected training samples. The sparse vector is recovered by solving a sparsity-constrained optimization problem, and it can directly determine the class label of the test sample. Two different approaches are proposed to incorporate the contextual information into the sparse recovery optimization problem in order to improve the classification performance. In the first approach, an explicit smoothing constraint is imposed on the problem formulation by forcing the vector Laplacian of the reconstructed image to become zero. In this approach, the reconstructed pixel of interest has similar spectral characteristics to its four nearest neighbors. The second approach is via a joint sparsity model where hyperspectral pixels in a small neighborhood around the test pixel are simultaneously represented by linear combinations of a few common training samples, which are weighted with a different set of coefficients for each pixel. The proposed sparsity-based algorithm is applied to several real hyperspectral images for classification. Experimental results show that our algorithm outperforms the classical supervised classifier support vector machines in most cases.

...read moreread less

1,099 citations

Journal Article•DOI•

On Dynamic Mode Decomposition: Theory and Applications

[...]

Jonathan H. Tu¹, Clarence W. Rowley¹, Dirk M. Luchtenburg¹, Steven L. Brunton², J. Nathan Kutz² - Show less +1 more•Institutions (2)

Princeton University¹, University of Washington²

29 Nov 2013-arXiv: Numerical Analysis

TL;DR: A theoretical framework in which dynamic mode decomposition is defined as the eigendecomposition of an approximating linear operator, which generalizes DMD to a larger class of datasets, including nonsequential time series, and shows that under certain conditions, DMD is equivalent to LIM.

...read moreread less

Abstract: Originally introduced in the fluid mechanics community, dynamic mode decomposition (DMD) has emerged as a powerful tool for analyzing the dynamics of nonlinear systems. However, existing DMD theory deals primarily with sequential time series for which the measurement dimension is much larger than the number of measurements taken. We present a theoretical framework in which we define DMD as the eigendecomposition of an approximating linear operator. This generalizes DMD to a larger class of datasets, including nonsequential time series. We demonstrate the utility of this approach by presenting novel sampling strategies that increase computational efficiency and mitigate the effects of noise, respectively. We also introduce the concept of linear consistency, which helps explain the potential pitfalls of applying DMD to rank-deficient datasets, illustrating with examples. Such computations are not considered in the existing literature, but can be understood using our more general framework. In addition, we show that our theory strengthens the connections between DMD and Koopman operator theory. It also establishes connections between DMD and other techniques, including the eigensystem realization algorithm (ERA), a system identification method, and linear inverse modeling (LIM), a method from climate science. We show that under certain conditions, DMD is equivalent to LIM.

...read moreread less

1,067 citations

1
…
2
3
4
5
6
7
8
…
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Matrix computations

[...]

Gene H. Golub

01 Jan 1983

34,729 citations

Book•

Compressed sensing

[...]

D.L. Donoho¹•Institutions (1)

Stanford University¹

01 Jan 2004

TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.

...read moreread less

Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

...read moreread less

18,609 citations

Journal Article•DOI•

Atomic Decomposition by Basis Pursuit

[...]

Scott Chen¹, David L. Donoho², Michael A. Saunders²•Institutions (2)

Renaissance Technologies¹, Stanford University²

11 Dec 1998-SIAM Journal on Scientific Computing

TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.

...read moreread less

Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

...read moreread less

9,950 citations

Journal Article•DOI•

Matching pursuits with time-frequency dictionaries

[...]

Stéphane Mallat¹, Zhifeng Zhang¹•Institutions (1)

New York University¹

01 Aug 1993-IEEE Transactions on Signal Processing

TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.

...read moreread less

Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

...read moreread less

9,380 citations

Journal Article•DOI•

Least angle regression

[...]

Bradley Efron¹, Trevor Hastie¹, Iain M. Johnstone¹, Robert Tibshirani¹, Hemant Ishwaran², Keith Knight³, Jean-Michel Loubes⁴, Jean-Michel Loubes⁵, Pascal Massart⁶, Pascal Massart⁵, David Madigan⁷, David Madigan⁸, Greg Ridgeway⁸, Greg Ridgeway⁹, Saharon Rosset¹⁰, Saharon Rosset¹, Ji Zhu, Robert A. Stine¹¹, Berwin A. Turlach¹², Sanford Weisberg¹³ - Show less +16 more•Institutions (13)

Stanford University¹, Cleveland Clinic², University of Toronto³, Centre national de la recherche scientifique⁴, Université Paris-Saclay⁵, University of Paris-Sud⁶, Avaya⁷, Rutgers University⁸, RAND Corporation⁹, IBM¹⁰, University of Pennsylvania¹¹, University of Western Australia¹², University of Minnesota¹³

01 Apr 2004-Annals of Statistics

TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.

...read moreread less

Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

...read moreread less

7,828 citations