scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Noise-Sensitivity Phase Transition in Compressed Sensing

01 Oct 2011-IEEE Transactions on Information Theory (IEEE)-Vol. 57, Iss: 10, pp 6920-6941
TL;DR: In this paper, the authors derived exact expressions for the asymptotic MSE of x1,λ, and evaluated its worst-case noise sensitivity over all types of k-sparse signals.
Abstract: Consider the noisy underdetermined system of linear equations: y = Ax0 + z, with A an n × N measurement matrix, n <; N, and z ~ N(0, σ2I) a Gaussian white noise. Both y and A are known, both x0 and z are unknown, and we seek an approximation to x0. When x0 has few nonzeros, useful approximations are often obtained by l1-penalized l2 minimization, in which the reconstruction x1,λ solves min{||y - Ax||22/2 + λ||x||1}. Consider the reconstruction mean-squared error MSE = E|| x1,λ - x0||22/N, and define the ratio MSE/σ2 as the noise sensitivity. Consider matrices A with i.i.d. Gaussian entries and a large-system limit in which n, N → ∞ with n/N → δ and k/n → ρ. We develop exact expressions for the asymptotic MSE of x1,λ , and evaluate its worst-case noise sensitivity over all types of k-sparse signals. The phase space 0 ≤ 8, ρ ≤ 1 is partitioned by the curve ρ = ρMSE(δ) into two regions. Formal noise sensitivity is bounded throughout the region ρ = ρMSE(δ) and is unbounded throughout the region ρ = ρMSE(δ). The phase boundary ρ = ρMSE(δ) is identical to the previously known phase transition curve for equivalence of l1 - l0 minimization in the k-sparse noiseless case. Hence, a single phase boundary describes the fundamental phase transitions both for the noise less and noisy cases. Extensive computational experiments validate these predictions, including the existence of game-theoretical structures underlying it (saddlepoints in the payoff, least-favorable signals and maximin penalization). Underlying our formalism is an approximate message passing soft thresholding algorithm (AMP) introduced earlier by the authors. Other papers by the authors detail expressions for the formal MSE of AMP and its close connection to l1-penalized reconstruction. The focus of the present paper is on computing the minimax formal MSE within the class of sparse signals x0.

Content maybe subject to copyright    Report

Citations
More filters
Book
01 Jan 2015
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

3,857 citations

Journal ArticleDOI
TL;DR: This paper proves that indeed it holds asymptotically in the large system limit for sensing matrices with independent and identically distributed Gaussian entries, and provides rigorous foundation to state evolution.
Abstract: “Approximate message passing” (AMP) algorithms have proved to be effective in reconstructing sparse signals from a small number of incoherent linear measurements. Extensive numerical experiments further showed that their dynamics is accurately tracked by a simple one-dimensional iteration termed state evolution. In this paper, we provide rigorous foundation to state evolution. We prove that indeed it holds asymptotically in the large system limit for sensing matrices with independent and identically distributed Gaussian entries. While our focus is on message passing algorithms for compressed sensing, the analysis extends beyond this setting, to a general class of algorithms on dense graphs. In this context, state evolution plays the role that density evolution has for sparse graphs. The proof technique is fundamentally different from the standard approach to density evolution, in that it copes with a large number of short cycles in the underlying factor graph. It relies instead on a conditioning technique recently developed by Erwin Bolthausen in the context of spin glass theory.

1,008 citations


Additional excerpts

  • ...A prominent example is the noise sensitivity of LASSO, which is investigated in [DMM10c]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a denoising-based approximate message passing (D-AMP) framework is proposed to integrate a wide class of denoisers within its iterations. But, the performance of D-AMP is limited by the use of an appropriate Onsager correction term in its iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.
Abstract: A denoising algorithm seeks to remove noise, errors, or perturbations from a signal. Extensive research has been devoted to this arena over the last several decades, and as a result, todays denoisers can effectively remove large amounts of additive white Gaussian noise. A compressed sensing (CS) reconstruction algorithm seeks to recover a structured signal acquired using a small number of randomized measurements. Typical CS reconstruction algorithms can be cast as iteratively estimating a signal from a perturbed observation. This paper answers a natural question: How can one effectively employ a generic denoiser in a CS reconstruction algorithm? In response, we develop an extension of the approximate message passing (AMP) framework, called denoising-based AMP (D-AMP), that can integrate a wide class of denoisers within its iterations. We demonstrate that, when used with a high-performance denoiser for natural images, D-AMP offers the state-of-the-art CS recovery performance while operating tens of times faster than competing methods. We explain the exceptional performance of D-AMP by analyzing some of its theoretical features. A key element in D-AMP is the use of an appropriate Onsager correction term in its iterations, which coerces the signal perturbation at each iteration to be very close to the white Gaussian noise that denoisers are typically designed to remove.

535 citations

Journal ArticleDOI
TL;DR: In this article, the authors introduced a simple and very general theory of compressive sensing, in which the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all standard models-e.g., Gaussian, frequency measurements-discussed in the literature, but also provides a framework for new measurement strategies as well.
Abstract: This paper introduces a simple and very general theory of compressive sensing. In this theory, the sensing mechanism simply selects sensing vectors independently at random from a probability distribution F; it includes all standard models-e.g., Gaussian, frequency measurements-discussed in the literature, but also provides a framework for new measurement strategies as well. We prove that if the probability distribution F obeys a simple incoherence property and an isotropy property, one can faithfully recover approximately sparse signals from a minimal number of noisy measurements. The novelty is that our recovery results do not require the restricted isometry property (RIP) to hold near the sparsity level in question, nor a random model for the signal. As an example, the paper shows that a signal with s nonzero entries can be faithfully recovered from about s logn Fourier coefficients that are contaminated with noise.

520 citations

Journal ArticleDOI
TL;DR: An empirical-Bayesian technique is proposed that simultaneously learns the signal distribution while MMSE-recovering the signal-according to the learned distribution-using AMP, and model the non-zero distribution as a Gaussian mixture, and learn its parameters through expectation maximization, using AMP to implement the expectation step.
Abstract: When recovering a sparse signal from noisy compressive linear measurements, the distribution of the signal's non-zero coefficients can have a profound effect on recovery mean-squared error (MSE). If this distribution was a priori known, then one could use computationally efficient approximate message passing (AMP) techniques for nearly minimum MSE (MMSE) recovery. In practice, however, the distribution is unknown, motivating the use of robust algorithms like LASSO-which is nearly minimax optimal-at the cost of significantly larger MSE for non-least-favorable distributions. As an alternative, we propose an empirical-Bayesian technique that simultaneously learns the signal distribution while MMSE-recovering the signal-according to the learned distribution-using AMP. In particular, we model the non-zero distribution as a Gaussian mixture and learn its parameters through expectation maximization, using AMP to implement the expectation step. Numerical experiments on a wide range of signal classes confirm the state-of-the-art performance of our approach, in both reconstruction error and runtime, in the high-dimensional regime, for most (but not all) sensing operators.

375 citations

References
More filters
Journal ArticleDOI
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Abstract: We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.

11,413 citations


"The Noise-Sensitivity Phase Transit..." refers background in this paper

  • ...this one, but with a different choice of [4], [5]....

    [...]

Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations


"The Noise-Sensitivity Phase Transit..." refers methods in this paper

  • ...Thousands of articles use or study this approach, which has variously been called LASSO, Basis Pursuit, or more prosaically, -penalized least-squares [36], [6], [7]....

    [...]

Journal ArticleDOI
TL;DR: F can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program) and numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted.
Abstract: This paper considers a natural error correcting problem with real valued input/output. We wish to recover an input vector f/spl isin/R/sup n/ from corrupted measurements y=Af+e. Here, A is an m by n (coding) matrix and e is an arbitrary and unknown vector of errors. Is it possible to recover f exactly from the data y? We prove that under suitable conditions on the coding matrix A, the input f is the unique solution to the /spl lscr//sub 1/-minimization problem (/spl par/x/spl par//sub /spl lscr/1/:=/spl Sigma//sub i/|x/sub i/|) min(g/spl isin/R/sup n/) /spl par/y - Ag/spl par//sub /spl lscr/1/ provided that the support of the vector of errors is not too large, /spl par/e/spl par//sub /spl lscr/0/:=|{i:e/sub i/ /spl ne/ 0}|/spl les//spl rho//spl middot/m for some /spl rho/>0. In short, f can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program). In addition, numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted. This work is related to the problem of finding sparse solutions to vastly underdetermined systems of linear equations. There are also significant connections with the problem of recovering signals from highly incomplete measurements. In fact, the results introduced in this paper improve on our earlier work. Finally, underlying the success of /spl lscr//sub 1/ is a crucial property we call the uniform uncertainty principle that we shall describe in detail.

6,853 citations


"The Noise-Sensitivity Phase Transit..." refers methods in this paper

  • ...The most well-known analytic approach is based on the Restricted Isometry Property (RIP), developed by Candès and Tao [9], [10]....

    [...]

  • ...The most well-known analytic approach is the Restricted Isometry Principle (RIP), developed by Candès and Tao [CT05, CT07]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of recovering a vector x ∈ R^m from incomplete and contaminated observations y = Ax ∈ e + e, where e is an error term.
Abstract: Suppose we wish to recover a vector x_0 Є R^m (e.g., a digital signal or image) from incomplete and contaminated observations y = Ax_0 + e; A is an n by m matrix with far fewer rows than columns (n « m) and e is an error term. Is it possible to recover x_0 accurately based on the data y? To recover x_0, we consider the solution x^# to the l_(1-)regularization problem min ‖x‖l_1 subject to ‖Ax - y‖l(2) ≤ Є, where Є is the size of the error term e. We show that if A obeys a uniform uncertainty principle (with unit-normed columns) and if the vector x_0 is sufficiently sparse, then the solution is within the noise level ‖x^# - x_0‖l_2 ≤ C Є. As a first example, suppose that A is a Gaussian random matrix; then stable recovery occurs for almost all such A's provided that the number of nonzeros of x_0 is of about the same order as the number of observations. As a second instance, suppose one observes few Fourier samples of x_0; then stable recovery occurs for almost any set of n coefficients provided that the number of nonzeros is of the order of n/[log m]^6. In the case where the error term vanishes, the recovery is of course exact, and this work actually provides novel insights into the exact recovery phenomenon discussed in earlier papers. The methodology also explains why one can also very nearly recover approximately sparse signals.

6,727 citations

Journal ArticleDOI
TL;DR: A generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph, that computes-either exactly or approximately-various marginal functions derived from the global function.
Abstract: Algorithms that must deal with complicated global functions of many variables often exploit the manner in which the given functions factor as a product of "local" functions, each of which depends on a subset of the variables. Such a factorization can be visualized with a bipartite graph that we call a factor graph, In this tutorial paper, we present a generic message-passing algorithm, the sum-product algorithm, that operates in a factor graph. Following a single, simple computational rule, the sum-product algorithm computes-either exactly or approximately-various marginal functions derived from the global function. A wide variety of algorithms developed in artificial intelligence, signal processing, and digital communications can be derived as specific instances of the sum-product algorithm, including the forward/backward algorithm, the Viterbi algorithm, the iterative "turbo" decoding algorithm, Pearl's (1988) belief propagation algorithm for Bayesian networks, the Kalman filter, and certain fast Fourier transform (FFT) algorithms.

6,637 citations


"The Noise-Sensitivity Phase Transit..." refers background in this paper

  • ...The underlying factor graph [23] is the complete bipartite graph over variable nodes and factor nodes....

    [...]