scispace - formally typeset
Search or ask a question
Journal ArticleDOI

One-Bit Compressed Sensing by Linear Programming

01 Aug 2013-Communications on Pure and Applied Mathematics (Wiley Subscription Services, Inc., A Wiley Company)-Vol. 66, Iss: 8, pp 1275-1297
TL;DR: In this paper, the authors give the first computationally tractable and almost optimal solution to the problem of one-bit compressed sensing, showing how to accurately recover an s-sparse vector x ∈ R n from the signs of O(s log 2 (n/s) random linear measurements of x.
Abstract: ONE-BIT COMPRESSED SENSING BY LINEAR PROGRAMMING arXiv:1109.4299v5 [cs.IT] 16 Mar 2012 YANIV PLAN AND ROMAN VERSHYNIN Abstract. We give the first computationally tractable and almost optimal solution to the problem of one-bit compressed sensing, showing how to accurately recover an s-sparse vector x ∈ R n from the signs of O(s log 2 (n/s)) random linear measurements of x. The recovery is achieved by a simple linear program. This result extends to approximately sparse vectors x. Our result is universal in the sense that with high probability, one measurement scheme will successfully recover all sparse vectors simultaneously. The argument is based on solving an equivalent geometric problem on random hyperplane tessellations. 1. Introduction Compressed sensing is a modern paradigm of data acquisition, which is having an impact on several disciplines, see [21]. The scientist has access to a measurement vector v ∈ R m obtained as v = Ax, where A is a given m × n measurement matrix and x ∈ R n is an unknown signal that one needs to recover from v. One would like to take m n, rendering A non-invertible; the key ingredient to successful recovery of x is take into account its assumed structure – sparsity. Thus one assumes that x has at most s nonzero entries, although the support pattern is unknown. The strongest known results are for random measurement matrices A. In particular, if A has Gaussian i.i.d. entries, then we may take m = O(s log(n/s)) and still recover x exactly with high probability [10, 7]; see [26] for an overview. Furthermore, this recovery may be achieved in polynomial time by solving the convex minimization program min x 0 1 subject to Ax 0 = v. Stability results are also available when noise is added to the problem [9, 8, 3, 27]. However, while the focus of compressed sensing is signal recovery with minimal information, the classical set-up (1.1), (1.2) assumes infinite bit precision of the measurements. This disaccord raises an important question: how many bits per measurement (i.e. per coordinate of v) are sufficient for tractable and accurate sparse recovery? This paper shows that one bit per measurement is enough. There are many applications where such severe quantization may be inherent or preferred — analog-to-digital conversion [20, 18], binomial regression in statistical modeling and threshold group testing [12], to name a few. 1.1. Main results. This paper demonstrates that a simple modification of the convex program (1.2) is able to accurately estimate x from extremely quantized measurement vector y = sign(Ax). Date: September 19, 2011. 2000 Mathematics Subject Classification. 94A12; 60D05; 90C25. Y.P. is supported by an NSF Postdoctoral Research Fellowship under award No. 1103909. R.V. is supported by NSF grants DMS 0918623 and 1001829.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper investigates an alternative CS approach that shifts the emphasis from the sampling rate to the number of bits per measurement, and introduces the binary iterative hard thresholding algorithm for signal reconstruction from 1-bit measurements that offers state-of-the-art performance.
Abstract: The compressive sensing (CS) framework aims to ease the burden on analog-to-digital converters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse signals. Practical ADCs not only sample but also quantize each measurement to a finite number of bits; moreover, there is an inverse relationship between the achievable sampling rate and the bit depth. In this paper, we investigate an alternative CS approach that shifts the emphasis from the sampling rate to the number of bits per measurement. In particular, we explore the extreme case of 1-bit CS measurements, which capture just their sign. Our results come in two flavors. First, we consider ideal reconstruction from noiseless 1-bit measurements and provide a lower bound on the best achievable reconstruction error. We also demonstrate that i.i.d. random Gaussian matrices provide measurement mappings that, with overwhelming probability, achieve nearly optimal error decay. Next, we consider reconstruction robustness to measurement errors and noise and introduce the binary e-stable embedding property, which characterizes the robustness of the measurement process to sign changes. We show that the same class of matrices that provide almost optimal noiseless performance also enable such a robust mapping. On the practical side, we introduce the binary iterative hard thresholding algorithm for signal reconstruction from 1-bit measurements that offers state-of-the-art performance.

645 citations

Posted Content
TL;DR: In this paper, the authors consider the case of 1-bit CS measurements and provide a lower bound on the best achievable reconstruction error, and show that the same class of matrices that provide almost optimal noiseless performance also enable a robust mapping.
Abstract: The Compressive Sensing (CS) framework aims to ease the burden on analog-to-digital converters (ADCs) by reducing the sampling rate required to acquire and stably recover sparse signals. Practical ADCs not only sample but also quantize each measurement to a finite number of bits; moreover, there is an inverse relationship between the achievable sampling rate and the bit depth. In this paper, we investigate an alternative CS approach that shifts the emphasis from the sampling rate to the number of bits per measurement. In particular, we explore the extreme case of 1-bit CS measurements, which capture just their sign. Our results come in two flavors. First, we consider ideal reconstruction from noiseless 1-bit measurements and provide a lower bound on the best achievable reconstruction error. We also demonstrate that i.i.d. random Gaussian matrices describe measurement mappings achieving, with overwhelming probability, nearly optimal error decay. Next, we consider reconstruction robustness to measurement errors and noise and introduce the Binary $\epsilon$-Stable Embedding (B$\epsilon$SE) property, which characterizes the robustness measurement process to sign changes. We show the same class of matrices that provide almost optimal noiseless performance also enable such a robust mapping. On the practical side, we introduce the Binary Iterative Hard Thresholding (BIHT) algorithm for signal reconstruction from 1-bit measurements that offers state-of-the-art performance.

461 citations

Journal ArticleDOI
TL;DR: In this paper, the problem of matrix completion was extended to the case of 1-bit observations, and a new theory was proposed for matrix completion in the context of recommender systems, where each rating consists of a single bit representing a positive or negative rating.
Abstract: The problem of recovering a matrix from an incomplete sampling of its entries—also known as matrix completion—arises in a wide variety of practical situations, including collaborative filtering, system identification, sensor localization, rank aggregation, and many more. While many of these applications have a relatively long history, recent advances in the closely related field of compressed sensing have enabled a burst of progress in the last few years, and we now have a strong base of theoretical results concerning matrix completion. A typical result from this literature is that a generic d × d matrix of rank r can be exactly recovered from O(r dpolylog(d)) randomly chosen entries. Similar results can be established in the case of noisy observations and approximately low-rank matrices. See [1] and references therein for further details. Although these results are quite impressive, there is an important gap between the statement of the problem as considered in the matrix completion literature and many of the most common applications discussed therein. As an example, consider collaborative filtering and the now-famous “Netflix problem.” In this setting, we assume that there is some unknown matrix whose entries each represent a rating for a particular user on a particular movie. Since any user will rate only a small subset of possible movies, we are only able to observe a small fraction of the total entries in the matrix, and our goal is to infer the unseen ratings from the observed ones. If the rating matrix is low-rank, then this would seem to be the exact problem studied in the matrix completion literature. However, there is a subtle difference: the theory developed in this literature generally assumes that observations consist of (possibly noisy) continuous-valued entries of the matrix, whereas in the Netflix problem the observations are “quantized” to the set of integers between 1 and 5. If we believe that it is possible for a user’s true rating for a particular movie to be, for example, 4.5, then we must account for the impact of this “quantization noise” on our recovery. Of course, one could potentially treat quantization simply as a form of bounded noise, but this is somewhat unsatisfying because the ratings aren’t just quantized — there are also hard limits placed on the minimum and maximum allowable ratings. (Why should we suppose that a movie given a rating of 5 could not have a true underlying rating of 6 or 7 or 10?) The inadequacy of standard matrix completion techniques in dealing with this effect is particularly pronounced when we consider recommender systems where each rating consists of a single bit representing a positive or negative rating (consider for example rating music on Pandora, the relevance of advertisements on Hulu, or posts on Reddit or MathOverflow). Similar situations arise in nearly every application that has been proposed for matrix completion, including the analysis of incomplete survey data, the recovery of pairwise distance matrices (multidimensional scaling), quantum state tomography, and many others. In such cases, the assumptions made in the existing theory of matrix completion do not apply, standard algorithms are ill-posed, and a new theory is required. In this work we describe the approach we take in [1] to extend the theory of matrix completion to the case of 1-bit observations. We consider a statistical model for such data where a binary output is generated according to a probability distribution which is parameterized by the corresponding entry of the unknown matrix M . The central question we ask is: “Given observations of this form, can we recover the underlying matrix?” Several new challenges arise when trying to develop a theory for 1-bit matrix completion. First, matrix completion is in some sense a more challenging problem than compressed sensing. Specifically, some additional difficulty arises because the set of low-rank matrices is “coherent” with single entry measurements—there will always be certain (sparse) low-rank matrices that we cannot hope to recover without essentially sampling every entry of the matrix. The typical way to deal with this possibility is to consider a reduced set of lowrank matrices by placing restrictions on the entry-wise maximum of the matrix or its singular vectors—informally, we require that the matrix is not too “spiky”. However, we introduce an entirely new dimension of ill-posedness by restricting ourselves to 1-bit observations. An example of this is described in detail in [1] and shows that in the case where we simply observe Y = sign(M), the problem of recovering M is illposed. To see this, let M = uv∗ for any vectors u,v ∈ R, and for simplicity assume that there are no zero entries in u or v. Now let ũ and ṽ be any vectors with the same sign pattern as u and v respectively. It is apparent that either M or M = ũṽ will yield the same observations Y , and thus M and M are indistinguishable. Note that while it is obvious that this 1-bit measurement process will destroy any information we have regarding the scaling of M , this ill-posedness remains even if we knew something about the scaling a priori (such as the Frobenius norm of M ). For any given set of observations, there will always be radically different possible matrices that are all consistent with observed measurements. After considering this example, the problem might seem hopeless. However, an interesting surprise is that when we add noise to the problem (that is, when we observe a subset of the matrix Y = sign(M + Z) where Z 6= 0 is an appropriate stochastic matrix) the picture completely changes—this noise has a “dithering” effect and the problem becomes well-posed. In fact, we will show that in this setting we can sometimes recover M to the same degree of accuracy that is possible when given access to completely unquantized measurements! In particular, under appropriate conditions, O(rd) measurements are sufficient to accurately recover M . We will provide an overview of these results and discuss a number of practical applications.

294 citations

Journal ArticleDOI
TL;DR: In this article, the uplink performance of a quantized massive MIMO system that deploys orthogonal frequency division multiplexing (OFDM) for wideband communication is investigated.
Abstract: Coarse quantization at the base station (BS) of a massive multi-user (MU) multiple-input multiple-output (MIMO) wireless system promises significant power and cost savings. Coarse quantization also enables significant reductions of the raw analog-to-digital converter data that must be transferred from a spatially separated antenna array to the baseband processing unit. The theoretical limits as well as practical transceiver algorithms for such quantized MU-MIMO systems operating over frequency-flat, narrowband channels have been studied extensively. However, the practically relevant scenario where such communication systems operate over frequency-selective, wideband channels is less well understood. This paper investigates the uplink performance of a quantized massive MU-MIMO system that deploys orthogonal frequency-division multiplexing (OFDM) for wideband communication. We propose new algorithms for quantized maximum a posteriori channel estimation and data detection, and we study the associated performance/quantization tradeoffs. Our results demonstrate that coarse quantization (e.g., four to six bits, depending on the ratio between the number of BS antennas and the number of users) in massive MU-MIMO-OFDM systems entails virtually no performance loss compared with the infinite-precision case at no additional cost in terms of baseband processing complexity.

221 citations

Posted Content
TL;DR: In this article, the authors study the landscape of the empirical risk, namely its stationary points and their properties, and establish uniform convergence of the gradient and Hessian of empirical risk to their population counterparts.
Abstract: Most high-dimensional estimation and prediction methods propose to minimize a cost function (empirical risk) that is written as a sum of losses associated to each data point. In this paper we focus on the case of non-convex losses, which is practically important but still poorly understood. Classical empirical process theory implies uniform convergence of the empirical risk to the population risk. While uniform convergence implies consistency of the resulting M-estimator, it does not ensure that the latter can be computed efficiently. In order to capture the complexity of computing M-estimators, we propose to study the landscape of the empirical risk, namely its stationary points and their properties. We establish uniform convergence of the gradient and Hessian of the empirical risk to their population counterparts, as soon as the number of samples becomes larger than the number of unknown parameters (modulo logarithmic factors). Consequently, good properties of the population risk can be carried to the empirical risk, and we can establish one-to-one correspondence of their stationary points. We demonstrate that in several problems such as non-convex binary classification, robust regression, and Gaussian mixture model, this result implies a complete characterization of the landscape of the empirical risk, and of the convergence properties of descent algorithms. We extend our analysis to the very high-dimensional setting in which the number of parameters exceeds the number of samples, and provide a characterization of the empirical risk landscape under a nearly information-theoretically minimal condition. Namely, if the number of samples exceeds the sparsity of the unknown parameters vector (modulo logarithmic factors), then a suitable uniform convergence result takes place. We apply this result to non-convex binary classification and robust regression in very high-dimension.

181 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of recovering a vector x ∈ R^m from incomplete and contaminated observations y = Ax ∈ e + e, where e is an error term.
Abstract: Suppose we wish to recover a vector x_0 Є R^m (e.g., a digital signal or image) from incomplete and contaminated observations y = Ax_0 + e; A is an n by m matrix with far fewer rows than columns (n « m) and e is an error term. Is it possible to recover x_0 accurately based on the data y? To recover x_0, we consider the solution x^# to the l_(1-)regularization problem min ‖x‖l_1 subject to ‖Ax - y‖l(2) ≤ Є, where Є is the size of the error term e. We show that if A obeys a uniform uncertainty principle (with unit-normed columns) and if the vector x_0 is sufficiently sparse, then the solution is within the noise level ‖x^# - x_0‖l_2 ≤ C Є. As a first example, suppose that A is a Gaussian random matrix; then stable recovery occurs for almost all such A's provided that the number of nonzeros of x_0 is of about the same order as the number of observations. As a second instance, suppose one observes few Fourier samples of x_0; then stable recovery occurs for almost any set of n coefficients provided that the number of nonzeros is of the order of n/[log m]^6. In the case where the error term vanishes, the recovery is of course exact, and this work actually provides novel insights into the exact recovery phenomenon discussed in earlier papers. The methodology also explains why one can also very nearly recover approximately sparse signals.

6,727 citations

Journal ArticleDOI
TL;DR: If the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program.
Abstract: Suppose we are given a vector f in a class FsubeRopfN , e.g., a class of digital signals or digital images. How many linear measurements do we need to make about f to be able to recover f to within precision epsi in the Euclidean (lscr2) metric? This paper shows that if the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program. More precisely, suppose that the nth largest entry of the vector |f| (or of its coefficients in a fixed basis) obeys |f|(n)lesRmiddotn-1p/, where R>0 and p>0. Suppose that we take measurements yk=langf# ,Xkrang,k=1,...,K, where the Xk are N-dimensional Gaussian vectors with independent standard normal entries. Then for each f obeying the decay estimate above for some 0

6,342 citations

Journal ArticleDOI
TL;DR: This algorithm gives the first substantial progress in approximating MAX CUT in nearly twenty years, and represents the first use of semidefinite programming in the design of approximation algorithms.
Abstract: We present randomized approximation algorithms for the maximum cut (MAX CUT) and maximum 2-satisfiability (MAX 2SAT) problems that always deliver solutions of expected value at least.87856 times the optimal value. These algorithms use a simple and elegant technique that randomly rounds the solution to a nonlinear programming relaxation. This relaxation can be interpreted both as a semidefinite program and as an eigenvalue minimization problem. The best previously known approximation algorithms for these problems had performance guarantees of 1/2 for MAX CUT and 3/4 or MAX 2SAT. Slight extensions of our analysis lead to a.79607-approximation algorithm for the maximum directed cut problem (MAX DICUT) and a.758-approximation algorithm for MAX SAT, where the best previously known approximation algorithms had performance guarantees of 1/4 and 3/4, respectively. Our algorithm gives the first substantial progress in approximating MAX CUT in nearly twenty years, and represents the first use of semidefinite programming in the design of approximation algorithms.

3,932 citations

Journal ArticleDOI
TL;DR: In many important statistical applications, the number of variables or parameters p is much larger than the total number of observations n as discussed by the authors, and it is possible to estimate β reliably based on the noisy data y.
Abstract: In many important statistical applications, the number of variables or parameters p is much larger than the number of observations n. Suppose then that we have observations y=Xβ+z, where β∈Rp is a parameter vector of interest, X is a data matrix with possibly far fewer rows than columns, n≪p, and the zi’s are i.i.d. N(0, σ^2). Is it possible to estimate β reliably based on the noisy data y?

3,539 citations

Book ChapterDOI
01 May 2012
TL;DR: This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurementMatrices in compressed sensing.
Abstract: This is a tutorial on some basic non-asymptotic methods and concepts in random matrix theory. The reader will learn several tools for the analysis of the extreme singular values of random matrices with independent rows or columns. Many of these methods sprung off from the development of geometric functional analysis since the 1970's. They have applications in several fields, most notably in theoretical computer science, statistics and signal processing. A few basic applications are covered in this text, particularly for the problem of estimating covariance matrices in statistics and for validating probabilistic constructions of measurement matrices in compressed sensing. These notes are written particularly for graduate students and beginning researchers in different areas, including functional analysts, probabilists, theoretical statisticians, electrical engineers, and theoretical computer scientists.

2,780 citations