scispace - formally typeset
Search or ask a question
Author

Emmanuel J. Candès

Bio: Emmanuel J. Candès is an academic researcher from Stanford University. The author has contributed to research in topics: Convex optimization & Compressed sensing. The author has an hindex of 102, co-authored 262 publications receiving 135077 citations. Previous affiliations of Emmanuel J. Candès include Samsung & École Normale Supérieure.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, it was shown that for paths of length $m$ starting at the origin, the hypotheses become distinguishable (in a minimax sense) if they are not if
Abstract: Consider a graph with a set of vertices and oriented edges connecting pairs of vertices. Each vertex is associated with a random variable and these are assumed to be independent. In this setting, suppose we wish to solve the following hypothesis testing problem: under the null, the random variables have common distribution N(0,1) while under the alternative, there is an unknown path along which random variables have distribution $N(\mu,1)$, $\mu> 0$, and distribution N(0,1) away from it. For which values of the mean shift $\mu$ can one reliably detect and for which values is this impossible? Consider, for example, the usual regular lattice with vertices of the form \[\{(i,j):0\le i,-i\le j\le i and j has the parity of i\}\] and oriented edges $(i,j)\to (i+1,j+s)$, where $s=\pm1$. We show that for paths of length $m$ starting at the origin, the hypotheses become distinguishable (in a minimax sense) if $\mu_m\gg1/\sqrt{\log m}$, while they are not if $\mu_m\ll1/\log m$. We derive equivalent results in a Bayesian setting where one assumes that all paths are equally likely; there, the asymptotic threshold is $\mu_m\approx m^{-1/4}$. We obtain corresponding results for trees (where the threshold is of order 1 and independent of the size of the tree), for distributions other than the Gaussian and for other graphs. The concept of the predictability profile, first introduced by Benjamini, Pemantle and Peres, plays a crucial role in our analysis.

1 citations

Journal ArticleDOI
01 Dec 2007-Pamm
TL;DR: It is shown that it is possible to recover the message with nearly the same accuracy as in the setting where no gross errors occur.
Abstract: This article discusses a recently proposed error correction method involving convex optimization [1]. From an encoded and corrupted real-valued message, a receiver would like to determine the original message. A few entries of the encoded message are corrupted arbitrarily (which we call gross errors) and all the entries of the encoded message are corrupted slightly. We show that it is possible to recover the message with nearly the same accuracy as in the setting where no gross errors occur. (© 2008 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

1 citations

Journal ArticleDOI
TL;DR: In this article , the authors use multiple hypothesis testing to evaluate the performance of a black-box model on sensitive sub-populations, such as recidivism prediction, and certify that the model performs adequately.
Abstract: Before deploying a black-box model in high-stakes problems, it is important to evaluate the model's performance on sensitive subpopulations. For example, in a recidivism prediction task, we may wish to identify demographic groups for which our prediction model has unacceptably high false positive rates or certify that no such groups exist. In this paper, we frame this task, often referred to as"fairness auditing,"in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups with statistical guarantees. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately. Crucially, our audit is model-agnostic and applicable to nearly any performance metric or group fairness criterion. Our methods also accommodate extremely rich -- even infinite -- collections of subpopulations. Further, we generalize beyond subpopulations by showing how to assess performance over certain distribution shifts. We test the proposed methods on benchmark datasets in predictive inference and algorithmic fairness and find that our audits can provide interpretable and trustworthy guarantees.

1 citations

01 Jan 2010
TL;DR: Novel results are introduced showing that matrix completion is provably accurate even when the few observed entries are corrupted with a small amount of noise, and that, in practice, nuclear-norm minimization accurately fills in the many missing entries of large low-rank matrices from just a few noisy samples.
Abstract: On the heels of compressed sensing, a new field has very recently emerged. This field addresses a broad range of problems of significant practical interest, namely, the recovery of a data matrix from what appears to be incomplete, and perhaps even corrupted, information. In its simplest form, the problem is to recover a matrix from a small sample of its entries. It comes up in many areas of science and engineering, including collaborative filtering, machine learning, control, remote sensing, and computer vision, to name a few. This paper surveys the novel literature on matrix completion, which shows that under some suitable conditions, one can recover an unknown low-rank matrix from a nearly minimal set of entries by solving a simple convex optimization problem, namely, nuclear-norm minimization subject to data constraints. Fur- ther, this paper introduces novel results showing that matrix completion is provably accurate even when the few observed entries are corrupted with a small amount of noise. A typical result is that one can recover an unknown nn matrix of low rankr from just aboutnr log 2nnoisy samples with an error that is proportional to the noise level. We present numerical results that complement our quantitative analysis and show that, in practice, nuclear-norm minimization accurately fills in the many missing entries of large low-rank matrices from just a few noisy samples. Some analogies between matrix completion and compressed sensing are discussed throughout.

1 citations

Posted Content
TL;DR: The sequential conditional randomization test (CRT) as discussed by the authors is a variable selection procedure that combines the CRT and Selective SeqStep+ to produce a list of discoveries.
Abstract: This paper introduces the sequential CRT, which is a variable selection procedure that combines the conditional randomization test (CRT) and Selective SeqStep+. Valid p-values are constructed via the flexible CRT, which are then ordered and passed through the selective SeqStep+ filter to produce a list of discoveries. We develop theory guaranteeing control on the false discovery rate (FDR) even though the p-values are not independent. We show in simulations that our novel procedure indeed controls the FDR and are competitive with -- and sometimes outperform -- state-of-the-art alternatives in terms of power. Finally, we apply our methodology to a breast cancer dataset with the goal of identifying biomarkers associated with cancer stage.

1 citations


Cited by
More filters
Book
D.L. Donoho1
01 Jan 2004
TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Abstract: Suppose x is an unknown vector in Ropfm (a digital image or signal); we plan to measure n general linear functionals of x and then reconstruct. If x is known to be compressible by transform coding with a known transform, and we reconstruct via the nonlinear procedure defined here, the number of measurements n can be dramatically smaller than the size m. Thus, certain natural classes of images with m pixels need only n=O(m1/4log5/2(m)) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual m pixel samples. More specifically, suppose x has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)-so the coefficients belong to an lscrp ball for 0

18,609 citations

Book
23 May 2011
TL;DR: It is argued that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas.
Abstract: Many problems of recent interest in statistics and machine learning can be posed in the framework of convex optimization. Due to the explosion in size and complexity of modern datasets, it is increasingly important to be able to solve problems with a very large number of features or training examples. As a result, both the decentralized collection or storage of these datasets as well as accompanying distributed solution methods are either necessary or at least highly desirable. In this review, we argue that the alternating direction method of multipliers is well suited to distributed convex optimization, and in particular to large-scale problems arising in statistics, machine learning, and related areas. The method was developed in the 1970s, with roots in the 1950s, and is equivalent or closely related to many other algorithms, such as dual decomposition, the method of multipliers, Douglas–Rachford splitting, Spingarn's method of partial inverses, Dykstra's alternating projections, Bregman iterative algorithms for l1 problems, proximal methods, and others. After briefly surveying the theory and history of the algorithm, we discuss applications to a wide variety of statistical and machine learning problems of recent interest, including the lasso, sparse logistic regression, basis pursuit, covariance selection, support vector machines, and many others. We also discuss general distributed optimization, extensions to the nonconvex setting, and efficient implementation, including some details on distributed MPI and Hadoop MapReduce implementations.

17,433 citations

Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations

Journal Article
TL;DR: This book by a teacher of statistics (as well as a consultant for "experimenters") is a comprehensive study of the philosophical background for the statistical design of experiment.
Abstract: THE DESIGN AND ANALYSIS OF EXPERIMENTS. By Oscar Kempthorne. New York, John Wiley and Sons, Inc., 1952. 631 pp. $8.50. This book by a teacher of statistics (as well as a consultant for \"experimenters\") is a comprehensive study of the philosophical background for the statistical design of experiment. It is necessary to have some facility with algebraic notation and manipulation to be able to use the volume intelligently. The problems are presented from the theoretical point of view, without such practical examples as would be helpful for those not acquainted with mathematics. The mathematical justification for the techniques is given. As a somewhat advanced treatment of the design and analysis of experiments, this volume will be interesting and helpful for many who approach statistics theoretically as well as practically. With emphasis on the \"why,\" and with description given broadly, the author relates the subject matter to the general theory of statistics and to the general problem of experimental inference. MARGARET J. ROBERTSON

13,333 citations

Journal ArticleDOI
TL;DR: A new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically.
Abstract: We consider the class of iterative shrinkage-thresholding algorithms (ISTA) for solving linear inverse problems arising in signal/image processing. This class of methods, which can be viewed as an extension of the classical gradient algorithm, is attractive due to its simplicity and thus is adequate for solving large-scale problems even with dense matrix data. However, such methods are also known to converge quite slowly. In this paper we present a new fast iterative shrinkage-thresholding algorithm (FISTA) which preserves the computational simplicity of ISTA but with a global rate of convergence which is proven to be significantly better, both theoretically and practically. Initial promising numerical results for wavelet-based image deblurring demonstrate the capabilities of FISTA which is shown to be faster than ISTA by several orders of magnitude.

11,413 citations