scispace - formally typeset
Search or ask a question
Author

Prateek Jain

Bio: Prateek Jain is an academic researcher from Google. The author has contributed to research in topics: Matrix (mathematics) & Computer science. The author has an hindex of 51, co-authored 173 publications receiving 11698 citations. Previous affiliations of Prateek Jain include Microsoft & University of California, Berkeley.


Papers
More filters
Proceedings ArticleDOI
20 Jun 2007
TL;DR: An information-theoretic approach to learning a Mahalanobis distance function that can handle a wide variety of constraints and can optionally incorporate a prior on the distance function and derive regret bounds for the resulting algorithm.
Abstract: In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the distance function. We express this problem as a particular Bregman optimization problem---that of minimizing the LogDet divergence subject to linear constraints. Our resulting algorithm has several advantages over existing methods. First, our method can handle a wide variety of constraints and can optionally incorporate a prior on the distance function. Second, it is fast and scalable. Unlike most existing methods, no eigenvalue computations or semi-definite programming are required. We also present an online version and derive regret bounds for the resulting algorithm. Finally, we evaluate our method on a recent error reporting system for software called Clarify, in the context of metric learning for nearest neighbor classification, as well as on standard data sets.

2,058 citations

Proceedings ArticleDOI
01 Jun 2013
TL;DR: This paper presents one of the first theoretical analyses of the performance of alternating minimization for matrix completion, and the related problem of matrix sensing, and shows that alternating minimizations guarantees faster convergence to the true matrix, while allowing a significantly simpler analysis.
Abstract: Alternating minimization represents a widely applicable and empirically successful approach for finding low-rank matrices that best fit the given data. For example, for the problem of low-rank matrix completion, this method is believed to be one of the most accurate and efficient, and formed a major component of the winning entry in the Netflix Challenge [17].In the alternating minimization approach, the low-rank target matrix is written in a bi-linear form, i.e. X = UV†; the algorithm then alternates between finding the best U and the best V. Typically, each alternating step in isolation is convex and tractable. However the overall problem becomes non-convex and is prone to local minima. In fact, there has been almost no theoretical understanding of when this approach yields a good result.In this paper we present one of the first theoretical analyses of the performance of alternating minimization for matrix completion, and the related problem of matrix sensing. For both these problems, celebrated recent results have shown that they become well-posed and tractable once certain (now standard) conditions are imposed on the problem. We show that alternating minimization also succeeds under similar conditions. Moreover, compared to existing results, our paper shows that alternating minimization guarantees faster (in particular, geometric) convergence to the true matrix, while allowing a significantly simpler analysis.

1,072 citations

Journal ArticleDOI
22 Jun 2015
TL;DR: In this paper, the authors show that a resampling version of the alternating minimization algorithm converges geometrically to the solution of a non-convex phase retrieval problem.
Abstract: Phase retrieval problems involve solving linear equations, but with missing sign (or phase, for complex numbers) information. More than four decades after it was first proposed, the seminal error reduction algorithm of Gerchberg and Saxton and Fienup is still the popular choice for solving many variants of this problem. The algorithm is based on alternating minimization; i.e., it alternates between estimating the missing phase information, and the candidate solution. Despite its wide usage in practice, no global convergence guarantees for this algorithm are known. In this paper, we show that a (resampling) variant of this approach converges geometrically to the solution of one such problem—finding a vector $\bf x$ from ${\bf y}, {\bf A}$ , where ${\bf y} = \vert {\bf A}^T{\bf x}\vert$ and $\vert{\bf z}\vert$ denotes a vector of element-wise magnitudes of ${\bf z}$ —under the assumption that $ {\bf A}$ is Gaussian. Empirically, we demonstrate that alternating minimization performs similar to recently proposed convex techniques for this problem (which are based on “lifting” to a convex matrix problem) in sample complexity and robustness to noise. However, it is much more efficient and can scale to large problems. Analytically, for a resampling version of alternating minimization, we show geometric convergence to the solution, and sample complexity that is off by log factors from obvious lower bounds. We also establish close to optimal scaling for the case when the unknown vector is sparse. Our work represents the first theoretical guarantee for alternating minimization (albeit with resampling) for any variant of phase retrieval problems in the non-convex setting.

459 citations

Proceedings Article
06 Dec 2010
TL;DR: Singular value projection (SVP) as discussed by the authors is a simple and fast algorithm for rank minimization under affine constraints (ARMP) and shows that SVP recovers the minimum rank solution for affine constraint that satisfy a restricted isometry property (RIP).
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of low-rank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem.

445 citations

Posted Content
TL;DR: Results show that the SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem.
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization with affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy the "restricted isometry property" and show robustness of our method to noise. Our results improve upon a recent breakthrough by Recht, Fazel and Parillo (RFP07) and Lee and Bresler (LB09) in three significant ways: 1) our method (SVP) is significantly simpler to analyze and easier to implement, 2) we give recovery guarantees under strictly weaker isometry assumptions 3) we give geometric convergence guarantees for SVP even in presense of noise and, as demonstrated empirically, SVP is significantly faster on real-world and synthetic problems. In addition, we address the practically important problem of low-rank matrix completion (MCP), which can be seen as a special case of ARMP. We empirically demonstrate that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We make partial progress towards proving exact recovery and provide some intuition for the strong performance of SVP applied to matrix completion by showing a more restricted isometry property. Our algorithm outperforms existing methods, such as those of \cite{RFP07,CR08,CT09,CCS08,KOM09,LB09}, for ARMP and the matrix-completion problem by an order of magnitude and is also significantly more robust to noise.

412 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book
01 Jan 2009

8,216 citations