scispace - formally typeset
Search or ask a question

Showing papers by "Lawrence K. Saul published in 2006"


Journal ArticleDOI
TL;DR: An algorithm for unsupervised learning of image manifolds by semidefinite programming that computes a low dimensional representation of each image with the property that distances between nearby images are preserved.
Abstract: Can we detect low dimensional structure in high dimensional data sets of images? In this paper, we propose an algorithm for unsupervised learning of image manifolds by semidefinite programming. Given a data set of images, our algorithm computes a low dimensional representation of each image with the property that distances between nearby images are preserved. More generally, it can be used to analyze high dimensional data that lies on or near a low dimensional manifold. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.

590 citations


Proceedings Article
16 Jul 2006
TL;DR: A recently proposed algorithm-- maximum, variance unfolding--for learning faithful low dimensional representations of high dimensional data, which relies on modem tools in convex optimization that are proving increasingly useful in many areas of machine learning.
Abstract: Many problems in AI are simplified by clever representations of sensory or symbolic input. How to discover such representations automatically, from large amounts of unlabeled data, remains a fundamental challenge. The goal of statistical methods for dimensionality reduction is to detect and discover low dimensional structure in high dimensional data. In this paper, we review a recently proposed algorithm-- maximum, variance unfolding--for learning faithful low dimensional representations of high dimensional data. The algorithm relies on modem tools in convex optimization that are proving increasingly useful in many areas of machine learning.

285 citations


01 Jan 2006
TL;DR: This chapter provides an overview of unsupervised learning algorithms that can be viewed as spectral methods for linear and nonlinear dimensionality reduction and manifold learning.
Abstract: How can we search for low dimensional structure in high dimensional data? If the data is mainly confined to a low dimensional subspace, then simple linear methods can be used to discover the subspace and estimate its dimensionality. More generally, though, if the data lies on (or near) a low dimensional submanifold, then its structure may be highly nonlinear, and linear methods are bound to fail. Spectral methods have recently emerged as a powerful tool for nonlinear dimensionality reduction and manifold learning. These methods are able to reveal low dimensional structure in high dimensional data from the top or bottom eigenvectors of specially constructed matrices. To analyze data that lies on a low dimensional submanifold, the matrices are constructed from sparse weighted graphs whose vertices represent input patterns and whose edges indicate neighborhood relations. The main computations for manifold learning are based on tractable, polynomial-time optimizations, such as shortest path problems, least squares fits, semidefinite programming, and matrix diagonalization. This chapter provides an overview of unsupervised learning algorithms that can be viewed as spectral methods for linear and nonlinear dimensionality reduction.

278 citations


Proceedings Article
04 Dec 2006
TL;DR: This work proposes a learning algorithm based on the goal of margin maximization in continuous density hidden Markov models for automatic speech recognition (ASR) using Gaussian mixture models, and obtains competitive results for phonetic recognition on the TIMIT speech corpus.
Abstract: We study the problem of parameter estimation in continuous density hidden Markov models (CD-HMMs) for automatic speech recognition (ASR). As in support vector machines, we propose a learning algorithm based on the goal of margin maximization. Unlike earlier work on max-margin Markov networks, our approach is specifically geared to the modeling of real-valued observations (such as acoustic feature vectors) using Gaussian mixture models. Unlike previous discriminative frameworks for ASR, such as maximum mutual information and minimum classification error, our framework leads to a convex optimization, without any spurious local minima. The objective function for large margin training of CD-HMMs is defined over a parameter space of positive semidefinite matrices. Its optimization can be performed efficiently with simple gradient-based methods that scale well to large problems. We obtain competitive results for phonetic recognition on the TIMIT speech corpus.

209 citations


Proceedings ArticleDOI
14 May 2006
TL;DR: A framework for large margin classification by Gaussian mixture models (GMMs), which have many parallels to support vector machines (SVMs) but use ellipsoids to model classes instead of half-spaces is developed.
Abstract: We develop a framework for large margin classification by Gaussian mixture models (GMMs). Large margin GMMs have many parallels to support vector machines (SVMs) but use ellipsoids to model classes instead of half-spaces. Model parameters are trained discriminatively to maximize the margin of correct classification, as measured in terms of Mahalanobis distances. The required optimization is convex over the model's parameter space of positive semidefinite matrices and can be performed efficiently. Large margin GMMs are naturally suited to large problems in multiway classification; we apply them to phonetic classification and recognition on the TIMIT database. On both tasks, we obtain significant improvement over baseline systems trained by maximum likelihood estimation. For the problem of phonetic classification, our results are competitive with other state-of-the-art classifiers, such as hidden conditional random fields.

152 citations


Proceedings Article
04 Dec 2006
TL;DR: This paper shows how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied, and illustrates the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes.
Abstract: In many areas of science and engineering, the problem arises how to discover low dimensional representations of high dimensional data. Recently, a number of researchers have converged on common solutions to this problem using methods from convex optimization. In particular, many results have been obtained by constructing semidefinite programs (SDPs) with low rank solutions. While the rank of matrix variables in SDPs cannot be directly constrained, it has been observed that low rank solutions emerge naturally by computing high variance or maximal trace solutions that respect local distance constraints. In this paper, we show how to solve very large problems of this type by a matrix factorization that leads to much smaller SDPs than those previously studied. The matrix factorization is derived by expanding the solution of the original problem in terms of the bottom eigenvectors of a graph Laplacian. The smaller SDPs obtained from this matrix factorization yield very good approximations to solutions of the original problem. Moreover, these approximations can be further refined by conjugate gradient descent. We illustrate the approach on localization in large scale sensor networks, where optimizations involving tens of thousands of nodes can be solved in just a few minutes.

128 citations


Journal ArticleDOI
TL;DR: A model for representing and predicting distances in large-scale networks by matrix factorization is presented which can model suboptimal and asymmetric routing policies, an improvement on previous approaches and a scalable system is designed and implemented that predicts large numbers of network distances from limited samples of Internet measurements.
Abstract: The responsiveness of networked applications is limited by communications delays, making network distance an important parameter in optimizing the choice of communications peers. Since accurate global snapshots are difficult and expensive to gather and maintain, it is desirable to use sampling techniques in the Internet to predict unknown network distances from a set of partially observed measurements. This paper makes three contributions. First, we present a model for representing and predicting distances in large-scale networks by matrix factorization which can model suboptimal and asymmetric routing policies, an improvement on previous approaches. Second, we describe two algorithms-singular value decomposition and non-negative matrix factorization-for representing a matrix of network distances as the product of two smaller matrices. Third, based on our model and algorithms, we have designed and implemented a scalable system-Internet Distance Estimation Service (IDES)-that predicts large numbers of network distances from limited samples of Internet measurements. Extensive simulations on real-world data sets show that IDES leads to more accurate, efficient and robust predictions of latencies in large-scale networks than existing approaches

123 citations