Search or ask a question

Showing papers by "Kilian Q. Weinberger published in 2004"

PDF

Open Access

Proceedings Article•DOI•

Learning a kernel matrix for nonlinear dimensionality reduction

[...]

Kilian Q. Weinberger¹, Fei Sha¹, Lawrence K. Saul¹•Institutions (1)

University of Pennsylvania¹

04 Jul 2004

TL;DR: This work investigates how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold and shows how to discover a mapping that "unfolds" the underlying manifold from which the data was sampled.

...read moreread less

Abstract: We investigate how to learn a kernel matrix for high dimensional data that lies on or near a low dimensional manifold. Noting that the kernel matrix implicitly maps the data into a nonlinear feature space, we show how to discover a mapping that "unfolds" the underlying manifold from which the data was sampled. The kernel matrix is constructed by maximizing the variance in feature space subject to local constraints that preserve the angles and distances between nearest neighbors. The main optimization involves an instance of semidefinite programming---a fundamentally different computation than previous algorithms for manifold learning, such as Isomap and locally linear embedding. The optimized kernels perform better than polynomial and Gaussian kernels for problems in manifold learning, but worse for problems in large margin classification. We explain these results in terms of the geometric properties of different kernels and comment on various interpretations of other manifold learning algorithms as kernel methods.

...read moreread less

561 citations

Proceedings Article•DOI•

Unsupervised learning of image manifolds by semidefinite programming

[...]

Kilian Q. Weinberger¹, Lawrence K. Saul¹•Institutions (1)

University of Pennsylvania¹

27 Jun 2004

TL;DR: The proposed algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold, and overcomes certain limitations of previous work in manifold learning, such as Isomap and locally linear embedding.

...read moreread less

Abstract: Can we detect low dimensional structure in high dimensional data sets of images and video? The problem of dimensionality reduction arises often in computer vision and pattern recognition. In this paper, we propose a new solution to this problem based on semidefinite programming. Our algorithm can be used to analyze high dimensional data that lies on or near a low dimensional manifold. It overcomes certain limitations of previous work in manifold learning, such as Isomap and locally linear embedding. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.

...read moreread less

243 citations

Proceedings Article•

Hierarchical Distributed Representations for Statistical Language Modeling

[...]

John Blitzer¹, Fernando Pereira¹, Kilian Q. Weinberger¹, Lawrence K. Saul¹•Institutions (1)

University of Pennsylvania¹

01 Dec 2004

TL;DR: This paper shows how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model, and demonstrates consistent improvement over class-based bigram models.

...read moreread less

Abstract: Statistical language models estimate the probability of a word occurring in a given context. The most common language models rely on a discrete enumeration of predictive contexts (e.g., n-grams) and consequently fail to capture and exploit statistical regularities across these contexts. In this paper, we show how to learn hierarchical, distributed representations of word contexts that maximize the predictive value of a statistical language model. The representations are initialized by unsupervised algorithms for linear and nonlinear dimensionality reduction [14], then fed as input into a hierarchical mixture of experts, where each expert is a multinomial distribution over predicted words [12]. While the distributed representations in our model are inspired by the neural probabilistic language model of Bengio et al. [2, 3], our particular architecture enables us to work with significantly larger vocabularies and training corpora. For example, on a large-scale bigram modeling task involving a sixty thousand word vocabulary and a training corpus of three million sentences, we demonstrate consistent improvement over class-based bigram models [10, 13]. We also discuss extensions of our approach to longer multiword contexts.

...read moreread less

32 citations