Showing papers by "Kilian Q. Weinberger published in 2007"

PDF

Open Access

Proceedings Article•

[...]

Kilian Q. Weinberger¹, Gerald Tesauro•Institutions (1)

11 Mar 2007

TL;DR: This paper constructs a novel algorithm for supervised metric learning, which learns a distance function by directly minimizing the leave-one-out regression error, and shows that this algorithm makes kernel regression comparable with the state of the art on several benchmark datasets.

...read moreread less

Abstract: Kernel regression is a well-established method for nonlinear regression in which the target value for a test point is estimated using a weighted average of the surrounding training samples. The weights are typically obtained by applying a distance-based kernel function to each of the samples, which presumes the existence of a well-defined distance metric. In this paper, we construct a novel algorithm for supervised metric learning, which learns a distance function by directly minimizing the leave-one-out regression error. We show that our algorithm makes kernel regression comparable with the state of the art on several benchmark datasets, and we provide efficient implementation details enabling application to datasets with ∼O(10k) instances. Further, we show that our algorithm can be viewed as a supervised variation of PCA and can be used for dimensionality reduction and high dimensional data visualization.

...read moreread less

161 citations

Patent•

Method and apparatus for improved reward-based learning using adaptive distance metrics

[...]

Gerald Tesauro¹, Kilian Q. Weinberger¹•Institutions (1)

IBM¹

11 Oct 2007

TL;DR: In this article, a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant is presented, where a distance metric and a distance-based function approximator estimating long-range expected value are initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance measure and function approximation are adjusted such that a Bellman error measure of the function approximation on the set of exemplars is minimized.

...read moreread less

Abstract: The present invention is a method and an apparatus for reward-based learning of policies for managing or controlling a system or plant. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance metric and a distance-based function approximator estimating long-range expected value are then initialized, where the distance metric computes a distance between two (state, action) pairs, and the distance metric and function approximator are adjusted such that a Bellman error measure of the function approximator on the set of exemplars is minimized. A management policy is then derived based on the trained distance metric and function approximator.

...read moreread less

25 citations

Patent•

Method and apparatus for improved reward-based learning using nonlinear dimensionality reduction

[...]

Rajarshi Das¹, Gerald Tesauro¹, Kilian Q. Weinberger¹•Institutions (1)

IBM¹

11 Oct 2007

TL;DR: In this article, the authors present a method and an apparatus for reward-based learning of management policies, where a set of one or more exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation.

...read moreread less

Abstract: The present invention is a method and an apparatus for reward-based learning of management policies. In one embodiment, a method for reward-based learning includes receiving a set of one or more exemplars, where at least two of the exemplars comprise a (state, action) pair for a system, and at least one of the exemplars includes an immediate reward responsive to a (state, action) pair. A distance measure between pairs of exemplars is used to compute a Non-Linear Dimensionality Reduction (NLDR) mapping of (state, action) pairs into a lower-dimensional representation, thereby producing embedded exemplars, wherein one or more parameters of the NLDR are tuned to minimize a cross-validation Bellman error on a holdout set taken from the set of one or more exemplars. The mapping is then applied to the set of exemplars, and reward-based learning is applied to the embedded exemplars to obtain a learned management policy.

...read moreread less

16 citations

Metric learning with convex optimization

[...]

Lawrence K. Saul¹, Kilian Q. Weinberger¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2007

TL;DR: Two approaches to metric learning based on convex optimization for two different data scenarios are investigated: Large Margin Nearest Neighbor (LMNN) and Maximum Variance Unfolding (MVU), designed for an unsupervised scenario.

...read moreread less

Abstract: Many machine learning algorithms rely heavily on the existence of a good measure of (dis-)similarity between input vectors. One of the most commonly used measures of dissimilarity is the Euclidean distance in input space. This is often suboptimal in many ways. The Euclidean distance metric does not incorporate any side-information that might be available and it does not take advantage of the data structure or specifics of the machine learning goals. Ideally a metric should be learned for each specific task. Recent advances in numerical optimization provide us with a powerful tool for metric learning (and machine learning in general): Convex optimization. I will investigate two approaches to metric learning based on convex optimization for two different data scenarios: The first algorithm, Large Margin Nearest Neighbor (LMNN), operates in a supervised scenario. LMNN learns a metric specifically to improve k-nearest neighbors classification. This is achieved through a linear transformation of the input data that moves similarly labeled inputs close together and separates differently labeled inputs by a large margin. LMNN can be written as a semidefinite program that could be applied to large data sets with up to 60000 training samples. The second algorithm, Maximum Variance Unfolding (MVU), is designed for an unsupervised scenario. The algorithm finds a low dimensional Euclidean embedding of the data that preserves local distances while globally maximizing the variance. Similar to LMNN, MVU can also be phrased as a semidefinite program. This formulation gives local guarantees and distinguishes the algorithm from prior work.

...read moreread less

9 citations