scispace - formally typeset
Search or ask a question
Author

Guy Lebanon

Other affiliations: University of Coimbra, LinkedIn, Georgia Institute of Technology  ...read more
Bio: Guy Lebanon is an academic researcher from Amazon.com. The author has contributed to research in topics: Statistical model & Fisher information metric. The author has an hindex of 31, co-authored 113 publications receiving 3281 citations. Previous affiliations of Guy Lebanon include University of Coimbra & LinkedIn.


Papers
More filters
Journal Article
TL;DR: Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.
Abstract: A family of kernels for statistical learning is introduced that exploits the geometric structure of statistical models. The kernels are based on the heat equation on the Riemannian manifold defined by the Fisher information metric associated with a statistical family, and generalize the Gaussian kernel of Euclidean space. As an important special case, kernels based on the geometry of multinomial families are derived, leading to kernel-based learning algorithms that apply naturally to discrete data. Bounds on covering numbers and Rademacher averages for the kernels are proved using bounds on the eigenvalues of the Laplacian on Riemannian manifolds. Experimental results are presented for document classification, for which the use of multinomial geometry is natural and well motivated, and improvements are obtained over the standard use of Gaussian or linear kernels, which have been the standard for text classification.

270 citations

Proceedings Article
08 Jul 2002
TL;DR: A new approach to ensemble learning is introduced that takes ranking rather than classification as fundamental, leading to models on the symmetric group and its cosets, a generalization of the Mallows model on permutations to combine multiple input rankings.
Abstract: A new approach to ensemble learning is introduced that takes ranking rather than classification as fundamental, leading to models on the symmetric group and its cosets. The approach uses a generalization of the Mallows model on permutations to combine multiple input rankings. Applications include the task of combining the output of multiple search engines and multiclass or multilabel classification, where a set of input classifiers is viewed as generating a ranking of class labels. Experiments for both types of applications are presented.

222 citations

Proceedings Article
03 Jan 2001
TL;DR: An equivalence is derived between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by Ada boost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels.
Abstract: We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.

198 citations

Proceedings Article
16 Jun 2013
TL;DR: A new matrix approximation model is proposed where it is assumed that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum ofLow-rank matrices, and improvements in prediction accuracy over classical approaches for recommendation tasks.
Abstract: Matrix approximation is a common tool in recommendation systems, text mining, and computer vision. A prevalent assumption in constructing matrix approximations is that the partially observed matrix is of low-rank. We propose a new matrix approximation model where we assume instead that the matrix is locally of low-rank, leading to a representation of the observed matrix as a weighted sum of low-rank matrices. We analyze the accuracy of the proposed local low-rank modeling. Our experiments show improvements in prediction accuracy over classical approaches for recommendation tasks.

185 citations

Journal ArticleDOI
Guy Lebanon1
TL;DR: This work considers the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points and discusses in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations
Abstract: Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.

147 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations

Journal ArticleDOI
TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.
Abstract: The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008---2012. The paper is intended for two audiences: algorithm designers, researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers, who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community's progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

6,061 citations

Journal ArticleDOI

3,152 citations

01 Jan 2006

3,012 citations