Learning Rankings via Convex Hull Separation

Home
/
Papers
/
Learning Rankings via Convex Hull Separation

Proceedings Article•

Learning Rankings via Convex Hull Separation

Glenn Fung¹, Romer Rosales¹, Balaji Krishnapuram¹•Institutions (1)

05 Dec 2005-Vol. 18, pp 395-402

TL;DR: Experiments indicate that the proposed algorithm for learning ranking functions from order constraints between sets—i.e. classes—of training samples is at least as accurate as the current state-of-the-art and several orders of magnitude faster than current methods.

read less

Abstract: We propose efficient algorithms for learning ranking functions from order constraints between sets—i.e. classes—of training samples. Our algorithms may be used for maximizing the generalized Wilcoxon Mann Whitney statistic that accounts for the partial ordering of the classes: special cases include maximizing the area under the ROC curve for binary classification and its generalization for ordinal regression. Experiments on public benchmarks indicate that: (a) the proposed algorithm is at least as accurate as the current state-of-the-art; (b) computationally, it is several orders of magnitude faster and—unlike current methods—it is easily able to handle even large datasets with over 20,000 samples.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Convex Analysisの二,三の進展について

[...]

徹丸山

01 Feb 1977

5,933 citations

Book•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

Microsoft¹

27 Jun 2009

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.

...read moreread less

Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

...read moreread less

2,515 citations

Cites background from "Learning Rankings via Convex Hull S..."

...• Other learning-to-rank algorithms [15, 19, 32, 93, 109, 127, 142, 143, 144] that are based on association rules, decision systems, and other technologies; other theoretical analysis on ranking [50]; and applications of learning-to-rank methods [87, 128]....
[...]

Proceedings Article•DOI•

Learning to rank: from pairwise approach to listwise approach

[...]

Zhe Cao¹, Tao Qin¹, Tie-Yan Liu², Ming-Feng Tsai³, Hang Li² - Show less +1 more•Institutions (3)

Tsinghua University¹, Microsoft², National Taiwan University³

20 Jun 2007

TL;DR: It is proposed that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning, and introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning.

...read moreread less

Abstract: The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects. Learning to rank is useful for document retrieval, collaborative filtering, and many other applications. Several methods for learning to rank have been proposed, which take object pairs as 'instances' in learning. We refer to them as the pairwise approach in this paper. Although the pairwise approach offers advantages, it ignores the fact that ranking is a prediction task on list of objects. The paper postulates that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning. The paper proposes a new probabilistic method for the approach. Specifically it introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning. Neural Network and Gradient Descent are then employed as model and algorithm in the learning method. Experimental results on information retrieval show that the proposed listwise approach performs better than the pairwise approach.

...read moreread less

2,003 citations

Proceedings Article•DOI•

AdaRank: a boosting algorithm for information retrieval

[...]

Jun Xu¹, Hang Li¹•Institutions (1)

Microsoft¹

23 Jul 2007

TL;DR: The proposed novel learning algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions, which proves that the training process of AdaRank is exactly that of enhancing the performance measure used.

...read moreread less

Abstract: In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.

...read moreread less

873 citations

Cites background from "Learning Rankings via Convex Hull S..."

...For other approaches to learning to rank, refer to [2, 11, 31]....
[...]

Journal Article•DOI•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

Microsoft¹

01 Mar 2009-Foundations and Trends in Information Retrieval

TL;DR: A statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities.

...read moreread less

Abstract: Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

...read moreread less

591 citations

1
2
3
4
…
5
6
7
8

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

Pranking with Ranking

[...]

Koby Crammer¹, Yoram Singer¹•Institutions (1)

Hebrew University of Jerusalem¹

03 Jan 2001

TL;DR: A simple and efficient online algorithm is described, its performance in the mistake bound model is analyzed, its correctness is proved, and it outperforms online algorithms for regression and classification applied to ranking.

...read moreread less

Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-predict ion rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking.

...read moreread less

657 citations

Additional excerpts

...Another, less elated, approach is [2]....
[...]

Proceedings Article•

Fast Sparse Gaussian Process Methods: The Informative Vector Machine

[...]

Ralf Herbrich¹, Neil D. Lawrence², Matthias Seeger³•Institutions (3)

Microsoft¹, University of Sheffield², University of Edinburgh³

01 Jan 2002

TL;DR: A framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, which allows for Bayesian model selection and is less complex in implementation is presented.

...read moreread less

Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.

...read moreread less

590 citations

Journal Article•

Gaussian Processes for Ordinal Regression

[...]

Wei Chu, Zoubin Ghahramani

01 Dec 2005-Journal of Machine Learning Research

TL;DR: A probabilistic kernel approach to ordinal regression based on Gaussian processes is presented, where a threshold model that generalizes the probit function is used as the likelihood function for ordinal variables.

...read moreread less

Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.

...read moreread less

475 citations

"Learning Rankings via Convex Hull S..." refers methods in this paper

...A non-parametric Bayesian model for ordinal regression based on Gaussian pro cesses (GP) was defined [1]....
[...]

Generalized Support Vector Machines

[...]

Olvi L. Mangasarian

01 Jan 1998

TL;DR: By setting apart the two functions of a support vector machine: separation of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in a higher dimensional space, this work is able to deene indeenite, possibly discontinuous, kernels, not necessarily inner product ones, that generate highly nonlin-ear separating surfaces.

...read moreread less

Abstract: By setting apart the two functions of a support vector machine: separation of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in a higher dimensional space, we are able to deene indeenite, possibly discontinuous, kernels, not necessarily inner product ones, that generate highly nonlin-ear separating surfaces. Maximizing the distance between the separating planes in the higher dimensional space is surrogated by support vector suppression, which is achieved by minimizing any desired norm of support vector multipliers. The norm may be one induced by the separation kernel if it happens to be positive deenite, or a Euclidean or a polyhe-dral norm. The latter norm leads to a linear program whereas the former norms lead to convex quadratic programs, all with an arbitrary separation kernel. A standard support vector machine can be recovered by using the same kernel for separation and support vector suppression. On a simple test example, all models perform equally well when a positive deenite kernel is used. When a negative deenite kernel is used, we are unable to solve the nonconvex quadratic program associated with a conventional support vector machine, while all other proposed models remain convex and easily generate a surface that separates all given points.

...read moreread less

296 citations

Proceedings Article•

Conditional Models on the Ranking Poset

[...]

Guy Lebanon¹, John Lafferty¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2002

TL;DR: A distance-based conditional model on the ranking poset is presented, an extension of the Mallows ϕ model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking.

...read moreread less

Abstract: A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows ϕ model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.

...read moreread less

51 citations

"Learning Rankings via Convex Hull S..." refers methods in this paper

...In other related work, boosting methods have been p roposed for learning preferences [3], and a combinatorial structure called the ranking poset was used for conditional modeling of partially ranked data[8], in the context of comb ining ranked sets of web pages produced by various web-page search engines....
[...]