scispace - formally typeset
Search or ask a question
Proceedings Article

Learning Rankings via Convex Hull Separation

05 Dec 2005-Vol. 18, pp 395-402
TL;DR: Experiments indicate that the proposed algorithm for learning ranking functions from order constraints between sets—i.e. classes—of training samples is at least as accurate as the current state-of-the-art and several orders of magnitude faster than current methods.
Abstract: We propose efficient algorithms for learning ranking functions from order constraints between sets—i.e. classes—of training samples. Our algorithms may be used for maximizing the generalized Wilcoxon Mann Whitney statistic that accounts for the partial ordering of the classes: special cases include maximizing the area under the ROC curve for binary classification and its generalization for ordinal regression. Experiments on public benchmarks indicate that: (a) the proposed algorithm is at least as accurate as the current state-of-the-art; (b) computationally, it is several orders of magnitude faster and—unlike current methods—it is easily able to handle even large datasets with over 20,000 samples.

Content maybe subject to copyright    Report

Citations
More filters
Book
Tie-Yan Liu1
27 Jun 2009
TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

2,515 citations


Cites background from "Learning Rankings via Convex Hull S..."

  • ...• Other learning-to-rank algorithms [15, 19, 32, 93, 109, 127, 142, 143, 144] that are based on association rules, decision systems, and other technologies; other theoretical analysis on ranking [50]; and applications of learning-to-rank methods [87, 128]....

    [...]

Proceedings ArticleDOI
20 Jun 2007
TL;DR: It is proposed that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning, and introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning.
Abstract: The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects. Learning to rank is useful for document retrieval, collaborative filtering, and many other applications. Several methods for learning to rank have been proposed, which take object pairs as 'instances' in learning. We refer to them as the pairwise approach in this paper. Although the pairwise approach offers advantages, it ignores the fact that ranking is a prediction task on list of objects. The paper postulates that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning. The paper proposes a new probabilistic method for the approach. Specifically it introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning. Neural Network and Gradient Descent are then employed as model and algorithm in the learning method. Experimental results on information retrieval show that the proposed listwise approach performs better than the pairwise approach.

2,003 citations

Proceedings ArticleDOI
Jun Xu1, Hang Li1
23 Jul 2007
TL;DR: The proposed novel learning algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions, which proves that the training process of AdaRank is exactly that of enhancing the performance measure used.
Abstract: In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.

873 citations


Cites background from "Learning Rankings via Convex Hull S..."

  • ...For other approaches to learning to rank, refer to [2, 11, 31]....

    [...]

Journal ArticleDOI
Tie-Yan Liu1
TL;DR: A statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities.
Abstract: Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

591 citations

References
More filters
Proceedings Article
03 Jan 2001
TL;DR: A simple and efficient online algorithm is described, its performance in the mistake bound model is analyzed, its correctness is proved, and it outperforms online algorithms for regression and classification applied to ranking.
Abstract: We discuss the problem of ranking instances. In our framework each instance is associated with a rank or a rating, which is an integer from 1 to k. Our goal is to find a rank-predict ion rule that assigns each instance a rank which is as close as possible to the instance's true rank. We describe a simple and efficient online algorithm, analyze its performance in the mistake bound model, and prove its correctness. We describe two sets of experiments, with synthetic data and with the EachMovie dataset for collaborative filtering. In the experiments we performed, our algorithm outperforms online algorithms for regression and classification applied to ranking.

657 citations


Additional excerpts

  • ...Another, less elated, approach is [2]....

    [...]

Proceedings Article
01 Jan 2002
TL;DR: A framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, which allows for Bayesian model selection and is less complex in implementation is presented.
Abstract: We present a framework for sparse Gaussian process (GP) methods which uses forward selection with criteria based on information-theoretic principles, previously suggested for active learning. Our goal is not only to learn d-sparse predictors (which can be evaluated in O(d) rather than O(n), d ≪ n, n the number of training points), but also to perform training under strong restrictions on time and memory requirements. The scaling of our method is at most O(n · d2), and in large real-world classification experiments we show that it can match prediction performance of the popular support vector machine (SVM), yet can be significantly faster in training. In contrast to the SVM, our approximation produces estimates of predictive probabilities ('error bars'), allows for Bayesian model selection and is less complex in implementation.

590 citations

Journal Article
TL;DR: A probabilistic kernel approach to ordinal regression based on Gaussian processes is presented, where a threshold model that generalizes the probit function is used as the likelihood function for ordinal variables.
Abstract: We present a probabilistic kernel approach to ordinal regression based on Gaussian processes. A threshold model that generalizes the probit function is used as the likelihood function for ordinal variables. Two inference techniques, based on the Laplace approximation and the expectation propagation algorithm respectively, are derived for hyperparameter learning and model selection. We compare these two Gaussian process approaches with a previous ordinal regression method based on support vector machines on some benchmark and real-world data sets, including applications of ordinal regression to collaborative filtering and gene expression analysis. Experimental results on these data sets verify the usefulness of our approach.

475 citations


"Learning Rankings via Convex Hull S..." refers methods in this paper

  • ...A non-parametric Bayesian model for ordinal regression based on Gaussian pro cesses (GP) was defined [1]....

    [...]

01 Jan 1998
TL;DR: By setting apart the two functions of a support vector machine: separation of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in a higher dimensional space, this work is able to deene indeenite, possibly discontinuous, kernels, not necessarily inner product ones, that generate highly nonlin-ear separating surfaces.
Abstract: By setting apart the two functions of a support vector machine: separation of points by a nonlinear surface in the original space of patterns, and maximizing the distance between separating planes in a higher dimensional space, we are able to deene indeenite, possibly discontinuous, kernels, not necessarily inner product ones, that generate highly nonlin-ear separating surfaces. Maximizing the distance between the separating planes in the higher dimensional space is surrogated by support vector suppression, which is achieved by minimizing any desired norm of support vector multipliers. The norm may be one induced by the separation kernel if it happens to be positive deenite, or a Euclidean or a polyhe-dral norm. The latter norm leads to a linear program whereas the former norms lead to convex quadratic programs, all with an arbitrary separation kernel. A standard support vector machine can be recovered by using the same kernel for separation and support vector suppression. On a simple test example, all models perform equally well when a positive deenite kernel is used. When a negative deenite kernel is used, we are unable to solve the nonconvex quadratic program associated with a conventional support vector machine, while all other proposed models remain convex and easily generate a surface that separates all given points.

296 citations

Proceedings Article
01 Jan 2002
TL;DR: A distance-based conditional model on the ranking poset is presented, an extension of the Mallows ϕ model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking.
Abstract: A distance-based conditional model on the ranking poset is presented for use in classification and ranking. The model is an extension of the Mallows ϕ model, and generalizes the classifier combination methods used by several ensemble learning algorithms, including error correcting output codes, discrete AdaBoost, logistic regression and cranking. The algebraic structure of the ranking poset leads to a simple Bayesian interpretation of the conditional model and its special cases. In addition to a unifying view, the framework suggests a probabilistic interpretation for error correcting output codes and an extension beyond the binary coding scheme.

51 citations


"Learning Rankings via Convex Hull S..." refers methods in this paper

  • ...In other related work, boosting methods have been p roposed for learning preferences [3], and a combinatorial structure called the ranking poset was used for conditional modeling of partially ranked data[8], in the context of comb ining ranked sets of web pages produced by various web-page search engines....

    [...]