scispace - formally typeset

Proceedings Article

Learning Rankings via Convex Hull Separation

05 Dec 2005-Vol. 18, pp 395-402

TL;DR: Experiments indicate that the proposed algorithm for learning ranking functions from order constraints between sets—i.e. classes—of training samples is at least as accurate as the current state-of-the-art and several orders of magnitude faster than current methods.

AbstractWe propose efficient algorithms for learning ranking functions from order constraints between sets—i.e. classes—of training samples. Our algorithms may be used for maximizing the generalized Wilcoxon Mann Whitney statistic that accounts for the partial ordering of the classes: special cases include maximizing the area under the ROC curve for binary classification and its generalization for ordinal regression. Experiments on public benchmarks indicate that: (a) the proposed algorithm is at least as accurate as the current state-of-the-art; (b) computationally, it is several orders of magnitude faster and—unlike current methods—it is easily able to handle even large datasets with over 20,000 samples.

Topics: Ordinal regression (56%), Convex hull (53%), Ranking (53%), Binary classification (51%)

...read more

Content maybe subject to copyright    Report

Citations
More filters

01 Feb 1977

5,933 citations


Book
Tie-Yan Liu1
27 Jun 2009
TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

2,244 citations


Cites background from "Learning Rankings via Convex Hull S..."

  • ...• Other learning-to-rank algorithms [15, 19, 32, 93, 109, 127, 142, 143, 144] that are based on association rules, decision systems, and other technologies; other theoretical analysis on ranking [50]; and applications of learning-to-rank methods [87, 128]....

    [...]


Proceedings ArticleDOI
20 Jun 2007
TL;DR: It is proposed that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning, and introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning.
Abstract: The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects. Learning to rank is useful for document retrieval, collaborative filtering, and many other applications. Several methods for learning to rank have been proposed, which take object pairs as 'instances' in learning. We refer to them as the pairwise approach in this paper. Although the pairwise approach offers advantages, it ignores the fact that ranking is a prediction task on list of objects. The paper postulates that learning to rank should adopt the listwise approach in which lists of objects are used as 'instances' in learning. The paper proposes a new probabilistic method for the approach. Specifically it introduces two probability models, respectively referred to as permutation probability and top k probability, to define a listwise loss function for learning. Neural Network and Gradient Descent are then employed as model and algorithm in the learning method. Experimental results on information retrieval show that the proposed listwise approach performs better than the pairwise approach.

1,752 citations


Proceedings ArticleDOI
Jun Xu1, Hang Li1
23 Jul 2007
TL;DR: The proposed novel learning algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions, which proves that the training process of AdaRank is exactly that of enhancing the performance measure used.
Abstract: In this paper we address the issue of learning to rank for document retrieval. In the task, a model is automatically created with some training data and then is utilized for ranking of documents. The goodness of a model is usually evaluated with performance measures such as MAP (Mean Average Precision) and NDCG (Normalized Discounted Cumulative Gain). Ideally a learning algorithm would train a ranking model that could directly optimize the performance measures with respect to the training data. Existing methods, however, are only able to train ranking models by minimizing loss functions loosely related to the performance measures. For example, Ranking SVM and RankBoost train ranking models by minimizing classification errors on instance pairs. To deal with the problem, we propose a novel learning algorithm within the framework of boosting, which can minimize a loss function directly defined on the performance measures. Our algorithm, referred to as AdaRank, repeatedly constructs 'weak rankers' on the basis of reweighted training data and finally linearly combines the weak rankers for making ranking predictions. We prove that the training process of AdaRank is exactly that of enhancing the performance measure used. Experimental results on four benchmark datasets show that AdaRank significantly outperforms the baseline methods of BM25, Ranking SVM, and RankBoost.

828 citations


Cites background from "Learning Rankings via Convex Hull S..."

  • ...For other approaches to learning to rank, refer to [2, 11, 31]....

    [...]


Journal ArticleDOI
Tie-Yan Liu1
TL;DR: A statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities.
Abstract: Learning to rank for Information Retrieval (IR) is a task to automatically construct a ranking model using training data, such that the model can sort new objects according to their degrees of relevance, preference, or importance. Many IR problems are by nature ranking problems, and many IR technologies can be potentially enhanced by using learning-to-rank techniques. The objective of this tutorial is to give an introduction to this research direction. Specifically, the existing learning-to-rank algorithms are reviewed and categorized into three approaches: the pointwise, pairwise, and listwise approaches. The advantages and disadvantages with each approach are analyzed, and the relationships between the loss functions used in these approaches and IR evaluation measures are discussed. Then the empirical evaluations on typical learning-to-rank methods are shown, with the LETOR collection as a benchmark dataset, which seems to suggest that the listwise approach be the most effective one among all the approaches. After that, a statistical ranking theory is introduced, which can describe different learning-to-rank algorithms, and be used to analyze their query-level generalization abilities. At the end of the tutorial, we provide a summary and discuss potential future work on learning to rank.

496 citations


References
More filters

Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

38,164 citations


"Learning Rankings via Convex Hull S..." refers background in this paper

  • ...Note that enforcing the constraints defined above indeed implies the desired ordering, since we have: Aw + y ≥ −γ ≥ γ̂ + 1 ≥ γ̂ ≥ Aw − y It is also important to note the connection with Support Vector Machines (SVM) formulation [10, 14] for the binary case....

    [...]


Book
01 Jan 1983
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,204 citations


"Learning Rankings via Convex Hull S..." refers methods in this paper

  • ...Ordinal regression and methods for handling structured output classes: For a classic description of generalized linear models for ordinal regre ssion, see [11]....

    [...]


01 Feb 1977

5,933 citations


"Learning Rankings via Convex Hull S..." refers background in this paper

  • ...B′u− w′[A− ′ − A+ ′ ] = 0, b′u ≤ −1, u ≥ 0, (7) Where the second equivalent form of the constraints was obtained by negation (as before), and the third equivalent form results from ourthird key insight: the application of Farka’s theorem of alternatives[9]....

    [...]


Proceedings ArticleDOI
23 Jul 2002
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

4,297 citations


Book
01 Jan 1969
TL;DR: It is shown that if A is closed for all k → x x, k → y y, where ( k A ∈ ) k y x , then ( ) A ∉ y x .
Abstract: Part 1 (if): Assume that Z is closed. We must show that if A is closed for all k → x x , k → y y , where ( k A ∈ ) k y x , then ( ) A ∈ y x . By the definition of Z being closed, we know that all points arbitrarily close to Z are in Z. Let k → x x , k → y y , and ( k A ∈ ) k y x . Now, for any ε > 0, there exists an N such that for all k ≥ N we have || || k ε − < x x , || || k ε − < y y which implies that ( ) , x y is arbitrarily close to Z, so ( ) , x y ∈ Z and ( ) A ∈ y x . Thus, A is closed.

2,142 citations


"Learning Rankings via Convex Hull S..." refers background in this paper

  • ...Bu− w[A ′ − A ′ ] = 0, bu ≤ −1, u ≥ 0, (7) Where the second equivalent form of the constraints was obtai ned by negation (as before), and the third equivalent form results from our third key insight: the application of Farka’s theorem of alternatives[9]....

    [...]