Learning Rankings via Convex Hull Separation

Home
/
Papers
/
Learning Rankings via Convex Hull Separation

Proceedings Article•

Learning Rankings via Convex Hull Separation

Glenn Fung¹, Romer Rosales¹, Balaji Krishnapuram¹•Institutions (1)

05 Dec 2005-Vol. 18, pp 395-402

TL;DR: Experiments indicate that the proposed algorithm for learning ranking functions from order constraints between sets—i.e. classes—of training samples is at least as accurate as the current state-of-the-art and several orders of magnitude faster than current methods.

read less

Abstract: We propose efficient algorithms for learning ranking functions from order constraints between sets—i.e. classes—of training samples. Our algorithms may be used for maximizing the generalized Wilcoxon Mann Whitney statistic that accounts for the partial ordering of the classes: special cases include maximizing the area under the ROC curve for binary classification and its generalization for ordinal regression. Experiments on public benchmarks indicate that: (a) the proposed algorithm is at least as accurate as the current state-of-the-art; (b) computationally, it is several orders of magnitude faster and—unlike current methods—it is easily able to handle even large datasets with over 20,000 samples.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Directly optimizing evaluation measures in learning to rank

[...]

Jun Xu¹, Tie-Yan Liu¹, Min Lu², Hang Li¹, Wei-Ying Ma¹ - Show less +1 more•Institutions (2)

Microsoft¹, Nankai University²

20 Jul 2008

TL;DR: Experimental results show that the methods based on direct optimization of evaluation measures can always outperform conventional methods of Ranking SVM and RankBoost, however, no significant difference exists among the performances of the direct optimization methods themselves.

...read moreread less

Abstract: One of the central issues in learning to rank for information retrieval is to develop algorithms that construct ranking models by directly optimizing evaluation measures used in information retrieval such as Mean Average Precision (MAP) and Normalized Discounted Cumulative Gain (NDCG). Several such algorithms including SVMmap and AdaRank have been proposed and their effectiveness has been verified. However, the relationships between the algorithms are not clear, and furthermore no comparisons have been conducted between them. In this paper, we conduct a study on the approach of directly optimizing evaluation measures in learning to rank for Information Retrieval (IR). We focus on the methods that minimize loss functions upper bounding the basic loss function defined on the IR measures. We first provide a general framework for the study and analyze the existing algorithms of SVMmap and AdaRank within the framework. The framework is based on upper bound analysis and two types of upper bounds are discussed. Moreover, we show that we can derive new algorithms on the basis of this analysis and create one example algorithm called PermuRank. We have also conducted comparisons between SVMmap, AdaRank, PermuRank, and conventional methods of Ranking SVM and RankBoost, using benchmark datasets. Experimental results show that the methods based on direct optimization of evaluation measures can always outperform conventional methods of Ranking SVM and RankBoost. However, no significant difference exists among the performances of the direct optimization methods themselves.

...read moreread less

159 citations

Cites methods from "Learning Rankings via Convex Hull S..."

...For other methods belonging to the approach, refer to [12, 29, 6, 25, 21, 30, 24, 31]....
[...]

Book Chapter•DOI•

Label Ranking Algorithms: A Survey

[...]

Shankar Vembu, Thomas Gärtner

01 Jan 2010

TL;DR: Label ranking is a complex prediction task where the goal is to map instances to a total order over a finite set of predefined labels as mentioned in this paper, and it subsumes several supervised learning problems, such as multiclass prediction, multilabel classification, and hierarchical classification.

...read moreread less

Abstract: Label ranking is a complex prediction task where the goal is to map instances to a total order over a finite set of predefined labels. An interesting aspect of this problem is that it subsumes several supervised learning problems, such as multiclass prediction, multilabel classification, and hierarchical classification. Unsurprisingly, there exists a plethora of label ranking algorithms in the literature due, in part, to this versatile nature of the problem. In this paper, we survey these algorithms.

...read moreread less

126 citations

Journal Article•

Efficient Learning of Label Ranking by Soft Projections onto Polyhedra

[...]

Shai Shalev-Shwartz¹, Yoram Singer•Institutions (1)

Hebrew University of Jerusalem¹

01 Dec 2006-Journal of Machine Learning Research

TL;DR: This work discusses the problem of learning to rank labels from a real valued feedback associated with each label and describes an efficient algorithm, called SOPOPO, for solving the reduced problem by employing a soft projection onto the polyhedron defined by a reduced set of constraints.

...read moreread less

Abstract: We discuss the problem of learning to rank labels from a real valued feedback associated with each label. We cast the feedback as a preferences graph where the nodes of the graph are the labels and edges express preferences over labels. We tackle the learning problem by defining a loss function for comparing a predicted graph with a feedback graph. This loss is materialized by decomposing the feedback graph into bipartite sub-graphs. We then adopt the maximum-margin framework which leads to a quadratic optimization problem with linear constraints. While the size of the problem grows quadratically with the number of the nodes in the feedback graph, we derive a problem of a significantly smaller size and prove that it attains the same minimum. We then describe an efficient algorithm, called SOPOPO, for solving the reduced problem by employing a soft projection onto the polyhedron defined by a reduced set of constraints. We also describe and analyze a wrapper procedure for batch learning when multiple graphs are provided for training. We conclude with a set of experiments which show significant improvements in run time over a state of the art interior-point algorithm.

...read moreread less

102 citations

Proceedings Article•DOI•

Learning to rank with ties

[...]

Ke Zhou¹, Gui-Rong Xue¹, Hongyuan Zha², Yong Yu¹•Institutions (2)

Shanghai Jiao Tong University¹, Georgia Institute of Technology²

20 Jul 2008

TL;DR: This paper analyzes the properties of ties and develops novel learning frameworks which combine ties and preference data using statistical paired comparison models to improve the performance of learned ranking functions.

...read moreread less

Abstract: Designing effective ranking functions is a core problem for information retrieval and Web search since the ranking functions directly impact the relevance of the search results. The problem has been the focus of much of the research at the intersection of Web search and machine learning, and learning ranking functions from preference data in particular has recently attracted much interest. The objective of this paper is to empirically examine several objective functions that can be used for learning ranking functions from preference data. Specifically, we investigate the roles of ties in the learning process. By ties, we mean preference judgments that two documents have equal degree of relevance with respect to a query. This type of data has largely been ignored or not properly modeled in the past. In this paper, we analyze the properties of ties and develop novel learning frameworks which combine ties and preference data using statistical paired comparison models to improve the performance of learned ranking functions. The resulting optimization problems explicitly incorporating ties and preference data are solved using gradient boosting methods. Experimental studies are conducted using three publicly available data sets which demonstrate the effectiveness of the proposed new methods.

...read moreread less

45 citations

Cites methods from "Learning Rankings via Convex Hull S..."

...Other methods for learning ranking functions are proposed in [4,12] Another approach to learning a ranking function address the problem of optimizing the loss function directly related to the performance measures of information retrieval, such as precision, mean average precision and Normalized Discount Cumulative Gain [6, 17, 28, 29]....
[...]

Proceedings Article•

Large-margin Weakly Supervised Dimensionality Reduction

[...]

Chang Xu¹, Chang Xu², Dacheng Tao², Chao Xu¹, Yong Rui³ - Show less +1 more•Institutions (3)

Peking University¹, University of Technology, Sydney², Microsoft³

21 Jun 2014

TL;DR: Theoretical analysis demonstrates that the proposed large margin optimization criteria can strengthen and improve the robustness and generalization performance of preference learning algorithms on the obtained low-dimensional subspace.

...read moreread less

Abstract: This paper studies dimensionality reduction in a weakly supervised setting, in which the preference relationship between examples is indicated by weak cues. A novel framework is proposed that integrates two aspects of the large margin principle (angle and distance), which simultaneously encourage angle consistency between preference pairs and maximize the distance between examples in preference pairs. Two specific algorithms are developed: an alternating direction method to learn a linear transformation matrix and a gradient boosting technique to optimize a non-linear transformation directly in the function space. Theoretical analysis demonstrates that the proposed large margin optimization criteria can strengthen and improve the robustness and generalization performance of preference learning algorithms on the obtained low-dimensional subspace. Experimental results on real-world datasets demonstrate the significance of studying dimensionality reduction in the weakly supervised setting and the effectiveness of the proposed framework.

...read moreread less

43 citations

Cites methods from "Learning Rankings via Convex Hull S..."

...In this paper we study the weakly supervised dimensionality reduction problem, where the preference relationships between examples are provided rather than explicit class labels....
[...]
...Three datasets 3 (Table 1) were used that have previously been used to evaluate ranking and ordinal regression (Fung et al., 2006)....
[...]

1
2
3
4
5
…
6
7
8

Collapse

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

"Learning Rankings via Convex Hull S..." refers background in this paper

...Note that enforcing the constraints defined above indeed implies the desired ordering, since we have: Aw + y ≥ −γ ≥ γ̂ + 1 ≥ γ̂ ≥ Aw − y It is also important to note the connection with Support Vector Machines (SVM) formulation [10, 14] for the binary case....
[...]

Book•

Generalized Linear Models

[...]

Peter McCullagh¹, John A. Nelder•Institutions (1)

Imperial College London¹

01 Jan 1983

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

23,215 citations

"Learning Rankings via Convex Hull S..." refers methods in this paper

...Ordinal regression and methods for handling structured output classes: For a classic description of generalized linear models for ordinal regre ssion, see [11]....
[...]

Convex Analysisの二,三の進展について

[...]

徹丸山

01 Feb 1977

5,933 citations

"Learning Rankings via Convex Hull S..." refers background in this paper

...B′u− w′[A− ′ − A+ ′ ] = 0, b′u ≤ −1, u ≥ 0, (7) Where the second equivalent form of the constraints was obtained by negation (as before), and the third equivalent form results from ourthird key insight: the application of Farka’s theorem of alternatives[9]....
[...]

Proceedings Article•DOI•

Optimizing search engines using clickthrough data

[...]

Thorsten Joachims¹•Institutions (1)

Cornell University¹

23 Jul 2002

TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.

...read moreread less

Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

...read moreread less

4,453 citations

Book•

Nonlinear Programming

[...]

Olvi L. Mangasarian

01 Jan 1969

TL;DR: It is shown that if A is closed for all k → x x, k → y y, where ( k A ∈ ) k y x , then ( ) A ∉ y x .

...read moreread less

Abstract: Part 1 (if): Assume that Z is closed. We must show that if A is closed for all k → x x , k → y y , where ( k A ∈ ) k y x , then ( ) A ∈ y x . By the definition of Z being closed, we know that all points arbitrarily close to Z are in Z. Let k → x x , k → y y , and ( k A ∈ ) k y x . Now, for any ε > 0, there exists an N such that for all k ≥ N we have || || k ε − < x x , || || k ε − < y y which implies that ( ) , x y is arbitrarily close to Z, so ( ) , x y ∈ Z and ( ) A ∈ y x . Thus, A is closed.

...read moreread less

2,146 citations

"Learning Rankings via Convex Hull S..." refers background in this paper

...Bu− w[A ′ − A ′ ] = 0, bu ≤ −1, u ≥ 0, (7) Where the second equivalent form of the constraints was obtai ned by negation (as before), and the third equivalent form results from our third key insight: the application of Farka’s theorem of alternatives[9]....
[...]