Optimizing search engines using clickthrough data

doi:10.1145/775047.775067

Home
/
Papers
/
Optimizing search engines using clickthrough data

Proceedings Article•DOI•

Optimizing search engines using clickthrough data

Thorsten Joachims¹•Institutions (1)

Cornell University¹

23 Jul 2002-pp 133-142

TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.

read less

Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

...read moreread less

Citations

PDF

Open Access

More filters

Book•

Learning to Rank for Information Retrieval

[...]

Tie-Yan Liu¹•Institutions (1)

Microsoft¹

27 Jun 2009

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.

...read moreread less

Abstract: This tutorial is concerned with a comprehensive introduction to the research area of learning to rank for information retrieval. In the first part of the tutorial, we will introduce three major approaches to learning to rank, i.e., the pointwise, pairwise, and listwise approaches, analyze the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures, evaluate the performance of these approaches on the LETOR benchmark datasets, and demonstrate how to use these approaches to solve real ranking applications. In the second part of the tutorial, we will discuss some advanced topics regarding learning to rank, such as relational ranking, diverse ranking, semi-supervised ranking, transfer ranking, query-dependent ranking, and training data preprocessing. In the third part, we will briefly mention the recent advances on statistical learning theory for ranking, which explain the generalization ability and statistical consistency of different ranking methods. In the last part, we will conclude the tutorial and show several future research directions.

...read moreread less

2,515 citations

Cites background or methods from "Optimizing search engines using cli..."

...2 Note that there are some algorithms, such as [68], which were also referred to as ordinal regression based algorithms in the literature....
[...]
...Ranking SVM [63, 68] uses SVM for the task of pairwise classification....
[...]
...• Ground truth mining [3, 68, 105], which targets automatically mining ground truth labels for learning to rank, mainly from click-through logs of search engines....
[...]
...Some work has been done along this direction [3, 68, 105], however, they also have certain limitations....
[...]
...6 This kind of judgment can also be mined from click-through logs of search engines [68, 69, 105]....
[...]

Book•

Foundations of Machine Learning

[...]

Mehryar Mohri, Afshin Rostamizadeh¹, Afshin Rostamizadeh², Ameet Talwalkar¹, Ameet Talwalkar² - Show less +1 more•Institutions (2)

University of California, Berkeley¹, New York University²

17 Aug 2012

TL;DR: This graduate-level textbook introduces fundamental concepts and methods in machine learning, and provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application.

...read moreread less

Abstract: This graduate-level textbook introduces fundamental concepts and methods in machine learning. It describes several important modern algorithms, provides the theoretical underpinnings of these algorithms, and illustrates key aspects for their application. The authors aim to present novel theoretical tools and concepts while giving concise proofs even for relatively advanced topics. Foundations of Machine Learning fills the need for a general textbook that also offers theoretical details and an emphasis on proofs. Certain topics that are often treated with insufficient attention are discussed in more detail here; for example, entire chapters are devoted to regression, multi-class classification, and ranking. The first three chapters lay the theoretical foundation for what follows, but each remaining chapter is mostly self-contained. The appendix offers a concise probability review, a short introduction to convex optimization, tools for concentration bounds, and several basic properties of matrices and norms used in the book. The book is intended for graduate students and researchers in machine learning, statistics, and related areas; it can be used either as a textbook or as a reference text for a research seminar.

...read moreread less

2,511 citations

Cites methods from "Optimizing search engines using cli..."

...MN is due to Taskar, Guestrin, and Koller [2003] and StructSVM was presented by Tsochantaridis, Joachims, Hofmann, and Altun [2005]. An alternative technique for tackling structured prediction as a regression problem was presented and analyzed by Cortes, Mohri, and Weston [2007c]....
[...]

Proceedings Article•DOI•

Factorization Machines

[...]

Steffen Rendle¹•Institutions (1)

University of Hildesheim¹

13 Dec 2010

TL;DR: Factorization Machines (FM) are introduced which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models and can mimic these models just by specifying the input data (i.e. the feature vectors).

...read moreread less

Abstract: In this paper, we introduce Factorization Machines (FM) which are a new model class that combines the advantages of Support Vector Machines (SVM) with factorization models. Like SVMs, FMs are a general predictor working with any real valued feature vector. In contrast to SVMs, FMs model all interactions between variables using factorized parameters. Thus they are able to estimate interactions even in problems with huge sparsity (like recommender systems) where SVMs fail. We show that the model equation of FMs can be calculated in linear time and thus FMs can be optimized directly. So unlike nonlinear SVMs, a transformation in the dual form is not necessary and the model parameters can be estimated directly without the need of any support vector in the solution. We show the relationship to SVMs and the advantages of FMs for parameter estimation in sparse settings. On the other hand there are many different factorization models like matrix factorization, parallel factor analysis or specialized models like SVD++, PITF or FPMC. The drawback of these models is that they are not applicable for general prediction tasks but work only with special input data. Furthermore their model equations and optimization algorithms are derived individually for each task. We show that FMs can mimic these models just by specifying the input data (i.e. the feature vectors). This makes FMs easily applicable even for users without expert knowledge in factorization models.

...read moreread less

2,460 citations

Cites background from "Optimizing search engines using cli..."

...1, if θ is w0 xi, if θ is wi xi ∑n j=1 vj,fxj − vi,fx2i , if θ is vi,f (4) The sum ∑n j=1 vj,fxj is independent of i and thus can be precomputed (e.g. when computing ŷ(x))....
[...]
...Scoring functions can be learned with pairwise training data [5], where a feature tuple (x(A),x(B)) ∈ D means that x(A) should be ranked higher than x(B)....
[...]

Journal Article•

Large Margin Methods for Structured and Interdependent Output Variables

[...]

Ioannis Tsochantaridis, Thorsten Joachims¹, Thomas Hofmann¹, Yasemin Altun¹•Institutions (1)

Max Planck Society¹

01 Dec 2005-Journal of Machine Learning Research

TL;DR: This paper proposes to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation and presents a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems.

...read moreread less

Abstract: Learning general functional dependencies between arbitrary input and output spaces is one of the key challenges in computational intelligence. While recent progress in machine learning has mainly focused on designing flexible and powerful input representations, this paper addresses the complementary issue of designing classification algorithms that can deal with more complex outputs, such as trees, sequences, or sets. More generally, we consider problems involving multiple dependent output variables, structured output spaces, and classification problems with class attributes. In order to accomplish this, we propose to appropriately generalize the well-known notion of a separation margin and derive a corresponding maximum-margin formulation. While this leads to a quadratic program with a potentially prohibitive, i.e. exponential, number of constraints, we present a cutting plane algorithm that solves the optimization problem in polynomial time for a large class of problems. The proposed method has important applications in areas such as computational biology, natural language processing, information retrieval/extraction, and optical character recognition. Experiments from various domains involving different types of output spaces emphasize the breadth and generality of our approach.

...read moreread less

2,292 citations

Cites background or methods from "Optimizing search engines using cli..."

...In contrast, we have proposed an efficient algorithm (Hofmann et al., 2002; Altun et al., 2003; Joachims, 2003) even in the case of very large output spaces, that takes advantage of the sparseness of the maximummargin solution....
[...]
...The same is true also for other ranking algorithms (Cohen et al., 1999; Herbrich et al., 2000; Schapire and Singer, 2000; Crammer and Singer, 2002; Joachims, 2002)....
[...]

Proceedings Article•DOI•

Training linear SVMs in linear time

[...]

Thorsten Joachims¹•Institutions (1)

Cornell University¹

20 Aug 2006

TL;DR: A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.

...read moreread less

Abstract: Linear Support Vector Machines (SVMs) have become one of the most prominent machine learning techniques for high-dimensional sparse data commonly encountered in applications like text classification, word-sense disambiguation, and drug design. These applications involve a large number of examples n as well as a large number of features N, while each example has only s

...read moreread less

2,173 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

The Nature of Statistical Learning Theory

[...]

Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

01 Jan 1995

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

...read moreread less

40,147 citations

Journal Article•DOI•

Support-Vector Networks

[...]

Corinna Cortes¹, Vladimir Vapnik¹•Institutions (1)

Bell Labs¹

15 Sep 1995-Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

37,861 citations

"Optimizing search engines using cli..." refers background in this paper

...However, just like in classification SVMs [7], it is possible to approximate the solution by introducing (non-negative) slack variables ξi,j,k and minimizing the upper bound ∑ ξi,j,k. Adding SVM regularization for margin maximization to the objective leads to the following optimization problem, which is similar to the ordinal regression approach in [12]....
[...]
...However, just like in classification SVMs [7], it is possible to approximate the solution by introducing (non-negative) slack variables ξi,j,k and minimizing the upper bound ∑ ξi,j,k....
[...]

Statistical learning theory

[...]

Vladimir Vapnik

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

26,531 citations

"Optimizing search engines using cli..." refers background or methods in this paper

...This makes it possible to use Kernels [4][25] and extend the Ranking SVM algorithm to non-linear retrieval functions....
[...]
...Therefore, the following algorithm directly addresses (6), taking an empirical risk minimization approach [25]....
[...]
...Note that (6) is (proportional to) a risk functional [25] with −τ as the loss function....
[...]

Proceedings Article•DOI•

A training algorithm for optimal margin classifiers

[...]

Bernhard E. Boser¹, Isabelle Guyon², Vladimir Vapnik²•Institutions (2)

University of California, Berkeley¹, Bell Labs²

01 Jul 1992

TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.

...read moreread less

Abstract: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented. The technique is applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions. The effective number of parameters is adjusted automatically to match the complexity of the problem. The solution is expressed as a linear combination of supporting patterns. These are the subset of training patterns that are closest to the decision boundary. Bounds on the generalization performance based on the leave-one-out method and the VC-dimension are given. Experimental results on optical character recognition problems demonstrate the good generalization obtained when compared with other learning algorithms.

...read moreread less

11,211 citations

Book•

Modern Information Retrieval

[...]

Ricardo Baeza-Yates, Berthier Ribeiro-Neto

15 May 1999

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.

...read moreread less

Abstract: From the Publisher: This is a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective. The advent of the Internet and the enormous increase in volume of electronically stored information generally has led to substantial work on IR from the computer science perspective - this book provides an up-to-date student oriented treatment of the subject.

...read moreread less

9,923 citations