Optimizing search engines using clickthrough data

doi:10.1145/775047.775067

Proceedings ArticleDOI

Optimizing search engines using clickthrough data

Thorsten Joachims

- pp 133-142

Chats0

TLDR

The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.

Abstract:

This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Learning joint query interpretation and response ranking

Uma Sawant, +1 more

TL;DR: This work proposes two new, natural formulations for joint query interpretation and response ranking that exploit bidirectional flow of information between the knowledge base and the corpus, inspired by probabilistic language models and max-margin discriminative learning.

...read moreread less

Proceedings ArticleDOI

SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates

Martin Engilberge, +3 more

TL;DR: The present work introduces a new method to learn approximations of such non-differentiable objective functions, based on a deep architecture that approximates the sorting of arbitrary sets of scores, that is trained virtually for free using synthetic data.

...read moreread less

Proceedings ArticleDOI

Multi-model Ontology-Based Hybrid Recommender System in E-learning Domain

Leyla Zhuhadar, +3 more

TL;DR: A multi-model ontology-based framework for semantic search of educational content in E-learning repository of courses, lectures, multimedia resources, etc is introduced and is implemented on the HyperManyMedia1 platform.

...read moreread less

Proceedings ArticleDOI

Ranking related news predictions

Nattiya Kanhabua, +2 more

TL;DR: This paper proposes a new task to address the problem of retrieving and ranking sentences that contain mentions to future events, which is called ranking related news predictions, and proposes a learning to rank approach based on 4 classes of features: term similarity, entity-based similarity, topic similarity, and temporal similarity.

...read moreread less

Proceedings Article

Modeling coherence in ESOL learner texts

Helen Yannakoudakis, +1 more

TL;DR: This work presents the first systematic analysis of several methods for assessing coherence under the framework of automated assessment (AA) of learner free-text responses, and examines the predictive power of different coherence models by measuring the effect on performance when combined with an AA system that achieves competitive results.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

The Nature of Statistical Learning Theory

Vladimir Vapnik

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?

...read moreread less

Journal ArticleDOI

Support-Vector Networks

Corinna Cortes, +1 more

- 15 Sep 1995 -

Machine Learning

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

...read moreread less

Statistical learning theory

Vladimir Vapnik

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Proceedings ArticleDOI

A training algorithm for optimal margin classifiers

Bernhard E. Boser, +2 more

TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.

...read moreread less

Book

Modern Information Retrieval

Ricardo Baeza-Yates, +1 more

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.

...read moreread less

Collapse

Related Papers (5)

Learning to rank using gradient descent

Chris J.C. Burges, +6 more

An efficient boosting algorithm for combining preferences

Yoav Freund, +3 more

- 01 Dec 2003 -

Journal of Machine Learning Research

ACM Transactions on Information Systems

Optimizing search engines using clickthrough data

Citations

Learning joint query interpretation and response ranking

SoDeep: A Sorting Deep Net to Learn Ranking Loss Surrogates

Multi-model Ontology-Based Hybrid Recommender System in E-learning Domain

Ranking related news predictions

Modeling coherence in ESOL learner texts

References

The Nature of Statistical Learning Theory

Support-Vector Networks

Statistical learning theory

A training algorithm for optimal margin classifiers

Modern Information Retrieval

Related Papers (5)

Learning to rank using gradient descent

An efficient boosting algorithm for combining preferences

Learning to rank: from pairwise approach to listwise approach

Learning to Rank for Information Retrieval

Cumulated gain-based evaluation of IR techniques