Proceedings ArticleDOI
Optimizing search engines using clickthrough data
Thorsten Joachims
- pp 133-142
Reads0
Chats0
TLDR
The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.Abstract:
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.read more
Citations
More filters
Proceedings ArticleDOI
Structured learning for non-smooth ranking losses
TL;DR: This paper proposes new, almost-linear-time algorithms to optimize for two other criteria widely used to evaluate search systems: MRR (mean reciprocal rank) and NDCG (normalized discounted cumulative gain) in the max-margin structured learning framework.
Proceedings ArticleDOI
A probabilistic method for inferring preferences from clicks
TL;DR: This paper derives an unbiased estimator of comparison outcomes and shows how marginalizing over possible comparison outcomes given the observed click data can make this estimator even more effective.
Proceedings ArticleDOI
Towards context-aware search by learning a very large variable length hidden markov model from search logs
TL;DR: A strategy for parameter initialization in vlHMM learning which can greatly reduce the number of parameters to be estimated in practice is developed and a method for distributed vl HMM learning under the map-reduce model is devised.
Proceedings ArticleDOI
Probabilistic latent preference analysis for collaborative filtering
Nathan Liu,Min Zhao,Qiang Yang +2 more
TL;DR: The probabilistic latent preference analysis (pLPA) model for ranking predictions is proposed by directly modeling user preferences with respect to a set of items rather than the rating scores on individual items.
Proceedings ArticleDOI
Ranking Approaches for Microblog Search
TL;DR: This paper describes several new strategies for ranking microblogs in a real-time search engine and develops a framework to obtain such validation data, as well as evaluation measures to assess the accuracy of the proposed ranking strategies.
References
More filters
Book
The Nature of Statistical Learning Theory
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI
Support-Vector Networks
Corinna Cortes,Vladimir Vapnik +1 more
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Statistical learning theory
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Proceedings ArticleDOI
A training algorithm for optimal margin classifiers
TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Book
Modern Information Retrieval
TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.