scispace - formally typeset
Proceedings ArticleDOI

Optimizing search engines using clickthrough data

Reads0
Chats0
TLDR
The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract
This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

read more

Citations
More filters
Patent

Systems and methods to tune a general-purpose search engine for a search entry point

TL;DR: In this paper, statistical filtering and ranking techniques are employed to improve content search engine search results by tuning a general-purpose search engine for an entry point for a group of users.
Patent

Selection of advertisements to present on a web page or other destination based on search activities of users who selected the destination

TL;DR: A tracking system passively tracks and records searches conducted by actual search engine users as mentioned in this paper, where the collected data is aggregated and analyzed to generate data regarding the search queries used to locate and access particular destinations (e.g., web pages and sites).
Proceedings Article

An Analysis of Factors Used in Search Engine Ranking

TL;DR: This paper investigates the influence of different page features on the ranking of search engine results and reformulates the problem of learning the underlying, hidden scores as a binary classification problem, applying both linear and non-linear methods.
Proceedings ArticleDOI

ViewSer: enabling large-scale remote user studies of web search examination and interaction

TL;DR: This work introduces ViewSer, a novel methodology for performing web search examination studies remotely, at scale, and without requiring eye-tracking equipment, and explores applications of ViewSer to practical search tasks, such as analyzing the search result summary (snip- pet) attractiveness, result re-ranking, and evaluating snippet quality.
Journal ArticleDOI

Search Engines that Learn from Their Users

TL;DR: A new online evaluation paradigm called multileaving that extends upon interleaving is introduced that means that fewer users need to be exposed to possibly inferior search engines as they adapt more quickly to changes in user preferences.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

Statistical learning theory

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Proceedings ArticleDOI

A training algorithm for optimal margin classifiers

TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Book

Modern Information Retrieval

TL;DR: In this article, the authors present a rigorous and complete textbook for a first course on information retrieval from the computer science (as opposed to a user-centred) perspective, which provides an up-to-date student oriented treatment of the subject.
Related Papers (5)