scispace - formally typeset
Search or ask a question

Showing papers on "Ranking (information retrieval) published in 2002"


Proceedings ArticleDOI
23 Jul 2002
TL;DR: The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking.
Abstract: This paper presents an approach to automatically optimizing the retrieval quality of search engines using clickthrough data. Intuitively, a good information retrieval system should present relevant documents high in the ranking, with less relevant documents following below. While previous approaches to learning retrieval functions from examples exist, they typically require training data generated from relevance judgments by experts. This makes them difficult and expensive to apply. The goal of this paper is to develop a method that utilizes clickthrough data for training, namely the query-log of the search engine in connection with the log of links the users clicked on in the presented ranking. Such clickthrough data is available in abundance and can be recorded at very low cost. Taking a Support Vector Machine (SVM) approach, this paper presents a method for learning retrieval functions. From a theoretical perspective, this method is shown to be well-founded in a risk minimization framework. Furthermore, it is shown to be feasible even for large sets of queries and features. The theoretical results are verified in a controlled experiment. It shows that the method can effectively adapt the retrieval function of a meta-search engine to a particular group of users, outperforming Google in terms of retrieval quality after only a couple of hundred training examples.

4,453 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: A set of PageRank vectors are proposed, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic, and are shown to generate more accurate rankings than with a single, generic PageRank vector.
Abstract: In the original PageRank algorithm for improving the ranking of search-query results, a single PageRank vector is computed, using the link structure of the Web, to capture the relative "importance" of Web pages, independent of any particular search query. To yield more accurate search results, we propose computing a set of PageRank vectors, biased using a set of representative topics, to capture more accurately the notion of importance with respect to a particular topic. By using these (precomputed) biased PageRank vectors to generate query-specific importance scores for pages at query time, we show that we can generate more accurate rankings than with a single, generic PageRank vector. For ordinary keyword search queries, we compute the topic-sensitive PageRank scores for pages satisfying the query using the topic of the query keywords. For searches done in context (e.g., when the search query is performed by highlighting words in a Web page), we compute the topic-sensitive PageRank scores using the topic of the context in which the query appeared.

1,765 citations


Patent
01 Apr 2002
TL;DR: In this paper, a modular intelligent personal agent system is presented for search, navigation, control, retrieval, analysis, and results reporting on networks and databases, where hypertext documents and associated content media are displayed as symbol or thumbnail web documents as nodes with connector lines representing links between the documents.
Abstract: A modular intelligent personal agent system is presented for search, navigation, control, retrieval, analysis, and results reporting on networks and databases. A client-side or server-side software application retrieves and interprets hypertext documents executing a search algorithm, which search results are displayed in alternate three-dimensional and two-dimensional graphical visualization formats. Hypertext documents and associated content media are displayed as symbol or thumbnail web documents as nodes with connector lines representing links between the documents. Nodes and connector lines are color-coded symbol form for the user according to truth of search terms, numeric data tested in hypertext documents, according to domain type, link density, and metric counts. Different symbols represent search and Boolean evaluation status, document type, and thumbnails represent whole or incremental portions of the document page or type documents found. The three-dimensional displayed nodes are visually navigated based on recency, chronology of discovery and metric information values. The result of searches performed by the system can retrieve user selected documents from a network and automatically format results of the search and content retrieval using a plurality of ranking methods. The system provides alerts and content delivery to users using email, instant messaging and audio. Multiple agents can operate to accomplish complex tasks a singular agent cannot. Agents are deployed

596 citations


Patent
10 Dec 2002
TL;DR: In this article, the authors propose a disclosed search engine that deals with the problems of synonymy, polysemy, and retrieval by concept by allowing for a wide margin of uncertainty in the initial choice of keywords in a query.
Abstract: An information retrieval system that deals with the problems of synonymy, polysemy, and retrieval by concept by allowing for a wide margin of uncertainty in the initial choice of keywords in a query. For each input query vector and an information matrix, the disclosed system solves an optimization problem which maximizes the stability of a solution at a given level of misfit. The disclosed system may include a decomposition of the information matrix in terms of orthogonal basis functions. Each basis encodes groups of conceptually related keywords. The bases are arranged in order of decreasing statistical relevance to a query. The disclosed search engine approximates the input query with a weighted sum of the first few bases. Other commercial applications than the disclosed search engine can also be built on the disclosed techniques.

545 citations


Journal ArticleDOI
TL;DR: This work proposes ranking fuzzy numbers with the area between the centroid point and original point to overcome shortcomings in the coefficient of variation (CV index).
Abstract: To improve the ranking method of Lee and Li [1], Cheng [2] proposed the coefficient of variation (CV index). Shortcomings are also found in the CV index. Cheng [2] also proposed the distance method to improve the ranking method of Murakami et al. However, the distance method is not sound either. Moreover, the CV index contradicts the distance method in ranking some fuzzy numbers. Therefore, to overcome the above shortcomings, we propose ranking fuzzy numbers with the area between the centroid point and original point.

512 citations


Proceedings ArticleDOI
07 May 2002
TL;DR: The experimental results show that the log-based probabilistic query expansion method can greatly improve the search performance and has several advantages over other existing methods.
Abstract: Query expansion has long been suggested as an effective way to resolve the short query and word mismatching problems A number of query expansion methods have been proposed in traditional information retrieval However, these previous methods do not take into account the specific characteristics of web searching; in particular, of the availability of large amount of user interaction information recorded in the web query logs In this study, we propose a new method for query expansion based on query logs The central idea is to extract probabilistic correlations between query terms and document terms by analyzing query logs These correlations are then used to select high-quality expansion terms for new queries The experimental results show that our log-based probabilistic query expansion method can greatly improve the search performance and has several advantages over other existing methods

495 citations


Journal Article
TL;DR: A new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query and shows that a combination of both keywords and user logs is better than using either method alone.
Abstract: Query clustering is a process used to discover frequently asked questions or most popular topics on a search engine. This process is crucial for search engines based on question-answering. Because of the short lengths of queries, approaches based on keywords are not suitable for query clustering. This paper describes a new query clustering method that makes use of user logs which allow us to identify the documents the users have selected for a query. The similarity between two queries may be deduced from the common documents the users selected for them. Our experiments show that a combination of both keywords and user logs is better than using either method alone.

486 citations


Book ChapterDOI
01 Jan 2002
TL;DR: A broad and diverse group of experimental results is presented to demonstrate that the algorithms are effective, efficient, robust, and scalable.
Abstract: A multi-database model of distributed information retrieval is presented, in which people are assumed to have access to many searchable text databases. In such an environment, full-text information retrieval consists of discovering database contents, ranking databases by their expected ability to satisfy the query, searching a small number of databases, and merging results returned by different databases. This paper presents algorithms for each task. It also discusses how to reorganize conventional test collections into multi-database testbeds, and evaluation methodologies for multi-database experiments. A broad and diverse group of experimental results is presented to demonstrate that the algorithms are effective, efficient, robust, and scalable.

450 citations


Journal ArticleDOI
TL;DR: In this article, techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine are surveyed.
Abstract: Frequently a user's information needs are stored in the databases of multiple search engines. It is inconvenient and inefficient for an ordinary user to invoke multiple search engines and identify useful documents from the returned results. To support unified access to multiple search engines, a metasearch engine can be constructed. When a metasearch engine receives a query from a user, it invokes the underlying search engines to retrieve useful information for the user. Metasearch engines have other benefits as a search tool such as increasing the search coverage of the Web and improving the scalability of the search. In this article, we survey techniques that have been proposed to tackle several underlying challenges for building a good metasearch engine. Among the main challenges, the database selection problem is to identify search engines that are likely to return useful documents to a given query. The document selection problem is to determine what documents to retrieve from each identified search engine. The result merging problem is to combine the documents returned from multiple search engines. We will also point out some problems that need to be further researched.

443 citations


Journal ArticleDOI
TL;DR: The strategies that have evolved to deal with the problem of matching material and process attributes to design requirements are reviewed, the progress that has been made and the challenges that remain.

371 citations


Proceedings ArticleDOI
04 Nov 2002
TL;DR: A graph-theoretic analysis is applied to one of the two major classes of voting procedures from Social Choice Theory, the Condorcet procedure, and yields a sorting-based algorithm that performs very well on TREC data, often outperforming existing metasearch algorithms whether or not relevance scores and training data is available.
Abstract: We present a new algorithm for improving retrieval results by combining document ranking functions: Condorcet-fuse. Beginning with one of the two major classes of voting procedures from Social Choice Theory, the Condorcet procedure, we apply a graph-theoretic analysis that yields a sorting-based algorithm that is elegant, efficient, and effective. The algorithm performs very well on TREC data, often outperforming existing metasearch algorithms whether or not relevance scores and training data is available. Condorcet-fuse significantly outperforms Borda-fuse, the analogous representative from the other major class of voting algorithms.

Proceedings ArticleDOI
S. Muthukrishnan1
06 Jan 2002
TL;DR: This paper considers document retrieval problems that are motivated by online query processing in databases, Information Retrieval systems and Computational Biology, and provides the first known optimal algorithm for the document listing problem.
Abstract: We are given a collection D of text documents d1,…,dk, with ∑i = n, which may be preprocessed. In the document listing problem, we are given an online query comprising of a pattern string p of length m and our goal is to return the set of all documents that contain one or more copies of p. In the closely related occurrence listing problem, we output the set of all positions within the documents where pattern p occurs. In 1973, Weiner [24] presented an algorithm with O(n) time and space preprocessing following which the occurrence listing problem can be solved in time O(m + output) where output is the number of positions where p occurs; this algorithm is clearly optimal. In contrast, no optimal algorithm is known for the closely related document listing problem, which is perhaps more natural and certainly well-motivated.We provide the first known optimal algorithm for the document listing problem. More generally, we initiate the study of pattern matching problems that require retrieving documents matched by the patterns; this contrasts with pattern matching problems that have been studied more frequently, namely, those that involve retrieving all occurrences of patterns. We consider document retrieval problems that are motivated by online query processing in databases, Information Retrieval systems and Computational Biology. We present very efficient (optimal) algorithms for our document retrieval problems. Our approach for solving such problems involve performing "local" encodings whereby they are reduced to range query problems on geometric objects --- points and lines --- that have color. We present improved algorithms for these colored range query problems that arise in our reductions using the structural properties of strings. This approach is quite general and yields simple, efficient, implementable algorithms for all the document retrieval problems in this paper.

Patent
25 Apr 2002
TL;DR: In this article, a spoken query is represented as a lattice indicating possible sequential combinations of words in the spoken query, and the lattice is converted to a query certainty vector.
Abstract: A system and method indexes and retrieves documents stored in a database. A document feature vector is extracted for each document to be indexed. The feature vector is projected to a low dimension document feature vector, and the documents are indexed according to the low dimension document feature vectors. A spoken query is represented as a lattice indicating possible sequential combinations of words in the spoken query. The lattice is converted to a query certainty vector, which is also projected to a low dimension query certainty vector. The low dimension query vector is compared to each of the low dimension document feature vectors to retrieve a matching result set of documents.

Patent
16 Apr 2002
TL;DR: In this paper, the content of the messages, the recipient's address book, and parameters such as desired keywords and undesired keywords are employed in determining a numeric ranking for each message.
Abstract: Electronic messages are processed based on criteria relating to the sender, the content, and the personalization of the message. The content of the messages, the recipient's address book, and parameters such as desired keywords and undesired keywords are employed in determining a numeric ranking for each message. Based upon the ranking, messages are assigned to a category indicating an expected response of the recipient, such as read, reply, and save, or simply read. Messages in the lowest category (spam) are marked for deletion. Fuzzy logic is preferably applied in determining the category to which a message is assigned based on its ranking. Content importance, sender importance, and degree of personalization are combined in a non-linear manner to rank a message. Based on the recipient's actual response to a message, the priority of subsequent similar messages is adjusted to more accurately assign the messages to a category.

Journal ArticleDOI
TL;DR: It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents, and a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance is proposed.
Abstract: This article proposes evaluation methods based on the use of nondichotomous relevance judgements in IR experiments. It is argued that evaluation methods should credit IR methods for their ability to retrieve highly relevant documents. This is desirable from the user point of view in modern large IR environments. The proposed methods are (1) a novel application of P-R curves and average precision computations based on separate recall bases for documents of different degrees of relevance, and (2) generalized recall and precision based directly on multiple grade relevance assessments (i.e., not dichotomizing the assessments). We demonstrate the use of the traditional and the novel evaluation measures in a case study on the effectiveness of query types, based on combinations of query structures and expansion, in retrieving documents of various degrees of relevance. The test was run with a best match retrieval system (InQuery1) in a text database consisting of newspaper articles. To gain insight into the retrieval process, one should use both graded relevance assessments and effectiveness measures that enable one to observe the differences, if any, between retrieval methods in retrieving documents of different levels of relevance. In modern times of information overload, one should pay attention, in particular, to the capability of retrieval methods retrieving highly relevant documents.

Patent
Michael E. Barrett1, Alan Levin1
15 Jan 2002
TL;DR: In this article, a method of organizing information in which the search activities of previous users is monitored and such activity is used to organize information for future users is presented, where user activities are monitored from a time and use based perspective to insure more relevant results can be provided in response to a user's search for information.
Abstract: A method of organizing information in which the search activities of previous users is monitored and such activity is used to organize information for future users. The user activities are monitored from a time and use based perspective to insure more relevant results can be provided in response to a user's search for information.

Patent
08 Aug 2002
TL;DR: In this paper, a categorization engine classifies incoming documents to topics and then assigns each document to a topic using confidence scores expressing how confident the algorithm is in this assignment, and the confidence score is compared to the topic's (configurable) threshold.
Abstract: Automatic classification is applied in two stages: classification and ranking. In the first stage, a categorization engine classifies incoming documents to topics. A document may be classified to a single topic or multiple topics or no topics. For each topic, a raw score is generated for a document and that raw score is used to determine whether the document should be at least preliminarily classified to the topic. In the second stage, for each document assigned to a topic (i.e., for each document-topic association) the categorization engine generates confidence scores expressing how confident the algorithm is in this assignment. The confidence score of the assigned document is compared to the topic's (configurable) threshold. If the confidence score is higher than this configurable threshold, the document is placed in the topic's Published list. If not, the document is placed in the topic's Proposed list, where it awaits approval by a knowledge management expert. By modifying a topic's threshold, a knowledge management expert can advantageously control the tradeoff between human oversight and control vs. time and human effort expended.

Proceedings ArticleDOI
04 Nov 2002
TL;DR: An approach to retrieval of documents that contain of both free text and semantically enriched markup in which both documents and queries can be marked up with statements in the DAML+OIL semantic web language is described.
Abstract: We describe an approach to retrieval of documents that contain of both free text and semantically enriched markup. In particular, we present the design and implementation prototype of a framework in which both documents and queries can be marked up with statements in the DAML+OIL semantic web language. These statements provide both structured and semi-structured information about the documents and their content. We claim that indexing text and semantic markup together will significantly improve retrieval performance. Our approach allows inferencing to be done over this information at several points: when a document is indexed, when a query is processed and when query results are evaluated.

Patent
Boris Chidlovskii1
02 Oct 2002
TL;DR: In this article, a user of a meta-search engine submits a query formulated with operators defining relationships between keywords and answers are retrieved from each source as a summary of each document found that satisfies the query.
Abstract: A user of a meta-search engine submits a query formulated with operators defining relationships between keywords. Information sources are selected for interrogation by the user or by the meta-search engine. If necessary, the query is translated for each selected source to adapt the operators of the query to a form accepted by that source. The query is submitted to each selected source and answers are retrieved from each source as a summary of each document found that satisfies the query. The answers are post-filtered from each source to determine if the answers satisfy the originally formulated query. Answers that satisfy the query are displayed as a list of selectable document summaries. The analysis includes computing a subsumption ratio of filtered answers to answers received that satisfy a translated query. The subsumption ratio is used to improve the accuracy of subsequent queries submitted by the user to the meta-search engine.

Patent
03 Jan 2002
TL;DR: In this paper, a search and a browse on a single user query is performed, and a refined query is selected from the results of the first user query by selecting concepts from a first directory associated with the refined query.
Abstract: A search and a browse on a single user query is performed. A refined query is selected from the results of the first user query. Thereafter, a list of concepts from a first directory associated with the refined query is obtained. The concepts are defined in a hierarchical relationship with concepts having broader scope being higher in the hierarchy and concepts having a narrower scope being lower in the hierarchy. Additionally, a list of web sites associated with the search concept is obtained from a second directory.

Book ChapterDOI
25 Mar 2002
TL;DR: This paper presents the XXL search engine that supports relevance ranking on XML data XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions.
Abstract: Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java servlets. Experiments with a variety of structurally diverse XML data demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

Proceedings ArticleDOI
11 Jul 2002
TL;DR: In this article, the authors discuss manual and automatic evaluation of summaries using data from the Document Understanding Conference 2001 (DUC-2001) and show the instability of the manual evaluation.
Abstract: In this paper we discuss manual and automatic evaluations of summaries using data from the Document Understanding Conference 2001 (DUC-2001). We first show the instability of the manual evaluation. Specifically, the low inter-human agreement indicates that more reference summaries are needed. To investigate the feasibility of automated summary evaluation based on the recent BLEU method from machine translation, we use accumulative n-gram overlap scores between system and human summaries. The initial results provide encouraging correlations with human judgments, based on the Spearman rank-order correlation coefficient. However, relative ranking of systems needs to take into account the instability.

Proceedings Article
11 Jul 2002
TL;DR: This paper shows the instability of the manual evaluation of summaries, and investigates the feasibility of automated summary evaluation based on the recent BLEU method from machine translation using accumulative n-gram overlap scores between system and human summaries.
Abstract: In this paper we discuss manual and automatic evaluations of summaries using data from the Document Understanding Conference 2001 (DUC-2001). We first show the instability of the manual evaluation. Specifically, the low interhuman agreement indicates that more reference summaries are needed. To investigate the feasibility of automated summary evaluation based on the recent BLEU method from machine translation, we use accumulative n-gram overlap scores between system and human summaries. The initial results provide encouraging correlations with human judgments, based on the Spearman rank-order correlation coefficient. However, relative ranking of systems needs to take into account the instability.

Proceedings ArticleDOI
Dragomir R. Radev1, Weiguo Fan1, Hong Qi1, Harris Wu1, Amardeep Grewal1 
07 May 2002
TL;DR: The architecture that augments existing search engines so that they support natural language question answering, called NSIR, is developed and some probabilistic approaches to the last three of these stages are described.
Abstract: Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this paper we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR) using proximity and question type features achieves a total reciprocal document rank of .20 on the TREC 8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.

Patent
19 Nov 2002
TL;DR: A search engine that utilizes both record-based data and user activity data to develop, update and refine ranking protocols and identify words and phrases that give rise to search ambiguity as discussed by the authors, so that the engine can interact with the user to better respond to user queries and enhance data acquisition from databases, intranets and internets.
Abstract: A search engine is disclosed that utilizes both record based data and user activity data to develop, update and refine ranking protocols and to identify words and phrases that give rise to search ambiguity so that the engine can interact with the user to better respond to user queries and enhance data acquisition from databases, intranets and internets.

Patent
Bradley Scott Rubin1
04 Jan 2002
TL;DR: In this paper, a framework for use with object-oriented programming systems provides a reusable object oriented (OO) framework that provides an information retrieval (IR) shell that permits a framework user to define an index class that includes word index objects and provides an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained within the word index object that relates to stored documents.
Abstract: A framework for use with object-oriented programming systems provides a reusable object oriented (OO) framework for use with object oriented programming systems that provides an information retrieval (IR) shell that permits a framework user to define an index class that includes word index objects and provides an extensible information retrieval system that evaluates a user query by comparing information contained in the user query with information contained in the word index objects that relates to stored documents. The information in word index objects is produced by preprocessing operations on documents such that the documents relevant to the user query will be identified, thereby providing a query result. The information retrieval system user can load documents into the computer system storage, index documents so their information can be subject to a query search, and request query evaluation to identify and retrieve documents most closely related to the subject matter of a user query.

Proceedings ArticleDOI
07 Nov 2002
TL;DR: This work studies two real search engine traces by examining query locality and its implications for caching, and shows that with proxy or user side caching, prefetching based on the user lexicon looks promising.
Abstract: Caching is a popular technique for reducing both server load and user response time in distributed systems. We consider the question of whether caching might be effective for search engines as well. We study two real search engine traces by examining query locality and its implications for caching. Our trace analysis produced three results. One result shows that queries have significant locality, with query frequency following a Zipf distribution. Very popular queries are shared among different users and can be cached at servers or proxies, while 16% to 22% of the queries are from the same users and should be cached at the user side. Multiple-word queries are shared less and should be cached mainly at the user side. Another result shows that if caching is to be done at the user side, short-term caching for hours is enough to cover query temporal locality, while server/proxy caching should use longer periods, such as days. The third result showed that most users have small lexicons when submitting queries. Frequent users who submit many search requests tend to reuse a small subset of words to form queries. Thus, with proxy or user side caching, prefetching based on the user lexicon looks promising.

Journal ArticleDOI
TL;DR: This article presents a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries and term-based ranking.
Abstract: XML represents both content and structure of documents. Taking advantage of the document structure promises to greatly improve the retrieval precision. In this article, we present a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Our query model is based on tree matching as a simple and elegant means to formulate queries without knowing the exact structure of the data. Using this query model we propose a logical document concept by deciding on the document boundaries at query time. We combine structured queries and term-based ranking by extending the term concept to structural terms that include substructures of queries and documents. The notions of term frequency and inverse document frequency are adapted to logical documents and structural terms. We introduce an efficient technique to calculate all necessary term frequencies and inverse document frequencies at query time. By adjusting parameters of the retrieval process we are able to model two contrary approaches: the classical vector space model, and the original tree matching approach.

01 Jan 2002
TL;DR: TREC-2001 saw the falling into abeyance of the Large Web Task but a strengthening and broadening of activities based on the 1.69 million page WTlOg corpus.
Abstract: TREC-2001 saw the falling into abeyance of the Large Web Task but a strengthening and broadening of activities based on the 1.69 million page WTlOg corpus. There were two tasks. The topic relevance task was like traditional TREC ad hoc but used queries taken from real web search logs from which description and narrative fields of a topic description were inferred by the topic developers. There were 50 topics. In the homepage finding task queries corresponded to the name of an entity whose home page (site entry page) was included in WTlOg. The challenge in this task was to return all of the homepages at the very top of the ranking. Cursory analysis suggests that once again, exploitation of link information did not help on the topic relevance task. By contrast, in the homepage finding task, the best performing run which did not make use of either link information or properties of the document's URL achieved only half of the mean reciprocal rank of the best run.

Patent
30 Apr 2002
TL;DR: In this paper, a system and method for aggregating rankings from a plurality of ranking sources to generate a maximally consistent ranking by minimizing a distance measure is presented, where the ranking sources may be search engines executing queries on web pages that have been deliberately modified to cause an incorrect estimate of their relevance.
Abstract: A system and method for aggregating rankings from a plurality of ranking sources to generate a maximally consistent ranking by minimizing a distance measure. The ranking sources may be search engines executing queries on web pages that have been deliberately modified to cause an incorrect estimate of their relevance. The invention supports combining partial rankings.