scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Experimental results show that the proposed approach to relevant term extraction and term suggestion can provide organized and highly relevant terms, and can exploit the contextual information in a user's query session to make more effective suggestions.
Abstract: This paper proposes an effective term suggestion approach to interactive Web search. Conventional approaches to making term suggestions involve extracting co-occurring keyterms from highly ranked retrieved documents. Such approaches must deal with term extraction difficulties and interference from irrelevant documents, and, more importantly, have difficulty extracting terms that are conceptually related but do not frequently co-occur in documents. In this paper, we present a new, effective log-based approach to relevant term extraction and term suggestion. Using this approach, the relevant terms suggested for a user query are those that co-occur in similar query sessions from search engine logs, rather than in the retrieved documents. In addition, the suggested terms in each interactive search step can be organized according to its relevance to the entire query session, rather than to the most recent single query as in conventional approaches. The proposed approach was tested using a proxy server log containing about two million query transactions submitted to search engines in Taiwan. The obtained experimental results show that the proposed approach can provide organized and highly relevant terms, and can exploit the contextual information in a user's query session to make more effective suggestions.

273 citations

Proceedings ArticleDOI
28 Jul 2003
TL;DR: It is shown that the CORI algorithm does not do well in environments with a mix of "small" and "very large" databases, and a new resource selection algorithm is proposed that uses information about database sizes as well as database contents.
Abstract: Prior research under a variety of conditions has shown the CORI algorithm to be one of the most effective resource selection algorithms, but the range of database sizes studied was not large. This paper shows that the CORI algorithm does not do well in environments with a mix of "small" and "very large" databases. A new resource selection algorithm is proposed that uses information about database sizes as well as database contents. We also show how to acquire database size estimates in uncooperative environments as an extension of the query-based sampling used to acquire resource descriptions. Experiments demonstrate that the database size estimates are more accurate for large databases than estimates produced by a competing method; the new resource ranking algorithm is always at least as effective as the CORI algorithm; and the new algorithm results in better document rankings than the CORI algorithm.

273 citations

Journal ArticleDOI
TL;DR: A series of knowledge extractors are developed, which employ a combination of knowledge-based and statistical techniques, for automatically identifying clinically relevant aspects of MEDLINE abstracts, and which significantly outperforms the already competitive PubMed baseline.
Abstract: The combination of recent developments in question-answering research and the availability of unparalleled resources developed specifically for automatic semantic processing of text in the medical domain provides a unique opportunity to explore complex question answering in the domain of clinical medicine. This article presents a system designed to satisfy the information needs of physicians practicing evidence-based medicine. We have developed a series of knowledge extractors, which employ a combination of knowledge-based and statistical techniques, for automatically identifying clinically relevant aspects of MEDLINE abstracts. These extracted elements serve as the input to an algorithm that scores the relevance of citations with respect to structured representations of information needs, in accordance with the principles of evidence-based medicine. Starting with an initial list of citations retrieved by PubMed, our system can bring relevant abstracts into higher ranking positions, and from these abstracts generate responses that directly answer physicians' questions. We describe three separate evaluations: one focused on the accuracy of the knowledge extractors, one conceptualized as a document reranking task, and finally, an evaluation of answers by two physicians. Experiments on a collection of real-world clinical questions show that our approach significantly outperforms the already competitive PubMed baseline.

272 citations

Posted Content
TL;DR: This paper forms the ranking problem in a rigorous statistical framework, establishes in particular a tail inequality for degenerate U-processes, and applies it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification.
Abstract: The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is "better," with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish in particular a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied.

272 citations

Journal ArticleDOI
TL;DR: In this article, the authors formulate the ranking problem in a rigorous statistical framework, where the goal is to learn a ranking rule for deciding, among two instances, which one is "better" with minimum ranking risk.
Abstract: The problem of ranking/ordering instances, instead of simply classifying them, has recently gained much attention in machine learning. In this paper we formulate the ranking problem in a rigorous statistical framework. The goal is to learn a ranking rule for deciding, among two instances, which one is "better," with minimum ranking risk. Since the natural estimates of the risk are of the form of a U-statistic, results of the theory of U-processes are required for investigating the consistency of empirical risk minimizers. We establish in particular a tail inequality for degenerate U-processes, and apply it for showing that fast rates of convergence may be achieved under specific noise assumptions, just like in classification. Convex risk minimization methods are also studied.

272 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168