Topic
Ranking (information retrieval)
About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.
Papers published on a yearly basis
Papers
More filters
••
01 Jul 1997TL;DR: It is concluded that interactive query expansion has good potential, particular y for term sources that are porer than relevance feedback, but it may be difficult for searchers to realise this potential without experience or training in term selection and free-text search strategies.
Abstract: In query expansion, terms from a source such as relevance feedback are added to the query. This often improves retrieval effectiveness but results are variable across queries. In interactive query expansion (IQE) the automatically-derived terms are instead offered as suggestions to the searcher, who decides which to add. There is little evidence of whether IQE is likely to be effective over multiple iterations in a large scale retrieval context, or whether inexperienced users can achieve this effectiveness in practice. These experiments address these two questions. A small but significant improvement in potential retrieval effectiveness is found. This is consistent across a range of topics. Inexperienced users’ term selections consistently fail to improve on automatic query expansion, however. It is concluded that interactive query expansion has good potential, particular y for term sources that are porer than relevance feedback. But it may be difficult for searchers to realise this potential without experience or training in term selection and free-text search strategies.
165 citations
••
11 Jul 2021TL;DR: Pyserini as mentioned in this paper is a Python toolkit for reproducible information retrieval research with sparse and dense representations, which aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture.
Abstract: Pyserini is a Python toolkit for reproducible information retrieval research with sparse and dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections. We aim to support, out of the box, the entire research lifecycle of efforts aimed at improving ranking with modern neural approaches. In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well as hybrid retrieval that integrates both approaches. This paper provides an overview of toolkit features and presents empirical results that illustrate its effectiveness on two popular ranking tasks. Around this toolkit, our group has built a culture of reproducibility through shared norms and tools that enable rigorous automated testing.
165 citations
•
13 Jun 1997TL;DR: In this paper, a method for classifying a document based on content within a class hierarchy is proposed, which consists of a plurality of category nodes stored within a tree data structure, each of which includes a category name corresponding to a unique directory and a category definition comprising a set of defining terms.
Abstract: A method for classifying a document based on content within a class hierarchy. The class hierarchy comprises a plurality of category nodes stored within a tree data structure. Each of the plurality of category nodes includes a category name corresponding to a unique directory and a category definition comprising a set of defining terms. The class hierarchy is searched to determine appropriate categories for classification of the document. The document is then stored in directories corresponding to the categories selected for classification. If no categories are produced by the search, a system administrator is notified of the unsuccessful search.
165 citations
•
25 Oct 2010TL;DR: In this paper, a method for transliteration includes receiving input such as a word, a sentence, a phrase, and a paragraph, in a source language, creating source language sub-phonetic units for the word and converting the source language SUB-PHONETs to target language subphonETs.
Abstract: A method for transliteration includes receiving input such as a word, a sentence, a phrase, and a paragraph, in a source language, creating source language sub-phonetic units for the word and converting the source language sub-phonetic units for the word to target language sub-phonetic units, retrieving ranking for each of the target language sub-phonetic units from a database and creating target language words for the word in the source language based on the target language sub-phonetic units and ranking of the each of the target language sub-phonetic units. The method further includes identifying candidate target language words based predefined criteria, and displaying candidate target language words.
164 citations
••
24 Jul 2011TL;DR: This work proposes a probabilistic mechanism for generating query suggestions from the corpus without using query logs and utilizes the document corpus to extract a set of candidate phrases that are highly correlated with the partial user query.
Abstract: After an end-user has partially input a query, intelligent search engines can suggest possible completions of the partial query to help end-users quickly express their information needs. All major web-search engines and most proposed methods that suggest queries rely on search engine query logs to determine possible query suggestions. However, for customized search systems in the enterprise domain, intranet search, or personalized search such as email or desktop search or for infrequent queries, query logs are either not available or the user base and the number of past user queries is too small to learn appropriate models. We propose a probabilistic mechanism for generating query suggestions from the corpus without using query logs. We utilize the document corpus to extract a set of candidate phrases. As soon as a user starts typing a query, phrases that are highly correlated with the partial user query are selected as completions of the partial query and are offered as query suggestions. Our proposed approach is tested on a variety of datasets and is compared with state-of-the-art approaches. The experimental results clearly demonstrate the effectiveness of our approach in suggesting queries with higher quality.
164 citations