scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Posted Content
TL;DR: This tutorial introduces basic concepts and intuitions behind neural IR models, and places them in the context of traditional retrieval models, by introducing fundamental concepts of IR and different neural and non-neural approaches to learning vector representations of text.
Abstract: Neural ranking models for information retrieval (IR) use shallow or deep neural networks to rank search results in response to a query. Traditional learning to rank models employ machine learning techniques over hand-crafted IR features. By contrast, neural models learn representations of language from raw text that can bridge the gap between query and document vocabulary. Unlike classical IR models, these new machine learning based approaches are data-hungry, requiring large scale training data before they can be deployed. This tutorial introduces basic concepts and intuitions behind neural IR models, and places them in the context of traditional retrieval models. We begin by introducing fundamental concepts of IR and different neural and non-neural approaches to learning vector representations of text. We then review shallow neural IR methods that employ pre-trained neural term embeddings without learning the IR task end-to-end. We introduce deep neural networks next, discussing popular deep architectures. Finally, we review the current DNN models for information retrieval. We conclude with a discussion on potential future directions for neural IR.

155 citations

Journal ArticleDOI
TL;DR: This paper used corpus analysis techniques to automatically discover similar words directly from the contents of the databases which are not tagged with part-of-speech labels, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents.
Abstract: Searching online text collections can be both rewarding and frustrating. While valuable information can be found, typically many irrelevant documents are also retrieved, while many relevant ones are missed. Terminology mismatches between the user's query and document contents are a main cause of retrieval failures. Expanding a user's query with related words can improve search performances, but finding and using related words is an open problem. This research uses corpus analysis techniques to automatically discover similar words directly from the contents of the databases which are not tagged with part-of-speech labels. Using these similarities, user queries are automatically expanded, resulting in conceptual retrieval rather than requiring exact word matches between queries and documents. We are able to achieve a 7.6% improvement for TREC 5 queries and up to a 28.5% improvement on the narrow-domain Cystic Fibrosis collection. This work has been extended to multidatabase collections where each subdatabase has a collection-specific similarity matrix associated with it. If the best matrix is selected, substantial search improvements are possible. Various techniques to select the appropriate matrix for a particular query are analyzed, and a 4.8% improvement in the results is validated.

154 citations

Journal ArticleDOI
TL;DR: A new method of document retrieval based on the fundamental operations of the fuzzy set theory is presented, starting by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed.
Abstract: The aim of a document retrieval system is to issue documents which contain the information needed by a given user of an information system The process of retrieving documents in response to a given query is carried out by means of the search patterns of these documents and the query It is thus clear that the quality of this process, ie the pertinence of the information system response to the information need of a given user depends on the degree of accuracy in which document and query contents are represented by their search patterns It seems obvious that the weighting of descriptors entering document search patterns improves the quality of the document retrieval process A mathematical apparatus which takes into consideration, in a natural manner, the fact that the grades of importance of the descriptors in document search patterns are of the continuum type, that is an apparatus adequate to the description of a retrieval system of documents indexed by weighted descriptors is—among known mathematical methods—the theory of fuzzy sets, formulated by LA Zadeh It is the aim of this paper to present a new method of document retrieval based on the fundamental operations of the fuzzy set theory We start by introducing basic notions, then the syntax and semantics of the proposed language for document retrieval will be given and an algorithm allocating documents to particular queries will be described and its properties discussed The basic advantage of the use of the fuzzy set theory for document retrieval system description is that it takes into consideration, in a simple way, the differentiation of the importance of descriptors in document search patterns and the differentiation of the formal relevance grades of particular documents of an information system to a given query Documents of the highest grades (in the given information system) of formal relevance to the given query may be retrieved by means of the application of simple operations of the fuzzy set theory

154 citations

Proceedings ArticleDOI
01 Sep 2007
TL;DR: Results show that when the number of objectives increases, the selection based on just Pareto-dominance without diversity maintenance is able to advance the search better than with diversity maintenance, and diversity maintenance connives at difficulties solving problems with a high number of objective.
Abstract: An alternative relation to Pareto-dominance is studied. The relation is based on ranking a set of solutions according to each separate objective and an aggregation function to calculate a scalar fitness value for each solution. The relation is called as ranking-dominance and it tries to tackle the curse of dimensionality commonly observed in multi-objective optimization. Ranking-dominance can be used to sort a set of solutions even for a large number of objectives when the Pareto-dominance relation cannot distinguish solutions from one another anymore. This permits the search to advance even with a large number of objectives. Experimental results indicate that in some cases the selection based on ranking-dominance is able to advance the search towards the Pareto-front better than the selection based on Pareto-dominance. However, in some cases it is also possible that the search does not proceed into direction of the Pareto-front because the ranking-dominance relation permits deterioration of individual objectives. The results also show that when the number of objectives increases, the selection based on just Pareto-dominance without diversity maintenance is able to advance the search better than with diversity maintenance. Therefore, diversity maintenance connives at difficulties solving problems with a high number of objectives.

154 citations

Journal ArticleDOI
TL;DR: This article presents a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries and term-based ranking.
Abstract: XML represents both content and structure of documents. Taking advantage of the document structure promises to greatly improve the retrieval precision. In this article, we present a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Our query model is based on tree matching as a simple and elegant means to formulate queries without knowing the exact structure of the data. Using this query model we propose a logical document concept by deciding on the document boundaries at query time. We combine structured queries and term-based ranking by extending the term concept to structural terms that include substructures of queries and documents. The notions of term frequency and inverse document frequency are adapted to logical documents and structural terms. We introduce an efficient technique to calculate all necessary term frequencies and inverse document frequencies at query time. By adjusting parameters of the retrieval process we are able to model two contrary approaches: the classical vector space model, and the original tree matching approach.

154 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168