scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Patent
29 Jun 2000
TL;DR: In this article, meta-descriptors are generated for multimedia information in a repository by extracting the descriptors from the multimedia information and clustering the metadata information based on the descriptor.
Abstract: Multimedia information retrieval is performed using meta-descriptors in addition to descriptors. A 'descriptor' is a representation of a feature, a 'feature' being a distinctive characteristic of multimedia information, while a 'meta-descriptor' is information about the descriptor. Meta-descriptors are generated for multimedia information in a repository (10, 12, 14, 16, 18, 20, 22, 24) by extracting the descriptors from the multimedia information (111), clustering the multimedia information based on the descriptors (112), assigning meta-descriptors to each cluster (113), and attaching the meta-descriptors to the multimedia information in the repository (114). The multimedia repository is queried by formulating a query using query-by-example (131), acquiring the descriptor/s and meta-descriptor/s for a repository multimedia item (132), generating a query descriptor/s if none of the same type has been previously generated (133, 134), comparing the descriptors of the repository multimedia item and the query multimedia item (135), and ranking and displaying the results (136, 137).

133 citations

Posted Content
TL;DR: The proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches, and shows that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF.
Abstract: A fundamental goal of search engines is to identify, given a query, documents that have relevant text. This is intrinsically difficult because the query and the document may use different vocabulary, or the document may contain query words without being relevant. We investigate neural word embeddings as a source of evidence in document ranking. We train a word2vec embedding model on a large unlabelled query corpus, but in contrast to how the model is commonly used, we retain both the input and the output projections, allowing us to leverage both the embedding spaces to derive richer distributional relationships. During ranking we map the query words into the input space and the document words into the output space, and compute a query-document relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) captures evidence on whether a document is about a query term in addition to what is modelled by traditional term-frequency based approaches. Our experiments show that the DESM can re-rank top documents returned by a commercial Web search engine, like Bing, better than a term-matching based signal like TF-IDF. However, when ranking a larger set of candidate documents, we find the embeddings-based approach is prone to false positives, retrieving documents that are only loosely related to the query. We demonstrate that this problem can be solved effectively by ranking based on a linear mixture of the DESM and the word counting features.

133 citations

Proceedings ArticleDOI
TL;DR: This paper proposed a joint approach that incorporates BERT's classification vector into existing neural models and showed that it outperforms state-of-the-art ad-hoc ranking baselines.
Abstract: Although considerable attention has been given to neural ranking architectures recently, far less attention has been paid to the term representations that are used as input to these models. In this work, we investigate how two pretrained contextualized language models (ELMo and BERT) can be utilized for ad-hoc document ranking. Through experiments on TREC benchmarks, we find that several existing neural ranking architectures can benefit from the additional context provided by contextualized language models. Furthermore, we propose a joint approach that incorporates BERT's classification vector into existing neural models and show that it outperforms state-of-the-art ad-hoc ranking baselines. We call this joint approach CEDR (Contextualized Embeddings for Document Ranking). We also address practical challenges in using these models for ranking, including the maximum input length imposed by BERT and runtime performance impacts of contextualized language models.

133 citations

Journal ArticleDOI
TL;DR: This work puts forward an active-learning selection criterion that minimizes redundancy between the candidate images shown to the user at every feedback round and argues that the insensitivity to scale is desirable in this context and shows how to obtain it by the use of specific kernel functions.
Abstract: As the resolution of remote-sensing imagery increases, the full complexity of the scenes becomes increasingly difficult to approach. User-defined classes in large image databases are often composed of several groups of images and span very different scales in the space of low-level visual descriptors. The interactive retrieval of such image classes is then very difficult. To address this challenge, we evaluate here, in the context of satellite image retrieval, two general improvements for relevance feedback using support vector machines (SVMs). First, to optimize the transfer of information between the user and the system, we focus on the criterion employed by the system for selecting the images presented to the user at every feedback round. We put forward an active-learning selection criterion that minimizes redundancy between the candidate images shown to the user. Second, for image classes spanning very different scales in the low-level description space, we find that a high sensitivity of the SVM to the scale of the data brings about a low retrieval performance. We argue that the insensitivity to scale is desirable in this context, and we show how to obtain it by the use of specific kernel functions. Experimental evaluation of both ranking and classification performance on a ground-truth database of satellite images confirms the effectiveness of our approach

133 citations

Patent
16 Mar 2001
TL;DR: A query information retrieval content enhancing system and method using the system disclosed that takes a user query and generates not only results corresponding to the exact query, but also results that relate to the same query as discussed by the authors.
Abstract: A query information retrieval content enhancing system and method using the system disclosed that takes a user query and generates not only results corresponding to the exact query, but also generates results that relate to the exact query. The related results are generated by identifying query keywords and connectors and determining related keywords and/or connectors. The original keywords and connectors and the relates keywords and connectors are then submitted to data mining routines that generate the related results. The normal results and related results are then made available to the user through an interface so that the user can review, analyze and manipulate the results.

133 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168