scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Proceedings ArticleDOI
20 Apr 2009
TL;DR: A multi-modality recommendation based on both tag and visual correlation is proposed, and the tag recommendation is formulates as a learning problem, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities.
Abstract: Social tagging provides valuable and crucial information for large-scale web image retrieval. It is ontology-free and easy to obtain; however, irrelevant tags frequently appear, and users typically will not tag all semantic objects in the image, which is also called semantic loss. To avoid noises and compensate for the semantic loss, tag recommendation is proposed in literature. However, current recommendation simply ranks the related tags based on the single modality of tag co-occurrence on the whole dataset, which ignores other modalities, such as visual correlation. This paper proposes a multi-modality recommendation based on both tag and visual correlation, and formulates the tag recommendation as a learning problem. Each modality is used to generate a ranking feature, and Rankboost algorithm is applied to learn an optimal combination of these ranking features from different modalities. Experiments on Flickr data demonstrate the effectiveness of this learning-based multi-modality recommendation strategy.

197 citations

Proceedings ArticleDOI
02 Feb 2015
TL;DR: This paper proposes a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base and significantly outperforms several state-of-the-art baselines while being able to process queries in sub-millisecond times---at least two orders of magnitude faster than existing systems.
Abstract: Entity linking deals with identifying entities from a knowledge base in a given piece of text and has become a fundamental building block for web search engines, enabling numerous downstream improvements from better document ranking to enhanced search results pages. A key problem in the context of web search queries is that this process needs to run under severe time constraints as it has to be performed before any actual retrieval takes place, typically within milliseconds.In this paper we propose a probabilistic model that leverages user-generated information on the web to link queries to entities in a knowledge base. There are three key ingredients that make the algorithm fast and space-efficient. First, the linking process ignores any dependencies between the different entity candidates, which allows for a O(k2) implementation in the number of query terms. Second, we leverage hashing and compression techniques to reduce the memory footprint. Finally, to equip the algorithm with contextual knowledge without sacrificing speed, we factor the distance between distributional semantics of the query words and entities into the model.We show that our solution significantly outperforms several state-of-the-art baselines by more than 14% while being able to process queries in sub-millisecond times---at least two orders of magnitude faster than existing systems.

197 citations

Book
David Carmel1, Elad Yom-Tov1
30 Apr 2010
TL;DR: This tutorial is to expose participants to the current research on query performance prediction (also known as query difficulty estimation), and participants will become familiar with states-of-the-art performance prediction methods, and with common evaluation methodologies for prediction quality.
Abstract: Many information retrieval (IR) systems suffer from a radical variance in performance when responding to users' queries. Even for systems that succeed very well on average, the quality of results returned for some of the queries is poor. Thus, it is desirable that IR systems will be able to identify "difficult" queries in order to handle them properly. Understanding why some queries are inherently more difficult than others is essential for IR, and a good answer to this important question will help search engines to reduce the variance in performance, hence better servicing their customer needs. The high variability in query performance has driven a new research direction in the IR field on estimating the expected quality of the search results, i.e. the query difficulty, when no relevance feedback is given. Estimating the query difficulty is a significant challenge due to the numerous factors that impact retrieval performance. Many prediction methods have been proposed recently. However, as many researchers observed, the prediction quality of state-of-the-art predictors is still too low to be widely used by IR applications. The low prediction quality is due to the complexity of the task, which involves factors such as query ambiguity, missing content, and vocabulary mismatch. The goal of this tutorial is to expose participants to the current research on query performance prediction (also known as query difficulty estimation). Participants will become familiar with states-of-the-art performance prediction methods, and with common evaluation methodologies for prediction quality. We will discuss the reasons that cause search engines to fail for some of the queries, and provide an overview of several approaches for estimating query difficulty. We then describe common methodologies for evaluating the prediction quality of those estimators, and some experiments conducted recently with their prediction quality, as measured over several TREC benchmarks. We will cover a few potential applications that can utilize query difficulty estimators by handling each query individually and selectively based on its estimated difficulty. Finally we will summarize with a discussion on open issues and challenges in the field.

197 citations

01 Jan 2006
TL;DR: A formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy is presented and is used to structure search results.
Abstract: In social bookmark tools users are setting up lightweight conceptual structures called folksonomies. Currently, the information retrieval support is limited. We present a formal model and a new search algorithm for folksonomies, called FolkRank, that exploits the structure of the folksonomy. The proposed algorithm is also applied to find communities within the folksonomy and is used to structure search results. All findings are demonstrated on a large scale dataset. A long version of this paper has been published at the European Semantic Web Conference 2006 [3].

196 citations

Journal ArticleDOI
TL;DR: Three retrieval models for probabilistic indexing are described along with evaluation results for each, including the binary independence indexing (BII) model, which is a generalized version of the Maron and Kuhns indexing model.
Abstract: In this article three retrieval models for probabilistic indexing are described along with evaluation results for each. First is the binary independence indexing (BII) model, which is a generalized version of the Maron and Kuhns indexing model. In this model, the indexing weight of a descriptor in a document is an estimate of the probability of relevance of this document with respect to queries using this descriptor. Second is the retrieval-with-probabilistic-indexing (RPI) model, which is suited to different kinds of probabilistic indexing. For that we assume that each indexing scheme has its own concept of “correctness” to which the probabilities relate. In addition to the probabilistic indexing weights, the RPI model provides the possibility of relevance weighting of search terms. A third model that is similar was proposed by Croft some years ago as an extension of the binary independence retrieval model but it can be shown that this model is not based on the probabilistic ranking principle. The probabilistic indexing weights required for any of these models can be provided by an application of the Darmstadt indexing approach (DIA) for indexing with descriptors from a controlled vocabulary. The experimental results show significant improvements over retrieval with binary indexing. Finally, suggestions are made regarding how the DIA can be applied to probabilistic indexing with free text terms.

196 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168