scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Proceedings ArticleDOI
Xiubo Geng1, Tie-Yan Liu1, Tao Qin1, Hang Li1
23 Jul 2007
TL;DR: This paper proposes a new feature selection method that uses its value to rank the training instances, and defines the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature.
Abstract: Ranking is a very important topic in information retrieval. While algorithms for learning ranking models have been intensively studied, this is not the case for feature selection, despite of its importance. The reality is that many feature selection methods used in classification are directly applied to ranking. We argue that because of the striking differences between ranking and classification, it is better to develop different feature selection methods for ranking. To this end, we propose a new feature selection method in this paper. Specifically, for each feature we use its value to rank the training instances, and define the ranking accuracy in terms of a performance measure or a loss function as the importance of the feature. We also define the correlation between the ranking results of two features as the similarity between them. Based on the definitions, we formulate the feature selection issue as an optimization problem, for which it is to find the features with maximum total importance scores and minimum total similarity scores. We also demonstrate how to solve the optimization problem in an efficient way. We have tested the effectiveness of our feature selection method on two information retrieval datasets and with two ranking models. Experimental results show that our method can outperform traditional feature selection methods for the ranking task.

285 citations

Patent
12 Nov 1996
TL;DR: In this article, a plurality of text search engines based on substantially different computational searching techniques are combined into a single list of information items, and a ranking process ranks the items in the combined list by utilizing information item ordering data also received from each of the search engines as to the relevance of the information items output by the search engine to the user's request.
Abstract: An information retrieval system is disclosed, wherein the system includes a plurality of text search engines based on substantially different computational searching techniques. By activating each search engine with input from a user information request, the output from each of the search engines is combined into a single list of information items. A ranking process ranks the information items in the combined list by utilizing information item ordering data also received from each of the search engines as to the relevance of the information items output by the search engine to the user's request. Thus, by providing higher rankings to those information items determined to be most relevant to the user's request by each of (or a majority of) the search engines, these information items have been found to be highly consistent in satisfying the user's request for information.

281 citations

Journal ArticleDOI
TL;DR: It is shown how the concept of relevance may be replaced by the condition of being highly rated by a similarity measure and it becomes possible to identify the stop words in a cullectmn by automated statistical testing.
Abstract: A stop word may be identified as a word that has the same likehhood of occurring in those documents not relevant to a query as in those documents relevant to the query. In this paper we show how the concept of relevance may be replaced by the condition of being highly rated by a similarity measure. Thus it becomes possible to identify the stop words in a cullectmn by automated statistical testing. We describe the nature of the statistical test as it is realized with a vector retrieval methodology based on the cosine coefficient of document-document similarity. As an example, this tech nique is then applied to a large MEDLINE " subset in the area of biotechnology. The initial processing of this datahase involves a 310 word stop list of common non-content terms. Our technique is then applied and 75% of the remaining terms are identified as stop words. We compare retrieval with and without the removal of these stop words and find that of the top twenty documents retrieved in response to a random query docume...

281 citations

Proceedings Article
23 Aug 2010
TL;DR: The problem is formulated as a bipartite graph and the well-known web page ranking algorithm HITS is used to find important features and rank them high and demonstrates promising results on diverse real-life datasets.
Abstract: An important task of opinion mining is to extract people's opinions on features of an entity. For example, the sentence, "I love the GPS function of Motorola Droid" expresses a positive opinion on the "GPS function" of the Motorola phone. "GPS function" is the feature. This paper focuses on mining features. Double propagation is a state-of-the-art technique for solving the problem. It works well for medium-size corpora. However, for large and small corpora, it can result in low precision and low recall. To deal with these two problems, two improvements based on part-whole and "no" patterns are introduced to increase the recall. Then feature ranking is applied to the extracted feature candidates to improve the precision of the top-ranked candidates. We rank feature candidates by feature importance which is determined by two factors: feature relevance and feature frequency. The problem is formulated as a bipartite graph and the well-known web page ranking algorithm HITS is used to find important features and rank them high. Experiments on diverse real-life datasets show promising results.

280 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168