scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Patent
05 Nov 1993
TL;DR: In this paper, a procedure for determining text relevancy is proposed, which can be used to enhance the retrieval of text documents by search queries and can help a user intelligently and rapidly locate information found in large textual databases.
Abstract: This is a procedure for determining text relevancy and can be used to enhance the retrieval of text documents by search queries. This system helps a user intelligently and rapidly locate information found in large textual databases. A first embodiment determines the common meanings between each word in the query and each word in the document. Then an adjustment is made for words in the query that are not in the documents. Further, weights are calculated for both the semantic components in the query and the semantic components in the documents. These weights are multiplied together, and their products are subsequently added to one another to determine a real value number (similarity coefficient) for each document. Finally, the documents are sorted in sequential order according to their real value number from largest to smallest value. Another, embodiment is for routing documents to topics/headings (sometimes referred to as filtering). Here, the importance of each word in both topics and documents are calculated. Then, the real value number (similarity coefficient) for each document is determined. Then each document is routed one at a time according to their respective real value numbers to one or more topics. Finally, once the documents are located with their topics, the documents can be sorted. This system can be used to search and route all kinds of document collections, such as collections of legal documents, medical documents, news stories, and patents.

215 citations

Proceedings ArticleDOI
29 Mar 2009
TL;DR: This work is able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query, and provides efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty.
Abstract: When dealing with massive quantities of data, top-k queries are a powerful technique for returning only the k most relevant tuples for inspection, based on a scoring function. The problem of efficiently answering such ranking queries has been studied and analyzed extensively within traditional database settings. The importance of the top-k is perhaps even greater in probabilistic databases, where a relation can encode exponentially many possible worlds. There have been several recent attempts to propose definitions and algorithms for ranking queries over probabilistic data. However, these all lack many of the intuitive properties of a top-k over deterministic data. Specifically, we define a number of fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are all satisfied by ranking queries on certain data. We argue that all these conditions should also be fulfilled by any reasonable definition for ranking uncertain data. Unfortunately, none of the existing definitions is able to achieve this. To remedy this shortcoming, this work proposes an intuitive new approach of expected rank. This uses the well-founded notion of the expected rank of each tuple across all possible worlds as the basis of the ranking. We are able to prove that, in contrast to all existing approaches, the expected rank satisfies all the required properties for a ranking query. We provide efficient solutions to compute this ranking across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. For an uncertain relation of N tuples, the processing cost is O(N logN)—no worse than simply sorting the relation. In settings where there is a high cost for generating each tuple in turn, we provide pruning techniques based on probabilistic tail bounds that can terminate the search early and guarantee that the top-k has been found. Finally, a comprehensive experimental study confirms the effectiveness of our approach.

214 citations

Journal ArticleDOI
TL;DR: An algorithm is presented which not only analyzes the overall sentiment of a document/review, but also identifies the semantic orientation of specific components of the review that lead to a particular sentiment.

214 citations

Proceedings ArticleDOI
23 May 2006
TL;DR: This work shows that it can significantly outperform PageRank using features that are independent of the link structure of the Web, and uses RankNet, a ranking machine learning algorithm, to combine these and other static features based on anchor text and domain characteristics.
Abstract: Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We show that we can significantly outperform PageRank using features that are independent of the link structure of the Web. We gain a further boost in accuracy by using data on the frequency at which users visit Web pages. We use RankNet, a ranking machine learning algorithm, to combine these and other static features based on anchor text and domain characteristics. The resulting model achieves a static ranking pairwise accuracy of 67.3% (vs. 56.7% for PageRank or 50% for random).

213 citations

Patent
16 Jul 1998
TL;DR: A system and method for relative ranking and contextual summarization of search hits from multiple distributed, heterogeneous information resources based upon the original content of each hit is disclosed in this article.
Abstract: A system and method for relative ranking and contextual summarization of search hits from multiple distributed, heterogeneous information resources based upon the original content of each hit is disclosed. In particular, the system and method of the present invention improve upon metasearch engine techniques by downloading the original documents (text or multimedia) identified by standard search engines as relevant and using the original content of each “hit” to re-rank them relative to each other according to the original query pattern for the search, providing a uniform ranking methodology for the user. The present invention is also directed to an improved summarization process where the downloaded documents are re-summarized relative to each other according to the original query pattern for the search, providing a uniform summarization methodology for the user.

213 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168