scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Examining the effects of query operators on the performance of three major Web search engines concluded that the use of most query operators had no significant effect on coverage, relative precision, or ranking, although the effect varied depending on the search engine.
Abstract: Research has reported that about 10p of Web searchers utilize advanced query operators, with the other 90p using extremely simple queries. It is often assumed that the use of query operators, such as Boolean operators and phrase searching, improves the effectiveness of Web searching. We test this assumption by examining the effects of query operators on the performance of three major Web search engines. We selected one hundred queries from the transaction log of a Web search service. Each of these original queries contained query operators such as AND, OR, MUST APPEAR (+), or PHRASE (" "). We then removed the operators from these one hundred advanced queries. We submitted both the original and modified queries to three major Web search engines; a total of 600 queries were submitted and 5,748 documents evaluated. We compared the results from the original queries with the operators to the results from the modified queries without the operators. We examined the results for changes in coverage, relative precision, and ranking of relevant documents. The use of most query operators had no significant effect on coverage, relative precision, or ranking, although the effect varied depending on the search engine. We discuss implications for the effectiveness of searching techniques as currently taught, for future information retrieval system design, and for future research.

129 citations

Proceedings ArticleDOI
31 Aug 2010
TL;DR: This paper describes several new strategies for ranking microblogs in a real-time search engine and develops a framework to obtain such validation data, as well as evaluation measures to assess the accuracy of the proposed ranking strategies.
Abstract: Ranking microblogs, such as tweets, as search results for a query is challenging, among other things because of the sheer amount of microblogs that are being generated in real time, as well as the short length of each individual microblog. In this paper, we describe several new strategies for ranking microblogs in a real-time search engine. Evaluating these ranking strategies is non-trivial due to the lack of a publicly available ground truth validation dataset. We have therefore developed a framework to obtain such validation data, as well as evaluation measures to assess the accuracy of the proposed ranking strategies. Our experiments demonstrate that it is beneficial for microblog search engines to take into account social network properties of the authors of microblogs in addition to properties of the microblog itself.

129 citations

Journal ArticleDOI
TL;DR: This paper proposes a novel ranking function discovery framework based on Genetic Programming and shows through various experiments how this new framework helps automate the ranking function design/discovery process.
Abstract: Ranking functions play a substantial role in the performance of information retrieval (IR) systems and search engines. Although there are many ranking functions available in the IR literature, various empirical evaluation studies show that ranking functions do not perform consistently well across different contexts (queries, collections, users). Moreover, it is often difficult and very expensive for human beings to design optimal ranking functions that work well in all these contexts. In this paper, we propose a novel ranking function discovery framework based on Genetic Programming and show through various experiments how this new framework helps automate the ranking function design/discovery process.

129 citations

Journal ArticleDOI
TL;DR: Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article, and uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words.
Abstract: Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words. In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score. We have developed "Relemed", a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list. In two case studies, we demonstrate that the most relevant articles appear at the top of the Relemed results, while this is not necessarily the case with a PubMed search. We have also shown that a Relemed search includes not only all the articles retrieved by PubMed, but potentially additional relevant articles, due to the extended 'automatic term mapping' and text-word searching features implemented in Relemed. By using sentence-level matching, Relemed can deliver higher specificity, thus eliminating more false-positive articles. By introducing an appropriate relevance metric, the most relevant articles on which the user wishes to focus are listed first. Relemed also shrinks the displayed text, and hence the time spent scanning the articles.

129 citations

Patent
Wacholder1, Faye
26 Dec 2000
TL;DR: In this paper, a "domain-general" method for representing the sense of a document includes the steps of extracting a list of simplex noun phrases representing candidate significant topics in the document, clustering the noun phrases by head, and ranking the noun phrase according to a significance measure.
Abstract: A "domain-general" method for representing the "sense" of a document includes the steps of extracting a list of simplex noun phrases representing candidate significant topics in the document, clustering the simplex noun phrases by head, and ranking the simplex noun phrases according to a significance measure to indicate the relative importance of the simplex noun phrases as significant topics of the document. Furthermore, the output can be filtered in a variety of ways, both for automatic processing and for presentation to users.

129 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168