scispace - formally typeset
Search or ask a question
Topic

Ranking (information retrieval)

About: Ranking (information retrieval) is a research topic. Over the lifetime, 21109 publications have been published within this topic receiving 435130 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Eight well-known similarity/distance metrics are compared on a large dataset of molecular fingerprints with sum of ranking differences (SRD) and ANOVA analysis and the Tanimoto index, Dice index, Cosine coefficient and Soergel distance were identified to be the best metrics for similarity calculations.
Abstract: Cheminformaticians are equipped with a very rich toolbox when carrying out molecular similarity calculations. A large number of molecular representations exist, and there are several methods (similarity and distance metrics) to quantify the similarity of molecular representations. In this work, eight well-known similarity/distance metrics are compared on a large dataset of molecular fingerprints with sum of ranking differences (SRD) and ANOVA analysis. The effects of molecular size, selection methods and data pretreatment methods on the outcome of the comparison are also assessed. A supplier database ( https://mcule.com/ ) was used as the source of compounds for the similarity calculations in this study. A large number of datasets, each consisting of one hundred compounds, were compiled, molecular fingerprints were generated and similarity values between a randomly chosen reference compound and the rest were calculated for each dataset. Similarity metrics were compared based on their ranking of the compounds within one experiment (one dataset) using sum of ranking differences (SRD), while the results of the entire set of experiments were summarized on box and whisker plots. Finally, the effects of various factors (data pretreatment, molecule size, selection method) were evaluated with analysis of variance (ANOVA). This study complements previous efforts to examine and rank various metrics for molecular similarity calculations. Here, however, an entirely general approach was taken to neglect any a priori knowledge on the compounds involved, as well as any bias introduced by examining only one or a few specific scenarios. The Tanimoto index, Dice index, Cosine coefficient and Soergel distance were identified to be the best (and in some sense equivalent) metrics for similarity calculations, i.e. these metrics could produce the rankings closest to the composite (average) ranking of the eight metrics. The similarity metrics derived from Euclidean and Manhattan distances are not recommended on their own, although their variability and diversity from other similarity metrics might be advantageous in certain cases (e.g. for data fusion). Conclusions are also drawn regarding the effects of molecule size, selection method and data pretreatment on the ranking behavior of the studied metrics.

770 citations

Proceedings Article
09 Dec 2003
TL;DR: A simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data, to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data.
Abstract: The Google search engine has enjoyed huge success with its web page ranking algorithm, which exploits global, rather than local, hyperlink structure of the web using random walks. Here we propose a simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data. The core idea of our method is to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Encouraging experimental results from synthetic, image, and text data illustrate the validity of our method.

767 citations

Journal ArticleDOI
TL;DR: A novel probabilistic retrieval model forms a basis to interpret the TF-IDF term weights as making relevance decisions, and it is shown that the term-frequency factor of the ranking formula can be rendered into different term- frequency factors of existing retrieval systems.
Abstract: A novel probabilistic retrieval model is presented It forms a basis to interpret the TF-IDF term weights as making relevance decisions It simulates the local relevance decision-making for every location of a document, and combines all of these “local” relevance decisions as the “document-wide” relevance decision for the document The significance of interpreting TF-IDF in this way is the potential to: (1) establish a unifying perspective about information retrieval as relevance decision-making; and (2) develop advanced TF-IDF-related term weights for future elaborate retrieval models Our novel retrieval model is simplified to a basic ranking formula that directly corresponds to the TF-IDF term weights In general, we show that the term-frequency factor of the ranking formula can be rendered into different term-frequency factors of existing retrieval systems In the basic ranking formula, the remaining quantity - log p(r¯|t ∈ d) is interpreted as the probability of randomly picking a nonrelevant usage (denoted by r¯) of term t Mathematically, we show that this quantity can be approximated by the inverse document-frequency (IDF) Empirically, we show that this quantity is related to IDF, using four reference TREC ad hoc retrieval data collections

752 citations

Patent
17 Aug 1995
TL;DR: In this paper, a negotiated matching system includes a plurality of remote terminals associated with respective potential counterparties, a communications network for permitting communication between the remote terminals, and a matching station.
Abstract: A negotiated matching system includes a plurality of remote terminals associated with respective potential counterparties, a communications network for permitting communication between the remote terminals, and a matching station. Each user enters trading information and ranking information into his or her remote terminal. The matching station then uses the trading and ranking information from each user to identify transactions between counterparties that are mutually acceptable based on the ranking information, thereby matching potential counterparties to a transaction. Once a match occurs, the potential counterparties transmit negotiating messages to negotiate some or all terms of the transaction. Thus, the negotiated matching system first matches potential counterparties who are acceptable to each other based on trading and ranking information, and then enables the two counterparties to negotiate and finalize the terms of a transaction.

742 citations

Journal ArticleDOI
TL;DR: This work almost settles a long-standing conjecture of Bang-Jensen and Thomassen and shows that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.
Abstract: We address optimization problems in which we are given contradictory pieces of input information and the goal is to find a globally consistent solution that minimizes the extent of disagreement with the respective inputs. Specifically, the problems we address are rank aggregation, the feedback arc set problem on tournaments, and correlation and consensus clustering. We show that for all these problems (and various weighted versions of them), we can obtain improved approximation factors using essentially the same remarkably simple algorithm. Additionally, we almost settle a long-standing conjecture of Bang-Jensen and Thomassen and show that unless NP⊆BPP, there is no polynomial time algorithm for the problem of minimum feedback arc set in tournaments.

740 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
83% related
Ontology (information science)
57K papers, 869.1K citations
82% related
Graph (abstract data type)
69.9K papers, 1.2M citations
82% related
Feature learning
15.5K papers, 684.7K citations
81% related
Supervised learning
20.8K papers, 710.5K citations
81% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20233,112
20226,541
20211,105
20201,082
20191,168