scispace - formally typeset
Search or ask a question
Topic

Pairwise comparison

About: Pairwise comparison is a research topic. Over the lifetime, 6804 publications have been published within this topic receiving 174081 citations.


Papers
More filters
Journal ArticleDOI
01 Jul 2012
TL;DR: This paper introduces a new pairwise distance measure, based on matching, for phylogenetic trees, and proves that it induces a metric on the space of trees, shows how to compute it in low polynomial time, and verify through statistical testing that it is robust, and notes that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures.
Abstract: Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes-reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.

73 citations

Proceedings ArticleDOI
25 Aug 2013
TL;DR: Analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters.
Abstract: The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.

73 citations

Journal ArticleDOI
Matteo Brunelli1
TL;DR: In this article, a set of properties has been defined to define a family of functions representing inconsistency indices, and the authors expand the set by adding and justifying a new one and continue the study of inconsistency indices to check whether or not they satisfy the above mentioned properties.
Abstract: Pairwise comparisons between alternatives are a well-established tool to decompose decision problems into smaller and more easily tractable sub-problems. However, due to our limited rationality, the subjective preferences expressed by decision makers over pairs of alternatives can hardly ever be consistent. Therefore, several inconsistency indices have been proposed in the literature to quantify the extent of the deviation from complete consistency. Only recently, a set of properties has been proposed to define a family of functions representing inconsistency indices. The scope of this paper is twofold. Firstly, it expands the set of properties by adding and justifying a new one. Secondly, it continues the study of inconsistency indices to check whether or not they satisfy the above mentioned properties. Out of the four indices considered in this paper, in their present form, two fail to satisfy some properties. An adjusted version of one index is proposed so that it fulfills them.

73 citations

Proceedings ArticleDOI
28 Jul 2003
TL;DR: This work adapt query retrieval to rate the quality of document similarity measures and demonstrates that the effectiveness of an information-theoretic measure for document similarity yields statistically significant improvements over other popular measures of similarity.
Abstract: Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifically to document similarity and test the effectiveness of an information-theoretic measure for pairwise document similarity. We adapt query retrieval to rate the quality of document similarity measures and demonstrate that our proposed information-theoretic measure for document similarity yields statistically significant improvements over other popular measures of similarity.

72 citations


Network Information
Related Topics (5)
Markov chain
51.9K papers, 1.3M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
76% related
Deep learning
79.8K papers, 2.1M citations
75% related
Optimization problem
96.4K papers, 2.1M citations
74% related
Robustness (computer science)
94.7K papers, 1.6M citations
74% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
20231,305
20222,607
2021581
2020554
2019520