Topic
Pairwise comparison
About: Pairwise comparison is a research topic. Over the lifetime, 6804 publications have been published within this topic receiving 174081 citations.
Papers published on a yearly basis
Papers
More filters
••
01 Jul 2012TL;DR: This paper introduces a new pairwise distance measure, based on matching, for phylogenetic trees, and proves that it induces a metric on the space of trees, shows how to compute it in low polynomial time, and verify through statistical testing that it is robust, and notes that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures.
Abstract: Comparing two or more phylogenetic trees is a fundamental task in computational biology. The simplest outcome of such a comparison is a pairwise measure of similarity, dissimilarity, or distance. A large number of such measures have been proposed, but so far all suffer from problems varying from computational cost to lack of robustness; many can be shown to behave unexpectedly under certain plausible inputs. For instance, the widely used Robinson-Foulds distance is poorly distributed and thus affords little discrimination, while also lacking robustness in the face of very small changes-reattaching a single leaf elsewhere in a tree of any size can instantly maximize the distance. In this paper, we introduce a new pairwise distance measure, based on matching, for phylogenetic trees. We prove that our measure induces a metric on the space of trees, show how to compute it in low polynomial time, verify through statistical testing that it is robust, and finally note that it does not exhibit unexpected behavior under the same inputs that cause problems with other measures. We also illustrate its usefulness in clustering trees, demonstrating significant improvements in the quality of hierarchical clustering as compared to the same collections of trees clustered using the Robinson-Foulds distance.
73 citations
••
73 citations
••
25 Aug 2013TL;DR: Analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters.
Abstract: The increasing pervasiveness of social media creates new opportunities to study human social behavior, while challenging our capability to analyze their massive data streams. One of the emerging tasks is to distinguish between different kinds of activities, for example engineered misinformation campaigns versus spontaneous communication. Such detection problems require a formal definition of meme, or unit of information that can spread from person to person through the social network. Once a meme is identified, supervised learning methods can be applied to classify different types of communication. The appropriate granularity of a meme, however, is hardly captured from existing entities such as tags and keywords. Here we present a framework for the novel task of detecting memes by clustering messages from large streams of social data. We evaluate various similarity measures that leverage content, metadata, network features, and their combinations. We also explore the idea of pre-clustering on the basis of existing entities. A systematic evaluation is carried out using a manually curated dataset as ground truth. Our analysis shows that pre-clustering and a combination of heterogeneous features yield the best trade-off between number of clusters and their quality, demonstrating that a simple combination based on pairwise maximization of similarity is as effective as a non-trivial optimization of parameters. Our approach is fully automatic, unsupervised, and scalable for real-time detection of memes in streaming data.
73 citations
••
TL;DR: In this article, a set of properties has been defined to define a family of functions representing inconsistency indices, and the authors expand the set by adding and justifying a new one and continue the study of inconsistency indices to check whether or not they satisfy the above mentioned properties.
Abstract: Pairwise comparisons between alternatives are a well-established tool to decompose decision problems into smaller and more easily tractable sub-problems. However, due to our limited rationality, the subjective preferences expressed by decision makers over pairs of alternatives can hardly ever be consistent. Therefore, several inconsistency indices have been proposed in the literature to quantify the extent of the deviation from complete consistency. Only recently, a set of properties has been proposed to define a family of functions representing inconsistency indices. The scope of this paper is twofold. Firstly, it expands the set of properties by adding and justifying a new one. Secondly, it continues the study of inconsistency indices to check whether or not they satisfy the above mentioned properties. Out of the four indices considered in this paper, in their present form, two fail to satisfy some properties. An adjusted version of one index is proposed so that it fulfills them.
73 citations
••
28 Jul 2003TL;DR: This work adapt query retrieval to rate the quality of document similarity measures and demonstrates that the effectiveness of an information-theoretic measure for document similarity yields statistically significant improvements over other popular measures of similarity.
Abstract: Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifically to document similarity and test the effectiveness of an information-theoretic measure for pairwise document similarity. We adapt query retrieval to rate the quality of document similarity measures and demonstrate that our proposed information-theoretic measure for document similarity yields statistically significant improvements over other popular measures of similarity.
72 citations