Topic

# Similarity measure

About: Similarity measure is a research topic. Over the lifetime, 11498 publications have been published within this topic receiving 268081 citations. The topic is also known as: similarity function & measure of similarity.

##### Papers published on a yearly basis

##### Papers

More filters

••

TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.

Abstract: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.

6,757 citations

••

TL;DR: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed, which employs a metric derived from the Bhattacharyya coefficient as similarity measure, and uses the mean shift procedure to perform the optimization.

Abstract: A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.

4,996 citations

••

TL;DR: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions).

Abstract: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings. We first identify several application scenarios for the resultant 'knowledge reuse' framework that we call cluster ensembles. The cluster ensemble problem is then formalized as a combinatorial optimization problem in terms of shared mutual information. In addition to a direct maximization approach, we propose three effective and efficient techniques for obtaining high-quality combiners (consensus functions). The first combiner induces a similarity measure from the partitionings and then reclusters the objects. The second combiner is based on hypergraph partitioning. The third one collapses groups of clusters into meta-clusters which then compete for each object to determine the combined clustering. Due to the low computational costs of our techniques, it is quite feasible to use a supra-consensus function that evaluates all three approaches against the objective function and picks the best solution for a given situation. We evaluate the effectiveness of cluster ensembles in three qualitatively different application scenarios: (i) where the original clusters were formed based on non-identical sets of features, (ii) where the original clustering algorithms worked on non-identical sets of objects, and (iii) where a common data-set is used and the main purpose of combining multiple clusterings is to improve the quality and robustness of the solution. Promising results are obtained in all three situations for synthetic as well as real data-sets.

4,375 citations

••

Yahoo!

^{1}TL;DR: The standard similarity measure based on untransformed indices is shown to give misleading results, but transforming the indices or entropies to effective numbers of species produces a stable, easily interpreted, sensitive general similarity measure.

Abstract: Entropies such as the Shannon–Wiener and Gini–Simpson indices are not themselves diversities. Conversion of these to effective number of species is the key to a unified and intuitive interpretation of diversity. Effective numbers of species derived from standard diversity indices share a common set of intuitive mathematical properties and behave as one would expect of a diversity, while raw indices do not. Contrary to Keylock, the lack of concavity of effective numbers of species is irrelevant as long as they are used as transformations of concave alpha, beta, and gamma entropies. The practical importance of this transformation is demonstrated by applying it to a popular community similarity measure based on raw diversity indices or entropies. The standard similarity measure based on untransformed indices is shown to give misleading results, but transforming the indices or entropies to effective numbers of species produces a stable, easily interpreted, sensitive general similarity measure. General overlap measures derived from this transformed similarity measure yield the Jaccard index, Sorensen index, Horn index of overlap, and the Morisita–Horn index as special cases.

3,677 citations

••

25 Jun 2005TL;DR: A meta-algorithm is applied, based on a metric labeling formulation of the rating-inference problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels.

Abstract: We address the rating-inference problem, wherein rather than simply decide whether a review is "thumbs up" or "thumbs down", as in previous sentiment analysis work, one must determine an author's evaluation with respect to a multi-point scale (e.g., one to five "stars"). This task represents an interesting twist on standard multi-class text categorization because there are several different degrees of similarity between class labels; for example, "three stars" is intuitively closer to "four stars" than to "one star".We first evaluate human performance at the task. Then, we apply a meta-algorithm, based on a metric labeling formulation of the problem, that alters a given n-ary classifier's output in an explicit attempt to ensure that similar items receive similar labels. We show that the meta-algorithm can provide significant improvements over both multi-class and regression versions of SVMs when we employ a novel similarity measure appropriate to the problem.

2,544 citations