scispace - formally typeset
Search or ask a question
JournalISSN: 0176-4268

Journal of Classification 

Springer Science+Business Media
About: Journal of Classification is an academic journal published by Springer Science+Business Media. The journal publishes majorly in the area(s): Cluster analysis & Hierarchical clustering. It has an ISSN identifier of 0176-4268. Over the lifetime, 707 publications have been published receiving 30409 citations. The journal is also known as: Journal of classification (Print).


Papers
More filters
Journal ArticleDOI
TL;DR: The survey work and case studies will be useful for all those involved in developing software for data analysis using Ward’s hierarchical clustering method.
Abstract: The Ward error sum of squares hierarchical clustering method has been very widely used since its first description by Ward in a 1963 publication. It has also been generalized in various ways. Two algorithms are found in the literature and software, both announcing that they implement the Ward clustering method. When applied to the same distance matrix, they produce different results. One algorithm preserves Ward's criterion, the other does not. Our survey work and case studies will be useful for all those involved in developing software for data analysis using Ward's hierarchical clustering method.

2,331 citations

Journal ArticleDOI
TL;DR: In this article, an entropy criterion is proposed to estimate the number of clusters arising from a mixture model, which is derived from a relation linking the likelihood and the classification likelihood of a mixture.
Abstract: In this paper, we consider an entropy criterion to estimate the number of clusters arising from a mixture model. This criterion is derived from a relation linking the likelihood and the classification likelihood of a mixture. Its performance is investigated through Monte Carlo experiments, and it shows favorable results compared to other classical criteria.

1,689 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present properties of dissimilarity coefficients with respect to their metric and Euclidean status, and the response to different types of data is investigated, leading to guidance on the choice of an appropriate coefficient.
Abstract: We assemble here properties of certain dissimilarity coefficients and are specially concerned with their metric and Euclidean status. No attempt is made to be exhaustive as far as coefficients are concerned, but certain mathematical results that we have found useful are presented and should help establish similar properties for other coefficients. The response to different types of data is investigated, leading to guidance on the choice of an appropriate coefficient.

893 citations

Journal ArticleDOI
TL;DR: A centroid SAHN clustering algorithm that requires 0(n2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics is described.
Abstract: Whenevern objects are characterized by a matrix of pairwise dissimilarities, they may be clustered by any of a number of sequential, agglomerative, hierarchical, nonoverlapping (SAHN) clustering methods. These SAHN clustering methods are defined by a paradigmatic algorithm that usually requires 0(n 3) time, in the worst case, to cluster the objects. An improved algorithm (Anderberg 1973), while still requiring 0(n 3) worst-case time, can reasonably be expected to exhibit 0(n 2) expected behavior. By contrast, we describe a SAHN clustering algorithm that requires 0(n 2 logn) time in the worst case. When SAHN clustering methods exhibit reasonable space distortion properties, further improvements are possible. We adapt a SAHN clustering algorithm, based on the efficient construction of nearest neighbor chains, to obtain a reasonably general SAHN clustering algorithm that requires in the worst case 0(n 2) time and space. Whenevern objects are characterized byk-tuples of real numbers, they may be clustered by any of a family of centroid SAHN clustering methods. These methods are based on a geometric model in which clusters are represented by points ink-dimensional real space and points being agglomerated are replaced by a single (centroid) point. For this model, we have solved a class of special packing problems involving point-symmetric convex objects and have exploited it to design an efficient centroid clustering algorithm. Specifically, we describe a centroid SAHN clustering algorithm that requires 0(n 2) time, in the worst case, for fixedk and for a family of dissimilarity measures including the Manhattan, Euclidean, Chebychev and all other Minkowski metrics.

877 citations

Journal ArticleDOI
TL;DR: The shape of a set of labeled points corresponds to those attributes of the configuration that are invariant to the effects of translation, rotation, and scale as mentioned in this paper, and is used to compare different shapes and also serve as a metric that may be used to define multidimensional shape spaces.
Abstract: The shape of a set of labeled points corresponds to those attributes of the configuration that are invariant to the effects of translation, rotation, and scale. Procrustes distance may be used to compare different shapes and also serve as a metric that may be used to define multidimensional shape spaces. This paper demonstrates that the preshape space of planar triangles Procrustes aligned to a reference triangle corresponds to a unit hemisphere. An overview of methods used as linear approximations of D. G. Kendall's non-Euclidean shape space is given, and the equivalence of several methods based on orthogonal projections is shown. Some problems with approximations based on stereo graphic projections are also discussed. A simple example using artificial data is included.

789 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202316
202223
202140
202043
201934
201824