scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1981"


Journal ArticleDOI
TL;DR: This survey summarizes some of the proposed segmentation techniques in the area of biomedical image segmentation, which fall into the categories of characteristic feature thresholding or clustering and edge detection.

1,160 citations


Journal ArticleDOI
TL;DR: In this article, a random sample is divided into the $k$ clusters that minimise the within cluster sum of squares, and conditions are found that ensure the almost sure convergence, as the sample size increases, of the set of means of the k$ clusters.
Abstract: A random sample is divided into the $k$ clusters that minimise the within cluster sum of squares. Conditions are found that ensure the almost sure convergence, as the sample size increases, of the set of means of the $k$ clusters. The result is proved for a more general clustering criterion.

490 citations


Journal ArticleDOI
TL;DR: Results indicated that a subset of internal criterion measures could be identified which appear to be valid indices of correct cluster recovery and could form the basis of a permutation test for the existence of cluster structure or a clustering algorithm.
Abstract: A Monte Carlo evaluation of thirty internal criterion measures for cluster analysis was conducted. Artificial data sets were constructed with clusters which exhibited the properties of internal cohesion and external isolation. The data sets were analyzed by four hierarchical clustering methods. The resulting values of the internal criteria were compared with two external criterion indices which determined the degree of recovery of correct cluster structure by the algorithms. The results indicated that a subset of internal criterion measures could be identified which appear to be valid indices of correct cluster recovery. Indices from this subset could form the basis of a permutation test for the existence of cluster structure or a clustering algorithm.

391 citations



Journal ArticleDOI
TL;DR: A review of Monte Carlo validation studies of clustering algorithms indicates that other algorithms may provide better recovery under a variety of conditions than Ward's minimum variance hierarchical method.
Abstract: A review of Monte Carlo validation studies of clustering algorithms is presented. Several validation studies have tended to support the view that Ward's minimum variance hierarchical method gives the best recovery of cluster structure. However, a more complete review of the validation literature on clustering indicates that other algorithms may provide better recovery under a variety of conditions. Applied researchers are cautioned concerning the uncritical selection of Ward's method for empirical research. Alternative explanations for the differential recovery performance are explored and recommendations are made for future Monte Carlo experiments.

236 citations


Journal ArticleDOI
TL;DR: Several experiments reported in here show that the proposed performance measure puts an order on different partitions of the same data which is consistent with the error rate of a classifier designed on the basis of the obtained cluster labelings.
Abstract: Clustering is primarily used to uncover the true underlying structure of a given data set and, for this purpose, it is desirable to subject the same data to several different clustering algorithms. This paper attempts to put an order on the various partitions of a data set obtained from different clustering algorithms. The goodness of each partition is expressed by means of a performance measure based on a fuzzy set decomposition of the data set under consideration. Several experiments reported in here show that the proposed performance measure puts an order on different partitions of the same data which is consistent with the error rate of a classifier designed on the basis of the obtained cluster labelings.

192 citations


Book ChapterDOI
01 Jan 1981
TL;DR: S8 illustrates some of the difficulties inherent with cluster analysis; its aim is to alert investigators to the fact that various algorithms can suggest radically different substructures in the same data set.
Abstract: S8 illustrates some of the difficulties inherent with cluster analysis; its aim is to alert investigators to the fact that various algorithms can suggest radically different substructures in the same data set. The balance of Chapter 3 concerns objective functional methods based on fuzzy c-partitions of finite data. The nucleus for all these methods is optimization of nonlinear objectives involving the weights u ik ; functionals using these weights will be differentiable over M fc —but not over M c —a decided advantage for the fuzzy embedding of hard c-partition space. Classical first- and second-order conditions yield iterative algorithms for finding the optimal fuzzy c-partitions defined by various clustering criteria.

184 citations


Journal ArticleDOI
TL;DR: The proportion exponent is introduced as a measure of the validity of the clustering obtained for a data set using a fuzzy clustering algorithm and its use as a validity functional is illustrated with four numerical examples and its effectiveness compared to other validity functionals.

178 citations


Journal ArticleDOI
TL;DR: Trees and castles show general size effects, the change of whole clusters of variables from point to point, trends, and outliers, and are especially appropriate for evaluating the clustering of variables and for observing clusters of points.
Abstract: A number of points in k dimensions are displayed by associating with each point a symbol: a drawing of a tree or a castle. All symbols have the same structure derived from a hierarchical clustering algorithm applied to the k variables (dimensions) over all points, but their parts are coded according to the coordinates of each individual point. Trees and castles show general size effects, the change of whole clusters of variables from point to point, trends, and outliers. They are especially appropriate for evaluating the clustering of variables and for observing clusters of points. Their major advantage over earlier attempts to represent multivariate observations (such as profiles, stars, faces, boxes, and Andrews's curves) lies in their matching of relationships between variables to relationships between features of the representing symbol. Several examples are given, including one with 48 variables.

154 citations


Posted Content
TL;DR: An overlapping clustering model, ADCLUS, is described which can be used in marketing studies involving products/subjects that can belong to more than one group or cluster simultaneously.
Abstract: Most clustering techniques used in product positioning and market segmenta­tion studies render mutually exclusive equivalence classes of the relevant products or subjects space. Such classificatory techniques are thus restricted to the extent that they preclude overlap between subsets or equivalence classes. An overlapping clustering model, ADCLUS, is described which can be used in marketing studies involving products/ subjects that can belong to more than one group or cluster simultaneously. The authors provide theoretical justification for and an application of the approach, using the MAPCLUS algorithm for fitting the ADCLUS model.

146 citations


Book ChapterDOI
01 Jan 1981
TL;DR: The paper explains the recently introduced method of conjunctive conceptual clustering in terms of dynamic clustering and shows by an example its advantages over methods of numerical taxonomy from the viewpoint of cluster interpretation.
Abstract: Clustering is described as a multistep process in which some of the steps are performed by a data analyst and some by a computer program. At present, those performed by a computer program do not produce any description of the generated clusters. The recently introduced method of conjunctive conceptual clustering overcomes this problem by requiring that each cluster has a conjunctive description built from relations on object attributes and closely “fitting” the cluster. The paper explains the above clustering method in terms of dynamic clustering and shows by an example its advantages over methods of numerical taxonomy from the viewpoint of cluster interpretation.

Journal ArticleDOI
TL;DR: Most clustering techniques used in product positioning and market segmentation studies render mutually exclusive equivalence classes of the relevant products or subjects space as discussed by the authors, and such classificatory clustering methods are typically ineffective.
Abstract: Most clustering techniques used in product positioning and market segmentation studies render mutually exclusive equivalence classes of the relevant products or subjects space. Such classificatory ...

Journal ArticleDOI
Laszlo A. Belady1, C. J. Evangelisti1
TL;DR: A method to perform automatic clustering is described and a metric to quantify the complexity of the resulting partition is developed.



Journal ArticleDOI
TL;DR: This paper proposes a methodology based on clustering methods for inferring hierarchical choice processes from panel data for a homogeneous group of consumers and implements it for one product category.
Abstract: This paper proposes a methodology based on clustering methods for inferring hierarchical choice processes from panel data for a homogeneous group of consumers. Details of the methodology are presented and implemented for one product category.

Journal ArticleDOI
TL;DR: A combination of two popular mapping algorithms-Sammon's mean-square error technique and the triangulation method-is proposed to overcome the limitations in the individual algorithms.
Abstract: A number of linear and nonlinear mapping algorithms for the projection of patterns from a high-dimensional space to two dimensions are available. These two-dimensional representations allow quick visual observation of a data set. A combination of two popular mapping algorithms-Sammon's mean-square error technique and the triangulation method-is proposed to overcome the limitations in the individual algorithms. Some factors which describe the goodness of a projection are described, and a comparison is made of six of these algorithms by running them on four data sets. The results obtained support the use of the proposed algorithm.

Journal ArticleDOI
TL;DR: It is shown that many known algorithms of clustering and pattern recognition can be characterized as efforts to minimize entropy, when suitably defined.

Journal ArticleDOI
TL;DR: In this paper, a technique for testing the difference between two dependent correlations developed by Wolfe is proposed in a more general matrix context for evaluating a variety of data analysis schemes that are supposed to clarify the structure underlying a set of proximity measures.

Book ChapterDOI
01 Jan 1981
TL;DR: Clustering analysis is a newly developed computer-oriented data analysis technique that is a product of many research fields: statistics, computer science, operations research, and pattern recognition.
Abstract: Clustering analysis(1–4) is a newly developed computer-oriented data analysis technique. It is a product of many research fields: statistics, computer science, operations research, and pattern recognition. Because of the diverse backgrounds of researchers, clustering analysis has many different names. In biology, clustering analysis is called “taxonomy”.(5,6) In pattern recognition(7–15) it is called “unsupervised learning.” Perhaps the most confusing name of all, the term “classification” sometimes also denotes clustering analysis. Since classification may denote discriminant analysis, which is totally different from clustering analysis, it is perhaps important to distinguish these two terms.

Journal ArticleDOI
TL;DR: Among the clustering methods in any of several families of graph theoretic methods, clusters defined as the connected components are the most stable and the clusters specified as the maximal complete subgraphs are the least stable.
Abstract: Assessing the stability of a clustering method involves the measurement of the extent to which the generated clusters are affected by perturbations in the input data. A measure which specifies the disturbance in a set of clusters as the minimum number of operations required to restore the set of modified clusters to the original ones is adopted. A number of well-known graph theoretic clustering methods are compared in terms of their stability as determined by this measure. Specifically, it is shown that among the clustering methods in any of several families of graph theoretic methods, clusters defined as the connected components are the most stable and the clusters specified as the maximal complete subgraphs are the least stable. Furthermore, as one proceeds from the method producing the most narrow clusters (maximal complete subgraphs) to those producing relatively broader clusters, the clustering process is shown to remain at least as stable as any method in the previous stages. Finally, the lower and the upper bounds for the measure of stability, when clusters are defined as the connected components, are derived.

Journal ArticleDOI
TL;DR: In this paper, a Goldstone-type theorem for a wide class of lattice and continuum quantum systems, both for the ground state and at nonzero temperature, was proved.
Abstract: We prove a Goldstone-type theorem for a wide class of lattice and continuum quantum systems, both for the ground state and at nonzero temperature. For the ground state (T=0) spontaneous breakdown of a continuous symmetry implies no energy gap. For nonzero temperature, spontaneous symmetry breakdown implies slow clustering (noL1 clustering). The methods apply also to nonzero-temperature classical systems.

Book ChapterDOI
01 Jan 1981
TL;DR: The main basic choices which are preliminary to any clustering are presented and the dynamic clustering method which gives a solution to a family of optimization problems related to those choices is presented.
Abstract: We present first the main basic choices which are preliminary to any clustering and then the dynamic clustering method which gives a solution to a family of optimization problems related to those choices. We show then how these choices interfere in pattern recognition using three approaches: the syntactic approach, the logical approach and the numerical approach. For each approach we present a practical application.

01 Jul 1981
TL;DR: The goal of this research is the investigation of the ability of certain cluster techniques to segment monochromatic data collected by remote sensing devices, and the work was done on three levels: underlying motivations, simulations, and real data analysis.
Abstract: : The goal of this research is the investigation of the ability of certain cluster techniques to segment monochromatic data collected by remote sensing devices. The particular example that will be considered consists of temperature data collected by the Air Force Geophysical Laboratory at Hanscom Air Force Base. The work was done on three levels: underlying motivations, simulations, and real data analysis. In this particular paper, we shall restrict our consideration to a single feature, with multiple feature selection to be investigated at a later date. Section 2 presents the motivation for both the cluster techniques and the statistics that they use, while section 3 briefly explains the techniques. In section 4, some simulated data is analyzed, and in section 5, the techniques are applied to actual infrared temperature data. The paper concludes with an indication of possible future directions for the current research effort.

Journal ArticleDOI
TL;DR: The results indicate that the method is useful in extracting elementary patterns from an EEG and that the piecewise analysis approach is feasible.

Journal ArticleDOI
TL;DR: Classification of characteristic neural spike shapes in multi-unit recordings is performed in real time using a reduced feature set using a model of uncorrelated signal-related noise to reduce the feature set.
Abstract: Classification of characteristic neural spike shapes in multi-unit recordings is performed in real time using a reduced feature set. A model of uncorrelated signal-related noise is used to reduce the feature set by choosing a subset of aperiodic samples which is effective for discrimination between signals by a nearest-mean algorithm. Initial signal classes are determined by an unsupervised clustering algorithm applied to the reduced features of the learning set events. Classification is carried out in real time using a distance measure derived for the reduced feature set. Examples of separation and correlation of multiunit activity from cat and frog visual systems are described.

Proceedings ArticleDOI
01 Apr 1981
TL;DR: Two methods for automatically obtaining a set of acoustic prototypes for use by a centisecond labeling acoustic processor are described, one of which is based on bootstrapping, the other on clustering.
Abstract: Automatic selection of acoustic prototypes is an important step towards making speech recognition systems automatically adaptable to new speakers. Two methods for automatically obtaining a set of acoustic prototypes for use by a centisecond labeling acoustic processor are described. One method is based on bootstrapping, the other on clustering. Recognition results using these automatically obtained prototypes on the 1000-word vocabulary natural language Laser Patent task are presented. These results are compared to those from an experiment in which the acoustic prototypes were manually selected.

Journal ArticleDOI
TL;DR: The problem of performing multiple attribute clustering in a dynamic database is studied and the extended K-d tree method is presented, using the basic k-D tree structure after modification as the structure of the directory which organizes the data records in the secondary storage.
Abstract: The problem of performing multiple attribute clustering in a dynamic database is studied. The extended K-d tree method is presented. In an extended K-d tree organization, the basic k-d tree structure after modification is used as the structure of the directory which organizes the data records in the secondary storage. The discriminator value of each level of the directory determines the partitioning direction of the corresponding attribute subspace. When the record insertion causes the data page to overload, the attribute space will be further partitioned along the direction specified by the corresponding discriminator.

Proceedings Article
09 Sep 1981
TL;DR: Land Information Systems shall be used to store data on objects in space to get a map drawn on a CRT screen; the typical query is therefore a two-dimensional range query which yields all the data needed to draw the map.
Abstract: Land Information Systems shall be used to store data on objects in space (e. g. buildings, roads, electricity networks etc.). Interactive retrieval is mainly done in order to get a map drawn on a CRT screen; the typical query is therefore a two-dimensional range query which yields all the data needed to draw the map. A method to implement such a LIS based on a commercially available DMBS (e. g. DMBS-10 from DEC) is shown and some practical results are reported. The method used relies heavily on control of physical placement of stored records and clustering of data according to neighbourhood.

Journal ArticleDOI
TL;DR: In this article, when restricted tok-particle invariant subspaces, clustering operators are shown still cluster, even when the subspace is restricted to k-particles.
Abstract: Clustering operators, when restricted tok-particle invariant subspaces, are shown still to cluster.