scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1974"


Book
01 Jan 1974
TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.
Abstract: Cluster analysis comprises a range of methods for classifying multivariate data into subgroups. By organising multivariate data into such subgroups, clustering can help reveal the characteristics of any structure or patterns present. These techniques are applicable in a wide range of areas such as medicine, psychology and market research. This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering. Real life examples are used throughout to demonstrate the application of the theory, and figures are used extensively to illustrate graphical techniques. The book is comprehensive yet relatively non-mathematical, focusing on the practical aspects of cluster analysis.

9,857 citations


Journal ArticleDOI
TL;DR: An algorithm for the analysis of multivariate data is presented and is discussed in terms of specific examples to find one-and two-dimensional linear projections of multivariable data that are relatively highly revealing.
Abstract: An algorithm for the analysis of multivariate data is presented and is discussed in terms of specific examples. The algorithm seeks to find one-and two-dimensional linear projections of multivariate data that are relatively highly revealing.

1,635 citations


Journal ArticleDOI
01 May 1974
TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Abstract: Provides a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition. The topics treated include Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

1,222 citations


Journal ArticleDOI
TL;DR: Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning.
Abstract: This paper attempts to review and expand upon the relationship between graph theory and the clustering of a set of objects. Several graphtheoretic criteria are proposed for use within a general clustering paradigm as a means of developing procedures “in between” the extremes of complete-link and single-link hierarchical partitioning; these same ideas are then extended to include the more general problem of constructing subsets of objects with overlap. Finally, a number of related topics are surveyed within the general context of reinterpreting and justifying methods of clustering either through standard concepts in graph theory or their simple extensions.

151 citations


Journal ArticleDOI
01 Jan 1974
TL;DR: F fuzzy partitioning algorithm has potential value as a heuristic tool for identifying clusters within large finite data sets, and more specifically, for estimating the parameters in a mixture of unimodal probability densities, given a finite sample drawn from the mixture.
Abstract: Recent results pertaining to a newly developed fuzzy partitioning algorithm are surveyed. The algorithm has potential value as a heuristic tool for identifying clusters within large finite data sets, and more specifically, for estimating the parameters in a mixture of unimodal probability densities, given a finite sample drawn from the mixture. The topics treated are: fuzzy partitions, conventional clustering algorithms, fuzzy clustering algorithms, asymptotic behavior of optimal fuzzy partitions with increasing cluster separation, scalar measures of partition fuzziness, and unsupervised learning and parameter estimation.

126 citations


Journal ArticleDOI
TL;DR: Test the hypothesis that a hierarchical sequence of partitions constructed by the single-link or complete-link clustering method could have been obtained because of “noise” by referring the Goodman-Kruskal rank correlation y statistic to an approximate permutation distribution.
Abstract: A technique is presented for testing the hypothesis that a hierarchical sequence of partitions constructed by the single-link or complete-link clustering method could have been obtained because of “noise.” Two rank orderings of the object pairs are compared. One of the orderings is obtained from the initial proximity values; the second is derived from the levels at which an object pair first appears within a single subset within the hierarchy. The hypothesis that the given set of proximity values have been assigned randomly is tested by referring the Goodman-Kruskal rank correlation y statistic to an approximate permutation distribution.

122 citations


Journal ArticleDOI
TL;DR: A clustering of a nonnegative M×N array is obtained by permuting its rows and columns and can be stated as two traveling-salesman problems.
Abstract: A clustering of a nonnegative M×N array is obtained by permuting its rows and columns. W. T. McCormick et al. [Opns. Res. 20, 993-1009 1972] measure the effectiveness of a clustering by the sum of all products of nearest-neighbor elements in the permuted array. This note points out that this clustering problem can be stated as two traveling-salesman problems.

88 citations



Journal ArticleDOI
TL;DR: It is demonstrated that cluster density can be analytically described from the distributions of intervals between errors and derived clustering properties hold for any stationary process.
Abstract: This correspondence is concerned with binary processes and presents results with immediate applications in the modeling of digital channels for the purpose of evaluating code performance. It is demonstrated that cluster density can be analytically described from the distributions of intervals between errors. These relations and derived clustering properties hold for any stationary process. Analyses of real error data exemplify the use of these results in regard to channels having dependent inter-error intervals.

44 citations


Journal ArticleDOI
TL;DR: A framework for the evaluation of cluster-based retrieval strategies is constructed and these strategies are shown to be dependent on the method of cluster representation (cluster profile) adopted.

38 citations



Journal ArticleDOI
TL;DR: In this article, a Markov chain generalization of the binomial model is proposed to investigate clustering of like atoms within an alloy of two types of metalic elements and a combinatorial analysis is developed which provides the exact distribution of the sufficient statistics and permits small sample comparisons of expectations and mean square errors of estimates with their large sample approximations.
Abstract: The field ion microscope atom probe, used in the exploration of crystal structure, is discussed. Data is generated from a probe of an alloy of two types of metalic elements. A statistical model is formulated to investigate clustering of like atoms within the alloy. Physical considerations motivate a Markov chain generalization of the binomial model. A parameter in the model which measures the degree of clustering is estimated by maximum likelihood and large sample distribution theory is given using results from Billingsly. A combinatorial analysis is developed which provides the exact distribution of the sufficient statistics and permits small sample comparisons of expectations and mean square errors of estimates with their large sample approximations. The model is applied to some preliminary data and is used to point up and quantify some difficulties in the experiment.

Journal ArticleDOI
TL;DR: Most of the paper is devoted to stating relationships between spanning trees, single-link and complete-link hierarchical clustering, network flow and two divisive clustering procedures.
Abstract: The concept of a spanning tree for a weighted graph is used to characterize several methods of clustering a set of objects. In particular, most of the paper is devoted to stating relationships between spanning trees, single-link and complete-link hierarchical clustering, network flow and two divisive clustering procedures. Several related topics using the notion of a spanning tree are also mentioned.

Journal ArticleDOI
Masahiro Koiwa1
TL;DR: In this article, it was shown that for the present model problem, the grouping method is not valid and, therefore, it seems difficult to justify such a procedure mathematically, at least for the current model problem.
Abstract: The validity of the “grouping method” has been examined critically; the method was contrived by Kiritani in treating an enormously large number of simultaneous differential equations which represent the clustering process of quenched-in vacancies. The method is applied to a model problem which can be solved rigorously. Size distributions of clusters at the completion of a reaction are derived by the exact and the grouping methods; when calculated with the latter method, the distribution becomes broader and the position of the maximum shifts to lower sizes. Thus, it is concluded that, at least for the present model problem, the grouping method is not valid. By the grouping method one attempts to obtain an average of many differential equations. It seems difficult to justify such a procedure mathematically.

Journal ArticleDOI
C. T. Yu1
TL;DR: A clustering algorithm which is tree-like in structure, and is based on user queries, is presented and experimental results indicate that the proposed method is superior to the other methods.
Abstract: A clustering algorithm which is tree-like in structure, and is based on user queries, is presented. It is compared to Bonner's Method, Rocchio's Method, Dattola's Method and the Single Link Method in three different aspects, namely system effectiveness, system efficiency and the time required for clustering. Experimental results using the Cranfield 424 collection indicate that the proposed method is superior to the other methods.

Journal ArticleDOI
01 May 1974
TL;DR: A new clustering algorithm is presented that is based on dimensional information that includes an inherent feature selection criterion, which is discussed and shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.
Abstract: A new clustering algorithm is presented that is based on dimensional information. The algorithm includes an inherent feature selection criterion, which is discussed. Further, a heuristic method for choosing the proper number of intervals for a frequency distribution histogram, a feature necessary for the algorithm, is presented. The algorithm, although usable as a stand-alone clustering technique, is then utilized as a global approximator. Local clustering techniques and configuration of a global-local scheme are discussed, and finally the complete global-local and feature selector configuration is shown in application to a real-time adaptive classification scheme for the analysis of remote sensed multispectral scanner data.

Journal ArticleDOI
TL;DR: In this article, an attempt to find an acceptable model for clustering consistent with the picture of a continous hierarchy is discussed. But it is not clear whether superclusters are entities distinguishable in some natural and fundamental way from clusters, or from groups, or even from individual galaxies.
Abstract: A new view of the nature of the large-scale distribution of matter is suggested by the fact that the covariance function for the distribution varies smoothly, like a power law, over a wide range of separations. This leads one to ask whether superclusters are entities distinguishable in some natural and fundamental way from clusters, or from groups, or even from individual galaxies. I discuss here an attempt to find an acceptable model for clustering consistent with the picture of a continous hierarchy.


Journal ArticleDOI
TL;DR: The main aim of this paper is a synthetical study of properties of optimality in spaces formed by partitions of a finite set, and formalizes and takes for a model of that study a family of particularly efficient technics of “clusters centers” type.


Journal ArticleDOI
TL;DR: In this article, the authors analyzed the event-to-event fluctuations of the rapidity distributions for semi-inclusive data obtained from 205 GeV/c and 303 GeV /c proton-proton collisions observed at NAL.

Journal ArticleDOI
TL;DR: In this paper, the existence of clusters is strongly suggested by a number of correlations which have recently been observed in high-energy experiments, and call attention to strategies for studying the properties of produced clusters.
Abstract: The notion of clustering in many-particle production is defined and applied to experimental situations. We argue that the existence of clusters is strongly suggested by a number of correlations which have recently been observed in high-energy experiments, and call attention to strategies for studying the properties of produced clusters.

Journal ArticleDOI
01 Jul 1974
TL;DR: Experimental results are presented to demonstrate the utility of the self-organizing PSV search algorithms and clustering analysis, which has great promise as a means of assessing the complexity of an optimization problem.
Abstract: Self-organizing probability state variable (PSV) parameter search algorithms possessing long-term memory have been formulated to cope with systems that must avoid high performance-penalty operating regions. The information gained from all previous experiments is efficiently encoded in multivariate probability distribution functions (pdf's). This long-term memory capability enables the PSV algorithms to avoid effectively future experiments in high penalty regions. The systems considered are resource-limited, and catastrophic failure may occur if parameter values lying in high penalty regions are implemented. Those cases in which the high penalty regions are not known in advance were investigated. The PSV algorithms have the capability of adaptively learning the location and hypervolume of these regions as the search proceeds. The algorithms are explicitly guided in their internal strategies as a function of the remaining system resources and the updated probability distribution functions. Clustering analysis is used both in the discovery of new operating regions and for updating the pdf's. As a by-product of this research, clustering was also investigated as a presearch scheme. It is shown that this procedure has great promise as a means of assessing the complexity of an optimization problem. Experimental results are presented to demonstrate the utility of the self-organizing PSV search algorithms.

Journal ArticleDOI
TL;DR: In this paper, a clustering method is presented that groups sample plots (stands or other units) together, based on their proximity in a multidimensional test space in which the axes represent the attributes (species) of the individuals (sample plots, etc.).
Abstract: A clustering method is presented that groups sample plots (stands or other units) together, based on their proximity in a multidimensional test space in which the axes represent the attributes (species) of the individuals (sample plots, etc.). The resulting dendrogram is used to make subjective judgements on the type and distinctiveness of the groupings.

Journal ArticleDOI
TL;DR: The aim of this paper is to draw attention to work going on in document retrieval which parallels and in some cases is in advance of the work in data retrieval, aimed at reducing the number of comparisons needed to achieve the desired result.
Abstract: Introduction In a recent paper Burkhard and Keller [I] discuss the best-match p rob lem-the problem "of searching the set of keys in a file to find a key which is closest to a given query." Taking my cue f rom their paper, I present some work which I have done on the same problem in a related field. The aim of this paper is to draw attention to work going on in document retrieval which parallels and in some cases is in advance of the work in data retrieval. In both cases retrieval is based on a file structure imposed on the information, whether keys or documents, aimed at reducing the number of comparisons needed to achieve the desired result. In the case of keys, Burkhard and Keller recommend for their more sophisticated file structure (they recommend several simpler ones) a minimal cover of cliques C such that (1) every key is in at least one element of C, and (2) for no other smaller set C' does (1) hold. Unfortunately finding C requires the generation of (almost) all cliques on the set of keys. It is well known that the computat ion time to generate all cliques can be excessive. The only known bound on this time is so high, order 0(k) n for n keys, that it amounts to no bound at all. The number of cliques in a graph can increase dramatically with the number of nodes in the graph. This in itself has been found to be a hurdle in applications to document retrieval (see e.g. Minker, et al. [6]). So, for applications in document retrieval, where the number of documents to be clustered may be of the order of hundreds of thousands, clique generation is just too slow. Nevertheless, related clustering approaches have Copyright © 1974, Association for Computing Machinery, Inc. General permission to republish, but not for profit, all or part of this material is granted provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. Author's address: Department of Information Science, Monash University, Clayton, Victoria, 3168, Australia. been attempted by Salton [8], Litofsky [5], Crouch [2], and Van Rijsbergen [10], who have called in the techniques of cluster analysis to classify the documents so that the search time may be reduced. In both data and document retrieval search time is reduced by selection of a good clique or cluster representative. Burkhard and Keller proposed a method for selecting clique representatives. In document retrieval, one of the cluster representatives selected for use on heuristic grounds in [4] has proved to have an interesting theoretical basis and I describe it here.

Journal ArticleDOI
TL;DR: In this article, a simple theoretical framework is presented for examining the effects of clustering on local moment formation and magnetic ordering in metallic alloys in which the solute is a transition metal ferromagnetic in the bulk, but for which an isolated atom in the solvent does not necessarily carry a magnetic moment (in a Hartree-Fock sense).
Abstract: A simple theoretical framework is presented for examining the effects of clustering on local moment formation and magnetic ordering in metallic alloys in which the solute is a transition metal ferromagnetic in the bulk, but for which an isolated atom in the solvent does not necessarily carry a magnetic moment (in a Hartree-Fock sense). Consequences of the model are discussed and compared qualitatively with experiment. Some transition metals exhibit spontaneous long range magnetic order, eg Ni, Co, Fe. These we shall call type A. Those which have no such spontaneous order, eg Pd, Rh, Pt, we shall call type B. Simple metals, such as Zn, Cu, Al, we shall refer to as type C. We are interested in investigating theoretically the magnetic properties of BA and CA alloys at finite concentrations. We a r e p a r t i c ~ l a > ~ concerned with the effects of statistical (or metallurgical) clustering in producing tt local moments )> in alloys for which an isolated A atom (in B or C ) is non-magnetic in a Friedel-Anderson or Hartree sense. We restrict our discussion to simple models and simple but non-trivial approximations. As a model for a BA alloy, we take the d-electron Hamiltonian where in general the parameters t, V, U depend upon whether the atoms at i and j are A or B type. The problem may be simplified whilst retaining its essential features by assuming tij independent of the type of atoms, by taking the constituents to have the same number of d-electrons per atom, by treating the local electron number in Hartree approximation, and by assuming (Vi + Ui 4 2 ) = constant, due to sp charge transfer. We then obtain the simple model d-electron Hamiltonian \" i X = C tij a,+, aju C-Si.Si iju i 4

Journal ArticleDOI
TL;DR: The properties of the graphs whose edges correspond to the dissimilarities left invariant by the Jardine-Sibson Bk clustering method are examined and algorithms are given for the determination of Bk clusters.
Abstract: A cluster analysis can be Interpreted as a function which maps the input dissimilarity matrix into an output dissimilarity matrix whose elements indicate the dissimilarity between pairs of objects. Some cluster analyses leave invariant the dissimilarities between certain pairs of objects. The set of elements left invariant by the single-linkage clustering method corresponds to the edges in the minimum spanning tree. The properties of the graphs whose edges correspond to the dissimilarities left invariant by the Jardine-Sibson Bk clustering method are examined and algorithms are given for the determination of Bk clusters.

Journal ArticleDOI
TL;DR: Cl clustering was confirmed, a critical distance between cases of up to 100 metres giving a highly significant result (P = 0·001), and with one exception the observed number of pairs significantly exceeds the expected number even up to 1,000 metres.
Abstract: Eighteen infants with neural tube defects occurring in 979 births over five years in a small Wiltshire town were investigated for evidence of spatial epidemicity. Applying a method not used previously in the study of these defects, clustering was confirmed, a critical distance between cases of up to 100 metres giving a highly significant result (P = 0·001), and with one exception the observed number of pairs significantly exceeds the expected number (P < 0·01) even up to 1,000 metres.

01 Jan 1974
TL;DR: A new multispectral data analysis procedure, based on LARSYS, has been developed which substantially reduces the influence of the analyst.
Abstract: A new multispectral data analysis procedure, based on LARSYS, has been developed which substantially reduces the influence of the analyst. The analysis is automated, including the interpretation of clustering results. The classification results obtained are repeatable and not biased by analyst subjectivity during the analysis.

Journal ArticleDOI
TL;DR: A similarity coefficient involving two measurements is defined and various clustering procedures are studied using the similarity coefficient and emphasis is placed on the application of the cluster analysis procedures to the construction of a diagnostic classification.