scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1976"



Journal ArticleDOI
TL;DR: This paper examines eight clustering programs which are representative of the various available techniques and compare their performances from several points of view to set some guidelines for a potential user of a clustering technique.

336 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show how Strauss's model for clustering arises naturally from a family of finite point processes with a Markov property and as the equilibrium distributions of certain Markov processes.
Abstract: SUMMARY We show how Strauss's model for clustering arises naturally from a family of finite point processes with a Markov property and as the equilibrium distributions of certain Markov processes.

208 citations


Journal ArticleDOI
Koontz1, Narendra, Fukunaga
TL;DR: This paper presents a noniterative, graph-theoretic approach to nonparametric cluster analysis that is governed by a single-scalar parameter, requires no starting classification, and is capable of determining the number of clusters.
Abstract: Nonparametric clustering algorithms, including mode-seeking, valley-seeking, and unimodal set algorithms, are capable of identifying generally shaped clusters of points in metric spaces. Most mode and valley-seeking algorithms, however, are iterative and the clusters obtained are dependent on the starting classification and the assumed number of clusters. In this paper, we present a noniterative, graph-theoretic approach to nonparametric cluster analysis. The resulting algorithm is governed by a single-scalar parameter, requires no starting classification, and is capable of determining the number of clusters. The resulting clusters are unimodal sets.

197 citations


Journal ArticleDOI
TL;DR: A self-scaling local edge detector that can be applied in parallel on a picture is described in this paper, where the edge data is processed to local images of objects and generated a data structure that represents the imaged objects.
Abstract: A computer solution to the problem of automatic location of objects in digital pictures is presented. A self-scaling local edge detector that can be applied in parallel on a picture is described. Clustering algorithms and sequential boundary following algorithms process the edge data to local images of objects and generate a data structure that represents the imaged objects.

147 citations


Journal ArticleDOI
J. Douglas Carroll1
TL;DR: In this paper, hierarchical and non-hierarchical tree structures are proposed as models of similarity data and extensions to the analysis of individual differences are suggested.
Abstract: In this paper, hierarchical and non-hierarchical tree structures are proposed as models of similarity data. Trees are viewed as intermediate between multidimensional scaling and simple clustering. Procedures are discussed for fitting both types of trees to data. The concept of multiple tree structures shows great promise for analyzing more complex data. Hybrid models in which multiple trees and other discrete structures are combined with continuous dimensions are discussed. Examples of the use of multiple tree structures and hybrid models are given. Extensions to the analysis of individual differences are suggested.

143 citations


Journal ArticleDOI
TL;DR: This paper presents a technique for clustering orientation data with minimum constraint on resulting partitions, and testing clusters against a probability distribution defined on the unit sphere which admits elliptical symmetry about its mean.
Abstract: This paper presents a technique for (1) clustering orientation data with minimum constraint on resulting partitions, and (2) testing clusters against a probability distribution defined on the unit sphere which admits elliptical symmetry about its mean. The use of an objective function to highlight certain features of the data is discussed. The technique for delineation and analysis of clusters is applied to an example problem through use of a computer code.

113 citations


Journal ArticleDOI
TL;DR: This paper describes methods, both old and new, for the statistical analysis of non-stationary univariate stochastic point processes and sequences of positive random variables in computer systems.
Abstract: Central problems in the performance evaluation of computer systems are the description of the behavior of the system and characterization of the workload. One approach to these problems comprises the interactive combination of data-analytic procedures with probability modeling. This paper describes methods, both old and new, for the statistical analysis of non-stationary univariate stochastic point processes and sequences of positive random variables. Such processes arefr equently encountered in computer systems. As an illustration of the methodology an analysis is given of the stochastic point process of transactions initiated in a running data base system. On theb asis of the statistical analysis, a non-homogeneous Poissonp rocess model for the transaction initiation process is postulated for periods of high system activity and found to be an adequate characterization of the data. For periods of lower system activity, the transaction initiation process has a complex structure, with more clustering evident. Overall models of this type have application to the validation of proposed data base subsystem models.

93 citations


Journal ArticleDOI
TL;DR: The complete-link hierarchical clustering strategy is reinterpreted as a heuristic procedure for coloring the nodes of a graph using the number of “extraneous” edges in the fit of the constructed partitions to a sequence of graphs obtained from the basic proximity data.
Abstract: The complete-link hierarchical clustering strategy is reinterpreted as a heuristic procedure for coloring the nodes of a graph Using this framework, the problem of assessing goodness-of-fit in complete-link clustering is approached through the number of “extraneous” edges in the fit of the constructed partitions to a sequence of graphs obtained from the basic proximity data Several simple numerical examples that illustrate the suggested paradigm are given and some Monte Carlo results presented

64 citations


Journal ArticleDOI
01 Jan 1976
TL;DR: An efficient algorithm for data reorganization and clustering is presented and has some decided merits over the algorithm of Slagle et al., which provided the immediate stimulus for this work.
Abstract: An efficient algorithm for data reorganization and clustering is presented This algorithm has some decided merits over the algorithm of Slagle et al, which provided the immediate stimulus for this work

62 citations


01 Jan 1976
TL;DR: The main clustering algorithms are presented according to the symbolic descriptions: hierarchies, minimum spanning trees, partitions and their representations and a special attention is given to the dynamic cluster method, which handles adaptively a partition in clusters and a set of symbolic representations of the clusters.
Abstract: Cluster analysis is one of the Pattern Recognition techniques and should be ap­ preciated as such. It may be characterized by the use of resemblance or dis­ semblance measures between the objects to be identified. An evaluation of the significations of such measures is made, examples are given. After presenting a general model of clustering techniques, the general prop­ erties of a cluster, of a clustering operator and of a clustering model are examined. The goal of a clustering analysis is usually to obtain a symbolic description of the problem and from this an identification procedure. The main clustering algorithms are presented according to the symbolic descriptions: hierarchies, minimum spanning trees, partitions and their representations. A special attention is given to the dynamic cluster method, which handles adaptively a partition in clusters and a set of symbolic representations of the clusters. Examples are given of this method when the symbolic representation is either a kernel, a probability law or a linear manifold. Results are given of the dynamic cluster algorithm when the distance measures may also be modified among a class of distance. Some general conclusions and proposal of research are finally given.

Journal ArticleDOI
TL;DR: The behaviour of agglomerative clustering methods, while character weights are thus adjusted, is investigated and suggests that concept formation as done in classical taxonomy may be modelled as a feedback of more global information, which is obtained from local similarities, onto the evaluation of local similarities.


01 Jan 1976
TL;DR: This paper deals with methods found to be practical and useful in the evaluation of multi-component neutron activation analyses and related studies of archaeological artifacts at Brookhaven National Laboratory.
Abstract: The accumulation in various laboratories of large numbers of multi-component analyses of archaeological artifacts has required the development of increasingly more sophisticated methods for intercomparing these data and analyzing them statistically. A number of different methods of both clustering of specimens into groups and multivariate evaluation of group membership are possible. This paper deals with methods found to be practical and useful in the evaluation of multi-component neutron activation analyses and related studies of archaeological artifacts at Brookhaven National Laboratory. The methods were applied most extensively and successfully to data on pottery and related clays. The subject is treated under the following topics: the use of log normal distributions; clustering methods; preliminary univariate, element-by-element, evaluation of the groups indicated by clustering; multivariate probability calculations; the need for multivariate data handling; use of characteristic vectors of the variance-covariance matrix; standardized multivariant coordinates; the handling of missing data; and auxiliary programs. It is felt that multivariate techniques must ultimately be employed to resolve a set of data fully, but that much can be accomplished by more simple element-to-element methods. 6 figures. (RWR)

01 Jan 1976
TL;DR: The specifications, analysis and evaluation of some routing and topology design procedures for large store and forward packet switched computer communication networks rely on a hierarchical clustering of the network nodes.
Abstract: : This research deals with the specification, analysis and evaluation of some routing and topology design procedures for large store and forward packet switched computer communication networks. The procedures studied are an extension of present techniques and rely on a hierarchical clustering of the network nodes.

Journal ArticleDOI
TL;DR: Clustering by the unweighted pair group method, using the phi coefficient, is recommended for the analysis of biostratigraphic and paleoecologic presence—absence data.
Abstract: Binary coefficients can be assigned to several categories on the basis of algebraic and conceptual properties. The phi coefficient of association is related algebraically to the chi-square statistic for 2-by-2 contingency tables, and use of this coefficient in cluster analysis permits the objective, nonarbitrary partitioning of objects among groups on the basis of previously selected levels of significant, positive association. Similarity, matching, and distance coefficients possess neither conceptual nor operational statistical meaning for many geological data sets. The weighted pair group method and flexible clustering strategy may give an overly conservative partitioning of objects among groups. Clustering by the unweighted pair group method, using the phi coefficient, is recommended for the analysis of biostratigraphic and paleoecologic presence—absence data.


Journal ArticleDOI
TL;DR: The paper presents an exposition of two data reduction methods widely used in the behavioral sciences and commonly referred to as the single-link and complete-link hierarchical clustering procedures.
Abstract: The paper presents an exposition of two data reduction methods widely used in the behavioral sciences and commonly referred to as the single-link and complete-link hierarchical clustering procedures. Major emphasis is placed on several statistical techniques for evaluating the adequacy of a completed partition hierarchy and, in particular, the individual partitions within the sequence. A numerical reanalysis of some previously published data is included as an illustration of the suggested methodology.

Journal ArticleDOI
TL;DR: The algorithm based on the ISODATA technique, calculates all required thresholds from the actual data, thus eliminating a priori estimates, and empirical derivation of the set of rules for calculating parameters is presented.

Journal ArticleDOI
TL;DR: In this paper, the authors considered an infinite dynamical system idealized as a C*-algebra acted upon by time-translation automorphisms and showed that a stationary state of such a system which is stable for local perturbations of the dynamics and is clustering in time, either gives rise to a one-sided energy spectrum or is a KMS state.
Abstract: We consider as in [1] an infinite dynamical system idealized as aC*-algebra acted upon by time-translation automorphisms. We show that a stationary state of such a system which is stable for local perturbations of the dynamics and is clustering in time, either gives rise to a one-sided energy spectrum or is a KMS state. The clustering property assumed here is weaker than the one assumed in [1]. The new proof makes explicit use of spectral properties of clustering states.

01 Oct 1976
TL;DR: The DBC is one of the first database machines which have built-in protection mechanisms for access control and clustering mechanisms for performance enhancement.
Abstract: : The database computer (DBC) is a specialized back-end computer which is capable of managing data of 10 to the 9th power -10 to the 10th power bytes in size and supporting known data models such as relational, network, hierarchical and attribute-based models. In addition to its intended purpose of handling large databases and interfacing with various data models, the DBC is one of the first database machines which have built-in protection mechanisms for access control and clustering mechanisms for performance enhancement.

Proceedings ArticleDOI
13 Oct 1976
TL;DR: The tool that is introduced to attack problems in the use of computerized data base systems is clustering analysis, which detects an error in the book “Weyer's Warships of the World 1969.”
Abstract: Two problems in the use of computerized data base systems are: (1) How can we estimate the values for missing data? (2) How can we improve data integrity, that is, reduce the number of errors in the data? The tool that we introduce to attack these problems is clustering analysis. Experimental results indicate that our method is feasible.Our algorithm detected an error in the book “Weyer's Warships of the World 1969.” Each of the approximately 2000 warships listed in the book has 18 variables associated with it. It would be difficult for a person to find errors in the book. Our methods do not require any a priori knowledge about the data, for example, about warships.

01 Jan 1976
TL;DR: From preliminary tests it appears that the CLUSTR classifier is as accurate as the Bayes maximum likelihood decision rule and may be useful for proportion estimation, especially in cases where ground truth is limited, or where there are variations in the data, orwhere conventional signature extraction is difficult.
Abstract: Conventional classification procedures have several difficulties which sometimes limit the usefulness of computer aided analysis techniques on multispectral scanner data. In order to minimize some of these problems, the clustering algorithm used at ERIM (called CLUSTR) was adapted for use as a classifier. Briefly, the technique devised is to cluster the scene, assigning each pixel to a cluster, and then to identify the crop type of the clusters by examining training areas to determine the crop type of pixels assigned to each cluster. In this manner, the classification of each pixel to a particular crop class is accomplished. This approach to classification has several advantages OVer more conventional classification techniques. Among these advantages,are: 1) CLUSTR is designed to use several small normal distributions (clusters) to approximate the nongaussian spectral distributions of the various ground classes thus minimizing problems with non-gaussian distributions. 2) CLUSTR continually updates its estimate of the various spectral distributions, including modifying the means, variances and even the number of clusters as the distributions in the data change. This minimizes the effects of most variations in the data. 3) Problems stemming from the inability to obtain representative training data are reduced, because all of the data is used in constructing the signatures, instead of just the data from the training areas. *The effort described herein was supported by the Earth Observations Division of the NASA/Johnson Space Center under contract NAS9-l4l23. 3A-35 4) Inaccuracies in the ground truth for training areas are less impor_ tant than in conventional techniques, e.g., you do not cluster the wheat training regions and call all resulting clusters "wheat" even if they look like corn, in. ' stead you cluster the entire scene and only those clusters which have more "wheat" pixels assigned to them than "other" pixels are identified as "wheat". With conventional techniques, all pixels must be correctly identified. 5) Human participation in the signature extraction and classification procedures is reduced, because they are combined into one step. From preliminary tests it appears that the CLUSTR classifier is as accurate as the Bayes maximum likelihood decision rule and may be useful for proportion estimation, especially in cases where ground truth is limited, or where there are variations in the data, or where conventional signature extraction is difficult.

Journal ArticleDOI
TL;DR: A design algorithm based on clustering of terminals followed by optimization of location, capacity and number of concentrators in each cluster is developed and evaluated and has the following advantages.
Abstract: Cost-effective design of networks linking many remote terminals to a central computer (CPU) involves use of low-speed data lines to link geographically close terminals to concentrators. The concentrators are connected via high-speed data lines to the CPU. A design algorithm based on clustering of terminals followed by optimization of location, capacity and number of concentrators in each cluster is developed and evaluated. Evaluation is based on network designs for sets of 20 randomly (uniformly) generated locations of up to 500 terminals, with specific (realistic) cost versus capacity schedules being used for data lines and concentrators. In comparison with the popular Add algorithm, our linear regression clustering (LRC) algorithm has the following advantages: 1) the total cost of the concentrators, low-speed terminal lines, and high-speed CPU lines is typically 8 percent less; 2) the average transmission time delay at the terminals is typically 40 percent less; 3) the cost of adding low-speed data lines to connect additional terminals to concentrators in existing networks is typically 50 percent less; 4) the computational cost of design is typically 20 times less for 100-terminal networks and 150 times less for 500-terminals networks. Implications of the results and suggestions for further work are discussed.

01 Jan 1976
TL;DR: In this paper, four modifications of agglomerative hierarchic non-overlapping clustering methods were used on a previously published material of south Finnish peatland vegetation: nearest-neighbour, furthest neighbor, group average and minimum variance.
Abstract: Four modifications of agglomerative hierarchic non-overlapping clustering methods were used on a previously published material of south Finnish peatland vegetation: nearest-neighbour, furthest-neighbour, group average and minimum variance. The dendrogram of the nearest -neighbour clustering proved unsatis factory, while the others were in reasonably good agreement with the earlier subjective classification. Ordination of mire types was carried out with factor analysis. The resulting two-dimensional diagram was in concordance with the agglomerative clustering. Factor analysis was found to be a useful method for the summarization of the major structure of the material studied, while for a detailed classification the hierarchic clustering was preferable. The results of clustering and ordination suggest that wooded fens should be divided into several major types. The position of some transitional mire types is also discussed.

Journal ArticleDOI
TL;DR: It is shown, through examples of writings, that clustering of optical data is achieved in a 2-D Karhunen-Loève space, allowing the dating of texts.
Abstract: A Fourier transform gives a first dimensionally reduced description of optical data. But it is not sensitive to statistical variations characterizing class properties and allowing clustering and statistical recognition. A Karhunen-Loeve transform of Fourier spectra leads to a more classifying space: it is shown, through examples of writings, that clustering of optical data (especially recognition of scriptors) is achieved in a 2-D Karhunen-Loeve space. Inner evolution of data belonging to a given class is described in a 3-D KL space, allowing the dating of texts.

Journal ArticleDOI
TL;DR: In this article, an unsupervised classification scheme for the production of thematic maps from LANDSAT imagery is described, which can be implemented in an interactive mode on a timesharing computer system.
Abstract: SUMMARYThis paper describes an unsupervised classification scheme for the production of thematic maps from LANDSAT imagery. In this method, spectrally separable classes are identified from a four-dimensional histogram generated from a portion of a LANDSAT image. The scheme is very rapid, memory requirements are modest, and it can be implemented in an interactive mode on a timesharing computer system. Test results from an agricultural area near Melfort, Saskatchewan, exhibit accuracies of the classifications comparable to those obtained by supervised methods.

Journal ArticleDOI
TL;DR: In this paper, the use of cluster analysis to aid in empirically validating the course objectives of an industrial training curriculum is described, and two major findings emerged from the data analysis: the discovery of an important job activity missing from the curriculum and the identification of sizeable groups of workers with distinctly different training needs.
Abstract: This paper describes the use of cluster analysis to aid in empirically validating the course objectives of an industrial training curriculum. For instance, clusters of basic training needs were found and compared against existing course contents. Two major findings emerged from the data analysis: the discovery of an important job activity missing from the curriculum and the identification of sizeable groups of workers with distinctly different training needs. Considerable attention is also devoted to the statistical aspects of clustering which were important in obtaining these results.

Journal ArticleDOI
TL;DR: The results of field center pixel classification using MASC-extended signatures have been compared with classification results using untransformed signatures, and in all three data set pairs the MASC algorithm yielded very good results.
Abstract: A new signature extension method for use with LANDSAT data has been developed. The MASC (Multiplicative and Additive Signature Correction) algorithm uses an unsupervised clustering routine to gain relative information from two data sets. This information is then used to map the signatures derived from one data set onto the other data set. The MASC algorithm can be totally automated, thus making it suitable for use in large area crop inventories. This signature extension method has been tested on agricultural LANDSAT data. The results of field center pixel classification using MASC-extended signatures have been compared with classification results using untransformed signatures. In all three data set pairs the MASC algorithm yielded very good results.

Journal ArticleDOI
TL;DR: Factor analysis provided a clearer indication of relationships among taxa of the Limnanthaceae than did conventional clustering analysis and apparently, ordination of taxa in three dimensions is necessary to accurately express relationships between divergent evolutionary lines of the family.
Abstract: The Limnanthaceae is a small family of North American herbs with uncertain internal relationships. Taxa of the family were compared on the basis of 46 flavonol glycosides occurring in all plant tissues or in petals only. The flavonoid data were analyzed by 3 numerical taxonomic techniques: 1) clustering by the Weighted Pair Group method considering positive and negative matches (Using Simple Matching coefficient) ; 2) clustering by the Weighted Pair Group method considering only positive matches (using Jaccard coefficient) ; 3) Varimax Factor Analysis with rotation. No significantly different results were produced by the two clustering methods. Factor analysis provided a clearer indication of relationships among taxa of the Limnanthaceae than did conventional clustering analysis. Apparently, ordination of taxa in three dimensions is necessary to accurately express relationships between divergent evolutionary lines of the family since much of the expressed variation in flavonoids cannot be accounted for by a one-dimensional clustering technique.