scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1975"


Book
01 Feb 1975

6,068 citations


Journal ArticleDOI
TL;DR: Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.
Abstract: Nonparametric density gradient estimation using a generalized kernel approach is investigated. Conditions on the kernel functions are derived to guarantee asymptotic unbiasedness, consistency, and uniform consistency of the estimates. The results are generalized to obtain a simple mcan-shift estimate that can be extended in a k -nearest-neighbor approach. Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.

3,125 citations


Journal ArticleDOI
TL;DR: The CONCOR procedure is applied to several illustrative sets of social network data and is found to give results that are highly compatible with analyses and interpretations of the same data using the blockmodel approach of White.

750 citations


Journal ArticleDOI

462 citations


Journal ArticleDOI
TL;DR: In this article, the performance of six hierarchical clustering methods (given by one algorithm, Wishart [1969j] ) were compared on bivariate and multivariate normal Monte Carlo samples.
Abstract: The performance of six hierarchical clustering methods (given by one algorithm, Wishart [1969j) are compared on bivariate and multivariate normal Monte Carlo samples. The methods are stopped with the correct number of clusters and compared with respect to correct classification (placing pairs of points in the same or different clusters correctly or incorrectly) and with each other (both methods agree or disagree in placing a pair of points in the same or differing clusters).

255 citations


Journal ArticleDOI
TL;DR: The concept of power for monotone invariant clustering procedures is developed via the possible partitions of objects at each iteration level in the obtained hierarchy in this article, and the probability of rejecting the randomness hypothesis is obtained empirically for the possible types of partitions of the n objects employed.
Abstract: The concept of power for monotone invariant clustering procedures is developed via the possible partitions of objects at each iteration level in the obtained hierarchy. At a given level, the probability of rejecting the randomness hypothesis is obtained empirically for the possible types of partitions of the n objects employed. The results indicate that the power of a particular hierarchical clustering procedure is a function of the type of partition. The additional problem of estimating a “true” partition at a certain level of a hierarchy is discussed briefly.

240 citations


Journal ArticleDOI
TL;DR: An algorithm is described for generating fuzzy partitions which extremize a fuzzy extension of the k-means squared-error criterion function on finite data sets X, and the behavior of the algorithm is compared with that of the ordinary ISODATA clustering process and the maximum likelihood method.
Abstract: An algorithm is described for generating fuzzy partitions which extremize a fuzzy extension of the k-means squared-error criterion function on finite data sets X. It is shown how this algorithm may be applied to the problem of estimating the parameters (a priori probabilities, means, and covariances) of mixture of multivariate normal densities, given a finite sample X drawn from the mixture. The behavior of the algorithm is compared with that of the ordinary ISODATA clustering process and the maximum likelihood method, for a specific bivariate mixture.

236 citations


Journal ArticleDOI
TL;DR: A clustering algorithm based on the branch and bound method of combinatorial optimization determines the globally optimum classification and is computationally efficient.
Abstract: The problem of clustering N objects into M classes may be viewed as a combinatorial optimization algorithm. In the literature on clustering, iterative hill-climbing techniques are used to find a locally optimum classification. In this paper, we develop a clustering algorithm based on the branch and bound method of combinatorial optimization. This algorithm determines the globally optimum classification and is computationally efficient

215 citations


Proceedings Article
03 Sep 1975
TL;DR: A self-scaling local edge detector that can be applied in parallel on a picture is described and clustering algorithms and sequential boundary following algorithms process the edge data to local images of objects and generate a data structure that represents the imaged objects.
Abstract: A solution to the problem of automatic location of objects in digital pictures by computer is presented. A self-scaling local edge detector which can be applied in parallel on a picture is described. Clustering algorithms and boundary following algorithms which are sequential in nature process the edge data to locate images of objects and generate data structure which represents the imaged objects.

163 citations


Journal ArticleDOI
J.A. Hartigan1
TL;DR: Methods of estimating multivariate densities may be converted to clusters techniques, and clustering techniques may be helpful in estimating multivari densities.
Abstract: One model for clusters in multivariate data is that the data are sampled from a density with many modes, one mode for each cluster. Methods of estimating multivariate densities may therefore be converted to clustering techniques, and clustering techniques may be helpful in estimating multivariate densities. Graphical techniques for representing clusters are closely related to multivariate histograms. Block histograms in two dimensions are constructed by finding a rectangle of minimum area containing a fixed number of points, deleting this rectangle and the points it contains, then finding another rectangle of minimum area containing a fixed number of points and so on. These histograms are simple visual representations of a density estimate in two dimensions. Analogous block histograms in many dimensions are useful but more difficult to represent graphically. A different approach represents each point by a box drawn in three or more dimensions. If the points are first ordered by some other clustering techn...

151 citations


Proceedings ArticleDOI
22 Sep 1975
TL;DR: A metric with which to measure the similarity of usage among data items is developed and used by a clustering algorithm to reduce the space of alternative designs to a point where solution is economically feasible.
Abstract: The physical structure and relative placement of information elements within a data base is critical for the efficient design of a computerized information system which is shared by a community of users. Traditionally the selection among alternative structural designs has been handled largely via heuristics. Recent research has shown that a number of significant design problems can be stated mathematically as nonlinear, integer, zero-one programming problems. In concept, therefore, mathematical programming algorithms can be used to determine "optimal" data base designs. In practice, one finds that realistic problems of even modest size are computationally infeasible. This paper presents a means for overcoming this difficulty in the design of data base records. A metric with which to measure the similarity of usage among data items is developed and used by a clustering algorithm to reduce the space of alternative designs to a point where solution is economically feasible.


Journal ArticleDOI
01 Jan 1975
TL;DR: A clustering and data-reorganizing algorithm based on the concept of the shortest spanning path of a graph is given that can be used to reorganize and/or cluster a large file of data.
Abstract: A clustering and data-reorganizing algorithm based on the concept of the shortest spanning path of a graph is given. This algorithm can be used to reorganize and/or cluster a large file of data.

Journal ArticleDOI
TL;DR: A variety of retrieval strategies applied to this hierarchy are evaluated in terms of effectiveness and efficiency and comparisons are made between these results and those of similar experiments in document clustering on the Smart project.
Abstract: The single-link cluster method is used to construct a hierarchic classification for the 1400 documents in the Cranfield test collection. A variety of retrieval strategies applied to this hierarchy are evaluated in terms of effectiveness and efficiency. Comparisons are made between our results and those of similar experiments in document clustering on the Smart project.

Journal ArticleDOI
TL;DR: The concept of "neighborhood of a point" has been used in various programs for analyzing spatial dot patterns and an alternative definition is proposed to reflect the intuitive cluster associations of certain points in simple patterns more satisfactorily.
Abstract: The concept of "neighborhood of a point" has been used in various programs for analyzing spatial dot patterns. The common definition based on k-nearest neighbors does not reflect the intuitive cluster associations of certain points in simple patterns. An alternative definition is proposed to reflect such associations more satisfactorily. Its applications includes cluster analysis and descriptive measures for dot patterns.


Journal ArticleDOI
TL;DR: Transformations which map noisy feature points originating from the same curve in a picture into dense regions are considered and their properties are treated as they relate to subsequent clustering to detect curves in the original picture.

Journal ArticleDOI
TL;DR: In this paper, a spatial clustering procedure applicable to multispectral image data is discussed, which takes into account the spatial distribution of the measurements as well as their distribution in measurement space.
Abstract: A spatial clustering procedure applicable to multispectral image data is discussed. The procedure takes into account the spatial distribution of the measurements as well as their distribution in measurement space. The procedure calls for the generation and then thresholding of the gradient image, cleaning the thresholded image, labeling the connected regions in the cleaned image, and clustering the labeled regions. An experiment was carried out on ERTS data in order to study the effect of the selection of the gradient image, the threshold, and the cleaning process. Three gradients, three gradient thresholds, and two cleaning parameters yielded 18 gradient-thresholds combinations. The combination that yielded connected homogeneous regions with the smallest variance was Robert's gradient with distance 2, thresholded by its running mean, and a cleaning process that considered a resolution cell to be homogeneous if and only if at least 7 of its nearest neighbors were homogeneous.

01 Jan 1975
TL;DR: Three approaches for analyzing Landsat-1 data from Ludwig Mountain in the San Juan Mountain range in Colorado are considered.

Journal ArticleDOI
TL;DR: An iterative, non-hierarchical, divisive, polythetic method of clustering phytosociological data is described and an efficient means of revising the results is presented.
Abstract: An iterative, nonhierarchical, divisive, polythetic method of clustering phytosociological data is described and an efficient means of revising the results is presented. The procedures are applied to two sets of forest vegetation in western Washington (U.S.A.) and found to produce ecologically interpretable results. The clustering technique, ISODATA, produces clusters in n-species space on the basis of distance between cluster means. The iterations are truncated much sooner than is recommended by its originators, but the provisional clusters are analyzed by stepwise multiple discriminant analysis. This produces an effective and efficient analysis of the individual contribution of species to the classification and arranges the samples in canonical space, the axes of which are mutually orthogonal. This last procedure results in the clusters displayed in a reduced dimensional space and establishes the relationship between stands. The combination of procedures appears superior to several with which it was compared.

01 Jan 1975
TL;DR: In this article, three approaches for analyzing Landsat-1 data from Ludwig Mountain in the San Juan Mountain range in Colorado are considered: supervised, non-supervised and modified supervised.
Abstract: Three approaches for analyzing Landsat-1 data from Ludwig Mountain in the San Juan Mountain range in Colorado are considered. In the 'supervised' approach the analyst selects areas of known spectral cover types and specifies these to the computer as training fields. Statistics are obtained for each cover type category and the data are classified. Such classifications are called 'supervised' because the analyst has defined specific areas of known cover types. The second approach uses a clustering algorithm which divides the entire training area into a number of spectrally distinct classes. Because the analyst need not define particular portions of the data for use but has only to specify the number of spectral classes into which the data is to be divided, this classification is called 'nonsupervised'. A hybrid method which selects training areas of known cover type but then uses the clustering algorithm to refine the data into a number of unimodal spectral classes is called the 'modified-supervised' approach.

Journal ArticleDOI
TL;DR: In this article, a class of related nonmetric (monotone invariant) hierarchical grouping methods is presented, defined in terms of generalized cliques, based on a systematically varying specification of the degree of indirectness of permitted relationships (i.e., degree of "chaining").
Abstract: A class of related nonmetric (“monotone invariant”) hierarchical grouping methods is presented. The methods are defined in terms of generalized cliques, based on a systematically varying specification of the degree of indirectness of permitted relationships (i.e., degree of “chaining”). This approach to grouping is shown to provide a useful framework for grouping methods based on ana priori specification of the properties of the desired subsets, and includes a natural generalization for “complete linkage” and “single linkage” clustering, such as the methods of Johnson [1967]. The central feature of the class of methods is a simple iterative matrix operation on the original disparities (“inverse-proximities” or “dissimilarities”) matrix, and one of the methods also constitutes a very efficient single linkage clustering procedure.

01 Jan 1975
TL;DR: The Symposium topics were analysis algorithms, clustering feature selection, analysis techniques for forest and agricultural applications, water resources, image processing, computer systems, monitoring and evaluation of natural resources, and land use and geologic applications.
Abstract: The purpose of the Symposium was an in-depth presentation of new results in the theory, technology, and application of computer processing of remotely sensed data. In addition to the regular papers published in full, there are also included titles and abstracts of the short papers presented. The Symposium topics were: analysis algorithms, clustering feature selection, analysis techniques for forest and agricultural applications, water resources, image processing, computer systems, monitoring and evaluation of natural resources, and land use and geologic applications. (JSR)



Journal ArticleDOI
TL;DR: In this paper, a simple clustering method for preliminary classification of very large sets of phytosociological releves is described and applied to a collection of European salt marsh releves.
Abstract: A simple clustering method for preliminary classification of very large sets of phytosociological releves is described and applied to a collection of European salt marsh releves. The essence of the method is that each releve is considered separately and only in relation to the releves considered before. Clusters are formed through either assigning a new releve to an already existing cluster or to designate it as a separate cluster. The decision depends on whether the highest releve-cluster similarity exceeds a threshold value or not. The ‘similarity ratio’ is used as a similarity measure. The maximum number of clusters that can be formed is 50. An agglomerative cluster analysis of the clusters is added to the program. A table is printed with the distribution of all species over the clusters (with their score sums per cluster). With a punched output the procedure can be repeated for a part of the original dataset, with a higher threshold value. The resulting classification can be used as a starting point for more detailed analyses such as agglomerative clusting with relocation and principal component analysis.


Journal ArticleDOI
TL;DR: In this paper, a simple model with three variable parameters was developed for calculating observed clustering data from a free jet expansion, and the reactive cross sections that gave the best fit of experimental data were given by (π/4) σ2R1 = 9.90×10−16 cm2 and (π /4) ε2R3=5.22 ×10−17 cm2.
Abstract: A simple model with three variable parameters is developed for calculating observed clustering data from a free jet expansion. One parameter, σR1, can be associated with a reactive collision diameter for a monomer associating with a cluster to form a larger, excited cluster. A second parameter, σR3, can be associated with a reactive collision diameter for energy transfer from the excited cluster to the surrounding gas. The third parameter, K′, is a constant in a unimolecular decay rate expression for the excited cluster. The reactive cross sections that give the best fit of experimental data are given by (π/4) σ2R1 =9.90×10−16 cm2 and (π/4) σ2R3=5.22 ×10−17 cm2. No value could be assigned to K′. Apparently the excited clusters are stabilized by energy exchange collisions in this particular flow field before there is appreciable unimolecular decay. Good fits of clustering data are obtained even though these same values are used for clusters varying in size from 6 to 26 water molecules. Since the clustering...


Journal ArticleDOI
TL;DR: An interactive method for decomposing mixtures consisting of an arbitrary number of bivariate Gaussian components is described, which can handle problems currently attacked by cluster analysis methods.
Abstract: An interactive method for decomposing mixtures consisting of an arbitrary number of bivariate Gaussian components is described, which can handle problems currently attacked by cluster analysis methods. In contradistinction to most clustering methods, this procedure does not require selection of a metric or distance function with sample element arguments. Instead, estimates of population bivariate contours are examined graphically to yield estimates of subpopulation parameters. This approach is based on properties of the underlying population rather than on heuristic measures of distance between elements of a sample. Besides discussing the theory underlying this new class of procedures, several examples involving real and simulated data are presented.