Showing papers on "Cluster analysis published in 1975"

PDF

Open Access

Book•

Clustering Algorithms

[...]

John A. Hartigan

01 Feb 1975

6,068 citations

Journal Article•DOI•

The estimation of the gradient of a density function, with applications in pattern recognition

[...]

K. Fukunaga¹, L. Hostetler²•Institutions (2)

Purdue University¹, Sandia National Laboratories²

01 Jan 1975-IEEE Transactions on Information Theory

TL;DR: Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.

...read moreread less

Abstract: Nonparametric density gradient estimation using a generalized kernel approach is investigated. Conditions on the kernel functions are derived to guarantee asymptotic unbiasedness, consistency, and uniform consistency of the estimates. The results are generalized to obtain a simple mcan-shift estimate that can be extended in a k -nearest-neighbor approach. Applications of gradient estimation to pattern recognition are presented using clustering and intrinsic dimensionality problems, with the ultimate goal of providing further understanding of these problems in terms of density gradients.

...read moreread less

3,125 citations

Journal Article•DOI•

An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling

[...]

Ronald L. Breiger¹, Scott A. Boorman², Phipps Arabie³•Institutions (3)

Harvard University¹, University of Pennsylvania², University of Minnesota³

01 Aug 1975-Journal of Mathematical Psychology

TL;DR: The CONCOR procedure is applied to several illustrative sets of social network data and is found to give results that are highly compatible with analyses and interpretations of the same data using the blockmodel approach of White.

...read moreread less

750 citations

Journal Article•DOI•

A model for clustering

[...]

David J. Strauss

01 Aug 1975-Biometrika

462 citations

Journal Article•DOI•

391: A Monte Carlo Comparison of Six Clustering Procedures

[...]

F. Kent Kuiper, Lloyd D. Fisher

01 Sep 1975-Biometrics

TL;DR: In this article, the performance of six hierarchical clustering methods (given by one algorithm, Wishart [1969j] ) were compared on bivariate and multivariate normal Monte Carlo samples.

...read moreread less

Abstract: The performance of six hierarchical clustering methods (given by one algorithm, Wishart [1969j) are compared on bivariate and multivariate normal Monte Carlo samples. The methods are stopped with the correct number of clusters and compared with respect to correct classification (placing pairs of points in the same or different clusters correctly or incorrectly) and with each other (both methods agree or disagree in placing a pair of points in the same or differing clusters).

...read moreread less

255 citations

Journal Article•DOI•

Measuring the Power of Hierarchical Cluster Analysis

[...]

Frank B. Baker¹, Lawrence Hubert¹•Institutions (1)

University of Wisconsin-Madison¹

01 Mar 1975-Journal of the American Statistical Association

TL;DR: The concept of power for monotone invariant clustering procedures is developed via the possible partitions of objects at each iteration level in the obtained hierarchy in this article, and the probability of rejecting the randomness hypothesis is obtained empirically for the possible types of partitions of the n objects employed.

...read moreread less

Abstract: The concept of power for monotone invariant clustering procedures is developed via the possible partitions of objects at each iteration level in the obtained hierarchy. At a given level, the probability of rejecting the randomness hypothesis is obtained empirically for the possible types of partitions of the n objects employed. The results indicate that the power of a particular hierarchical clustering procedure is a function of the type of partition. The additional problem of estimating a “true” partition at a certain level of a hierarchy is discussed briefly.

...read moreread less

240 citations

Journal Article•DOI•

Optimal Fuzzy Partitions: A Heuristic for Estimating the Parameters in a Mixture of Normal Distributions

[...]

James C. Bezdek¹, J.C. Dunn•Institutions (1)

Marquette University¹

01 Aug 1975-IEEE Transactions on Computers

TL;DR: An algorithm is described for generating fuzzy partitions which extremize a fuzzy extension of the k-means squared-error criterion function on finite data sets X, and the behavior of the algorithm is compared with that of the ordinary ISODATA clustering process and the maximum likelihood method.

...read moreread less

Abstract: An algorithm is described for generating fuzzy partitions which extremize a fuzzy extension of the k-means squared-error criterion function on finite data sets X. It is shown how this algorithm may be applied to the problem of estimating the parameters (a priori probabilities, means, and covariances) of mixture of multivariate normal densities, given a finite sample X drawn from the mixture. The behavior of the algorithm is compared with that of the ordinary ISODATA clustering process and the maximum likelihood method, for a specific bivariate mixture.

...read moreread less

236 citations

Journal Article•DOI•

A Branch and Bound Clustering Algorithm

[...]

Warren L. G. Koontz¹, Patrenahalli M. Narendra, Keinosuke Fukunaga•Institutions (1)

Bell Labs¹

01 Sep 1975-IEEE Transactions on Computers

TL;DR: A clustering algorithm based on the branch and bound method of combinatorial optimization determines the globally optimum classification and is computationally efficient.

...read moreread less

Abstract: The problem of clustering N objects into M classes may be viewed as a combinatorial optimization algorithm. In the literature on clustering, iterative hill-climbing techniques are used to find a locally optimum classification. In this paper, we develop a clustering algorithm based on the branch and bound method of combinatorial optimization. This algorithm determines the globally optimum classification and is computationally efficient

...read moreread less

215 citations

Proceedings Article•

Boundary and object detection in real world images

[...]

Yoram Yakimovsky¹•Institutions (1)

California Institute of Technology¹

03 Sep 1975

TL;DR: A self-scaling local edge detector that can be applied in parallel on a picture is described and clustering algorithms and sequential boundary following algorithms process the edge data to local images of objects and generate a data structure that represents the imaged objects.

...read moreread less

Abstract: A solution to the problem of automatic location of objects in digital pictures by computer is presented. A self-scaling local edge detector which can be applied in parallel on a picture is described. Clustering algorithms and boundary following algorithms which are sequential in nature process the edge data to locate images of objects and generate data structure which represents the imaged objects.

...read moreread less

163 citations

Journal Article•DOI•

Printer graphics for clustering

[...]

J.A. Hartigan¹•Institutions (1)

Yale University¹

01 Jan 1975-Journal of Statistical Computation and Simulation

TL;DR: Methods of estimating multivariate densities may be converted to clusters techniques, and clustering techniques may be helpful in estimating multivari densities.

...read moreread less

Abstract: One model for clusters in multivariate data is that the data are sampled from a density with many modes, one mode for each cluster. Methods of estimating multivariate densities may therefore be converted to clustering techniques, and clustering techniques may be helpful in estimating multivariate densities. Graphical techniques for representing clusters are closely related to multivariate histograms. Block histograms in two dimensions are constructed by finding a rectangle of minimum area containing a fixed number of points, deleting this rectangle and the points it contains, then finding another rectangle of minimum area containing a fixed number of points and so on. These histograms are simple visual representations of a density estimate in two dimensions. Analogous block histograms in many dimensions are useful but more difficult to represent graphically. A different approach represents each point by a box drawn in three or more dimensions. If the points are first ordered by some other clustering techn...

...read moreread less

151 citations

Proceedings Article•DOI•

The use of cluster analysis in physical data base design

[...]

Jeffrey A. Hoffer¹, Dennis G. Severance²•Institutions (2)

Case Western Reserve University¹, University of Minnesota²

22 Sep 1975

TL;DR: A metric with which to measure the similarity of usage among data items is developed and used by a clustering algorithm to reduce the space of alternative designs to a point where solution is economically feasible.

...read moreread less

Abstract: The physical structure and relative placement of information elements within a data base is critical for the efficient design of a computerized information system which is shared by a community of users. Traditionally the selection among alternative structural designs has been handled largely via heuristics. Recent research has shown that a number of significant design problems can be stated mathematically as nonlinear, integer, zero-one programming problems. In concept, therefore, mathematical programming algorithms can be used to determine "optimal" data base designs. In practice, one finds that realistic problems of even modest size are computationally infeasible. This paper presents a means for overcoming this difficulty in the design of data base records. A metric with which to measure the similarity of usage among data items is developed and used by a clustering algorithm to reduce the space of alternative designs to a point where solution is economically feasible.

...read moreread less

Graphical evaluation of hierarchical clustering schemes

[...]

Henry M. Halff

01 Jan 1975

Journal Article•DOI•

A Clustering and Data-Reorganizing Algorithm

[...]

J. R. Slagle¹, C. L. Chang², S. R. Heller³•Institutions (3)

United States Naval Research Laboratory¹, IBM², United States Environmental Protection Agency³

01 Jan 1975

TL;DR: A clustering and data-reorganizing algorithm based on the concept of the shortest spanning path of a graph is given that can be used to reorganize and/or cluster a large file of data.

...read moreread less

Abstract: A clustering and data-reorganizing algorithm based on the concept of the shortest spanning path of a graph is given. This algorithm can be used to reorganize and/or cluster a large file of data.

...read moreread less

Journal Article•DOI•

Document clustering: An evaluation of some experiments with the cranfield 1400 collection

[...]

C.J. van Rijsbergen¹, W.B. Croft¹•Institutions (1)

Monash University, Clayton campus¹

01 Jan 1975-Information Processing and Management

TL;DR: A variety of retrieval strategies applied to this hierarchy are evaluated in terms of effectiveness and efficiency and comparisons are made between these results and those of similar experiments in document clustering on the Smart project.

...read moreread less

Abstract: The single-link cluster method is used to construct a hierarchic classification for the 1400 documents in the Cranfield test collection. A variety of retrieval strategies applied to this hierarchy are evaluated in terms of effectiveness and efficiency. Comparisons are made between our results and those of similar experiments in document clustering on the Smart project.

...read moreread less

Journal Article•DOI•

An Alternative Definition for "Neighborhood of a Point"

[...]

J.F. O'Callaghan

01 Nov 1975-IEEE Transactions on Computers

TL;DR: The concept of "neighborhood of a point" has been used in various programs for analyzing spatial dot patterns and an alternative definition is proposed to reflect the intuitive cluster associations of certain points in simple patterns more satisfactorily.

...read moreread less

Abstract: The concept of "neighborhood of a point" has been used in various programs for analyzing spatial dot patterns. The common definition based on k-nearest neighbors does not reflect the intuitive cluster associations of certain points in simple patterns. An alternative definition is proposed to reflect such associations more satisfactorily. Its applications includes cluster analysis and descriptive measures for dot patterns.

...read moreread less

Journal Article•DOI•

Third order ion-molecule clustering reactions

[...]

Anthony Good

01 Oct 1975-Chemical Reviews

Journal Article•DOI•

Transformations for the Computer Detection of Curves in Noisy Pictures

[...]

Stephen D. Shapiro¹•Institutions (1)

Stevens Institute of Technology¹

01 Dec 1975-Computer Graphics and Image Processing

TL;DR: Transformations which map noisy feature points originating from the same curve in a picture into dense regions are considered and their properties are treated as they relate to subsequent clustering to detect curves in the original picture.

...read moreread less

Journal Article•DOI•

A spatial clustering procedure for multi-image data

[...]

Robert M. Haralick, Its'hak Dinstein

01 May 1975-IEEE Transactions on Circuits and Systems

TL;DR: In this paper, a spatial clustering procedure applicable to multispectral image data is discussed, which takes into account the spatial distribution of the measurements as well as their distribution in measurement space.

...read moreread less

Abstract: A spatial clustering procedure applicable to multispectral image data is discussed. The procedure takes into account the spatial distribution of the measurements as well as their distribution in measurement space. The procedure calls for the generation and then thresholding of the gradient image, cleaning the thresholded image, labeling the connected regions in the cleaned image, and clustering the labeled regions. An experiment was carried out on ERTS data in order to study the effect of the selection of the gradient image, the threshold, and the cleaning process. Three gradients, three gradient thresholds, and two cleaning parameters yielded 18 gradient-thresholds combinations. The combination that yielded connected homogeneous regions with the smallest variance was Robert's gradient with distance 2, thresholded by its running mean, and a cleaning process that considered a resolution cell to be homogeneous if and only if at least 7 of its nearest neighbors were homogeneous.

...read moreread less

Computer-Aided Analysis of Landsat-1 MSS Data: A Comparison of Three Approaches, Including a "Modified Clustering" Approach-1 MSS Data: A Comparison of Three Approaches, Including a "Modified Clustering" Approach

[...]

M. D. Fleming, J. S. Berkebile, R. M. Hoffer

01 Jan 1975

TL;DR: Three approaches for analyzing Landsat-1 data from Ludwig Mountain in the San Juan Mountain range in Colorado are considered.

...read moreread less

Journal Article•DOI•

Vegetation clustering by means of isodata: Revision by multiple discriminant analysis

[...]

Roger del Moral¹•Institutions (1)

University of Washington¹

01 Jan 1975-Plant Ecology

TL;DR: An iterative, non-hierarchical, divisive, polythetic method of clustering phytosociological data is described and an efficient means of revising the results is presented.

...read moreread less

Abstract: An iterative, nonhierarchical, divisive, polythetic method of clustering phytosociological data is described and an efficient means of revising the results is presented. The procedures are applied to two sets of forest vegetation in western Washington (U.S.A.) and found to produce ecologically interpretable results. The clustering technique, ISODATA, produces clusters in n-species space on the basis of distance between cluster means. The iterations are truncated much sooner than is recommended by its originators, but the provisional clusters are analyzed by stepwise multiple discriminant analysis. This produces an effective and efficient analysis of the individual contribution of species to the classification and arranges the samples in canonical space, the axes of which are mutually orthogonal. This last procedure results in the clusters displayed in a reduced dimensional space and establishes the relationship between stands. The combination of procedures appears superior to several with which it was compared.

...read moreread less

Computer-aided analysis of Landsat-1 MSS data - A comparison of three approaches, including a 'modified clustering' approach

[...]

M. D. Fleming¹, J. S. Berkebile, R. M. Hoffer¹•Institutions (1)

Purdue University¹

01 Jan 1975

TL;DR: In this article, three approaches for analyzing Landsat-1 data from Ludwig Mountain in the San Juan Mountain range in Colorado are considered: supervised, non-supervised and modified supervised.

...read moreread less

Abstract: Three approaches for analyzing Landsat-1 data from Ludwig Mountain in the San Juan Mountain range in Colorado are considered. In the 'supervised' approach the analyst selects areas of known spectral cover types and specifies these to the computer as training fields. Statistics are obtained for each cover type category and the data are classified. Such classifications are called 'supervised' because the analyst has defined specific areas of known cover types. The second approach uses a clustering algorithm which divides the entire training area into a number of spectrally distinct classes. Because the analyst need not define particular portions of the data for use but has only to specify the number of spectral classes into which the data is to be divided, this classification is called 'nonsupervised'. A hybrid method which selects training areas of known cover type but then uses the clustering algorithm to refine the data into a number of unimodal spectral classes is called the 'modified-supervised' approach.

...read moreread less

Journal Article•DOI•

Nonmetric Grouping: Clusters and Cliques.

[...]

Edmund R. Peay¹•Institutions (1)

Flinders University¹

01 Sep 1975-Psychometrika

TL;DR: In this article, a class of related nonmetric (monotone invariant) hierarchical grouping methods is presented, defined in terms of generalized cliques, based on a systematically varying specification of the degree of indirectness of permitted relationships (i.e., degree of "chaining").

...read moreread less

Abstract: A class of related nonmetric (“monotone invariant”) hierarchical grouping methods is presented. The methods are defined in terms of generalized cliques, based on a systematically varying specification of the degree of indirectness of permitted relationships (i.e., degree of “chaining”). This approach to grouping is shown to provide a useful framework for grouping methods based on ana priori specification of the properties of the desired subsets, and includes a natural generalization for “complete linkage” and “single linkage” clustering, such as the methods of Johnson [1967]. The central feature of the class of methods is a simple iterative matrix operation on the original disparities (“inverse-proximities” or “dissimilarities”) matrix, and one of the methods also constitutes a very efficient single linkage clustering procedure.

...read moreread less

Machine processing of remotely sensed data

[...]

C.D. McGillem, D.B. Morrison

01 Jan 1975

TL;DR: The Symposium topics were analysis algorithms, clustering feature selection, analysis techniques for forest and agricultural applications, water resources, image processing, computer systems, monitoring and evaluation of natural resources, and land use and geologic applications.

...read moreread less

Abstract: The purpose of the Symposium was an in-depth presentation of new results in the theory, technology, and application of computer processing of remotely sensed data. In addition to the regular papers published in full, there are also included titles and abstracts of the short papers presented. The Symposium topics were: analysis algorithms, clustering feature selection, analysis techniques for forest and agricultural applications, water resources, image processing, computer systems, monitoring and evaluation of natural resources, and land use and geologic applications. (JSR)

...read moreread less

An Algorithm for Clustering Relational Data with Applications to Social Network Analysis and Com-par

[...]

Ronald L. Breiger, Scott A. Boorman, Phipps Arabie

01 Jan 1975

Journal Article•DOI•

Systematics of the (d, 6 Li) Reaction and α Clustering in Heavy Nuclei

[...]

Fredrick D. Becchetti, L.T. Chua, J. Jänecke, A. M. Vandermolen

27 Jan 1975-Physical Review Letters

Journal Article•DOI•

A simple clustering procedure for preliminary classification of very large sets of phytosociological releves

[...]

J. G. M. Janssen¹•Institutions (1)

Radboud University Nijmegen¹

01 Mar 1975-Plant Ecology

TL;DR: In this paper, a simple clustering method for preliminary classification of very large sets of phytosociological releves is described and applied to a collection of European salt marsh releves.

...read moreread less

Abstract: A simple clustering method for preliminary classification of very large sets of phytosociological releves is described and applied to a collection of European salt marsh releves. The essence of the method is that each releve is considered separately and only in relation to the releves considered before. Clusters are formed through either assigning a new releve to an already existing cluster or to designate it as a separate cluster. The decision depends on whether the highest releve-cluster similarity exceeds a threshold value or not. The ‘similarity ratio’ is used as a similarity measure. The maximum number of clusters that can be formed is 50. An agglomerative cluster analysis of the clusters is added to the program. A table is printed with the distribution of all species over the clusters (with their score sums per cluster). With a punched output the procedure can be repeated for a part of the original dataset, with a higher threshold value. The resulting classification can be used as a starting point for more detailed analyses such as agglomerative clusting with relocation and principal component analysis.

...read moreread less

Journal Article•DOI•

Note on a clustering problem

[...]

B. Saperstein

01 Sep 1975-Journal of Applied Probability

Journal Article•DOI•

A kinetic model for clustering of water on hydrated protons in a supersonic free jet expansion

[...]

J. Q. Searcy

15 Nov 1975-Journal of Chemical Physics

TL;DR: In this paper, a simple model with three variable parameters was developed for calculating observed clustering data from a free jet expansion, and the reactive cross sections that gave the best fit of experimental data were given by (π/4) σ2R1 = 9.90×10−16 cm2 and (π /4) ε2R3=5.22 ×10−17 cm2.

...read moreread less

Abstract: A simple model with three variable parameters is developed for calculating observed clustering data from a free jet expansion. One parameter, σR1, can be associated with a reactive collision diameter for a monomer associating with a cluster to form a larger, excited cluster. A second parameter, σR3, can be associated with a reactive collision diameter for energy transfer from the excited cluster to the surrounding gas. The third parameter, K′, is a constant in a unimolecular decay rate expression for the excited cluster. The reactive cross sections that give the best fit of experimental data are given by (π/4) σ2R1 =9.90×10−16 cm2 and (π/4) σ2R3=5.22 ×10−17 cm2. No value could be assigned to K′. Apparently the excited clusters are stabilized by energy exchange collisions in this particular flow field before there is appreciable unimolecular decay. Good fits of clustering data are obtained even though these same values are used for clusters varying in size from 6 to 26 water molecules. Since the clustering...

...read moreread less

A clustering approach to the generation of subfiles for the design of a computer data base.

[...]

Jeffrey Alan Hoffer

01 Jan 1975

Journal Article•DOI•

Implementation and Applications of Bivariate Gaussian Mixture Decomposition

[...]

Michael E. Tarter, Abraham Silvers

01 Mar 1975-Journal of the American Statistical Association

TL;DR: An interactive method for decomposing mixtures consisting of an arbitrary number of bivariate Gaussian components is described, which can handle problems currently attacked by cluster analysis methods.

...read moreread less

Abstract: An interactive method for decomposing mixtures consisting of an arbitrary number of bivariate Gaussian components is described, which can handle problems currently attacked by cluster analysis methods. In contradistinction to most clustering methods, this procedure does not require selection of a metric or distance function with sample element arguments. Instead, estimates of population bivariate contours are examined graphically to yield estimates of subpopulation parameters. This approach is based on properties of the underlying population rather than on heuristic measures of distance between elements of a sample. Besides discussing the theory underlying this new class of procedures, several examples involving real and simulated data are presented.

...read moreread less