Showing papers on "Cluster analysis published in 1979"

PDF

Open Access

Journal Article•DOI•

A K-Means Clustering Algorithm

[...]

J. A. Hartigan¹, M. A. Wong¹•Institutions (1)

Yale University¹

01 Mar 1979-Journal of The Royal Statistical Society Series C-applied Statistics

10,702 citations

Journal Article•DOI•

A Cluster Separation Measure

[...]

David L. Davies¹, Donald W. Bouldin¹•Institutions (1)

University of Tennessee¹

01 Feb 1979-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.

...read moreread less

Abstract: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.

...read moreread less

6,757 citations

Algorithm AS136: A k-means clustering algorithm.

[...]

JA Hartingan, Ma Wong

01 Jan 1979

1,185 citations

Journal Article•DOI•

Image segmentation by clustering

[...]

G.B. Coleman, H.C. Andrews¹•Institutions (1)

University of Southern California¹

01 May 1979

TL;DR: The technique does not require training prototypes but operates in an "unsupervised" mode and is based on a mathematical-pattern recognition model, which achieves a maximum value that is postulated to represent an intrinsic number of clusters in the data.

...read moreread less

Abstract: This paper describes a procedure for segmenting imagery using digital methods and is based on a mathematical-pattern recognition model. The technique does not require training prototypes but operates in an "unsupervised" mode. The features most useful for the given image to be segmented are retained by the algorithm without human interaction, by rejecting those attributes which do not contribute to homogeneous clustering in N-dimensional vector space. The basic procedure is a K-means clustering algorithm which converges to a local minimum in the average squared intercluster distance for a specified number of clusters. The algorithm iterates on the number of clusters, evaluating the clustering based on a parameter of clustering quality. The parameter proposed is a product of between and within cluster scatter measures, which achieves a maximum value that is postulated to represent an intrinsic number of clusters in the data. At this value, feature rejection is implemented via a Bhattacharyya measure to make the image segments more homogeneous (thereby removing "noisy" features); and reclustering is performed. The resulting parameter of clustering fidelity is maximized with segmented imagery resulting in psychovisually pleasing and culturally logical image segments.

...read moreread less

595 citations

Journal Article•DOI•

Velocity determination in scenes containing several moving objects

[...]

Claude L. Fennema, William B. Thompson¹•Institutions (1)

University of Minnesota¹

01 Apr 1979-Computer Graphics and Image Processing

TL;DR: A relationship between the time variation of intensity, the spatial gradient, and velocity has been developed which allows the determination of motion using clustering techniques, and the clustering technique is described.

...read moreread less

593 citations

Journal Article•DOI•

Additive clustering: Representation of similarities as combinations of discrete overlapping properties.

[...]

Roger N. Shepard¹, Phipps Arabie•Institutions (1)

Stanford University¹

01 Mar 1979-Psychological Review

479 citations

Journal Article•DOI•

Validity studies in clustering methodologies

[...]

Richard C. Dubes¹, Anil K. Jain¹•Institutions (1)

Michigan State University¹

01 Jan 1979-Pattern Recognition

TL;DR: This paper provides a semi-tutorial review of the state-of-the-art in cluster validity, or the verification of results from clustering algorithms, and covers ways of measuring clustering tendency, the fit of hierarchical and partitional structures and indices of compactness and isolation for individual clusters.

...read moreread less

298 citations

Journal Article•DOI•

The hierarchy of correlation functions and its relation to other measures of galaxy clustering

[...]

Simon D. M. White¹•Institutions (1)

University of California, Berkeley¹

01 Feb 1979-Monthly Notices of the Royal Astronomical Society

281 citations

Journal Article•DOI•

Classification and Clustering.

[...]

Robert F. Ling, J. Van Ryzin

01 Sep 1979-Journal of the American Statistical Association

253 citations

Journal Article•DOI•

Speaker-independent recognition of isolated words using clustering techniques

[...]

Lawrence R. Rabiner¹, Stephen E. Levinson², Aaron E. Rosenberg², Jay G. Wilpon²•Institutions (2)

Bell Labs¹, Alcatel-Lucent²

01 Aug 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary, and shows error rates that are comparable to, or better than, those obtained with speaker-trained isolatedword recognition systems.

...read moreread less

Abstract: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.

...read moreread less

245 citations

Journal Article•DOI•

Cluster Analysis: An Application of Lagrangian Relaxation

[...]

John M. Mulvey¹, Harlan Crowder²•Institutions (2)

Princeton University¹, IBM²

01 Apr 1979-Management Science

TL;DR: It is shown that the optimization algorithm is an effective solution technique for the homogeneous clustering problem, and also a good method for providing tight lower bounds for evaluating the quality of solutions generated by other procedures.

...read moreread less

Abstract: This paper presents and tests an effective optimization algorithm for clustering homogeneous data. The algorithm iteratively employs a subgradient method for determining lower bounds and a simple search procedure for determining upper bounds. The overall objective is to assign n objects to m mutually exclusive “clusters” such that the sum of the distances from each object to a designated cluster median is minimum. The model represents a special case of the uncapacitated facility location and m-median problems. This technique has proven efficient for examples with n ≤ 200 i.e., the number of 0-1 variables ≤ 40,000; computational experiences with 10 real-world clustering applications are provided. A comparison with a hierarchical agglomerative heuristic, the minimum squared error method, is included. It is shown that the optimization algorithm is an effective solution technique for the homogeneous clustering problem, and also a good method for providing tight lower bounds for evaluating the quality of solutions generated by other procedures.

...read moreread less

Journal Article•DOI•

Decomposition of Two-Dimensional Shapes by Graph-Theoretic Clustering

[...]

Linda G. Shapiro¹, Robert M. Haralick¹•Institutions (1)

Virginia Tech¹

01 Jan 1979-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Using this procedure on handdrawn colon shapes copied from an X-ray and on handprinted characters, the parts determined by the clustering often correspond well to decompositions that a human might make.

...read moreread less

Abstract: This paper describes a technique for transforming a twodimensional shape into a binary relation whose clusters represent the intuitively pleasing simple parts of the shape. The binary relation can be defined on the set of boundary points of the shape or on the set of line segments of a piecewise linear approximation to the boundary. The relation includes all pairs of vertices (or segments) such that the line segment joining the pair lies entirely interior to the boundary of the shape. The graph-theoretic clustering method first determines dense regions, which are local regions of high compactness, and then forms clusters by merging together those dense regions having high enough overlap. Using this procedure on handdrawn colon shapes copied from an X-ray and on handprinted characters, the parts determined by the clustering often correspond well to decompositions that a human might make.

...read moreread less

Journal Article•DOI•

A Tree-to-Tree Distance and Its Application to Cluster Analysis

[...]

Shin-Yee Lu¹•Institutions (1)

Syracuse University¹

01 Apr 1979-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An algorithm that generates the distance for any two trees is presented and cluster analysis for patterns represented by tree structures is discussed, using a tree-to-tree distance to measure the similarity between patterns.

...read moreread less

Abstract: A distance measure between two trees is proposed. Using the idea of language transformation, a tree can be derived from another by a series of transformations. The distance between the two trees is the minimum-cost sequence of transformations. Based on this definition, an algorithm that generates the distance for any two trees is presented. Cluster analysis for patterns represented by tree structures is discussed. Using a tree-to-tree distance, the similarity between patterns is measured in terms of distance between their tree representations. An illustrative example on clustering of character patterns is presented.

...read moreread less

Journal Article•DOI•

Comparison shopping and the clustering of homogeneous firms

[...]

B. Curtis Eaton¹, Richard G. Lipsey¹•Institutions (1)

University of British Columbia¹

01 Nov 1979-Journal of Regional Science

Proceedings Article•DOI•

Speaker independent recognition of isolated words using clustering techniques

[...]

Lawrence R. Rabiner¹, Stephen E. Levinson², Aaron E. Rosenberg², Jay G. Wilpon²•Institutions (2)

Bell Labs¹, Alcatel-Lucent²

01 Apr 1979

TL;DR: In this paper, a speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary, which are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers).

...read moreread less

Abstract: A speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers). The recognition system, which uses telephone recordings, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule to lower the probability of error. Results are presented on two test sets of data which show error rates that are comparable to, or better than, those obtained with speaker trained, isolated word recognition systems.

...read moreread less

Report•DOI•

Thresholding Using the Isodata Clustering Algorithm

[...]

Flavio R. Dias Velasco

01 Mar 1979-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: It is proved that in one dimension, ISODATA always converges, and this algorithm is applied to requantize images into specified numbers of gray levels.

...read moreread less

Abstract: : A recently proposed iterative thresholding scheme turns out to be essentially the well-known ISODATA clustering algorithm, applied to a one- dimensional feature space (the sole feature of a pixel is its gray level). We prove that in one dimension, ISODATA always converges. We also apply it to requantize images into specified numbers of gray levels.

...read moreread less

Journal Article•DOI•

Interactive clustering techniques for selecting speaker-independent reference templates for isolated word recognition

[...]

Stephen E. Levinson¹, Lawrence R. Rabiner², A. Rosenberg², Jay G. Wilpon²•Institutions (2)

Bell Labs¹, Alcatel-Lucent²

01 Apr 1979-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition by identifying coarse structure, fine structure, overlap of, and outliers from clusters.

...read moreread less

Abstract: It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition. We describe a set of clustering techniques specifically designed for this purpose. These interactive procedures identify coarse structure, fine structure, overlap of, and outliers from clusters. The techniques have been applied to a large speech data base consisting of four repetitions of a 39 word vocabulary (the letters of the alphabet, the digits, and three auxiliary commands) spoken by 50 male and 50 female speakers. The results of the cluster analysis show that the data are highly structured containing large prominent clusters. Some statistics of the analysis and their significance are presented.

...read moreread less

Journal Article•DOI•

A clustering strategy based on a formalism of the reproductive process in natural systems

[...]

Vijay V. Raghavan¹, Kim Birchard¹•Institutions (1)

University of Regina¹

01 Sep 1979

TL;DR: Experimental results show that it is possible to devise clustering strategies based on the principles of adaptation in natural systems that are both effective and efficient.

...read moreread less

Abstract: Given a set of objects each of which is represented by a finite number of attributes or features and a clustering criterion that associates a value of utility to any classification, the objective of a clustering method is to identify that classification of the objects which optimizes the criterion. A new strategy to solve this problem is developed. The approach is, in essence, a modification of the reproductive plan, a type of adaptive procedure devised by Holland [2], which embodies many principles found in the adaptation of natural systems through evolution. The proposed approach differs from conventional methods in the sense that the search through the space of possible solutions proceeds in a parallel fashion.The adaptive clustering strategy requires the specification of methods for the generation of an initial population of classifications, the parent selection, the modifications and the replacement of current classifications with new ones. The effects of changing several of these features are investigated. Experimental results show that it is possible to devise clustering strategies based on the principles of adaptation in natural systems that are both effective and efficient.

...read moreread less

Journal Article•DOI•

A citation study of computer science literature

[...]

Gerard Salton¹, Donna Bergmark¹•Institutions (1)

Cornell University¹

01 Jan 1979-IEEE Transactions on Professional Communication

TL;DR: A clustering study of computer science literature is described, using bibliographic citations as a clustering criterion, and conclusions are drawn regarding the scope ofComputer science and the characteristics of individual documents in the area.

...read moreread less

Abstract: The bibliographic reference and citations which exist among documents in a given document collection can be used to study the history and scope of particular subject areas and to assess the importance of individual authors, documents, and journals. A clustering study of computer science literature is described, using bibliographic citations as a clustering criterion, and conclusions are drawn regarding the scope of computer science and the characteristics of individual documents in the area. In particular, the clustering characteristics lead to a distinction between core and fringe areas in the field and to the identification of particularly influential articles.

...read moreread less

Journal Article•DOI•

Ultrametric hierarchical clustering algorithms

[...]

Glenn W. Milligan¹•Institutions (1)

Ohio State University¹

01 Sep 1979-Psychometrika

TL;DR: Johnson has shown that the single linkage and the complete linkage hierarchical clustering algorithms induce a metric on the data known as the ultrametric through the use of the Lance and Williams recurrence formula.

...read moreread less

Abstract: Johnson has shown that the single linkage and the complete linkage hierarchical clustering algorithms induce a metric on the data known as the ultrametric. Through the use of the Lance and Williams recurrence formula, Johnson's proof is extended to four other common clustering algorithms. It is also noted that two additional methods produce hierarchical structures which can violate the ultrametric inequality.

...read moreread less

Journal Article•DOI•

Some experiments in image segmentation by clustering of local feature values

[...]

Bruce J. Schachter¹, Larry S. Davis¹, Azriel Rosenfeld¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 1979-Pattern Recognition

TL;DR: Some attempts to segment textured black and white images by detecting clusters of local feature values and partitioning the feature space so as to separate these clusters.

...read moreread less

Journal Article•DOI•

On the clustering of multidimensional pictorial data

[...]

Jack Bryant¹•Institutions (1)

Texas A&M University¹

01 Jan 1979-Pattern Recognition

TL;DR: A new approach to problems of clustering and classification of multidimensional pictorial data is presented and the development of a clustering technique and program is described.

...read moreread less

Journal Article•DOI•

Considerations in applying clustering techniques to speaker-independent word recognition.

[...]

Lawrence R. Rabiner¹, Jay G. Wilpon•Institutions (1)

Bell Labs¹

01 Sep 1979-Journal of the Acoustical Society of America

TL;DR: The next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates.

...read moreread less

Abstract: Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker‐independent word templates for an isolated word recognition system [Levinson et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐27 (2), 134–141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [IEEE Trans. Acoust. Speech Signal Process. ASSP‐26 (3), 34–42 (1978)] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e., the cluster center) and the elements in the cluster. Experimental data show the first method to be superior to the second method when three or more clusters per word are used in the recognition task.

...read moreread less

Journal Article•DOI•

Hierarchical agglomerative clustering procedure

[...]

Alena Lukasová

01 Jan 1979-Pattern Recognition

TL;DR: The formal definitions of both the definite and the definite hierarchical clustering procedure with the dissimilarity coefficient D are given, by means of which the properties of these procedures can be investigated.

...read moreread less

Journal Article•DOI•

Nucleation and Molecular clustering about ions

[...]

A. W. Castleman¹•Institutions (1)

University of Colorado Boulder¹

01 Jan 1979-Advances in Colloid and Interface Science

Journal Article•DOI•

Automatic EEG analysis: A segmentation procedure based on the autocorrelation function

[...]

D Michael, J Houchin

01 Feb 1979-Electroencephalography and Clinical Neurophysiology

TL;DR: A new automatic procedure for EEG segmentation based on the autocorrelation function, which is simple to implement and gives good segmentation and clustering results.

...read moreread less

Journal Article•DOI•

PFS Clustering Method

[...]

Mark A. Vogel¹, Andrew K. C. Wong²•Institutions (2)

TASC, Inc¹, University of Waterloo²

01 Mar 1979-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a method of cluster analysis based on a pseudo F-statistic (PFS) criterion function, designed to subdivide an ensemble into an optimal set of groups, where the number of groups is not specified and no ad hoc parameters are employed.

...read moreread less

Abstract: This paper presents a method of cluster analysis based on a pseudo F-statistic (PFS) criterion function. It is designed to subdivide an ensemble into an optimal set of groups, where the number of groups is not specified and no ad hoc parameters are employed. Univariate and multivariate F-statistic and pseudo F-statistic consistency is displayed. Algorithms for feasible application of PFS are given. Results from simulations are utilized to demonstrate the capabilities of the PFS clustering method and to provide a comparative guide for other users.

...read moreread less

Journal Article•DOI•

The estimation of mutation rates when premeiotic events are involved.

[...]

William R. Engels¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 1979-Environmental Mutagenesis

TL;DR: Unbiased methods for measuring mutation rate and determining the precision of these measurements are given to replace a biased method now frequently used.

...read moreread less

Abstract: When mutation or recombination events occur premeiotically, the distribution of exceptional individuals among the offspring will be "clustered" as opposed to binomial. Even though the exact nature of the clustering is usually unknown, unbiased methods for measuring mutation rate and determining the precision of these measurements are given to replace a biased method now frequently used. When clustering is pronounced, the unweighted average mutation rate is found to be a more efficient estimator than the usual average weighted by family size. Methods of statistical inference and optimal experimental design in the absence of specific knowledge of the mechanism of clustering are also discussed.

...read moreread less

Journal Article•DOI•

Stochastic tree grammar inference for texture synthesis and discrimination

[...]

S.Y. Lu¹, King-Sun Fu²•Institutions (2)

Syracuse University¹, Purdue University²

01 Mar 1979-Computer Graphics and Image Processing

TL;DR: A texture grammar inference procedure which employs a clustering algorithm and a stochastic regular grammar inference Procedure and is introduced.

...read moreread less

Journal Article•DOI•

Dynoc—A dynamic optimal cluster-seeking technique

[...]

Julius T. Tou¹•Institutions (1)

University of Florida¹

01 Dec 1979-International Journal of Parallel Programming

TL;DR: A new technique for automatic clustering of multivariate data is proposed, in which a performance index is introduced in terms of the ratio of the minimum interset distance to maximum intraset distance.

...read moreread less

Abstract: A new technique for automatic clustering of multivariate data is proposed In this approach a performance index for determining optimal clusters is introduced This performance index is expressed in terms of the ratio of the minimum interset distance to maximum intraset distance The optimal clusters are found when the performance index reaches a global maximum If there are alternative groupings with equal number of clusters, the one with the largest performance index is chosen

...read moreread less