scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1979"



Journal ArticleDOI
TL;DR: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster which can be used to infer the appropriateness of data partitions.
Abstract: A measure is presented which indicates the similarity of clusters which are assumed to have a data density which is a decreasing function of distance from a vector characteristic of the cluster. The measure can be used to infer the appropriateness of data partitions and can therefore be used to compare relative appropriateness of various divisions of the data. The measure does not depend on either the number of clusters analyzed nor the method of partitioning of the data and can be used to guide a cluster seeking algorithm.

6,757 citations



Journal ArticleDOI
01 May 1979
TL;DR: The technique does not require training prototypes but operates in an "unsupervised" mode and is based on a mathematical-pattern recognition model, which achieves a maximum value that is postulated to represent an intrinsic number of clusters in the data.
Abstract: This paper describes a procedure for segmenting imagery using digital methods and is based on a mathematical-pattern recognition model. The technique does not require training prototypes but operates in an "unsupervised" mode. The features most useful for the given image to be segmented are retained by the algorithm without human interaction, by rejecting those attributes which do not contribute to homogeneous clustering in N-dimensional vector space. The basic procedure is a K-means clustering algorithm which converges to a local minimum in the average squared intercluster distance for a specified number of clusters. The algorithm iterates on the number of clusters, evaluating the clustering based on a parameter of clustering quality. The parameter proposed is a product of between and within cluster scatter measures, which achieves a maximum value that is postulated to represent an intrinsic number of clusters in the data. At this value, feature rejection is implemented via a Bhattacharyya measure to make the image segments more homogeneous (thereby removing "noisy" features); and reclustering is performed. The resulting parameter of clustering fidelity is maximized with segmented imagery resulting in psychovisually pleasing and culturally logical image segments.

595 citations


Journal ArticleDOI
TL;DR: A relationship between the time variation of intensity, the spatial gradient, and velocity has been developed which allows the determination of motion using clustering techniques, and the clustering technique is described.

593 citations



Journal ArticleDOI
TL;DR: This paper provides a semi-tutorial review of the state-of-the-art in cluster validity, or the verification of results from clustering algorithms, and covers ways of measuring clustering tendency, the fit of hierarchical and partitional structures and indices of compactness and isolation for individual clusters.

298 citations




Journal ArticleDOI
TL;DR: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary, and shows error rates that are comparable to, or better than, those obtained with speaker-trained isolatedword recognition systems.
Abstract: A speaker-independent isolated word recognition system is described which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large database consisting of 100 replications of each word (i.e., once by each of 100 talkers). The recognition system, which accepts telephone quality speech input, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule. Results for several test sets of data are presented. They show error rates that are comparable to, or better than, those obtained with speaker-trained isolated word recognition systems.

245 citations


Journal ArticleDOI
TL;DR: It is shown that the optimization algorithm is an effective solution technique for the homogeneous clustering problem, and also a good method for providing tight lower bounds for evaluating the quality of solutions generated by other procedures.
Abstract: This paper presents and tests an effective optimization algorithm for clustering homogeneous data. The algorithm iteratively employs a subgradient method for determining lower bounds and a simple search procedure for determining upper bounds. The overall objective is to assign n objects to m mutually exclusive “clusters” such that the sum of the distances from each object to a designated cluster median is minimum. The model represents a special case of the uncapacitated facility location and m-median problems. This technique has proven efficient for examples with n ≤ 200 i.e., the number of 0-1 variables ≤ 40,000; computational experiences with 10 real-world clustering applications are provided. A comparison with a hierarchical agglomerative heuristic, the minimum squared error method, is included. It is shown that the optimization algorithm is an effective solution technique for the homogeneous clustering problem, and also a good method for providing tight lower bounds for evaluating the quality of solutions generated by other procedures.

Journal ArticleDOI
TL;DR: Using this procedure on handdrawn colon shapes copied from an X-ray and on handprinted characters, the parts determined by the clustering often correspond well to decompositions that a human might make.
Abstract: This paper describes a technique for transforming a twodimensional shape into a binary relation whose clusters represent the intuitively pleasing simple parts of the shape. The binary relation can be defined on the set of boundary points of the shape or on the set of line segments of a piecewise linear approximation to the boundary. The relation includes all pairs of vertices (or segments) such that the line segment joining the pair lies entirely interior to the boundary of the shape. The graph-theoretic clustering method first determines dense regions, which are local regions of high compactness, and then forms clusters by merging together those dense regions having high enough overlap. Using this procedure on handdrawn colon shapes copied from an X-ray and on handprinted characters, the parts determined by the clustering often correspond well to decompositions that a human might make.

Journal ArticleDOI
Shin-Yee Lu1
TL;DR: An algorithm that generates the distance for any two trees is presented and cluster analysis for patterns represented by tree structures is discussed, using a tree-to-tree distance to measure the similarity between patterns.
Abstract: A distance measure between two trees is proposed. Using the idea of language transformation, a tree can be derived from another by a series of transformations. The distance between the two trees is the minimum-cost sequence of transformations. Based on this definition, an algorithm that generates the distance for any two trees is presented. Cluster analysis for patterns represented by tree structures is discussed. Using a tree-to-tree distance, the similarity between patterns is measured in terms of distance between their tree representations. An illustrative example on clustering of character patterns is presented.


Proceedings ArticleDOI
01 Apr 1979
TL;DR: In this paper, a speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary, which are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers).
Abstract: A speaker independent, isolated word recognition system is proposed which is based on the use of multiple templates for each word in the vocabulary. The word templates are obtained from a statistical clustering analysis of a large data base consisting of 100 replications of each word (i.e. once by each of 100 talkers). The recognition system, which uses telephone recordings, is based on an LPC analysis of the unknown word, dynamic time warping of each reference template to the unknown word (using the Itakura LPC distance measure), and the application of a K-nearest neighbor (KNN) decision rule to lower the probability of error. Results are presented on two test sets of data which show error rates that are comparable to, or better than, those obtained with speaker trained, isolated word recognition systems.

ReportDOI
TL;DR: It is proved that in one dimension, ISODATA always converges, and this algorithm is applied to requantize images into specified numbers of gray levels.
Abstract: : A recently proposed iterative thresholding scheme turns out to be essentially the well-known ISODATA clustering algorithm, applied to a one- dimensional feature space (the sole feature of a pixel is its gray level). We prove that in one dimension, ISODATA always converges. We also apply it to requantize images into specified numbers of gray levels.

Journal ArticleDOI
TL;DR: It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition by identifying coarse structure, fine structure, overlap of, and outliers from clusters.
Abstract: It is demonstrated that clustering can be a powerful tool for selecting reference templates for speaker-independent word recognition. We describe a set of clustering techniques specifically designed for this purpose. These interactive procedures identify coarse structure, fine structure, overlap of, and outliers from clusters. The techniques have been applied to a large speech data base consisting of four repetitions of a 39 word vocabulary (the letters of the alphabet, the digits, and three auxiliary commands) spoken by 50 male and 50 female speakers. The results of the cluster analysis show that the data are highly structured containing large prominent clusters. Some statistics of the analysis and their significance are presented.

Journal ArticleDOI
01 Sep 1979
TL;DR: Experimental results show that it is possible to devise clustering strategies based on the principles of adaptation in natural systems that are both effective and efficient.
Abstract: Given a set of objects each of which is represented by a finite number of attributes or features and a clustering criterion that associates a value of utility to any classification, the objective of a clustering method is to identify that classification of the objects which optimizes the criterion. A new strategy to solve this problem is developed. The approach is, in essence, a modification of the reproductive plan, a type of adaptive procedure devised by Holland [2], which embodies many principles found in the adaptation of natural systems through evolution. The proposed approach differs from conventional methods in the sense that the search through the space of possible solutions proceeds in a parallel fashion.The adaptive clustering strategy requires the specification of methods for the generation of an initial population of classifications, the parent selection, the modifications and the replacement of current classifications with new ones. The effects of changing several of these features are investigated. Experimental results show that it is possible to devise clustering strategies based on the principles of adaptation in natural systems that are both effective and efficient.

Journal ArticleDOI
TL;DR: A clustering study of computer science literature is described, using bibliographic citations as a clustering criterion, and conclusions are drawn regarding the scope ofComputer science and the characteristics of individual documents in the area.
Abstract: The bibliographic reference and citations which exist among documents in a given document collection can be used to study the history and scope of particular subject areas and to assess the importance of individual authors, documents, and journals. A clustering study of computer science literature is described, using bibliographic citations as a clustering criterion, and conclusions are drawn regarding the scope of computer science and the characteristics of individual documents in the area. In particular, the clustering characteristics lead to a distinction between core and fringe areas in the field and to the identification of particularly influential articles.

Journal ArticleDOI
TL;DR: Johnson has shown that the single linkage and the complete linkage hierarchical clustering algorithms induce a metric on the data known as the ultrametric through the use of the Lance and Williams recurrence formula.
Abstract: Johnson has shown that the single linkage and the complete linkage hierarchical clustering algorithms induce a metric on the data known as the ultrametric. Through the use of the Lance and Williams recurrence formula, Johnson's proof is extended to four other common clustering algorithms. It is also noted that two additional methods produce hierarchical structures which can violate the ultrametric inequality.

Journal ArticleDOI
TL;DR: Some attempts to segment textured black and white images by detecting clusters of local feature values and partitioning the feature space so as to separate these clusters.

Journal ArticleDOI
Jack Bryant1
TL;DR: A new approach to problems of clustering and classification of multidimensional pictorial data is presented and the development of a clustering technique and program is described.

Journal ArticleDOI
TL;DR: The next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates.
Abstract: Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker‐independent word templates for an isolated word recognition system [Levinson et al., IEEE Trans. Acoust. Speech Signal Process. ASSP‐27 (2), 134–141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker‐independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [IEEE Trans. Acoust. Speech Signal Process. ASSP‐26 (3), 34–42 (1978)] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e., the cluster center) and the elements in the cluster. Experimental data show the first method to be superior to the second method when three or more clusters per word are used in the recognition task.

Journal ArticleDOI
TL;DR: The formal definitions of both the definite and the definite hierarchical clustering procedure with the dissimilarity coefficient D are given, by means of which the properties of these procedures can be investigated.


Journal ArticleDOI
TL;DR: A new automatic procedure for EEG segmentation based on the autocorrelation function, which is simple to implement and gives good segmentation and clustering results.

Journal ArticleDOI
TL;DR: This paper presents a method of cluster analysis based on a pseudo F-statistic (PFS) criterion function, designed to subdivide an ensemble into an optimal set of groups, where the number of groups is not specified and no ad hoc parameters are employed.
Abstract: This paper presents a method of cluster analysis based on a pseudo F-statistic (PFS) criterion function. It is designed to subdivide an ensemble into an optimal set of groups, where the number of groups is not specified and no ad hoc parameters are employed. Univariate and multivariate F-statistic and pseudo F-statistic consistency is displayed. Algorithms for feasible application of PFS are given. Results from simulations are utilized to demonstrate the capabilities of the PFS clustering method and to provide a comparative guide for other users.

Journal ArticleDOI
TL;DR: Unbiased methods for measuring mutation rate and determining the precision of these measurements are given to replace a biased method now frequently used.
Abstract: When mutation or recombination events occur premeiotically, the distribution of exceptional individuals among the offspring will be "clustered" as opposed to binomial. Even though the exact nature of the clustering is usually unknown, unbiased methods for measuring mutation rate and determining the precision of these measurements are given to replace a biased method now frequently used. When clustering is pronounced, the unweighted average mutation rate is found to be a more efficient estimator than the usual average weighted by family size. Methods of statistical inference and optimal experimental design in the absence of specific knowledge of the mechanism of clustering are also discussed.

Journal ArticleDOI
TL;DR: A texture grammar inference procedure which employs a clustering algorithm and a stochastic regular grammar inference Procedure and is introduced.

Journal ArticleDOI
TL;DR: A new technique for automatic clustering of multivariate data is proposed, in which a performance index is introduced in terms of the ratio of the minimum interset distance to maximum intraset distance.
Abstract: A new technique for automatic clustering of multivariate data is proposed In this approach a performance index for determining optimal clusters is introduced This performance index is expressed in terms of the ratio of the minimum interset distance to maximum intraset distance The optimal clusters are found when the performance index reaches a global maximum If there are alternative groupings with equal number of clusters, the one with the largest performance index is chosen