scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1982"


Journal ArticleDOI
TL;DR: The requirements and components for a proposed Document Analysis System, which assists a user in encoding printed documents for computer processing, are outlined and several critical functions have been investigated and the technical approaches are discussed.
Abstract: This paper outlines the requirements and components for a proposed Document Analysis System, which assists a user in encoding printed documents for computer processing. Several critical functions have been investigated and the technical approaches are discussed. The first is the segmentation and classification of digitized printed documents into regions of text and images. A nonlinear, run-length smoothing algorithm has been used for this purpose. By using the regular features of text lines, a linear adaptive classification scheme discriminates text regions from others. The second technique studied is an adaptive approach to the recognition of the hundreds of font styles and sizes that can occur on printed documents. A preclassifier is constructed during the input process and used to speed up a well-known pattern-matching method for clustering characters from an arbitrary print source into a small sample of prototypes. Experimental results are included.

718 citations


Journal ArticleDOI
H.M. Chan1, D.A. Milner1
TL;DR: In this paper, a new algorithm was developed for forming component families and machine groups for cellular manufacture by progressively restructuring the machine component matrix, allowing interaction from the user when exceptions and overlap between groups cause the iterative algorithm to prematurely stop.

547 citations


Journal ArticleDOI
TL;DR: A new technique for matching image features to maps or models which forms all possible pairs of image features and model features which match on the basis of local evidence alone and which is robust with respect to changes of image orientation and content.
Abstract: A new technique is presented for matching image features to maps or models. The technique forms all possible pairs of image features and model features which match on the basis of local evidence alone. For each possible pair of matching features the parameters of an RST (rotation, scaling, and translation) transformation are derived. Clustering in the space of all possible RST parameter sets reveals a good global transformation which matches many image features to many model features. Results with a variety of data sets are presented which demonstrate that the technique does not require sophisticated feature detection and is robust with respect to changes of image orientation and content. Examples in both cartography and object detection are given.

304 citations


Journal ArticleDOI
TL;DR: In this paper, a stochastic method for global optimization is described and evaluated, which involves a combination of sampling, clustering and local search, with a range of confidence intervals on the value of the global optimum.
Abstract: A stochastic method for global optimization is described and evaluated. The method involves a combination of sampling, clustering and local search, and terminates with a range of confidence intervals on the value of the global optimum. Computational results on standard test functions are included as well.

263 citations



Journal ArticleDOI
TL;DR: The global optimal solution is shown to be difficult to obtain and an alternative iterative procedure is presented which is easily implemented and converges to a local optimum.

228 citations


Journal ArticleDOI
David Pollard1
TL;DR: Asymptotic results from the statistical theory of k -means clustering are applied to problems of vector quantization and the behavior of quantizers constructed from long training sequences of data is analyzed.
Abstract: Asymptotic results from the statistical theory of k -means clustering are applied to problems of vector quantization. The behavior of quantizers constructed from long training sequences of data is analyzed by relating it to the consistency problem for k -means.

226 citations


Journal ArticleDOI
TL;DR: The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering produced by the algorithm.
Abstract: The uniform data function is a function which assigns to the output of the fuzzy c-means (Fc-M) or fuzzy isodata algorithm a number which measures the quality or validity of the clustering produced by the algorithm. For the preselected number of cluster c, the Fc-M algorithm produces c vectors in the space in which the data lie, called cluster centers, which represent points about which the data are concentrated. It also produces for each data point c-membership values, numbers between zero and one which measure the similarity of the data points to each of the cluster centers. It is these membership values which indicate how the point is classified. They also indicate how well the point has been classified, in that values close to one indicate that the point is close to a particular center, but uniformly low memberships indicate that the point has not been classified clearly. The uniform data functional (UDF) combines the memberships in such a way as to indicate how well the data have been classified and is computed as follows. For each data point compute the ratio of its smallest membership to its largest and then compute the probability that one could obtain a smaller ratio (indicating better classification) from a clustering of a standard data set in which there is no cluster structure. These probabilities are then averaged over the data set to obtain the values of the UDF.

221 citations


Book
01 Jan 1982
TL;DR: Discriminant Analysis for Time Series (R.R.H. Shumway).
Abstract: Discriminant Analysis for Time Series (R.H. Shumway). Optimum Rules for Classification into Two Multivariate Normal Populations with the Same Covariance Matrix (S. D. Gupta). Large Sample Approximations and Asymptotic Expansions of Classification Statistics (M. Siotani). Bayesian Discrimination (S. Geisser). Classification of Growth Curves (J.C. Lee). Nonparametric Classification (J.D. Broffitt). Logistic Discrimination (J.A. Anderson). Nearest Neighbor Methods in Discrimination (L. Devroye, T.J. Wagner). The Classification and Mixture Maximum Likelihood Approaches to Cluster Analysis (G.J. McLachlan). Graphical Techniques for Multivariate Data and for Clustering (J.M. Chambers, B. Kleiner). Cluster Analysis Software (R.K. Blashfield, M.S. Aldenderfer, L.C. Morey). Single-link Clustering Algorithms (F.J. Rohlf). Theory of Multidimensional Scaling (J. de Leeuw, W. Heiser). Multidimensional Scaling and its Applications (M. Wish, J.D. Carroll). Intrinsic Dimensionality Extraction (K. Fukunaga). Structural Methods in Image Analysis and Recognition (L.N. Kanal, B.A. Lambird, D. Lavine). Image Models (N. Ahuja, A. Rosenfeld). Image Texture Survey (R.M. Haralick). Applications of Stochastic Languages (K.S. Fu). A Unifying Viewpoint on Pattern Recognition (J.C. Simon, E. Backer, J. Sallentin). Logical Functions in the Problems of Empirical Prediction (G.S. Lbov). Inference and Data Tables with Missing Values (N.G. Zagoruiko, V.N. Yolkina). Recognition of Electrocardiographic Patterns (J.H. van Bemmel). Waveform Parsing Systems (C.G. Stockman). Continuous Speech Recognition: Statistical Methods (F. Jelinek, R.L. Mercer, L.R. Bahl). Applications of Pattern Recognition in Radar (A. Grometstein, W.H. Schoendorf). White Blood Cell Recognition (E.S. Gelsema, G.H. Landeweerd). Pattern Recognition Techniques for Remote Sensing Applications (P.H. Swain). Optical Character Recognition - Theory and Practice (G. Nagy). Computer and Statistical Considerations for Oil Spill Identification (Y.T. Chien, T.J. Killeen). Pattern Recognition in Chemistry (B.R. Kowalski, S. Wold). Covariance Matrix Representation and Object-Predicate Symmetry (T. Kaminuma, S. Tomita, S. Watanabe). Multivariate Morphometrics (P.A. Reyment). Multivariate Analysis with Latent Variables (P.M. Bentler, D.G. Weeks). Use of Distance Measures, Information Measures and Error Bounds in Feature Evaluation (M. Ben-Bassat). Topics in Measurement Selection (J.M. Van Campenhout). Selection of Variables under Univariate Regression Models (P.R. Krishnaiah). On the Selection of Variables under Regression Models using Krishnaiah's Finite Intersection Tests (J.L. Schmidhammer). Dimensionality and Sample Size Considerations in Pattern Recognition Practice (A.K. Jain, B. Chandrasekaran). Selecting Variables in Discriminant Analysis for Improving upon Classical Procedures (W. Schaafsma). Selection of Variables in Discriminant Analysis (P.R. Krishnaiah). Index.

201 citations



Journal ArticleDOI
TL;DR: In this paper, likelihood methods are described for fitting cyclic Poisson and Hawkes' self-exciting models to Kawasumi's historical earthquake series and to more recent data supplied by the Japan Meteorological Agency.
Abstract: Likelihood methods are described for fitting cyclic Poisson and Hawkes' self-exciting models to Kawasumi's historical earthquake series and to more recent data supplied by the Japan Meteorological Agency. Identification of the model is discussed from the standpoint of an entropy maximization principle. The cyclic effect is shown to be not statistically significant after clustering has been allowed for; its physical significance therefore remains questionable.

Journal ArticleDOI
TL;DR: In this article, the authors suggest that α-clustering may play an important role in the structure of heavy nuclei and propose a phenomenological model for treating it, which is based on α-Clustering.

Journal ArticleDOI
Wayne S. DeSarbo1
TL;DR: A general class of nonhierarchical clustering models and associated algorithms for fitting them are presented and two applications concerning brand-switching data and celebrity-brand proximities are discussed.
Abstract: A general class of nonhierarchical clustering models and associated algorithms for fitting them are presented. These (metric) clustering models generalize the Shepard-Arabie Additive Clusters model in allowing for: (1). either overlapping or nonoverlapping clusters; (2). either symmetric (one-way clustering) or nonsymmetric (two-way clustering) proximities (input data); and, (3). either symmetric or diagonal weights. The GENNCLUS algorithms utilize alternating least-squares methods combining ordinary and constrained least-squares, nonlinear constrained mathematical programming, and combinatorial optimization techniques in estimating model parameters. In addition to developing the mathematical bases of these models, a comprehensive set of Monte Carlo simulations of the different models is reported. Two applications concerning brand-switching data and celebrity-brand proximities are discussed. Finally, extensions to three-way models, nonmetric analyses, and other model specifications are provided.

Journal ArticleDOI
TL;DR: A simple relational distance measure is defined, it is proved it is a metric, and using this measure, two organizational/access methods are described: clustering and binary search trees.
Abstract: Relational models are commonly used in scene analysis systems. Most such systems are experimental and deal with only a small number of models. Unknown objects to be analyzed are usually sequentially compared to each model. In this paper, we present some ideas for organizing a large database of relational models. We define a simple relational distance measure, prove it is a metric, and using this measure, describe two organizational/access methods: clustering and binary search trees. We illustrate these methods with a set of randomly generated graphs.

Journal ArticleDOI
TL;DR: A hybrid method for clustering multivariate observations is proposed, which combines elements of the k-means and the single-linkage clustering techniques, and is shown to be consistent, under certain regularity conditions, in one dimension.
Abstract: A hybrid method for clustering multivariate observations is proposed, which combines elements of the k-means and the single-linkage clustering techniques. One purpose of the proposed method is to discover the high-density clusters given a random sample of size N from some underlying population; a high-density cluster at level c in a population with density f is defined as a maximal connected set of points x with f(x) ≥ c. This clustering procedure is practicable for very large numbers of observations and is shown to be consistent, under certain regularity conditions, in one dimension.

Journal ArticleDOI
TL;DR: A methodology, called grade of membership analysis, which deals simultaneously with the dual problems of case clustering and estimation of discriminant coefficients and permits the representation of patient heterogeneity within diagnostic category.
Abstract: A number of classification techniques have been applied to the analysis of medical diagnostic systems and decision making. Commonly used approaches such as cluster analysis, linear discriminant analysis and Bayesian classification are subject to logical and statistical limitations. In this paper we present a methodology, called »grade of membership« analysis, which resolves many of those limitations. This methodology deals simultaneously with the dual problems of case clustering and estimation of discriminant coefficients. The methodology also permits the assessment of the reliability of externally defined medical diagnoses, multiple diagnoses for individuals, disease progression and severity, and permits the representation of patient heterogeneity within diagnostic category. Maximum likelihood principles are invoked both to obtain parameter estimates and as a basis for likelihood ratio testing of complex hypotheses about the model structure. The model is illustrated by an analysis of data on abdominal symptoms and disease.

Journal ArticleDOI
TL;DR: In this article, the authors deal with clustering problems where grouping is constrained by a symmetric and reflexive relation, and two methods are adapted: the standard hierarchical clustering procedure based on the Lance and Williams formula, and local optimization procedure, CLUDIA.
Abstract: The paper deals with clustering problems where grouping is constrained by a symmetric and reflexive relation. For solving clustering problems with relational constraints two methods are adapted: the “standard” hierarchical clustering procedure based on the Lance and Williams formula, and local optimization procedure, CLUDIA. To illustrate these procedures, clusterings of the European countries are given based on the developmental indicators where the relation is determined by the geographical neighbourhoods of countries.


Journal ArticleDOI
TL;DR: This paper introduces a new procedure, based on probability profiles, for judging the validity of clusters established from rank-order proximity data, and explains the background from graph theory and cluster analysis needed to treat cluster validity.

Journal ArticleDOI
TL;DR: The modification described here relaxes two of the assumptions made in the original McGregor/Shen formulation of the algorithm, leading to solution networks which can be lower in cost than those generated by the algorithm in its original formulation.

Book ChapterDOI
TL;DR: A variety of algorithms to serve as a convenient source of algorithms for the single-link method are presented and even the least efficient algorithm may sometimes be useful for small data sets.
Abstract: Publisher Summary This chapter focuses on the computational algorithms for the single-link clustering method that is one of the oldest methods of cluster analysis. This clustering method is also known by many other names because of the fact that it has been reinvented in different application areas and that there exist many very different computational algorithms corresponding to the single-link clustering model. Often this identity has gone unnoticed, as the new clustering methods are not always compared with the existing ones. Different clustering methods imply different definitions of what constitutes a “cluster” and should, thus, be expected to give different results for many data sets. A variety of algorithms to serve as a convenient source of algorithms for the single-link method are presented in the chapter. While the algorithms differ considerably in terms of their computational efficiency, even the least efficient algorithm may sometimes be useful for small data sets.


Journal ArticleDOI
TL;DR: In this article, an image segmentation algorithm based on histogram clustering and probabilistic relaxation labeling is explored by means of a set of artificially generated test images with known parameters.
Abstract: An image segmentation algorithm based on histogram clustering and probabilistic relaxation labeling is explored. The algorithm is evaluated by means of a set of artificially generated test images with known parameters. Two sources of pixel labeling errors are revealed. The first derives from distribution overlap in the histogram and leads to fragmented or missing regions in a segmentation. The second derives from the gloal nature of the compatibility coefficients used in the relaxation process. The coefficients are shown to be insufficient to correct certain labeling errors and can even cause the destruction of fine image details during the course of the relaxation updating process. A potential solution to these problems is shown to be obtainable by using orientation dependent compatibility coefficients and localizing the scope of the algorithm to small subimages followed by a merging of the segmented subimages.

Journal ArticleDOI
TL;DR: In this article, the modal-coherent approach is extended by developing efficient computational algorithms for evaluating the rms coherency measure for the required random disturbance and for determining the structurally coherent groups using the computed values of the measure.
Abstract: Recent research has shown that the most desirable features of conventional modal and coherent dynamic equivalents can be combined ina single equivalent when an rms coherency measure and a robust, random system disturbance are used to determine structurally coherent groups for coherency-based aggregation. In particular, a modal-coherent equivalent can be derived which preserves not only the coherent groups of the original system model, but also the modes of group to group oscillations. A modal-coherent equivalent represents a valuable tool for transient stability analysis since it is constructed only once for a given utility and can then be used in the transient stability study of any disturbance that might occur in that utility. Previous works have presented theoretical developments which explain the structural coherency mechanism on which the modal-coherent approach to dynamic equivalents is based, and have neglected the computational aspects of constructing modal-coherent equivalents. This paper extends the value of the modal-coherent approach by developing efficient computational algorithms for evaluating the rms coherency measure for the required random disturbance and for determining the structurally coherent groups using the computed values of the measure. These algorithms will allow modal-coherent equivalents to be constructed for large power systems at a reasonable cost.

Journal ArticleDOI
K. R. Ito1
TL;DR: Upper bounds for the two-point correlation functions in statistical models in one or two dimensions which haveSO(N) symmetry are obtained and this clarifies upper bounds for long range interactions for which there exists clustering.
Abstract: We obtain upper bounds for the two-point correlation functions in statistical models in one or two dimensions which haveSO(N) symmetry. This clarifies upper bounds for long range interactions for which there exists clustering.

Journal ArticleDOI
TL;DR: Six different hierarchal clustering algorithms were used to cluster eleven sets of compounds for which associated property data were available and the effectiveness of the clustering in each case was assessed by inspection of the resulting tree diagram representing the classification and by the utility of the classification for molecular property prediction.

Journal ArticleDOI
TL;DR: In this paper, the authors attempted to determine why children under 8 years of age show categorical clustering in free recall above chance expectations, but such organization does not correlate with recall.

Book ChapterDOI
TL;DR: This chapter reviews software that is divided into five categories: collections of subroutines and algorithms; general statistical packages that include clustering methods; cluster analysis packages; simple programs that perform one type of clustering, and special purpose clustering programs, including novel methods, graphics and other aids to cluster interpretation.
Abstract: Publisher Summary Clustering software comes in a variety of forms, ranging from the simple, 100-line FORTRAN programs to packages containing many thousands of statements. This chapter reviews software that is divided into five categories: collections of subroutines and algorithms; general statistical packages that include clustering methods; cluster analysis packages; simple programs that perform one type of clustering, and special purpose clustering programs, including novel methods, graphics and other aids to cluster interpretation. A software program that emphasizes hierarchical methods and those that contain iterative partitioning methods are discussed in the chapter. The special purpose programs and usability that concerns the users' manuals and error handling of the programs are described in the chapter.

Journal ArticleDOI
TL;DR: The concept of validation is introduced, the use of four readily available procedures as applied to an archaeological problem is presented, and three are presented as well as a fourth which is not presented.
Abstract: Despite increasing sophistication in their use of cluster analysis, archaeologists have yet to apply objective validation methods to the results of their research. Validation methods are a mixed bag of procedures and statistics which measure the adequacy of results obtained from a clustering process. There are many different methods, and most are biased in favor of particular types of clustering algorithms. This paper introduces the concept of validation, and presents the use of four readily available procedures as applied to an archaeological problem.

Journal ArticleDOI
TL;DR: In this article, a hierarchical tree of 500 infrared spectra, using the recently proposed fractal or 3-distances clustering method, was described and discussed, with a very satisfactory clustering scheme with respect to the structure of the compounds involved.