scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1987"


Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


Journal ArticleDOI
TL;DR: This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification and exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components.
Abstract: Artificial neural net models have been studied for many years in the hope of achieving human-like performance in the fields of speech and image recognition. These models are composed of many nonlinear computational elements operating in parallel and arranged in patterns reminiscent of biological neural nets. Computational elements or nodes are connected via weights that are typically adapted during use to improve performance. There has been a recent resurgence in the field of artificial neural nets caused by new net topologies and algorithms, analog VLSI implementation techniques, and the belief that massive parallelism is essential for high performance speech and image recognition. This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification. These nets are highly parallel building blocks that illustrate neural net components and design principles and can be used to construct more complex systems. In addition to describing these nets, a major emphasis is placed on exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components. Single-layer nets can implement algorithms required by Gaussian maximum-likelihood classifiers and optimum minimum-error classifiers for binary patterns corrupted by noise. More generally, the decision regions required by any classification algorithm can be generated in a straightforward manner by three-layer feed-forward nets.

7,798 citations



01 Jan 1987

1,481 citations


Journal ArticleDOI
TL;DR: A new projection pursuit algorithm for exploring multivariate data is presented that has both statistical and computational advantages over previous methods and the emphasis here is on the discovery of nonlinear effects such as clustering or other general nonlinear associations among the variables.
Abstract: A new projection pursuit algorithm for exploring multivariate data is presented that has both statistical and computational advantages over previous methods. A number of practical issues concerning its application are addressed. A connection to multivariate density estimation is established, and its properties are investigated through simulation studies and application to real data. The goal of exploratory projection pursuit is to use the data to find low- (one-, two-, or three-) dimensional projections that provide the most revealing views of the full-dimensional data. With these views the human gift for pattern recognition can be applied to help discover effects that may not have been anticipated in advance. Since linear effects are directly captured by the covariance structure of the variable pairs (which are straightforward to estimate) the emphasis here is on the discovery of nonlinear effects such as clustering or other general nonlinear associations among the variables. Although arbitrary ...

829 citations


Journal ArticleDOI
TL;DR: A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research, and two sets of recommendations are offered.
Abstract: A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research. After an over view of the clustering literature, the clustering process is discussed within a seven-step framework. The four major types of clustering methods can be characterized as hierarchical, partitioning, overlapping, and ordina tion algorithms. The validation of such algorithms re fers to the problem of determining the ability of the methods to recover cluster configurations which are known to exist in the data. Validation approaches in clude mathematical derivations, analyses of empirical datasets, and monte carlo simulation methods. Next, interpretation and inference procedures in cluster anal ysis are discussed. inference procedures involve test ing for significant cluster structure and the problem of determining the number of clusters in the data. The paper concludes with two sets of recommendations. One set deals with topics in clustering that would ben ...

624 citations


Journal ArticleDOI
TL;DR: The selection of the proper clustering procedure to use in the development of an objective synoptic methodology may have far-reaching implications on the composition of the final homogeneous groupings as mentioned in this paper.
Abstract: The selection of the proper clustering procedure to use in the development of an objective synoptic methodology may have far-reaching implications on the composition of the final “homogeneous” groupings The goal of this study is to evaluate three common clustering techniques (Ward's, average linkage, and centroid) to determine which yields the most meaningful synoptic classification The three clustering procedures were applied to a temporal synoptic index which classified days in Mobile, Alabama into meteorologically homogeneous units The final meteorological groupings differed widely among the three pressures Ward's tended to produce groups with relatively similar numbers of days Thus, many extreme weather days were grouped with less extreme days, and the final meteorological units did not duplicate reality with great precision The centroid procedure produced one very large group and many single-day groups, yielding unsatisfactory results The average linkage procedure, which minimizes wit

510 citations


Journal ArticleDOI
TL;DR: In this stochastic approach to global optimization, clustering techniques are applied to identify local minima of a real valued objective function that are potentially global.
Abstract: In this stochastic approach to global optimization, clustering techniques are applied to identify local minima of a real valued objective function that are potentially global. Three different methods of this type are described; their accuracy and efficiency are analyzed in detail.

478 citations


Journal ArticleDOI
TL;DR: A procedure to detect connected planar, convex, and concave surfaces of 3-D objects by segments the range image into surface patches by a square error criterion clustering algorithm using surface points and associated surface normals.
Abstract: The recognition of objects in three-dimensional space is a desirable capability of a computer vision system. Range images, which directly measure 3-D surface coordinates of a scene, are well suited for this task. In this paper we report a procedure to detect connected planar, convex, and concave surfaces of 3-D objects. This is accomplished in three stages. The first stage segments the range image into ``surface patches'' by a square error criterion clustering algorithm using surface points and associated surface normals. The second stage classifies these patches as planar, convex, or concave based on a non-parametric statistical test for trend, curvature values, and eigenvalue analysis. In the final stage, boundaries between adjacent surface patches are classified as crease or noncrease edges, and this information is used to merge compatible patches to produce reasonable faces of the object(s). This procedure has been successfully applied to a large number of real and synthetic images, four of which we present in this paper.

464 citations


Journal ArticleDOI
TL;DR: In this paper, an algorithm for concurrent formation of part-families and machine-cells in group technology is presented, which is an expanded and improved version of the earlier ideal seed method for clustering.
Abstract: This paper deals with the development of an algorithm for concurrent formation of part-families and machine-cells in group technology. The acronym ZODTAC stands for zero-one data: ideal seed algorithm for clustering. The present algorithm is an expanded and improved version of the earlier ideal seed method. The formation of part-families and machine-cells has been treated as a problem of block diagonalization of the zero-one matrix. Different methods of choosing seeds have been developed and tested. A new concept called ‘relative efficiency’ has been developed and used as a stopping rule for the iterations. The ZODIAC procedure and its theory are given in detail. Test results with a 40 × 100 matrix are shown.

407 citations


Journal ArticleDOI
TL;DR: This article provides an introduction and a road map for applying clustering techniques productively to research in counseling psychology and culls those aspects most relevant and useful to psychologists from this literature.
Abstract: As a research technique that has grown rapidly in applications in many scientific disciplines, cluster analysis has potential for wider use in counseling psychology research. We begin with a simple example illustrating the clustering approach. Topics covered include the variety of approaches in clustering, the times when cluster analysis may be a choice for analysis, the steps in cluster analysis, the data features, such as level, shape, and scatter, that affect cluster results, alternate clustering methods and evidence indicating which are most effective, and examples of clustering applications in counseling research. Although we make an attempt to provide a comprehensive overview of major issues, the reader is encouraged to consult several good recent publications on the topic that are especially relevant for psychologists. Cluster analysis is a classification technique for forming homogeneous groups within complex data sets. Both the clustering methods and the ways of applying them are extremely diverse. Our purpose in writing this article is to provide an introduction and a road map for applying these techniques productively to research in counseling psychology. The cluster analysis literature is huge, is scattered among many diverse disciplines, and is often arcane. We have made an attempt to cull those aspects most relevant and useful to psychologists from this literature. Most of the discussion in the psychological community about how best to apply cluster analysis to obtain robust, valid, and useful results has taken place within the past 5 years. We seem to be on the verge of a consensus, which has long been needed in an often bewildering field. In the past 30 years, a number of clustering methods, often with their own vocabulary and approaches, have sprouted within a wide variety of scientific disciplines. The earliest sustained applications were in problems of biological classification, within the field called numerical taxonomy (Sokal & Sneath, 1963). Today, clustering is applied to problems as different as the grouping of chemical structures (Massart & Kaufman, 1983) and the classification of helpful and nonhelpful events in counseling (Elliott, 1985). Computerized methods for generating clusters have been developed and made increasingly available over the last decade. Applications of clustering have mushroomed in many disciplines, including the social sciences. In an annual bibliographic search performed by the Classification Society (Day, 1986) 1,166 entries are shown for the 1985 scientific literature alone.

Journal ArticleDOI
01 Jun 1987-Ecology
TL;DR: The method is applied to point location data for a sample of ponderosa pine (Pinus ponderosa) trees, and shows that heterogeneity within the forest is clearly a function of the scale of analysis.
Abstract: A technique based on second-order methods, called second-order neighborhood analysis, is used to quantify clustering at various spatial scales. The theoretical model represents the degree of clustering in a Poisson process from the perspective of each individual point. The method is applied to point location data for a sample of ponderosa pine (Pinus ponderosa) trees, and shows that heterogeneity within the forest is clearly a function of the scale of analysis.

Book
01 Apr 1987
TL;DR: This book describes the latest developments in computational techniques which increase the effectiveness of the storage, retrieval, and processing of information in computerized chemical information systems.
Abstract: From the Publisher: This book describes the latest developments in computational techniques which increase the effectiveness of the storage, retrieval, and processing of information in computerized chemical information systems. Covers the specifics of searching and clustering methods based upon the calculation of measures of similarity between chemical structures in machine-readable files. These techniques are used not only to enhance information retrieval, but also as a component of drug development programs. Chapters cover the representation of chemical structures searching files, quantitative structure-activity relationships, similarity and cluster analysis, similarity in chemical information systems, clustering in chemical information systems, and nearest neighbor searching algorithms.

Proceedings ArticleDOI
01 Mar 1987
TL;DR: It is demonstrated that the affine viewing transformation is a reasonable approximation to perspective and a clustering approach, which produces a set of consistent assignments between vertex-pairs in the model and in the image is described.
Abstract: It is demonstrated that the affine viewing transformation is a reasonable approximation to perspective. A group of image vertices and edges, called the vertex-pair, which fully determines the affine transformation between a three-dimensional model and a two-dimensional image is defined. A clustering approach, which produces a set of consistent assignments between vertex-pairs in the model and in the image is described. A number of experimental results on outdoor images are presented.

Journal ArticleDOI
TL;DR: Experiments show that the positional accuracy of points placed in the data space by a model pose obtained via clustering is comparable to the positional accuracies of the sensed data from which pose candidates are computed.
Abstract: The general paradigm of pose clustering is discussed and compared to other techniques applicable to the problem of object detection. Pose clustering is also called hypothesis accumulation and generalized Hough transform and is characterized by a “parallel” accumulation of low level evidence followed by a maxima or clustering step which selects pose hypotheses with strong support from the set of evidence. Examples are given showing the use of pose clustering in both 2D and 3D problems. Experiments show that the positional accuracy of points placed in the data space by a model pose obtained via clustering is comparable to the positional accuracy of the sensed data from which pose candidates are computed. A specific sensing system is described which yields an accuracy of a few millimeters. Complexity of the pose clustering approach relative to alternative approaches is discussed with reference to conventional computers and massively parallel computers. It is conjectured that the pose clustering approach can produce superior results in real time on a massively parallel machine.

BookDOI
01 Jan 1987
TL;DR: This volume takes up where multidimensional scaling leaves off assumes a working knowledge of the material covered in the earlier volumes and goes well beyond it with a sophisticated overview and analysis of 3-way scaling procedures.
Abstract: This volume takes up where multidimensional scaling leaves off assumes a working knowledge of the material covered in the earlier volumes and goes well beyond it. It begins with a review and application of the INDSCAL model. The authors begin their discussion with an example of the use of the INDSCAL model they then present the model itself and finally they return to another extended example. The initial example is the Rosenberg-Kim study of English kinship terms. The presentation of the model and concepts grows nicely from this context. A 2nd example firms up the readers understanding of the model. After they cover the INDSCAL model the authors present a detailed analysis of SINDSCAL and provide an introduction to other 3-way scaling models as well as individual differences clustering models. The Rosenberg and Kim data are used again to illustrate the INDCLUS clustering model. A series of appendices provide readers with the control cards to analyze ont of the examples using SINDSCAL and discuss several procedures for fitting the INDSCAL model. This monograph provides a sophisticated overview and analysis of 3-way scaling procedures.

Journal ArticleDOI
TL;DR: The modified Hubert index, proposed here for the first time, is shown to perform better than the Davies-Bouldin index under all experimental conditions and demonstrates the difficulty inherent in estimating the number of clusters.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs, and once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed.
Abstract: The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.

Journal ArticleDOI
TL;DR: In this article, the authors analyzed a catalog of earthquakes from the New Hebrides for the occurrence of temporal clusters that exhibit fractal behavior and found that significant deviations from random or Poisson behavior are found.
Abstract: The concept of fractals provides a means of testing whether clustering in time or space is a scale-invariant process. If the fraction x of the intervals of length T containing earthquakes is related to the time interval by x - then fractal clustering is occurring with fractal dimension D (O c D e 1). We have analyzed a catalog of earthquakes from the New Hebrides for the occurrence of temporal clusters that exhibit fractal behavior. Our studies have considered four distinct regions. The number of earthquakes considered in each region varies from 44 to 1,330. In all cases, significant deviations from random or Poisson behavior are found. The fractal dimensions found vary from 0.126 to 0.255. Our method introduces a new means of quantifying the clustering of earthquakes.

Journal ArticleDOI
TL;DR: The basic data model of an object-oriented database and the basic architecture of the system implementing it are described and a secondary storage segmentation scheme and a transaction-processing scheme are discussed.
Abstract: This paper describes the basic data model of an object-oriented database and the basic architecture of the system implementing it. In particular, a secondary storage segmentation scheme and a transaction-processing scheme are discussed. The segmentation scheme allows for arbitrary clustering of objects, including duplicates. The transaction scheme allows for many different sharing protocols ranging from those that enforce serializability to those that are nonserializable and require communication with the server only on demand. The interaction of these two features is described such that segment-level transfer and object-level locking is achieved.

Proceedings Article
13 Jul 1987
TL;DR: COBWEB is presented, a conceptual clustering system that organizes data to maximize inference abilities by capturing attribute inter-correlations at classification tree nodes and generating inferences as a by-product of classification.
Abstract: Conceptual clustering is an important way to sununarize data in an understandable manner. However, the recency of the conceptual clustering paradigm has allowed little exploration of conceptual clustering as a means of improving performance. This paper presents COBWEB, a conceptual clustering system that organizes data to maximize inference abilities. It does this by capturing attribute inter-correlations at classification tree nodes and generating inferences as a by-product of classification. Results from the domains of soybean and thyroid disease diagnosis support the success of this approach.

Journal ArticleDOI
TL;DR: A Monte Carlo study was made of the recovery of cluster structure in binary data by five hierarchical techniques, with a view to finding which data structure factors influenced recovery and to determining differences between clustering methods with respect to these factors.
Abstract: A Monte Carlo study was made of the recovery of cluster structure in binary data by five hierarchical techniques, with a view to finding which data structure factors influenced recovery and to determining differences between clustering methods with respect to these factors. Recovery was found to increase as the number of groups decreased, as the number of variables increased, as the mixing proportions tended towards equality, and as the number of observations was increased. Single link was found to be much worse than the other clustering techniques.

Journal ArticleDOI
TL;DR: Spatial variances are introduced as additional parameters of the classification, with the aim to better separate clouds from the surface and the different kinds of more or less homogeneous cloud classes.
Abstract: New developments of a cloud classification scheme based on histogram clustering by a statistical method are presented. Use of time series of geostationary satellite pictures as well as for construction of composite images representative of the surface properties and then for the identification of significative cloud classes is discussed. Spatial variances are introduced as additional parameters of the classification, with the aim to better separate clouds from the surface and the different kinds of more or less homogeneous cloud classes.

Journal ArticleDOI
01 Jul 1987
TL;DR: A cluster identification algorithm which has relatively low computational time complexity O(2mn) is developed and allows checking for the existence of clusters and determines the number of mutually separable clusters.
Abstract: Clustering of large-scale binary matrices requires a considerable computational effort. In some cases this effort is lost since the matrix is not decomposable into mutually separable submatrices. A cluster identification algorithm which has relatively low computational time complexity O(2mn) is developed. It allows checking for the existence of clusters and determines the number of mutually separable clusters. A modified cluster identification algorithm for clustering nondiagonally structured matrices is also presented. The two algorithms are illustrated in numerical examples.

01 Feb 1987
TL;DR: A simple allocation algorithm with an enhanced method for representing association is presented and compared with a standard SAHN methodology, suggesting that non-hierarchical techniques have been under-estimated.
Abstract: Requirements for the pattern analysis of increasingly large sets of data have elevated the utility of nonhierarchical methods. While SAHN (hierarchical) techniques are still very much in vogue, they are unable to analyse large sets of data on fast computers, and in many cases represent an inappropriate strategy. A simple allocation algorithm with an enhanced method for representing association is presented and compared with a standard SAHN methodology. Results suggest that non-hierarchical techniques have been under-estimated.

Journal ArticleDOI
TL;DR: This paper applies clustering methods to a new problem domain and presents a new method based on a cluster-structure approach for the recognition of 2-D partially occluded objects, which allows the identification of all parts of the model which match in the Occluded scene.

Journal ArticleDOI
TL;DR: In this article, a new data structure called Voronoi tree is presented to support the solution of proximity problems in general pseudo metric spaces with efficiently computable distance functions. But this data structure is not suitable for clustering.

Book ChapterDOI
01 Jan 1987
TL;DR: This paper shows how methods of cluster analysis, principal component analysis, and multidimensional scaling may be combined in order to obtain an optimal fit between a classification underlying some set of objects 1,…,n and its visual representation in a low-dimensional euclidean space ™s.
Abstract: This paper shows how methods of cluster analysis, principal component analysis, and multidimensional scaling may be combined in order to obtain an optimal fit between a classification underlying some set of objects 1,…,n and its visual representation in a low-dimensional euclidean space ℝs. We propose several clustering criteria and corresponding k-means-like algorithms which are based either on a probabilistic model or on geometrical considerations leading to matrix approximation problems. In particular, a MDS-clustering strategy is presented for-displaying not only the n objects using their pairwise dissimilarities, but also the detected clusters and their average distances.

Book ChapterDOI
01 Jan 1987
TL;DR: Current research emphasis in pattern recognition is on designing efficient algorithms, studying small sample properties of various estimators and decision rules, implementing the algorithms on novel computer architecture, and incorporating context and domain-specific knowledge in decision making.
Abstract: Statistical pattern recognition is now a mature discipline which has been successfully applied in several application domains. The primary goal in statistical pattern recognition is classification, where a pattern vector is assigned to one of a finite number of classes and each class is characterized by a probability density function on the measured features. A pattern vector is viewed as a point in the multidimensional space defined by the features. Design of a recognition system based on this paradigm requires careful attention to the following issues: type of classifier (single-stage vs. hierarchical), feature selection, estimation of classification error, parametric vs. nonparametric decision rules, and utilizing contextual information. Current research emphasis in pattern recognition is on designing efficient algorithms, studying small sample properties of various estimators and decision rules, implementing the algorithms on novel computer architecture, and incorporating context and domain-specific knowledge in decision making.

Journal ArticleDOI
TL;DR: In this article, the authors discuss the problem of selecting a proper threshold value and present a solution to it, which depends upon the similarity coefficient (threshold value) at which machine cells/machines are joined.
Abstract: Machine-component grouping is a basic step in the application of group technology to manufacturing. It is the process of finding families of similar parts (part-families) and forming the associated machine cells such that one or more part-families can be processed within a single cell. Among the algorithms used to form the machine cells, those based on the Similarity Coefficient Method (SCM) are more flexible in incorporating the manufacturing data into the machine-component grouping process. SCM is the application of clustering techniques in forming the machine cells. One of the major problems with SCM is that it generates a set of alternative solutions rather than a unique solution. The number and size of machine cells in a given solution depends upon the similarity coefficient (threshold value) at which machine cells/machines are joined. This paper discusses the problem of selecting a proper threshold value and presents a solution to it.