scispace - formally typeset
Search or ask a question

Showing papers on "Cluster analysis published in 1990"


Journal ArticleDOI
01 Sep 1990
TL;DR: The self-organizing map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications, and an algorithm which order responses spatially is reviewed, focusing on best matching cell selection and adaptation of the weight vectors.
Abstract: The self-organized map, an architecture suggested for artificial neural networks, is explained by presenting simulation experiments and practical applications. The self-organizing map has the property of effectively creating spatially organized internal representations of various features of input signals and their abstractions. One result of this is that the self-organization process can discover semantic relationships in sentences. Brain maps, semantic maps, and early work on competitive learning are reviewed. The self-organizing map algorithm (an algorithm which order responses spatially) is reviewed, focusing on best matching cell selection and adaptation of the weight vectors. Suggestions for applying the self-organizing map algorithm, demonstrations of the ordering process, and an example of hierarchical clustering of data are presented. Fine tuning the map by learning vector quantization is addressed. The use of self-organized maps in practical speech recognition and a simulation experiment on semantic mapping are discussed. >

7,883 citations


Journal ArticleDOI
01 Sep 1990
TL;DR: Regularization networks are mathematically related to the radial basis functions, mainly used for strict interpolation tasks as mentioned in this paper, and two extensions of the regularization approach are presented, along with the approach's corrections to splines, regularization, Bayes formulation, and clustering.
Abstract: The problem of the approximation of nonlinear mapping, (especially continuous mappings) is considered. Regularization theory and a theoretical framework for approximation (based on regularization techniques) that leads to a class of three-layer networks called regularization networks are discussed. Regularization networks are mathematically related to the radial basis functions, mainly used for strict interpolation tasks. Learning as approximation and learning as hypersurface reconstruction are discussed. Two extensions of the regularization approach are presented, along with the approach's corrections to splines, regularization, Bayes formulation, and clustering. The theory of regularization networks is generalized to a formulation that includes task-dependent clustering and dimensionality reduction. Applications of regularization networks are discussed. >

3,595 citations


Book ChapterDOI
E.R. Davies1
01 Jan 1990
TL;DR: This chapter introduces the subject of statistical pattern recognition (SPR) by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates comparable with those of an ideal Bayes’ classifier.
Abstract: This chapter introduces the subject of statistical pattern recognition (SPR). It starts by considering how features are defined and emphasizes that the nearest neighbor algorithm achieves error rates comparable with those of an ideal Bayes’ classifier. The concepts of an optimal number of features, representativeness of the training data, and the need to avoid overfitting to the training data are stressed. The chapter shows that methods such as the support vector machine and artificial neural networks are subject to these same training limitations, although each has its advantages. For neural networks, the multilayer perceptron architecture and back-propagation algorithm are described. The chapter distinguishes between supervised and unsupervised learning, demonstrating the advantages of the latter and showing how methods such as clustering and principal components analysis fit into the SPR framework. The chapter also defines the receiver operating characteristic, which allows an optimum balance between false positives and false negatives to be achieved.

1,189 citations


Journal ArticleDOI
TL;DR: In this article, a simple coding of many attractors with clustering is shown, where the attractors are organized so that their change exhibits bifurcation-like phenomena, and a precision-dependent tree is constructed which leads to the similarity of our attractor with those of spin-glasses.

817 citations


Journal ArticleDOI
TL;DR: In this paper, a new approach to clustering based on statistical physics is presented, where the problem is formulated as fuzzy clustering and the association probability distribution is obtained by maximizing the entropy at a given average variance.
Abstract: A new approach to clustering based on statistical physics is presented. The problem is formulated as fuzzy clustering and the association probability distribution is obtained by maximizing the entropy at a given average variance. The corresponding Lagrange multiplier is related to the ``temperature'' and motivates a deterministic annealing process where the free energy is minimized at each temperature. Critical temperatures are derived for phase transitions when existing clusters split. It is a hierarchical clustering estimating the most probable cluster parameters at various average variances.

486 citations


Journal ArticleDOI
TL;DR: A new method for detecting spatial clustering of events in populations with non-uniform density is proposed, based on selecting controls from the population at risk and computing interpoint distances for the combined sample.
Abstract: A new method for detecting spatial clustering of events in populations with non-uniform density is proposed. The method is based on selecting controls from the population at risk and computing interpoint distances for the combined sample. Nonparametric tests are developed which are based on the number of cases among the k nearest neighbours of each case and the number of cases nearer than the k nearest control. The performance of these tests is evaluated analytically and by simulation and the method is applied to a data set on the locations of cases of childhood leukaemia and lymphoma in a defined geographical area.

444 citations


Proceedings ArticleDOI
H. V. Jagadish1
01 May 1990
TL;DR: A mapping based on Hilbert's space-filling curve is presented, which out-performs previously proposed mappings on average over a variety of different operating conditions.
Abstract: There is often a need to map a multi-dimensional space on to a one-dimensional space. For example, this kind of mapping has been proposed to permit the use of one-dimensional indexing techniques to a multi-dimensional index space such as in a spatial database. This kind of mapping is also of value in assigning physical storage, such as assigning buckets to records that have been indexed on multiple attributes, to minimize the disk access effort.In this paper, we discuss what the desired properties of such a mapping are, and evaluate, through analysis and simulation, several mappings that have been proposed in the past. We present a mapping based on Hilbert's space-filling curve, which out-performs previously proposed mappings on average over a variety of different operating conditions.

443 citations


Proceedings ArticleDOI
04 Nov 1990
TL;DR: A texture segmentation algorithm inspired by the multichannel filtering theory for visual information processing in the early stages of the human visual system is presented and appears to perform as predicted by preattentive texture discrimination by a human.
Abstract: A texture segmentation algorithm inspired by the multichannel filtering theory for visual information processing in the early stages of the human visual system is presented. The channels are characterized by a bank of Gabor filters that nearly uniformly covers the spatial-frequency domain. A systematic filter selection scheme based on reconstruction of the input image from the filtered images is proposed. Texture features are obtained by subjecting each (selected) filtered image to a nonlinear transformation and computing a measure of energy in a window around each pixel. An unsupervised square-error clustering algorithm is then used to integrate the feature images and produce a segmentation. A simple procedure to incorporate spatial adjacency information in the clustering process is proposed. Experiments on images with natural textures as well as artificial textures with identical second and third-order statistics are reported. The algorithm appears to perform as predicted by preattentive texture discrimination by a human. >

426 citations


Journal ArticleDOI
TL;DR: It is shown that as the temperature approaches zero, the algorithm becomes the basic ISODATA algorithm and the method is independent of the initial choice of cluster means.

393 citations


Journal ArticleDOI
TL;DR: Estimation theory is used to derive a new approach to the clustering problem, a unification of centroid and mode estimation, achieved by considering the effect of spatial scale on the estimator, which is a multiresolution method which spans a range of spatial scales.

379 citations


Journal ArticleDOI
TL;DR: This paper presents another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O[T1?] for average trees representing secondary structures.
Abstract: In a previous paper, an algorithm was presented for analyzing multiple RNA secondary structures utilizing a multiple string alignment algorithm. In this paper we present another approach to the problem of comparing many secondary structures by utilizing a very efficient tree-matching algorithm that will compare two trees in O([T1] X [T2] X L1 X L2) in the worst case and very close to O([T1] X [T2]) for average trees representing secondary structures. The result of the pairwise comparison algorithm is then used with a cluster algorithm to produce a multiple structure clustering which can be displayed in a taxonomy tree to show related structures.

Proceedings ArticleDOI
22 Oct 1990
TL;DR: A collection of clustering and decomposition techniques that make possible the construction of sparse and locality-preserving representations for arbitrary networks is presented and several other graph-theoretic structures that are strongly related to covers are discussed.
Abstract: A collection of clustering and decomposition techniques that make possible the construction of sparse and locality-preserving representations for arbitrary networks is presented. The representation method considered is based on breaking the network G(V,E) into connected regions, or clusters, thus obtaining a cover for the network, i.e. a collection of clusters that covers the entire set of vertices V. Several other graph-theoretic structures that are strongly related to covers are discussed. These include sparse spanners, tree covers of graphs and the concepts of regional matchings and diameter-based separators. All of these structures can be constructed by means of one of the clustering algorithms given, and each has proved a convenient representation for handling certain network applications. >

Journal ArticleDOI
TL;DR: In this article, a parametric approach is proposed in order to introduce a well-defined metric on the class of autoregressive integrated moving-average (ARIMA) invertible models as the Euclidean distance between their auto-gressive expansions.
Abstract: . In a number of practical problems where clustering or choosing from a set of dynamic structures is needed, the introduction of a distance between the data is an early step in the application of multivariate statistical methods. In this paper a parametric approach is proposed in order to introduce a well-defined metric on the class of autoregressive integrated moving-average (ARIMA) invertible models as the Euclidean distance between their autoregressive expansions. Two case studies for clustering economic time series and for assessing the consistency of seasonal adjustment procedures are discussed. Finally, some related proposals are surveyed and some suggestions for further research are made.

Journal ArticleDOI
TL;DR: An algorithm for multidimensional data clustering (termed the variance-based algorithm), based on the criterion of minimization of the sum-of-squared error, is applied to the problem of reducing the number of colors used to represent a given color image.
Abstract: Color image quantization is a process of representing an image with a small number of well selected colors. In this article an algorithm for multidimensional data clustering (termed the variance-based algorithm), based on the criterion of minimization of the sum-of-squared error, is applied to the problem of reducing the number of colors used to represent a given color image. The suitability of the sum-of-squared error criterion for measuring the similarity between the original and quantized images is examined using a digitized image and a computer-generated image. The experimental results indicate that this error measure is basically consistent with the perceived quality of the quantized image. The performance of the variance-based algorithm is compared with that of other algorithms for color image quantization in terms of quantization images generated using the colors selected by the variance-based and the mediancut algorithms are also presented.

Journal ArticleDOI
TL;DR: The FSC algorithm is shown to be superior to the HT method with regards to memory requirement and computation time and to be successful even if only a part of a circular shape is present in the image.
Abstract: A new type of Fuzzy Clustering algorithm called Fuzzy-Shell Clustering (FSC) is introduced, The FSC algorithm seeks cluster prototypes that are p-dimensional hyper-spherical-shells. In two-dimensional data, this amounts to finding cluster prototypes that are circles. Thus the FSC algorithm can be applied for detection of circles in digital images. The algorithm does not require the data-points to be in any particular order, therefore its performance can be compared with the global transformation techniques such as Hough transforms. Several numerical examples are considered and the performance of the FSC algorithm is compared to the performance of the methods based on generalized Hough transform (HT). The FSC is shown to be superior to the HT method with regards to memory requirement and computation time. Like the HT method, the FSC is successful even if only a part of a circular shape is present in the image. Other potential applications of FSC are also considered.

Proceedings ArticleDOI
17 Jun 1990
TL;DR: The authors present learning rate schedules for fast adaptive k-means clustering which surpass the standard MacQueen learning rate schedule in speed and quality of solution by several orders of magnitude for large k.
Abstract: The authors present learning rate schedules for fast adaptive k -means clustering which surpass the standard MacQueen learning rate schedule (J. MacQeen, 1967) in speed and quality of solution by several orders of magnitude for large k . The methods accomplish this by largely overcoming the problems of metastable local minima and nonstationarity of cluster region boundaries which plague the MacQueen approach. The authors use simulation results to compare the clustering performances of four learning rate schedules applied to independently sampled data from a uniform distribution in one and two dimensions

PatentDOI
TL;DR: The system in principle is different from neural networks and statistical classifiers and can deal with very difficult pattern classification problems that arise in speech and image recognition, robotics, medical diagnosis, warfare systems and others.
Abstract: A method and apparatus (system) is provided for the separation into and the identification of classes of events wherein each of the events is represented by a signal vector comprising the signals x1, x2, . . . , xj, . . . , xn. The system in principle is different from neural networks and statistical classifiers. The system comprises a plurality of assemblies. The training or adaptation module stores a set of training examples and has a set of procedures (linear programming, clustering and others) that operate on the training examples and determine a set of transfer function and threshold values. These transfer functions and threshold values are installed on a recognized module for use in the identification phase. The training module is extremely fast and can deal with very difficult pattern classification problems that arise in speech and image recognition, robotics, medical diagnosis, warfare systems and others. The systems also exploits parallelism in both the learning and recognition phases.

Journal ArticleDOI
TL;DR: The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance and improvements in retrieval effectiveness.
Abstract: A new algorithm for document clustering is introduced. The base concept of the algorithm, the cover coefficient (CC) concept, provides a means of estimating the number of clusters within a document database and related indexing and clustering analytically. The CC concept is used also to identify the cluster seeds and to form clusters with these seeds. It is shown that the complexity of the clustering process is very low. The retrieval experiments show that the information-retrieval effectiveness of the algorithm is compatible with a very demanding complete linkage clustering method that is known to have good retrieval performance. The experiments also show that the algorithm is 15.1 to 63.5 (with an average of 47.5) percent better than four other clustering algorithms in cluster-based information retrieval. The experiments have validated the indexing-clustering relationships and the complexity of the algorithm and have shown improvements in retrieval effectiveness. In the experiments two document databases are used: TODS214 and INSPEC. The latter is a common database with 12,684 documents.

Proceedings ArticleDOI
13 May 1990
TL;DR: An overall strategy for inspection planning is discussed, and attention is focused on local and global accessibility cones and on algorithms for computing and clustering them.
Abstract: An overall strategy for inspection planning is discussed, and attention is focused on local and global accessibility cones and on algorithms for computing and clustering them. The direction-cone clusters serve to generate a minimal set of directions for inspecting a part. The algorithms are based primarily on the computation on Gaussian images and Minkowski (or sweeping) operations. Applications to computer vision and automatic machining are briefly presented. >

Journal ArticleDOI
TL;DR: A model-fitting approach to the cluster validation problem based on Akaike's information criterion is proposed, and its efficacy and robustness are demonstrated through experimental results for synthetic mixture data and image data.
Abstract: A clustering scheme is used for model parameter estimation. Most of the existing clustering procedures require prior knowledge of the number of classes, which is often, as in unsupervised image segmentation, unavailable and must be estimated. This problem is known as the cluster validation problem. For unsupervised image segmentation the solution of this problem directly affects the quality of the segmentation. A model-fitting approach to the cluster validation problem based on Akaike's information criterion is proposed, and its efficacy and robustness are demonstrated through experimental results for synthetic mixture data and image data. >

Journal ArticleDOI
TL;DR: In this paper, three array-based clustering algorithms (rank order clustering (ROC), direct clustering analysis (DCA), and bond energy analysis (BEA)) were compared for cell formation.
Abstract: SUMMARY This paper examines three array-based clustering algorithms—rank order clustering (ROC), direct clustering analysis (DCA), and bond energy analysis (BEA)—for manufacturing cell formation. According to our test, bond energy analysis outperforms the other two methods, regardless of which measure or data set is used. If exceptional elements exist in the data set, the BEA algorithm also produces better results than the other two methods without any additional processing. The BEA can compete with other more complicated methods that have appeared in the literature.

Journal ArticleDOI
TL;DR: In this article, a similarity coefficient based on the similarity of operation sequences is presented, and its use for parts grouping is discussed, and it is also illustrated that such a coefficient, augmented with an advanced clustering algorithm, can improve production effectiveness by identifying part families that allow machines to interleave between identical operations of different parts.

Journal ArticleDOI
TL;DR: Formal bounds are established on the efficacy of using the Hough transform to preselect likely subspaces of the search space, showing that the problem remains exponential, but that in practical terms the size of the problem is significantly decreased.

Journal ArticleDOI
TL;DR: Simulated annealing is used to find a near-optimal partitioning with respect to each of several clustering criteria for a variety of simulated data sets, which can then be used to determine the “best” clustering criterion for the multi-sensor fusion problem with a given fusion operator.

Journal ArticleDOI
TL;DR: In this article, the mathematical foundation of common integrated-circuit yield models based on the assumption that the yield is dominated by random point defects is discussed and the implications for overall system yield are discussed.
Abstract: The mathematical foundation of common integrated--circuit yield models based on the assumption that the yield is dominated by random point defects is discussed. Various mathematical models which are commonly used to account for defect clustering are given a physical interpretation and are compared mathematically and graphically. A yield model applicable when the repair of some defects in a chip is possible is developed and discussed. Simple yield models for systems with two-fold block redundancy and triple modular redundancy in the presence of defect clustering are developed. and the implications for overall system yield are discussed. It is shown that the yield of systems with circuit redundancy can be substantially affected by defect clustering and, hence, that a correct understanding of defects and yield is essential to predict the yields and costs of wafer-scale products. >

17 Sep 1990
TL;DR: An algorithm is examined for data clustering and incremental concept formation in the Cobweb/3 system which features a flexible user interface which includes a graphical display of the concept hierarchies that the system constructs.
Abstract: An algorithm is examined for data clustering and incremental concept formation. An overview is given of the Cobweb/3 system and the algorithm on which it is based, as well as the practical details of obtaining and running the system code. The implementation features a flexible user interface which includes a graphical display of the concept hierarchies that the system constructs.

Journal ArticleDOI
TL;DR: In this article, the poles and zeros of the original system in the s-plane are clustered using a newly defined inverse distance measure, and a tuning factor k is used to minimize the cumulative square of the time response deviations between the original systems and the reduced order approximant.

Book
18 Oct 1990
TL;DR: Most of the algorithms in this book are for hypercubes with the number of processors being a function of problems size, however, for image processing problems, the book also includes algorithms for and MIMD hypercube with a small number of processes.
Abstract: Fundamentals algorithms for SIMD and MIMD hypercubes are developed. These include algorithms for such problems as data broadcasting, data sum, prefix sum, shift, data circulation, data accumulation, sorting, random access reads and writes and data permutation. The fundamental algorithms are then used to obtain efficient hypercube algorithms for matrix multiplication, image processing problems such as convolution, template matching, hough transform, clustering and image processing transformation, and string editing. Most of the algorithms in this book are for hypercubes with the number of processors being a function of problems size. However, for image processing problems, the book also includes algorithms for and MIMD hypercube with a small number of processes. Experimental results on an NCUBE/77 MIMD hypercube are also presented. The book is suitable for use in a one-semester or one-quarter course on hypercube algorithms. For students with no prior exposure to parallel algorithms, it is recommended that one week will be spent on the material in chapter 1, about six weeks on chapter 2 and one week on chapter 3. The remainder of the term can be spent covering topics from the rest of the book.

Proceedings Article
01 Jan 1990
TL;DR: A collection of cluster and decomposition techniques enabling the construc- tion of sparse and locality preserving representations for arbitrary networks, namely, routing with poly- nomial communication-space tradeoff and online tracking of mobile users are presented.
Abstract: This abstract presents a collection of cluster- ing and decomposition techniques enabling the construc- tion of sparse and locality preserving representations for arbitrary networks. These new clustering techniques have already found several powerful applications in the area of distributed network algorithms. Two of these applications are described in this abstract, namely, routing with poly- nomial communication-space tradeoff and online tracking of mobile users.

Proceedings ArticleDOI
01 Mar 1990
TL;DR: The AFC method requires less memory than the HT method and is shown to work better for polygonal descriptions of digital curves and to be comparable to the global line detection technique like Hough transforms (HT).
Abstract: Detection of line segments in a digital picture is viewed as a clustering problem through application of the adaptive fuzzy clustering (AFC) algorithm. For each line detected, the AFC gives the line description in terms of the end-points of the line as well as its weighted geometric center. The results of the AFC technique are compared with the results of the fuzzy c-lines (FCL) and fuzzy c-elliptotypes (FCE) algorithms and superiority of AFC is demonstrated. It is also shown that the output of the AFC algorithm is not very sensitive to the number of clusters to be searched for. A major advantage of the AFC approach is that it does not require ordered ( e.g. chain-coded ) image data-points. Thus it is comparable to the global line detection technique like Hough transforms (HT). The AFC method requires less memory than the HT method and is shown to work better for polygonal descriptions of digital curves. A variation of the AFC algorithm is introduced in order to improve the computational efficiency.