scispace - formally typeset
Search or ask a question

Showing papers on "Fuzzy clustering published in 1988"


Book
01 Jan 1988

8,586 citations


Journal ArticleDOI
TL;DR: Algorithms that can be used to allow the implementation of hierarchic agglomerative clustering methods for document retrieval, and experimental evidence suggests that nearest neighbor clusters provide a reasonably efficient and effective means of including interdocument similarity information in document retrieval systems.
Abstract: This article reviews recent research into the use of hierarchic agglomerative clustering methods for document retrieval. After an introduction to the calculation of interdocument similarities and to clustering methods that are appropriate for document clustering, the article discusses algorithms that can be used to allow the implementation of these methods on databases of nontrivial size. The validation of document hierarchies is described using tests based on the theory of random graphs and on empirical characteristics of document collections that are to be clustered. A range of search strategies is available for retrieval from document hierarchies and the results are presented of a series of research projects that have used these strategies to search the clusters resulting from several different types of hierarchic agglomerative clustering method. It is suggested that the complete linkage method is probably the most effective method in terms of retrieval performance; however, it is also difficult to implement in an efficient manner. Other applications of document clustering techniques are discussed briefly; experimental evidence suggests that nearest neighbor clusters, possibly represented as a network model, provide a reasonably efficient and effective means of including interdocument similarity information in document retrieval systems.

842 citations


Journal ArticleDOI
TL;DR: A forward selection procedure for identifying the subset of variables is proposed and studied in the context of complete linkage hierarchical clustering, and can be applied to other clustering methods, too.
Abstract: Standard clustering algorithms can completely fail to identify clear cluster structure if that structure is confined to a subset of the variables. A forward selection procedure for identifying the subset is proposed and studied in the context of complete linkage hierarchical clustering. The basic approach can be applied to other clustering methods, too.

157 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to collect the main global and local, numerical and stochastic, convergence results for FCM in a brief and unified way.
Abstract: One of the main techniques embodied in many pattern recognition systems is cluster analysis — the identification of substructure in unlabeled data sets. The fuzzy c-means algorithms (FCM) have often been used to solve certain types of clustering problems. During the last two years several new local results concerning both numerical and stochastic convergence of FCM have been found. Numerical results describe how the algorithms behave when evaluated as optimization algorithms for finding minima of the corresponding family of fuzzy c-means functionals. Stochastic properties refer to the accuracy of minima of FCM functionals as approximations to parameters of statistical populations which are sometimes assumed to be associated with the data. The purpose of this paper is to collect the main global and local, numerical and stochastic, convergence results for FCM in a brief and unified way.

144 citations


Journal ArticleDOI
TL;DR: A new divisive algorithm for multidimensional data clustering that produces much smaller quantization errors than the median-cut and mean-split algorithms and is close to the local optimal ones derived by the k-means iterative procedure.
Abstract: A new divisive algorithm for multidimensional data clustering is suggested. Based on the minimization of the sum-of-squared-errors, the proposed method produces much smaller quantization errors than the median-cut and mean-split algorithms. It is also observed that the solutions obtained from our algorithm are close to the local optimal ones derived by the k-means iterative procedure.

115 citations


Journal ArticleDOI
TL;DR: An important result of this study is that a higher F k ( U ) value does not always correspond to a better allocation than a partition with a lower value: this observation contradicts the role of the partition coefficient as cluster validity measurement.

108 citations


Journal ArticleDOI
TL;DR: The issue of validity in clustering is considered and a definition of fuzzy r-cluster that extends E. Ruspini's definition (1982) is proposed, based on an indistinguishability relation based on the concept of t-norm.
Abstract: The issue of validity in clustering is considered and a definition of fuzzy r-cluster that extends E. Ruspini's definition (1982) is proposed. This definition is based on an indistinguishability relation based on the concept of t-norm. The fuzzy r-cluster's metrical properties are studied through the dual concept of t-conorm that leads to G-pseudometrics. From the concept of G-pseudometric, fuzzy r-clusters and fuzzy cluster coverages are defined. The authors propose a measure of cluster validity based on the concept of fuzzy coverage. The basic idea of the approach presented is that the smaller the difference between the degrees of membership and the degrees of indistinguishability, the better the clustering. >

24 citations


Journal ArticleDOI
TL;DR: The general properties that similarity metrics, objective functions, and concept description languages must have to guarantee that a (conceptual) clustering problem is polynomial-time solvable by a simple and widely used clustering technique, the agglomerative-hierarchical algorithm are investigated.
Abstract: Research in cluster analysis has resulted in a large number of algorithms and similarity measurements for clustering scientific data. Machine learning researchers have published a number of methods for conceptual clustering, in which observations are grouped into clusters that have “good” descriptions in some language. In this paper we investigate the general properties that similarity metrics, objective functions, and concept description languages must have to guarantee that a (conceptual) clustering problem is polynomial-time solvable by a simple and widely used clustering technique, the agglomerative-hierarchical algorithm. We show that under fairly general conditions, the agglomerative-hierarchical method may be used to find an optimal solution in polynomial time.

24 citations


Proceedings ArticleDOI
12 Sep 1988
TL;DR: A generalisation of the FCM algorithm (GFCM) which is more versatile than the standard FCM, having discriminant functions which may, by changing parameters, be varied in order to suit particular applications.
Abstract: Several algorithms have been defined which can segment images, each algorithm having its own merits. The Maximum Likelihood (ML) algorithm is considered the most accurate, while the Fuzzy c-Means (FCM) algorithm converges more quickly. This paper describes a generalisation of the FCM algorithm (GFCM) which is more versatile than the standard FCM, having discriminant functions which may, by changing parameters, be varied in order to suit particular applications. The discriminant functions can thus be more realistic than those used in the standard FCM and in the limiting case they approach gaussians, where the algorithm produces results identical to some implementations of ML.

23 citations


Journal ArticleDOI
TL;DR: It is shown that contrary to popular belief these iterative clustering algorithms do not guarantee that each stable partition is locally optimal, so a multiple-point reassignment rule is derived which assumes a Gaussian density model for each cluster.

14 citations


Proceedings ArticleDOI
01 Jan 1988
TL;DR: The authors examine the problem of reliably detecting life-threatening ventricular arrhythmias, discriminating them from other rhythms and imitative artifacts, making use of spectral parameters and fuzzy clustering algorithms for a statistical characterization of the groups.
Abstract: The authors examine the problem of reliably detecting life-threatening ventricular arrhythmias, discriminating them from other rhythms and imitative artifacts, making use of spectral parameters. They try to show the existence of a clusters' structure in the defined features space, one of each cluster representing a kind of rhythm or artifacts. They used a design set of ECG (electrocardiogram) records labeled by a cardiologist. Even when the hard classification labels of these records indicates that they are reasonably separated, they overlap considerably in some regions of the features space. For this reason, the authors use fuzzy clustering algorithms for a statistical characterization of the groups. The membership-functions matrix of the prototypes of each of the clusters and the matrices that induce the norms in their environment give useful descriptions over the training set for constructing a feasible classifier for detecting ventricular arrhythmias. >

Journal ArticleDOI
R. M. Umesh1
TL;DR: The proposed method, which is not biased towards clusters of any particular shape or size, is compared with two other clustering techniques.

Journal ArticleDOI
TL;DR: A modified version of fuzzy c -means, called fuzzy c-means with additional data (FCM-AD), is presented in order to achieve robustness against a few outliers.

Journal ArticleDOI
TL;DR: The architectural configuration of the various systolic sub-arrays, the data flow scheme and the internal structure of each of the processing elements, and the algorithmic complexity of the design are described.

Proceedings ArticleDOI
08 Aug 1988
TL;DR: A method which can be used to evaluate rapidly the transient stability of an electric power system by using a fuzzy pattern recognition approach, incorporating fuzzy membership values for the system operating states is presented.
Abstract: The objective of the paper is to present a method which can be used to evaluate rapidly the transient stability of an electric power system. The stability evaluation is performed by using a fuzzy pattern recognition approach, incorporating fuzzy membership values for the system operating states. These values classify the operating states into the defined fuzzy sets representing the class of either the stable or the unstable states. The memberhsip evaluation is performed by considering the lack of a known mathematical expression for the membership values and by using the fuzzy ISODATA clustering algorithm. The developed computational method is applied to evaluate the transient stability of a sample electric power system.

Book ChapterDOI
01 Jan 1988
TL;DR: Experimental results of various fuzzy clustering algorithms applied to imagery data are shown to illustrate applicability of the proposed way of cluster evaluation.
Abstract: The results of different fuzzy clustering algorithms are dealt with collectively in a formal framework of probabilistic set theory in order to interpret the structure of data. Special attention is paid to calculation of entropy of the fuzzy clusters detected by various grouping methods. Experimental results of various fuzzy clustering algorithms applied to imagery data are shown to illustrate applicability of the proposed way of cluster evaluation.


Journal ArticleDOI
TL;DR: This text develops a clustering algorithm based on A* search with certain pruning feature that determines the globally optimum classification and is computationally very efficient.
Abstract: The problem of clustering n-objects into m-classes may be viewed as a combinatorial optimization problem. The optimum classification of n-objects into m-classes is considered under the assumption that there exists a criterion by which each classification can be evaluated and ultimately the optimum classification can be obtained. Most clustering algorithms described in the literatures are iterative hill-climbing techniques which generally yield local optimum classification. In this text, we develop a clustering algorithm based on A* search with certain pruning feature. This algorithm determines the globally optimum classification and is computationally very efficient.


Journal ArticleDOI
TL;DR: By introducing an idea of prototype theory from the psychological domain with respect to human category formation, an alternative methodology of conceptual clustering is presented and using the schematically-modeled example, the algorithm is illustrated as well as the clustering results.

Book ChapterDOI
Yasuo Ohashi1
01 Jan 1988
TL;DR: Clustering is characterized as a method of increasing the goodness of fit of each model or decreasing the number of parameters of a well-fitted model through two operations—localization and merging, respectively.
Abstract: Publisher Summary The results of clustering cases depend on the variables used for analysis and, vice verse The results of clustering variables depend on the cases used, whether those results are obtained by the application of automated classification techniques such as the k-means method or by the visual inspection of the output of ordination techniques, such as principal component analysis It is a usual practice for consultants of data analysis to recommend their clients to reanalyze the data by deleting or sub-setting cases and/or variables to verify the stability and generality of the results of the analysis The remedy for the problems in selecting cases and/or variables to be used in classification is to classify all cases and all variables simultaneously A number of block clustering methods are proposed for the simultaneous clustering Clustering is characterized as a method of increasing the goodness of fit of each model or decreasing the number of parameters of a well-fitted model through two operations—localization and merging, respectively