scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Cluster Validity with Fuzzy Sets

01 Jan 1973-Vol. 3, Iss: 3, pp 58-73
TL;DR: This paper uses membership function matrices associated with fuzzy c-partitions of X, together with their values in the Euclidean (matrix) norm, to formulate an a posteriori method for evaluating algorithmically suggested clusterings of X.
Abstract: Given a finite, unlabelled set of real vectors X, one often presumes the existence of (c) subsets (clusters) in X, the members of which somehow bear more similarity to each other than to members of adjoining clusters. In this paper, we use membership function matrices associated with fuzzy c-partitions of X, together with their values in the Euclidean (matrix) norm, to formulate an a posteriori method for evaluating algorithmically suggested clusterings of X. Several numerical examples are offered in support of the proposed technique.
Citations
More filters
Journal ArticleDOI
TL;DR: An efficient method for estimating cluster centers of numerical data that can be used to determine the number of clusters and their initial values for initializing iterative optimization-based clustering algorithms such as fuzzy C-means is presented.
Abstract: We present an efficient method for estimating cluster centers of numerical data. This method can be used to determine the number of clusters and their initial values for initializing iterative optimization-based clustering algorithms such as fuzzy C-means. Here we use the cluster estimation method as the basis of a fast and robust algorithm for identifying fuzzy models. A benchmark problem involving the prediction of a chaotic time series shows this model identification method compares favorably with other, more computationally intensive methods. We also illustrate an application of this method in modeling the relationship between automobile trips and demographic factors.

2,815 citations

Proceedings ArticleDOI
01 Jan 1978
TL;DR: Experimental results are presented which indicate that more accurate clustering may be obtained by using fuzzy covariances, a natural approach to fuzzy clustering.
Abstract: A class of fuzzy ISODATA clustering algorithms has been developed previously which includes fuzzy means. This class of algorithms is generalized to include fuzzy covariances. The resulting algorithm closely resembles maximum likelihood estimation of mixture densities. It is argued that use of fuzzy covariances is a natural approach to fuzzy clustering. Experimental results are presented which indicate that more accurate clustering may be obtained by using fuzzy covariances.

1,988 citations

Journal ArticleDOI
TL;DR: Limitation analysis indicates, and numerical experiments confirm, that the Fukuyama-Sugeno index is sensitive to both high and low values of m and may be unreliable because of this, and calculations suggest that the best choice for m is probably in the interval [1.5, 2.5], whose mean and midpoint, m=2, have often been the preferred choice for many users of FCM.
Abstract: Many functionals have been proposed for validation of partitions of object data produced by the fuzzy c-means (FCM) clustering algorithm We examine the role a subtle but important parameter-the weighting exponent m of the FCM model-plays in determining the validity of FCM partitions The functionals considered are the partition coefficient and entropy indexes of Bezdek, the Xie-Beni (1991), and extended Xie-Beni indexes, and the Fukuyama-Sugeno index (1989) Limit analysis indicates, and numerical experiments confirm, that the Fukuyama-Sugeno index is sensitive to both high and low values of m and may be unreliable because of this Of the indexes tested, the Xie-Beni index provided the best response over a wide range of choices for the number of clusters, (2-10), and for m from 101-7 Finally, our calculations suggest that the best choice for m is probably in the interval [15, 25], whose mean and midpoint, m=2, have often been the preferred choice for many users of FCM >

1,724 citations

Journal ArticleDOI
TL;DR: This paper presents a fuzzy c-means (FCM) algorithm that incorporates spatial information into the membership function for clustering and yields regions more homogeneous than those of other methods.

1,296 citations


Cites methods from "Cluster Validity with Fuzzy Sets"

  • ...Disadvantages of Vpc and Vpe are that they measure only the fuzzy partition and lack a direct connection to the featuring property....

    [...]

  • ...As a result, the best clustering is achieved when the value Vpc is maximal or Vpe is minimal....

    [...]

  • ...The representative functions for the fuzzy partition are partition coefficient Vpc [9] and partition entropy Vpe [10]....

    [...]

  • ...The representative functions for the fuzzy partition are partition coefficient Vpc [9] and partition entropy Vpe [10]. g (a) FCM; (b) sFCM1,1; and (c) sFCM0,2....

    [...]

  • ...They are defined as follows: Vpc Z PN j Pc i u2ij N (6) and Vpe Z K PN j Pc i ½uijlog uij N (7) The idea of these validity functions is that the partition with less fuzziness means better performance....

    [...]

Journal ArticleDOI
TL;DR: This paper provides a structured and comprehensive overview of various facets of network anomaly detection so that a researcher can become quickly familiar with every aspect of network anomalies detection.
Abstract: Network anomaly detection is an important and dynamic research area. Many network intrusion detection methods and systems (NIDS) have been proposed in the literature. In this paper, we provide a structured and comprehensive overview of various facets of network anomaly detection so that a researcher can become quickly familiar with every aspect of network anomaly detection. We present attacks normally encountered by network intrusion detection systems. We categorize existing network anomaly detection methods and systems based on the underlying computational techniques used. Within this framework, we briefly describe and compare a large number of network anomaly detection methods and systems. In addition, we also discuss tools that can be used by network defenders and datasets that researchers in network anomaly detection can use. We also highlight research directions in network anomaly detection.

971 citations


Cites background from "Cluster Validity with Fuzzy Sets"

  • ...Bezdek [76] Classification entropy CE = 1 N ∑k...

    [...]

References
More filters
Journal ArticleDOI
01 Jan 1973
TL;DR: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space; in both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squarederror criterion function.
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,787 citations

01 Jan 1973
TL;DR: In this paper, two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space, and the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the LSE criterion function.
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,254 citations

Journal ArticleDOI
TL;DR: A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method.
Abstract: A family of graph-theoretical algorithms based on the minimal spanning tree are capable of detecting several kinds of cluster structure in arbitrary point sets; description of the detected clusters is possible in some cases by extensions of the method. Development of these clustering algorithms was based on examples from two-dimensional space because we wanted to copy the human perception of gestalts or point groupings. On the other hand, all the methods considered apply to higher dimensional spaces and even to general metric spaces. Advantages of these methods include determinacy, easy interpretation of the resulting clusters, conformity to gestalt principles of perceptual organization, and invariance of results under monotone transformations of interpoint distance. Brief discussion is made of the application of cluster detection to taxonomy and the selection of good feature spaces for pattern recognition. Detailed analyses of several planar cluster detection problems are illustrated by text and figures. The well-known Fisher iris data, in four-dimensional space, have been analyzed by these methods also. PL/1 programs to implement the minimal spanning tree methods have been fully debugged.

1,832 citations

Journal ArticleDOI
H. P. Friedman1, J. Rubin1
TL;DR: This paper attacks the problem of exploring the structure of multivariate data in search of “clusters” by using a computer procedure to obtain the “best” partition of n objects into g groups.
Abstract: This paper deals with methods of “cluster analysis”. In particular we attack the problem of exploring the structure of multivariate data in search of “clusters”. The approach taken is to use a computer procedure to obtain the “best” partition of n objects into g groups. A number of mathematical criteria for “best” are discussed and related to statistical theory. A procedure for optimizing the criteria is outlined. Some of the criteria are compared with respect to their behavior on actual data. Results of data analysis are presented and discussed.

586 citations

Journal ArticleDOI
George Nagy1
01 Jan 1968
TL;DR: This paper reviews statistical, adaptive, and heuristic techniques used in laboratory investigations of pattern recognition problems and includes correlation methods, discriminant analysis, maximum likelihood decisions minimax techniques, perceptron-like algorithms, feature extraction, preprocessing, clustering and nonsupervised learning.
Abstract: This paper reviews statistical, adaptive, and heuristic techniques used in laboratory investigations of pattern recognition problems. The discussion includes correlation methods, discriminant analysis, maximum likelihood decisions minimax techniques, perceptron-like algorithms, feature extraction, preprocessing, clustering and nonsupervised learning. Two-dimensional distributions are used to illustrate the properties of the various procedures. Several experimental projects, representative of prospective applications, are also described.

317 citations