scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Cluster validity index: Comparative study and a new validity index with high performance

TL;DR: A new validity index named Vcw is proposed for the fuzzy c-means algorithm and the performance of eight fairly recent cluster validity indexes are compared to select the best one between them that could give us the optimal number of clusters in the presence of a high overlap between the clusters.
Abstract: Cluster validity indexes are used to identify the best partitioning in a dataset from the results of a clustering algorithm. The overlap phenomenon is a source of failure for most of these validity indexes.In this work, we propose a new validity index named Vcw for the fuzzy c-means algorithm and we also propose to compare the performance of eight fairly recent cluster validity indexes with our new index on artificial and real data, in order to select the best one between them that could give us the optimal number of clusters in the presence of a high overlap between the clusters.
Citations
More filters
Journal Article
TL;DR: A new validity index for determining the number of clusters is proposed, based on a novel way of combining cohesion and discrepancy, which shows clearly the efficiency of the new index under the condition of overlapping clusters.
Abstract: In this paper, we propose a new validity index for determining the number of clusters. It is based on a novel way of combining cohesion and discrepancy. Extensive tests of the index in a conventional model selection process (FCM algorithm) have been performed using generated data sets and public domain data sets,and comparison with several existing and important indices has been made. The results obtained show clearly the efficiency of the new index under the condition of overlapping clusters.

14 citations

Journal ArticleDOI
TL;DR: In this paper , uncertainty fingerprints based on Type-2 fuzzy Gaussian Mixture Models (T2FGMM) and the Fréchet distance between clusters are used to assess the certainty of a well-defined partition.
References
More filters
Journal ArticleDOI
01 Jan 1973
TL;DR: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space; in both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squarederror criterion function.
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,787 citations


"Cluster validity index: Comparative..." refers methods in this paper

  • ...Fuzzy c-means (FCM) is a fuzzy clustering method proposed by Dunn [9] in 1973 and generalized by Bezdek in 1981 [10], which is an extension that allows elements to belong to multiple clusters simultaneously....

    [...]

  • ...[9] Dunn, J. C....

    [...]

  • ...[14] J.C.Dunn....

    [...]

01 Jan 1973
TL;DR: In this paper, two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space, and the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the LSE criterion function.
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,254 citations

Journal ArticleDOI
TL;DR: The authors present a fuzzy validity criterion based on a validity function which identifies compact and separate fuzzy c-partitions without assumptions as to the number of substructures inherent in the data.
Abstract: The authors present a fuzzy validity criterion based on a validity function which identifies compact and separate fuzzy c-partitions without assumptions as to the number of substructures inherent in the data. This function depends on the data set, geometric distance measure, distance between cluster centroids and more importantly on the fuzzy partition generated by any fuzzy algorithm used. The function is mathematically justified via its relationship to a well-defined hard clustering validity function, the separation index for which the condition of uniqueness has already been established. The performance of this validity function compares favorably to that of several others. The application of this validity function to color image segmentation in a computer color vision system for recognition of IC wafer defects which are otherwise impossible to detect using gray-scale image processing is discussed. >

3,237 citations


"Cluster validity index: Comparative..." refers background or methods in this paper

  • ...This index [5] is defined as the average between WB [21] and XB [15] indexes and is formulated as follows: WXI(c) = (WB+XB)/2 The optimal number of clusters is obtained at the minimum value of WXI....

    [...]

  • ...i=1 Where cvii is one of the following ten cluster validity indexes (CVIs): PC [11], NPC [12], PE [13], NPE [14], XB [15], VK [16], PBMF [17], FS [18], VT [19] and SC [20]....

    [...]

Journal ArticleDOI
01 Jan 1973
TL;DR: This paper uses membership function matrices associated with fuzzy c-partitions of X, together with their values in the Euclidean (matrix) norm, to formulate an a posteriori method for evaluating algorithmically suggested clusterings of X.
Abstract: Given a finite, unlabelled set of real vectors X, one often presumes the existence of (c) subsets (clusters) in X, the members of which somehow bear more similarity to each other than to members of adjoining clusters. In this paper, we use membership function matrices associated with fuzzy c-partitions of X, together with their values in the Euclidean (matrix) norm, to formulate an a posteriori method for evaluating algorithmically suggested clusterings of X. Several numerical examples are offered in support of the proposed technique.

1,170 citations


"Cluster validity index: Comparative..." refers methods in this paper

  • ...The FCM algorithm [10] is composed of the following steps: 1....

    [...]

  • ...Fuzzy c-means (FCM) is a fuzzy clustering method proposed by Dunn [9] in 1973 and generalized by Bezdek in 1981 [10], which is an extension that allows elements to belong to multiple clusters simultaneously....

    [...]

Journal ArticleDOI
TL;DR: A cluster validity index and its fuzzification is described, which can provide a measure of goodness of clustering on different partitions of a data set, and results demonstrating the superiority of the PBM-index in appropriately determining the number of clusters are provided.

710 citations


"Cluster validity index: Comparative..." refers background in this paper

  • ...𝑐𝑣𝑖𝑖 𝑛 𝑖=1 Where cvii is one of the following ten cluster validity indexes (CVIs): PC [11], NPC [12], PE [13], NPE [14], XB [15], VK [16], PBMF [17], FS [18], VT [19] and SC [20]....

    [...]

  • ...The PC, NPC and PBMF indexes are converted into their corresponding reciprocal types, and all the CVIs are normalized in the range [0, 1]....

    [...]

  • ...The three indexes: PC, NPC and PBMF are maximum type, while the other seven indexes are minimum type....

    [...]

  • ...i=1 Where cvii is one of the following ten cluster validity indexes (CVIs): PC [11], NPC [12], PE [13], NPE [14], XB [15], VK [16], PBMF [17], FS [18], VT [19] and SC [20]....

    [...]