scispace - formally typeset
Journal ArticleDOI

A New Cluster Validity Index for Fuzzy Clustering

01 Dec 2013-IFAC Proceedings Volumes (Elsevier)-Vol. 46, Iss: 32, pp 325-330
TL;DR: A new “Graded Distance index” (GD_index) is proposed for computing optimal number of fuzzy clusters for a given data set and the efficiency of this index is compared with well-known existing indices and tested on several data sets.
Abstract: Performance of any clustering algorithm depends critically on the number of clusters that are initialized. A practitioner might not know, a priori , the number of partitions into which his data should be divided; to address this issue many cluster validity indices have been proposed for finding the optimal number of partitions. In this paper, we propose a new “Graded Distance index” (GD_index) for computing optimal number of fuzzy clusters for a given data set. The efficiency of this index is compared with well-known existing indices and tested on several data sets. It is observed that the “GD_index” is able to correctly compute the optimal number of partitions in most of the data sets that are tested.
Topics: Fuzzy clustering (61%), Cluster analysis (58%)
Citations
More filters

Journal ArticleDOI
TL;DR: This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.
Abstract: Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.

82 citations


Journal ArticleDOI
13 Aug 2019-Genes
TL;DR: A multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data that obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters.
Abstract: Rapid advance in single-cell RNA sequencing (scRNA-seq) allows measurement of the expression of genes at single-cell resolution in complex disease or tissue. While many methods have been developed to detect cell clusters from the scRNA-seq data, this task currently remains a main challenge. We proposed a multi-objective optimization-based fuzzy clustering approach for detecting cell clusters from scRNA-seq data. First, we conducted initial filtering and SCnorm normalization. We considered various case studies by selecting different cluster numbers ( c l = 2 to a user-defined number), and applied fuzzy c-means clustering algorithm individually. From each case, we evaluated the scores of four cluster validity index measures, Partition Entropy ( P E ), Partition Coefficient ( P C ), Modified Partition Coefficient ( M P C ), and Fuzzy Silhouette Index ( F S I ). Next, we set the first measure as minimization objective (↓) and the remaining three as maximization objectives (↑), and then applied a multi-objective decision-making technique, TOPSIS, to identify the best optimal solution. The best optimal solution (case study) that had the highest TOPSIS score was selected as the final optimal clustering. Finally, we obtained differentially expressed genes (DEGs) using Limma through the comparison of expression of the samples between each resultant cluster and the remaining clusters. We applied our approach to a scRNA-seq dataset for the rare intestinal cell type in mice [GEO ID: GSE62270, 23,630 features (genes) and 288 cells]. The optimal cluster result (TOPSIS optimal score= 0.858) comprised two clusters, one with 115 cells and the other 91 cells. The evaluated scores of the four cluster validity indices, F S I , P E , P C , and M P C for the optimized fuzzy clustering were 0.482, 0.578, 0.607, and 0.215, respectively. The Limma analysis identified 1240 DEGs (cluster 1 vs. cluster 2). The top ten gene markers were Rps21, Slc5a1, Crip1, Rpl15, Rpl3, Rpl27a, Khk, Rps3a1, Aldob and Rps17. In this list, Khk (encoding ketohexokinase) is a novel marker for the rare intestinal cell type. In summary, this method is useful to detect cell clusters from scRNA-seq data.

15 citations


Journal ArticleDOI
TL;DR: A Neuro-Fuzzy C-Means Clustering algorithm (NFCM) is presented to resolve the issues mentioned above by adopting a novel Artificial Neural Network (ANN) based clustering approach.
Abstract: One of the standard approaches for data analysis in unsupervised machine learning techniques is cluster analysis or clustering, where the data possessing similar features are grouped into a certain number of clusters. Among several significant ways of performing clustering, Fuzzy C-means (FCM) is a methodology, where every data point is hypothesized to be associated with all the clusters through a fuzzy membership function value. FCM is performed by minimizing an objective functional by optimally estimating the decision variables namely, the membership function values and cluster representatives, under a constrained environment. With this approach, a marginal increase in the number of data points leads to an enormous increase in the size of decision variables. This explosion, in turn, prevents the application of evolutionary optimization solvers in FCM, which thereby leads to inefficient data clustering. In this paper, a Neuro-Fuzzy C-Means Clustering algorithm (NFCM) is presented to resolve the issues mentioned above by adopting a novel Artificial Neural Network (ANN) based clustering approach. In NFCM, a functional map is constructed between the data points and membership function values, which enables a significant reduction in the number of decision variables. Additionally, NFCM implements an intelligent framework to optimally design the ANN structure, as a result of which, the optimal number of clusters is identified. Results of 9 different data sets with dimensions ranging from 2 to 30 are presented along with a comprehensive comparison with the current state-of-the-art clustering methods to demonstrate the efficacy of the proposed algorithm.

13 citations


Journal ArticleDOI
08 Feb 2021-Energies
Abstract: In the conditions of climate change and the scarcity of natural resources, the future of energy is increasingly associated with the development of the so-called green energy. Its development is reflected in the European Commission strategic vision to transition to a climate-neutral economy. This is a challenge that the Central and Eastern European (CEE) countries, members of the EU, are also trying to meet. In recent years, these countries have seen an increase in the share of renewable energy and a reduction in greenhouse gas emissions (GGE). On the other hand, basing the energy sector on unstable energy sources (photovoltaics and wind technologies) may imply new challenges on the way to sustainable development. These are old problems in a new version (ecology, diversification of supplies) and new ones related to the features of renewable energy sources (RES; instability, dispersion). The aim of the article was to classify, on the basis of taxonomic methods, the CEE countries from the point of view of green energy transformation (original indicator) and to predict new threats to Romania, Poland, and Bulgaria, the countries representing different groups according to the applied classification. The issues presented are part of a holistic view of RES and can be useful in energy policy.

9 citations


Cites background from "A New Cluster Validity Index for Fu..."

  • ...They are applicable (also in social sciences) in assessing the similarities and differences between the studied objects (countries) [12,13]....

    [...]


Journal ArticleDOI
01 Jan 2019-
TL;DR: The study showed that the regional cluster classification results strongly depend on the input development indicators and the clustering technique used for this purpose and opened up new opportunities in developing recommendations on how to differentiate economic policies in order to achieve maximum growth for the regions and the entire country.
Abstract: Disparities in the development of regions in any country affect the entire national economy. Detecting the disparities can help formulate the proper economic policies for each region by taking action against the factors that slow down the economic growth. This study was conducted with the aim of applying clustering methods to analyse regional disparities based on the economic development indicators of the regions of Ukraine. There were considered fuzzy clustering methods, which generalize partition clustering methods by allowing objects to be partially classified into more than one cluster. Fuzzy clustering technique was applied using R packages to the data sets with the statistic indicators concerned to the economic activities in all administrative regions of Ukraine in 2017. Sets of development indicators for different sectors of economic activity, such as industry, agriculture, construction and services, were reviewed and analysed. The study showed that the regional cluster classification results strongly depend on the input development indicators and the clustering technique used for this purpose. Consideration of different partitions into fuzzy clusters opens up new opportunities in developing recommendations on how to differentiate economic policies in order to achieve maximum growth for the regions and the entire country.

3 citations


References
More filters

Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

13,346 citations


Journal ArticleDOI
01 Jan 1973-
TL;DR: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space; in both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squarederror criterion function.
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,261 citations


01 Jan 1973-
Abstract: Two fuzzy versions of the k-means optimal, least squared error partitioning problem are formulated for finite subsets X of a general inner product space. In both cases, the extremizing solutions are shown to be fixed points of a certain operator T on the class of fuzzy, k-partitions of X, and simple iteration of T provides an algorithm which has the descent property relative to the least squared error criterion function. In the first case, the range of T consists largely of ordinary (i.e. non-fuzzy) partitions of X and the associated iteration scheme is essentially the well known ISODATA process of Ball and Hall. However, in the second case, the range of T consists mainly of fuzzy partitions and the associated algorithm is new; when X consists of k compact well separated (CWS) clusters, Xi , this algorithm generates a limiting partition with membership functions which closely approximate the characteristic functions of the clusters Xi . However, when X is not the union of k CWS clusters, the limi...

5,254 citations


Journal ArticleDOI
TL;DR: The authors present a fuzzy validity criterion based on a validity function which identifies compact and separate fuzzy c-partitions without assumptions as to the number of substructures inherent in the data.
Abstract: The authors present a fuzzy validity criterion based on a validity function which identifies compact and separate fuzzy c-partitions without assumptions as to the number of substructures inherent in the data. This function depends on the data set, geometric distance measure, distance between cluster centroids and more importantly on the fuzzy partition generated by any fuzzy algorithm used. The function is mathematically justified via its relationship to a well-defined hard clustering validity function, the separation index for which the condition of uniqueness has already been established. The performance of this validity function compares favorably to that of several others. The application of this validity function to color image segmentation in a computer color vision system for recognition of IC wafer defects which are otherwise impossible to detect using gray-scale image processing is discussed. >

3,018 citations


Journal ArticleDOI
01 Jan 1973-
TL;DR: This paper uses membership function matrices associated with fuzzy c-partitions of X, together with their values in the Euclidean (matrix) norm, to formulate an a posteriori method for evaluating algorithmically suggested clusterings of X.
Abstract: Given a finite, unlabelled set of real vectors X, one often presumes the existence of (c) subsets (clusters) in X, the members of which somehow bear more similarity to each other than to members of adjoining clusters. In this paper, we use membership function matrices associated with fuzzy c-partitions of X, together with their values in the Euclidean (matrix) norm, to formulate an a posteriori method for evaluating algorithmically suggested clusterings of X. Several numerical examples are offered in support of the proposed technique.

1,089 citations


Performance
Metrics
No. of citations received by the Paper in previous years
YearCitations
20214
20203
20194
20181
20161
20131