scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Robust Rough-Fuzzy C-Means Algorithm: Design and Applications in Coding and Non-coding RNA Expression Data Clustering

01 Jan 2013-Fundamenta Informaticae (IOS Press)-Vol. 124, Iss: 1, pp 153-174
TL;DR: The proposed algorithm is robust in the sense that it can find overlapping and vaguely defined clusters with arbitrary shapes in noisy environment and is demonstrated on synthetic as well as coding and non-coding RNA expression data sets using some cluster validity indices.
Abstract: Cluster analysis is a technique that divides a given data set into a set of clusters in such a way that two objects from the same cluster are as similar as possible and the objects from different clusters are as dissimilar as possible. In this background, different rough-fuzzy clustering algorithms have been shown to be successful for finding overlapping and vaguely defined clusters. However, the crisp lower approximation of a cluster in existing rough-fuzzy clustering algorithms is usually assumed to be spherical in shape, which restricts to find arbitrary shapes of clusters. In this regard, this paper presents a new rough-fuzzy clustering algorithm, termed as robust rough-fuzzy c-means. Each cluster in the proposed clustering algorithm is represented by a set of three parameters, namely, cluster prototype, a possibilistic fuzzy lower approximation, and a probabilistic fuzzy boundary. The possibilistic lower approximation helps in discovering clusters of various shapes. The cluster prototype depends on the weighting average of the possibilistic lower approximation and probabilistic boundary. The proposed algorithm is robust in the sense that it can find overlapping and vaguely defined clusters with arbitrary shapes in noisy environment. An efficient method is presented, based on Pearson's correlation coefficient, to select initial prototypes of different clusters. A method is also introduced based on cluster validity index to identify optimum values of different parameters of the initialization method and the proposed clustering algorithm. The effectiveness of the proposed algorithm, along with a comparison with other clustering algorithms, is demonstrated on synthetic as well as coding and non-coding RNA expression data sets using some cluster validity indices.
Citations
More filters
Journal ArticleDOI
TL;DR: A modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering and significantly improves the clustering effect.
Abstract: To organize the wide variety of data sets automatically and acquire accurate classification, this paper presents a modified fuzzy c-means algorithm (SP-FCM) based on particle swarm optimization (PSO) and shadowed sets to perform feature clustering SP-FCM introduces the global search property of PSO to deal with the problem of premature convergence of conventional fuzzy clustering, utilizes vagueness balance property of shadowed sets to handle overlapping among clusters, and models uncertainty in class boundaries This new method uses Xie-Beni index as cluster validity and automatically finds the optimal cluster number within a specific range with cluster partitions that provide compact and well-separated clusters Experiments show that the proposed approach significantly improves the clustering effect

29 citations


Cites methods from "Robust Rough-Fuzzy C-Means Algorith..."

  • ...Moreover, shadowed set- and rough set-based clustering methods, namely, SPFCM, SRCM, RCM, and SCM, perform better than FCM....

    [...]

  • ...Fuzzy sets and rough sets have been incorporated in the c-means framework to develop the fuzzy c-means (FCM) [7] and rough c-means (RCM) [8] algorithms....

    [...]

  • ...In this section, the performance of FCM, RCM, shadowed 𝑐- means (SCM) [21], shadowed rough 𝑐-means (SRCM) [19], and SP-FCM algorithms is presented on four UCI datasets, four yeast gene expression datasets, and real data....

    [...]

  • ...Fuzzy sets and rough sets have been incorporated in the 𝑐-means framework to develop the fuzzy 𝑐-means (FCM) [7] and rough 𝑐-means (RCM) [8] algorithms....

    [...]

  • ...The SP-FCM and SRCM obtain the same effect and perform better than other clustering algorithms....

    [...]

Journal ArticleDOI
TL;DR: The formulation enables the proposed method to extract required number of correlated features sequentially with lesser computational cost as compared to existing methods, and provides an efficient way to find optimum regularization parameters employed in CCA.
Abstract: One of the main problems associated with high dimensional multimodal real life data sets is how to extract relevant and significant features. In this regard, a fast and robust feature extraction algorithm, termed as FaRoC, is proposed, integrating judiciously the merits of canonical correlation analysis (CCA) and rough sets. The proposed method extracts new features sequentially from two multidimensional data sets by maximizing their relevance with respect to class label and significance with respect to already-extracted features. To generate canonical variables sequentially, an analytical formulation is introduced to establish the relation between regularization parameters and CCA. The formulation enables the proposed method to extract required number of correlated features sequentially with lesser computational cost as compared to existing methods. To compute both significance and relevance measures of a feature, the concept of hypercuboid equivalence partition matrix of rough hypercuboid approach is used. It also provides an efficient way to find optimum regularization parameters employed in CCA. The efficacy of the proposed FaRoC algorithm, along with a comparison with other existing methods, is extensively established on several real life data sets.

23 citations

Journal ArticleDOI
01 Sep 2016
TL;DR: A new clustering algorithm, termed as rough-probabilistic clustering, is presented, integrating judiciously the merits of rough sets and a new probability distribution, called stomped normal (SN) distribution, for accurate and robust segmentation of images.
Abstract: Graphical abstractDisplay Omitted The segmentation of images into different meaningful classes is an important task for automatic image analysis technique. The finite Gaussian mixture model is one of the popular models for parametric model based image segmentation. However, the normality assumption of this model induces certain limitations as a single representative value is considered to represent each class. In this regard, the paper presents a new clustering algorithm, termed as rough-probabilistic clustering, integrating judiciously the merits of rough sets and a new probability distribution, called stomped normal (SN) distribution. The intensity distribution of a class is represented by SN distribution, where each class consists of a crisp lower approximation and a probabilistic boundary region. The intensity distribution of any image is modeled as a mixture of finite number of SN distributions. The expectation-maximization algorithm is used to estimate the parameters of each class. Incorporating hidden Markov random field framework into rough-probabilistic clustering, a new method is proposed for accurate and robust segmentation of images. The performance of the proposed segmentation approach, along with a comparison with related methods, is demonstrated on a set of HEp-2 cell images, and synthetic and real brain MR images for different bias fields and noise levels.

22 citations

Journal ArticleDOI
TL;DR: The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem and is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.
Abstract: The miRNAs, a class of short approximately 22‐nucleotide non‐coding RNAs, often act post‐transcriptionally to inhibit mRNA expression. In effect, they control gene expression by targeting mRNA. They also help in carrying out normal functioning of a cell as they play an important role in various cellular processes. However, dysregulation of miRNAs is found to be a major cause of a disease. It has been demonstrated that miRNA expression is altered in many human cancers, suggesting that they may play an important role as disease biomarkers. Multiple reports have also noted the utility of miRNAs for the diagnosis of cancer. Among the large number of miRNAs present in a microarray data, a modest number might be sufficient to classify human cancers. Hence, the identification of differentially expressed miRNAs is an important problem particularly for the data sets with large number of miRNAs and small number of samples. In this regard, a new miRNA selection algorithm, called μHEM, is presented based on rough hypercuboid approach. It selects a set of miRNAs from a microarray data by maximizing both relevance and significance of the selected miRNAs. The degree of dependency of sample categories on miRNAs is defined, based on the concept of hypercuboid equivalence partition matrix, to measure both relevance and significance of miRNAs. The effectiveness of the new approach is demonstrated on six publicly available miRNA expression data sets using support vector machine. The.632+ bootstrap error estimate is used to minimize the variability and biasedness of the derived results. An important finding is that the μHEM algorithm achieves lowest B.632+ error rate of support vector machine with a reduced set of differentially expressed miRNAs on four expression data sets compare to some existing machine learning and statistical methods, while for other two data sets, the error rate of the μHEM algorithm is comparable with the existing techniques. The results on several microarray data sets demonstrate that the proposed method can bring a remarkable improvement on miRNA selection problem. The method is a potentially useful tool for exploration of miRNA expression data and identification of differentially expressed miRNAs worth further investigation.

14 citations


Cites methods from "Robust Rough-Fuzzy C-Means Algorith..."

  • ...The theory of rough sets has also been successfully applied to microarray data analysis in [9,24-35]....

    [...]

Journal ArticleDOI
TL;DR: A novel method for simultaneous segmentation and bias field correction in brain MR images is presented, which integrates the concept of rough sets and the merit of a recently introduced probability distribution, called stomped-t (St-t) distribution.

10 citations

References
More filters
01 Jan 1967
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Abstract: The main purpose of this paper is to describe a process for partitioning an N-dimensional population into k sets on the basis of a sample. The process, which is called 'k-means,' appears to give partitions which are reasonably efficient in the sense of within-class variance. That is, if p is the probability mass function for the population, S = {S1, S2, * *, Sk} is a partition of EN, and ui, i = 1, 2, * , k, is the conditional mean of p over the set Si, then W2(S) = ff=ISi f z u42 dp(z) tends to be low for the partitions S generated by the method. We say 'tends to be low,' primarily because of intuitive considerations, corroborated to some extent by mathematical analysis and practical computational experience. Also, the k-means procedure is easily programmed and is computationally economical, so that it is feasible to process very large samples on a digital computer. Possible applications include methods for similarity grouping, nonlinear prediction, approximating multivariate distributions, and nonparametric tests for independence among several variables. In addition to suggesting practical classification methods, the study of k-means has proved to be theoretically interesting. The k-means concept represents a generalization of the ordinary sample mean, and one is naturally led to study the pertinent asymptotic behavior, the object being to establish some sort of law of large numbers for the k-means. This problem is sufficiently interesting, in fact, for us to devote a good portion of this paper to it. The k-means are defined in section 2.1, and the main results which have been obtained on the asymptotic behavior are given there. The rest of section 2 is devoted to the proofs of these results. Section 3 describes several specific possible applications, and reports some preliminary results from computer experiments conducted to explore the possibilities inherent in the k-means idea. The extension to general metric spaces is indicated briefly in section 4. The original point of departure for the work described here was a series of problems in optimal classification (MacQueen [9]) which represented special

24,320 citations


"Robust Rough-Fuzzy C-Means Algorith..." refers methods in this paper

  • ...However, rRFCM requires higher time compared to FCM/HCM....

    [...]

  • ...This section presents the comparative performance of different c-means algorithms such as HCM, FCM, RFCM, and rRFCM on five mRNA expression data....

    [...]

  • ...This section presents the performance of the proposed rRFCM algorithm, along with a comparison with HCM, FCM, and RFCM algorithms, on two types of real life data sets, namely, coding or messenger ribonucleic acid (mRNA) and non-coding RNA such as micro RNA (miRNA) expression data....

    [...]

  • ...One of the most widely used prototype based partitional clustering algorithms is hard c-means (HCM) [24]....

    [...]

  • ...The algorithms compared are hard c-means (HCM) [13, 24], fuzzy c-means (FCM) [5], possibilistic c-means (PCM) [16], rough c-means (RCM) [18], fuzzy-possibilistic c-means (FPCM) [26], rough-possibilistic c-means (RPCM) [22], rough-fuzzy c-means (RFCM) [21], and rough-fuzzy-possibilistic c-means (RFPCM) [22]....

    [...]

Book
31 Jul 1981
TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.
Abstract: New updated! The latest book from a very famous author finally comes out. Book of pattern recognition with fuzzy objective function algorithms, as an amazing reference becomes what you need to get. What's for is this book? Are you still thinking for what the book is? Well, this is what you probably will get. You should have made proper choices for your better life. Book, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with.

15,662 citations

Journal ArticleDOI
TL;DR: A new graphical display is proposed for partitioning techniques, where each cluster is represented by a so-called silhouette, which is based on the comparison of its tightness and separation, and provides an evaluation of clustering validity.

14,144 citations


"Robust Rough-Fuzzy C-Means Algorith..." refers methods in this paper

  • ...Silhouette Index To assess the quality of clusters, the Silhouette measure proposed by Rousseeuw [31] can be used....

    [...]

  • ...To assess the quality of clusters, the Silhouette measure proposed by Rousseeuw [31] can be used....

    [...]

  • ...Quantitative Indices To evaluate the performance of different clustering algorithms, several cluster validity indices such as Silhouette index [31], Davies-Bouldin (DB) index [6], Dunn index [6], and β index [27] are used to identify compact set of clusters, which are described next....

    [...]

Journal ArticleDOI
TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Abstract: Clustering is the unsupervised classification of patterns (observations, data items, or feature vectors) into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. However, clustering is a difficult problem combinatorially, and differences in assumptions and contexts in different communities has made the transfer of useful generic concepts and methodologies slow to occur. This paper presents an overview of pattern clustering methods from a statistical pattern recognition perspective, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. We present a taxonomy of clustering techniques, and identify cross-cutting themes and recent advances. We also describe some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

14,054 citations


"Robust Rough-Fuzzy C-Means Algorith..." refers background in this paper

  • ...A number of clustering algorithms have been proposed to suit different requirements [13, 14]....

    [...]

Journal ArticleDOI
09 Jun 2005-Nature
TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.
Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

9,470 citations


"Robust Rough-Fuzzy C-Means Algorith..." refers methods in this paper

  • ...In this background, several authors used hierarchical clustering in order to group miRNAs having similar function [10, 20, 35]....

    [...]