scispace - formally typeset
Search or ask a question
Book ChapterDOI

Rough-fuzzy c-means for clustering microarray gene expression data

12 Jan 2012-pp 203-210
TL;DR: An application of rough-fuzzy c-means (RFCM) algorithm is presented in this paper to discover co-expressed gene clusters and the pearson correlation based initialization method is used to address this limitation.
Abstract: Clustering technique is one of the useful tools to elucidate similar patterns across large number of transcripts and to identify likely co-regulated genes. It attempts to partition the genes into groups exhibiting similar patterns of variation in expression level. An application of rough-fuzzy c-means (RFCM) algorithm is presented in this paper to discover co-expressed gene clusters. Selection of initial prototypes of different clusters is one of the major issues of the RFCM based microarray data clustering. The pearson correlation based initialization method is used to address this limitation. It enables the RFCM algorithm to discover co-expressed gene clusters. The effectiveness of the RFCM algorithm and the initialization method, along with a comparison with other related methods, is demonstrated on five yeast gene expression data sets using standard cluster validity indices and gene ontology based analysis.
Citations
More filters
Journal ArticleDOI
TL;DR: This article compares k-mean to fuzzy c-means and rough k-Means as important representatives of soft clustering, and surveys important extensions and derivatives of these algorithms.

157 citations


Cites methods from "Rough-fuzzy c-means for clustering ..."

  • ...Applications of Maji and Pal’s RFCM are, for example, in the fields of microarray gene expression data [61,62] and image segmentation [60]....

    [...]

Journal ArticleDOI
TL;DR: A novel extension of this clustering algorithm, called Rough-Fuzzy Support Vector Clustering (RFSVC), that obtains rough-fuzzy clusters using the support vectors as cluster representatives, showing its potential for detecting outliers and computing membership degrees for clusters with any silhouette.

34 citations


Cites result from "Rough-fuzzy c-means for clustering ..."

  • ...On the other hand, Rough–Fuzzy C-Means (RFCM) and Rough–Possibilistic C-Means (RPCM) algorithms developed by Maji and Pal [22–24], and Maji and Paul [25] were run in order to compare their results with those of our proposal....

    [...]

Journal ArticleDOI
TL;DR: It is shown how all these clustering approaches are able of managing in different ways the uncertainty associated with the two components of the Informational Paradigm, i.e. the Empirical and Theoretical Information.

28 citations

Posted Content
01 Nov 2017-viXra
TL;DR: The first paper on clustering based on fuzzy sets theory was published in 1965 as mentioned in this paper, where L.A. Zadeh had published "Fuzzy Sets" and it has been 50 years since then.
Abstract: Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets”.

17 citations

Book ChapterDOI
24 Oct 2014
TL;DR: This paper studies the properties of this general user-weighted π k-means through extensive experiments and makes it possible to optionally integrate user-defined weights for parameter tuning using techniques such as evolutionary computing.
Abstract: Since its introduction by Lingras and West a decade ago, rough k-means has gained increasing attention in academia as well as in practice. A recently introduced extension, π rough k-means, eliminates need for the weight parameter in rough k-means applying probabilities derived from Laplace’s Principle of Indifference. However, the proposal in its more general form makes it possible to optionally integrate user-defined weights for parameter tuning using techniques such as evolutionary computing. In this paper, we study the properties of this general user-weighted π k-means through extensive experiments.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology information and evaluating and visualizing the collective annotation of a list of genes to GO terms, which can be used to draw conclusions from microarray and other biological data.
Abstract: Summary: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script. Availability: The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/

1,869 citations

Book
30 Sep 2008
TL;DR: This edition includes many more worked examples and diagrams to help give greater understanding of the methods and their application, including semi-supervised learning, combining clustering algorithms, and relevance feedback.
Abstract: This book considers classical and current theory and practice, of both supervised and unsupervised pattern recognition, to build a complete background for professionals and students of engineering. The authors, leading experts in the field of pattern recognition, have provided an up-to-date, self-contained volume encapsulating this wide spectrum of information. The very latest methods are incorporated in this edition: semi-supervised learning, combining clustering algorithms, and relevance feedback.This edition includes many more worked examples and diagrams (in two colour) to help give greater understanding of the methods and their application. Computer-based problems will be included with MATLAB code. An accompanying book contains extra worked examples and MATLAB code of all the examples used in this book.Thoroughly developed to include many more worked examples to give greater understanding of this mathematically oriented subjectMany more diagrams included--now in two color--to provide greater insight through visual presentationAn accompanying manual includes Matlab code of the methods and algorithms in the book, together with solved problems and real-life data sets in medical imaging, remote sensing and audio recognition. The Manual is available separately or at a special packaged price (ISBN: 9780123744869).Latest hot topics included to further the reference value of the text including semi-supervised learning, combining clustering algorithms, and relevance feedback.

627 citations

Journal ArticleDOI
TL;DR: By setting threshold levels for the membership values of the FCM method, genes which are tigthly associated to a given cluster can be selected and this selection increases the overall biological significance of the genes within the cluster.
Abstract: Motivation: Clustering analysis of data from DNA microarra yh ybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. Results: Am ajor problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m .W eshow that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m .B ysetting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster. Availability: Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/

534 citations

Proceedings Article
19 Aug 2000
TL;DR: This work has developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications, and which outperformed extant algorithms according to several common figures of merit.
Abstract: Novel DNA mlcroarray technologies enable the monitoring of expression levels of thousands of genes simultaneously. This allows a global view on the transcription levels of many (or all) genes when the cell undergoes specific conditions or processes. Analyzing gene expression data requires the clustering of genes into groups with similar expression patterns. We have developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications. No prior assumptions are made on the structure or the number of the clusters. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups of highly similar dements (kernels), which are likely to belong the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clustering. CLICK has been implemented and tested on a variety of biological datasets, ranging from gene expression, eDNA ollgo-fmgerprinting to protein sequence similarity. In all those applications it outperformed extant algorithms according to several common figures of merit. CLICK is also very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation.

388 citations

Journal ArticleDOI
01 Dec 2007
TL;DR: The RFPCM comprises a judicious integration of the principles of rough and fuzzy sets that incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy C-means and the coincident clusters of PCM.
Abstract: A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic C-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy C-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of C-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed C-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.

220 citations