Rough-fuzzy c-means for clustering microarray gene expression data

doi:10.1007/978-3-642-27387-2_26

Home
/
Papers
/
Rough-fuzzy c-means for clustering microarray gene expression data

Book Chapter•DOI•

Rough-fuzzy c-means for clustering microarray gene expression data

Pradipta Maji¹, Sushmita Paul¹•Institutions (1)

Indian Statistical Institute¹

12 Jan 2012-pp 203-210

TL;DR: An application of rough-fuzzy c-means (RFCM) algorithm is presented in this paper to discover co-expressed gene clusters and the pearson correlation based initialization method is used to address this limitation.

read less

Abstract: Clustering technique is one of the useful tools to elucidate similar patterns across large number of transcripts and to identify likely co-regulated genes. It attempts to partition the genes into groups exhibiting similar patterns of variation in expression level. An application of rough-fuzzy c-means (RFCM) algorithm is presented in this paper to discover co-expressed gene clusters. Selection of initial prototypes of different clusters is one of the major issues of the RFCM based microarray data clustering. The pearson correlation based initialization method is used to address this limitation. It enables the RFCM algorithm to discover co-expressed gene clusters. The effectiveness of the RFCM algorithm and the initialization method, along with a comparison with other related methods, is demonstrated on five yeast gene expression data sets using standard cluster validity indices and gene ontology based analysis.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Soft clustering -- Fuzzy and rough approaches and their extensions and derivatives

[...]

Georg Peters¹, Fernando Crespo², Pawan Lingras³, Richard Weber⁴•Institutions (4)

Munich University of Applied Sciences¹, Valparaiso University², University of Saint Mary³, University of Chile⁴

01 Feb 2013-International Journal of Approximate Reasoning

TL;DR: This article compares k-mean to fuzzy c-means and rough k-Means as important representatives of soft clustering, and surveys important extensions and derivatives of these algorithms.

...read moreread less

157 citations

Cites methods from "Rough-fuzzy c-means for clustering ..."

...Applications of Maji and Pal’s RFCM are, for example, in the fields of microarray gene expression data [61,62] and image segmentation [60]....
[...]

Journal Article•DOI•

A Rough-Fuzzy approach for Support Vector Clustering

[...]

Ramiro Saltos¹, Richard Weber¹•Institutions (1)

University of Chile¹

20 Apr 2016-Information Sciences

TL;DR: A novel extension of this clustering algorithm, called Rough-Fuzzy Support Vector Clustering (RFSVC), that obtains rough-fuzzy clusters using the support vectors as cluster representatives, showing its potential for detecting outliers and computing membership degrees for clusters with any silhouette.

...read moreread less

34 citations

Cites result from "Rough-fuzzy c-means for clustering ..."

...On the other hand, Rough–Fuzzy C-Means (RFCM) and Rough–Possibilistic C-Means (RPCM) algorithms developed by Maji and Pal [22–24], and Maji and Paul [25] were run in order to compare their results with those of our proposal....
[...]

Journal Article•DOI•

Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework

[...]

Pierpaolo D'Urso¹•Institutions (1)

Sapienza University of Rome¹

01 Aug 2017-Information Sciences

TL;DR: It is shown how all these clustering approaches are able of managing in different ways the uncertainty associated with the two components of the Informational Paradigm, i.e. the Empirical and Theoretical Information.

...read moreread less

28 citations

Posted Content•

Informational Paradigm, Management of Uncertainty and Theoretical Formalisms in the Clustering Framework: a Review

[...]

Pierpaolo D'Urso

01 Nov 2017-viXra

TL;DR: The first paper on clustering based on fuzzy sets theory was published in 1965 as mentioned in this paper, where L.A. Zadeh had published "Fuzzy Sets" and it has been 50 years since then.

...read moreread less

Abstract: Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets”.

...read moreread less

17 citations

Book Chapter•DOI•

Analysis of User-Weighted π Rough k-Means

[...]

Georg Peters¹, Pawan Lingras²•Institutions (2)

Munich University of Applied Sciences¹, Saint Mary's University²

24 Oct 2014

TL;DR: This paper studies the properties of this general user-weighted π k-means through extensive experiments and makes it possible to optionally integrate user-defined weights for parameter tuning using techniques such as evolutionary computing.

...read moreread less

Abstract: Since its introduction by Lingras and West a decade ago, rough k-means has gained increasing attention in academia as well as in practice. A recently introduced extension, π rough k-means, eliminates need for the weight parameter in rough k-means applying probabilities derived from Laplace’s Principle of Indifference. However, the proposal in its more general form makes it possible to optionally integrate user-defined weights for parameter tuning using techniques such as evolutionary computing. In this paper, we study the properties of this general user-weighted π k-means through extensive experiments.

...read moreread less

4 citations

References

PDF

Open Access

More filters

Journal Article•DOI•

GO: :TermFinder---open source software for accessing Gene Ontology information and finding significantly enriched Gene Ontology terms associated with a list of genes

[...]

Elizabeth I. Boyle, Shuai Weng, Jeremy Gollub¹, Heng Jin¹, David Botstein, J. Michael Cherry, Gavin Sherlock - Show less +3 more•Institutions (1)

Stanford University¹

12 Dec 2004-Bioinformatics

TL;DR: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology information and evaluating and visualizing the collective annotation of a list of genes to GO terms, which can be used to draw conclusions from microarray and other biological data.

...read moreread less

Abstract: Summary: GO::TermFinder comprises a set of object-oriented Perl modules for accessing Gene Ontology (GO) information and evaluating and visualizing the collective annotation of a list of genes to GO terms. It can be used to draw conclusions from microarray and other biological data, calculating the statistical significance of each annotation. GO::TermFinder can be used on any system on which Perl can be run, either as a command line application, in single or batch mode, or as a web-based CGI script. Availability: The full source code and documentation for GO::TermFinder are freely available from http://search.cpan.org/dist/GO-TermFinder/

...read moreread less

1,869 citations

Book•

Pattern Recognition, Fourth Edition

[...]

Sergios Theodoridis, Konstantinos Koutroumbas

30 Sep 2008

TL;DR: This edition includes many more worked examples and diagrams to help give greater understanding of the methods and their application, including semi-supervised learning, combining clustering algorithms, and relevance feedback.

...read moreread less

Abstract: This book considers classical and current theory and practice, of both supervised and unsupervised pattern recognition, to build a complete background for professionals and students of engineering. The authors, leading experts in the field of pattern recognition, have provided an up-to-date, self-contained volume encapsulating this wide spectrum of information. The very latest methods are incorporated in this edition: semi-supervised learning, combining clustering algorithms, and relevance feedback.This edition includes many more worked examples and diagrams (in two colour) to help give greater understanding of the methods and their application. Computer-based problems will be included with MATLAB code. An accompanying book contains extra worked examples and MATLAB code of all the examples used in this book.Thoroughly developed to include many more worked examples to give greater understanding of this mathematically oriented subjectMany more diagrams included--now in two color--to provide greater insight through visual presentationAn accompanying manual includes Matlab code of the methods and algorithms in the book, together with solved problems and real-life data sets in medical imaging, remote sensing and audio recognition. The Manual is available separately or at a special packaged price (ISBN: 9780123744869).Latest hot topics included to further the reference value of the text including semi-supervised learning, combining clustering algorithms, and relevance feedback.

...read moreread less

627 citations

Journal Article•DOI•

Fuzzy C-means method for clustering microarray data

[...]

Doulaye Dembélé¹, Philippe Kastner¹•Institutions (1)

Centre national de la recherche scientifique¹

22 May 2003-Bioinformatics

TL;DR: By setting threshold levels for the membership values of the FCM method, genes which are tigthly associated to a given cluster can be selected and this selection increases the overall biological significance of the genes within the cluster.

...read moreread less

Abstract: Motivation: Clustering analysis of data from DNA microarra yh ybridization studies is essential for identifying biologically relevant groups of genes. Partitional clustering methods such as K-means or self-organizing maps assign each gene to a single cluster. However, these methods do not provide information about the influence of a given gene for the overall shape of clusters. Here we apply a fuzzy partitioning method, Fuzzy C-means (FCM), to attribute cluster membership values to genes. Results: Am ajor problem in applying the FCM method for clustering microarray data is the choice of the fuzziness parameter m .W eshow that the commonly used value m = 2 is not appropriate for some data sets, and that optimal values for m vary widely from one data set to another. We propose an empirical method, based on the distribution of distances between genes in a given data set, to determine an adequate value for m .B ysetting threshold levels for the membership values, genes which are tigthly associated to a given cluster can be selected. Using a yeast cell cycle data set as an example, we show that this selection increases the overall biological significance of the genes within the cluster. Availability: Supplementary text and Matlab functions are available at http://www-igbmc.u-strasbg.fr/fcm/

...read moreread less

534 citations

Proceedings Article•

Center CLICK: A Clustering Algorithm with Applications to Gene Expression Analysis

[...]

Roded Sharan¹, Ron Shamir¹•Institutions (1)

Tel Aviv University¹

19 Aug 2000

TL;DR: This work has developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications, and which outperformed extant algorithms according to several common figures of merit.

...read moreread less

Abstract: Novel DNA mlcroarray technologies enable the monitoring of expression levels of thousands of genes simultaneously. This allows a global view on the transcription levels of many (or all) genes when the cell undergoes specific conditions or processes. Analyzing gene expression data requires the clustering of genes into groups with similar expression patterns. We have developed a novel clustering algorithm, called CLICK, which is applicable to gene expression analysis as well as to other biological applications. No prior assumptions are made on the structure or the number of the clusters. The algorithm utilizes graph-theoretic and statistical techniques to identify tight groups of highly similar dements (kernels), which are likely to belong the same true cluster. Several heuristic procedures are then used to expand the kernels into the full clustering. CLICK has been implemented and tested on a variety of biological datasets, ranging from gene expression, eDNA ollgo-fmgerprinting to protein sequence similarity. In all those applications it outperformed extant algorithms according to several common figures of merit. CLICK is also very fast, allowing clustering of thousands of elements in minutes, and over 100,000 elements in a couple of hours on a regular workstation.

...read moreread less

388 citations

Journal Article•DOI•

Rough Set Based Generalized Fuzzy $C$ -Means Algorithm and Quantitative Indices

[...]

Pradipta Maji, Sankar K. Pal¹•Institutions (1)

Indian Statistical Institute¹

01 Dec 2007

TL;DR: The RFPCM comprises a judicious integration of the principles of rough and fuzzy sets that incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy C-means and the coincident clusters of PCM.

...read moreread less

Abstract: A generalized hybrid unsupervised learning algorithm, which is termed as rough-fuzzy possibilistic C-means (RFPCM), is proposed in this paper. It comprises a judicious integration of the principles of rough and fuzzy sets. While the concept of lower and upper approximations of rough sets deals with uncertainty, vagueness, and incompleteness in class definition, the membership function of fuzzy sets enables efficient handling of overlapping partitions. It incorporates both probabilistic and possibilistic memberships simultaneously to avoid the problems of noise sensitivity of fuzzy C-means and the coincident clusters of PCM. The concept of crisp lower bound and fuzzy boundary of a class, which is introduced in the RFPCM, enables efficient selection of cluster prototypes. The algorithm is generalized in the sense that all existing variants of C-means algorithms can be derived from the proposed algorithm as a special case. Several quantitative indices are introduced based on rough sets for the evaluation of performance of the proposed C-means algorithm. The effectiveness of the algorithm, along with a comparison with other algorithms, has been demonstrated both qualitatively and quantitatively on a set of real-life data sets.

...read moreread less

220 citations