scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Noise-robust soft clustering of gene expression time-course data

TL;DR: To overcome the limitations of hard clustering, this work applied soft clustering which offers several advantages for researchers, including more noise robust and a priori pre-filtering of genes can be avoided.
Abstract: Clustering is an important tool in microarray data analysis. This unsupervised learning technique is commonly used to reveal structures hidden in large gene expression data sets. The vast majority of clustering algorithms applied so far produce hard partitions of the data, i.e. each gene is assigned exactly to one cluster. Hard clustering is favourable if clusters are well separated. However, this is generally not the case for microarray time-course data, where gene clusters frequently overlap. Additionally, hard clustering algorithms are often highly sensitive to noise. To overcome the limitations of hard clustering, we applied soft clustering which offers several advantages for researchers. First, it generates accessible internal cluster structures, i.e. it indicates how well corresponding clusters represent genes. This can be used for the more targeted search for regulatory elements. Second, the overall relation between clusters, and thus a global clustering structure, can be defined. Additionally, soft clustering is more noise robust and a priori pre-filtering of genes can be avoided. This prevents the exclusion of biologically relevant genes from the data analysis. Soft clustering was implemented here using the fuzzy c-means algorithm. Procedures to find optimal clustering parameters were developed. A software package for soft clustering has been developed based on the open-source statistical language R. The package called Mfuzz is freely available.
Citations
More filters
Journal ArticleDOI
03 Nov 2006-Cell
TL;DR: A general mass spectrometric technology is developed and applied for identification and quantitation of phosphorylation sites as a function of stimulus, time, and subcellular location to provide a missing link in a global, integrative view of cellular regulation.

3,404 citations


Cites methods from "Noise-robust soft clustering of gen..."

  • ...The transformed profiles were then clustered using the Mfuzz toolbox (Futschik and Carlisle, 2005), which is based on the open-source statistical language R (RDC Team, 2006)....

    [...]

  • ...For our analysis, the optimal values of c and m were derived by the iterative refinement procedure as described in Futschik and Carlisle (2005)....

    [...]

  • ...tered using the Mfuzz toolbox (Futschik and Carlisle, 2005), which is based on the open-source statistical language R (RDC Team, 2006)....

    [...]

Journal ArticleDOI
TL;DR: An R package termed Mfuzz is constructed implementing soft clustering tools for microarray data analysis, which can overcome shortcomings of conventional hard clustering techniques and offer further advantages.
Abstract: For the analysis of microarray data, clustering techniques are frequently used. Most of such methods are based on hard clustering of data wherein one gene (or sample) is assigned to exactly one cluster. Hard clustering, however, suffers from several drawbacks such as sensitivity to noise and information loss. In contrast, soft clustering methods can assign a gene to several clusters. They can overcome shortcomings of conventional hard clustering techniques and offer further advantages. Thus, we constructed an R package termed Mfuzz implementing soft clustering tools for microarray data analysis. The additional package Mfuzzgui provides a convenient TclTk based graphical user interface. Availability The R package Mfuzz and Mfuzzgui are available at http://itb1.biologie.hu-berlin.de/~futschik/software/R/Mfuzz/index.html. Their distribution is subject to GPL version 2 license.

828 citations

Journal ArticleDOI
Francine E. Garrett-Bakelman1, Francine E. Garrett-Bakelman2, Manjula Darshi3, Stefan J. Green4, Ruben C. Gur5, Ling Lin6, Brandon R. Macias, Miles J. McKenna7, Cem Meydan2, Tejaswini Mishra6, Jad Nasrini5, Brian D. Piening6, Brian D. Piening8, Lindsay F. Rizzardi9, Kumar Sharma3, Jamila H. Siamwala10, Jamila H. Siamwala11, Lynn Taylor7, Martha Hotz Vitaterna12, Maryam Afkarian13, Ebrahim Afshinnekoo2, Sara Ahadi6, Aditya Ambati6, Maneesh Arya, Daniela Bezdan2, Colin M. Callahan9, Songjie Chen6, Augustine M.K. Choi2, George E. Chlipala4, Kévin Contrepois6, Marisa Covington, Brian Crucian, Immaculata De Vivo14, David F. Dinges5, Douglas J. Ebert, Jason I. Feinberg9, Jorge Gandara2, Kerry George, John Goutsias9, George Grills2, Alan R. Hargens10, Martina Heer15, Martina Heer16, Ryan P. Hillary6, Andrew N. Hoofnagle17, Vivian Hook10, Garrett Jenkinson9, Garrett Jenkinson18, Peng Jiang12, Ali Keshavarzian19, Steven S. Laurie, Brittany Lee-McMullen6, Sarah B. Lumpkins, Matthew MacKay2, Mark Maienschein-Cline4, Ari Melnick2, Tyler M. Moore5, Kiichi Nakahira2, Hemal H. Patel10, Robert Pietrzyk, Varsha Rao6, Rintaro Saito20, Rintaro Saito10, Denis Salins6, Jan M. Schilling10, Dorothy D. Sears10, Caroline Sheridan2, Michael B. Stenger, Rakel Tryggvadottir9, Alexander E. Urban6, Tomas Vaisar17, Benjamin Van Espen10, Jing Zhang6, Michael G. Ziegler10, Sara R. Zwart21, John B. Charles, Craig E. Kundrot, Graham B. I. Scott22, Susan M. Bailey7, Mathias Basner5, Andrew P. Feinberg9, Stuart M. C. Lee, Christopher E. Mason, Emmanuel Mignot6, Brinda K. Rana10, Scott M. Smith, Michael Snyder6, Fred W. Turek11, Fred W. Turek12 
12 Apr 2019-Science
TL;DR: Given that the majority of the biological and human health variables remained stable, or returned to baseline, after a 340-day space mission, these data suggest that human health can be mostly sustained over this duration of spaceflight.
Abstract: INTRODUCTION To date, 559 humans have been flown into space, but long-duration (>300 days) missions are rare (n = 8 total). Long-duration missions that will take humans to Mars and beyond are planned by public and private entities for the 2020s and 2030s; therefore, comprehensive studies are needed now to assess the impact of long-duration spaceflight on the human body, brain, and overall physiology. The space environment is made harsh and challenging by multiple factors, including confinement, isolation, and exposure to environmental stressors such as microgravity, radiation, and noise. The selection of one of a pair of monozygotic (identical) twin astronauts for NASA’s first 1-year mission enabled us to compare the impact of the spaceflight environment on one twin to the simultaneous impact of the Earth environment on a genetically matched subject. RATIONALE The known impacts of the spaceflight environment on human health and performance, physiology, and cellular and molecular processes are numerous and include bone density loss, effects on cognitive performance, microbial shifts, and alterations in gene regulation. However, previous studies collected very limited data, did not integrate simultaneous effects on multiple systems and data types in the same subject, or were restricted to 6-month missions. Measurement of the same variables in an astronaut on a year-long mission and in his Earth-bound twin indicated the biological measures that might be used to determine the effects of spaceflight. Presented here is an integrated longitudinal, multidimensional description of the effects of a 340-day mission onboard the International Space Station. RESULTS Physiological, telomeric, transcriptomic, epigenetic, proteomic, metabolomic, immune, microbiomic, cardiovascular, vision-related, and cognitive data were collected over 25 months. Some biological functions were not significantly affected by spaceflight, including the immune response (T cell receptor repertoire) to the first test of a vaccination in flight. However, significant changes in multiple data types were observed in association with the spaceflight period; the majority of these eventually returned to a preflight state within the time period of the study. These included changes in telomere length, gene regulation measured in both epigenetic and transcriptional data, gut microbiome composition, body weight, carotid artery dimensions, subfoveal choroidal thickness and peripapillary total retinal thickness, and serum metabolites. In addition, some factors were significantly affected by the stress of returning to Earth, including inflammation cytokines and immune response gene networks, as well as cognitive performance. For a few measures, persistent changes were observed even after 6 months on Earth, including some genes’ expression levels, increased DNA damage from chromosomal inversions, increased numbers of short telomeres, and attenuated cognitive function. CONCLUSION Given that the majority of the biological and human health variables remained stable, or returned to baseline, after a 340-day space mission, these data suggest that human health can be mostly sustained over this duration of spaceflight. The persistence of the molecular changes (e.g., gene expression) and the extrapolation of the identified risk factors for longer missions (>1 year) remain estimates and should be demonstrated with these measures in future astronauts. Finally, changes described in this study highlight pathways and mechanisms that may be vulnerable to spaceflight and may require safeguards for longer space missions; thus, they serve as a guide for targeted countermeasures or monitoring during future missions.

538 citations

Journal ArticleDOI
TL;DR: Cellular events underlying the pluripotency of human embryonic stem cells (hESCs) are elucidated and a core hESC phosphoproteome of sites with similar robust changes in response to the two distinct treatments is identified.
Abstract: To elucidate cellular events underlying the pluripotency of human embryonic stem cells (hESCs), we performed parallel quantitative proteomic and phosphoproteomic analyses of hESCs during differentiation initiated by a diacylglycerol analog or transfer to media that had not been conditioned by feeder cells. We profiled 6521 proteins and 23,522 phosphorylation sites, of which almost 50% displayed dynamic changes in phosphorylation status during 24 hours of differentiation. These data are a resource for studies of the events associated with the maintenance of hESC pluripotency and those accompanying their differentiation. From these data, we identified a core hESC phosphoproteome of sites with similar robust changes in response to the two distinct treatments. These sites exhibited distinct dynamic phosphorylation patterns, which were linked to known or predicted kinases on the basis of the matching sequence motif. In addition to identifying previously unknown phosphorylation sites on factors associated with differentiation, such as kinases and transcription factors, we observed dynamic phosphorylation of DNA methyltransferases (DNMTs). We found a specific interaction of DNMTs during early differentiation with the PAF1 (polymerase-associated factor 1) transcriptional elongation complex, which binds to promoters of the pluripotency and known DNMT target genes encoding OCT4 and NANOG, thereby providing a possible molecular link for the silencing of these genes during differentiation.

450 citations


Cites methods from "Noise-robust soft clustering of gen..."

  • ...The resulting transformed ratios were standardized by dividing the peptide SILAC ratio for each time point by the SD for this peptide and then subjecting to unsupervised clustering with the fuzzy c-means algorithm as implemented in the Mfuzz package (45, 66) with a fuzzification parameter of 2 and 10 centers....

    [...]

Journal ArticleDOI
TL;DR: This paper compares the efficacy of three different implementations of techniques aimed to extend fuzzy c-means (FCM) clustering to VL data and concludes by demonstrating the VL algorithms on a dataset with 5 billion objects and presenting a set of recommendations regarding the use of different VL FCM clustering schemes.
Abstract: Very large (VL) data or big data are any data that you cannot load into your computer's working memory. This is not an objective definition, but a definition that is easy to understand and one that is practical, because there is a dataset too big for any computer you might use; hence, this is VL data for you. Clustering is one of the primary tasks used in the pattern recognition and data mining communities to search VL databases (including VL images) in various applications, and so, clustering algorithms that scale well to VL data are important and useful. This paper compares the efficacy of three different implementations of techniques aimed to extend fuzzy c-means (FCM) clustering to VL data. Specifically, we compare methods that are based on 1) sampling followed by noniterative extension; 2) incremental techniques that make one sequential pass through subsets of the data; and 3) kernelized versions of FCM that provide approximations based on sampling, including three proposed algorithms. We use both loadable and VL datasets to conduct the numerical experiments that facilitate comparisons based on time and space complexity, speed, quality of approximations to batch FCM (for loadable data), and assessment of matches between partitions and ground truth. Empirical results show that random sampling plus extension FCM, bit-reduced FCM, and approximate kernel FCM are good choices to approximate FCM for VL data. We conclude by demonstrating the VL algorithms on a dataset with 5 billion objects and presenting a set of recommendations regarding the use of different VL FCM clustering schemes.

424 citations

References
More filters
Book
01 Jan 1973

20,541 citations

Journal ArticleDOI
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

16,371 citations

Book
31 Jul 1981
TL;DR: Books, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with, becomes what you need to get.
Abstract: New updated! The latest book from a very famous author finally comes out. Book of pattern recognition with fuzzy objective function algorithms, as an amazing reference becomes what you need to get. What's for is this book? Are you still thinking for what the book is? Well, this is what you probably will get. You should have made proper choices for your better life. Book, as a source that may involve the facts, opinion, literature, religion, and many others are the great friends to join with.

15,662 citations


"Noise-robust soft clustering of gen..." refers methods in this paper

  • ...For m → 1, it can be shown that the clustering becomes hard.(9) The FCM algorithm is then equivalent to the k -means clustering....

    [...]

  • ...Soft clustering can be implemented using algorithms (such as fuzzy c-means) based on minimization of objective functions.(9,10) Alternatively, probabilistic approaches such Gaussian mixture models combined with expectation-maximization schemes can be applied....

    [...]

  • ...Several methods for minimizing the objective function Jm have been proposed.(9,10) Fuzzy c-means (FCM) clustering is the most common algorithm for solving this problem....

    [...]

01 Jan 1988

9,439 citations


"Noise-robust soft clustering of gen..." refers background in this paper

  • ...It has been widely used in numerous fields of scientific research.(1) Clustering can be especially useful if prior knowledge is little or non-existent, since it requires minimal prior assumptions....

    [...]

Book
01 Jan 1988

8,586 citations