Journal ArticleDOI
Biclustering Algorithms for Biological Data Analysis: A Survey
Reads0
Chats0
TLDR
In this comprehensive survey, a large number of existing approaches to biclustering are analyzed, and they are classified in accordance with the type of biclusters they can find, the patterns of bIClusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.Abstract:
A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results from the application of standard clustering methods to genes are limited. This limitation is imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the data matrix has been proposed. The goal is to find submatrices, that is, subgroups of genes and subgroups of conditions, where the genes exhibit highly correlated activities for every condition. In this paper, we refer to this class of algorithms as biclustering. Biclustering is also referred in the literature as coclustering and direct clustering, among others names, and has also been used in fields such as information retrieval and data mining. In this comprehensive survey, we analyze a large number of existing approaches to biclustering, and classify them in accordance with the type of biclusters they can find, the patterns of biclusters that are discovered, the methods used to perform the search, the approaches used to evaluate the solution, and the target applications.read more
Citations
More filters
Book
Machine Learning : A Probabilistic Perspective
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Journal ArticleDOI
Survey of clustering algorithms
Rui Xu,Donald C. Wunsch +1 more
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Journal ArticleDOI
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering
TL;DR: This survey tries to clarify the different problem definitions related to subspace clustering in general; the specific difficulties encountered in this field of research; the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and how several prominent solutions tackle different problems.
Journal ArticleDOI
A systematic comparison and evaluation of biclustering methods for gene expression data
Amela Prelić,Stefan Bleuler,Philip Zimmermann,Anja Wille,Peter Bühlmann,Wilhelm Gruissem,Lars Hennig,Lothar Thiele,Eckart Zitzler +8 more
TL;DR: A methodology for comparing and validating biclustering methods that includes a simple binary reference model that captures the essential features of most bic Lustering approaches and proposes a fast divide-and-conquer algorithm (Bimax).
Journal ArticleDOI
Computational cluster validation in post-genomic data analysis
TL;DR: In this article, the authors present a review of clustering validation techniques for post-genomic data analysis, with a particular focus on their application to postgenomic analysis of biological data.
References
More filters
PatentDOI
Mll translocations specify a distinct gene expression profile, distinguishing a unique leukemia
TL;DR: In this paper, the diagnosis of mixed lineage leukemia (MLL), acute lymphoblastic leukemia (ALL), and acute myellgenous leukemia (AML) according to the gene expression profile of a sample from an individual, as well as to methods of therapy and screening that utilize the genes indentified herein as targets.
Journal ArticleDOI
Gene-Expression Profiles in Hereditary Breast Cancer
Ingrid Hedenfalk,David Duggan,Yi Chen,Michael D. Radmacher,M. Bittner,Richard M. Simon,P. Meltzer,Barry A. Gusterson,Manel Esteller,O. P. Kallioniemi,Benjamin S. Wilfond,Åke Borg,J.M. Trent,Mark Raffeld,Zohar Yakhini,Amir Ben-Dor,Edward R. Dougherty,Juha Kononen,Lukas Bubendorf,W Fehrle,Stefania Pittaluga,Sofia Gruvberger,Niklas Loman,Oskar T. Johannsson,Håkan Olsson,Guido Sauter +25 more
TL;DR: Significantly different groups of genes are expressed by breast cancers with BRCA1 mutations and breast cancersWith BRCa2 mutations, the results suggest that a heritable mutation influences the gene-expression profile of the cancer.
Proceedings ArticleDOI
Information-theoretic co-clustering
TL;DR: This work presents an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages and demonstrates that the algorithm works well in practice, especially in the presence of sparsity and high-dimensionality.
Journal ArticleDOI
An Information-Intensive Approach to the Molecular Pharmacology of Cancer
John N. Weinstein,Timothy G. Myers,Patrick M. O'Connor,Stephen H. Friend,Albert J. Fornace,Kurt W. Kohn,Tito Fojo,Susan E. Bates,Larry Rubinstein,N. Leigh Anderson,John K. Buolamwini,William W. van Osdol,Anne Monks,Dominic A. Scudiero,Edward A. Sausville,Daniel W. Zaharevitz,Barry Bunow,Vellarkad N. Viswanadhan,George S. Johnson,Robert Wittes,Kenneth D. Paull +20 more
TL;DR: Information is being used to search for candidate anticancer drugs that are not dependent on intact p53 suppressor gene function for their activity, and it remains to be seen how effective this information-intensive strategy will be at generating new clinically active agents.
Journal ArticleDOI
Direct Clustering of a Data Matrix
TL;DR: This article presents a model, and a technique, for clustering cases and variables simultaneously and the principal advantage in this approach is the direct interpretation of the clusters on the data.