scispace - formally typeset
Search or ask a question

Showing papers by "Pablo Tamayo published in 2004"


Journal ArticleDOI
TL;DR: Nonnegative matrix factorization is described, an algorithm based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to a handful of metagenes, and found less sensitive to a priori selection of genes or initial conditions and able to detect alternative or context-dependent patterns of gene expression in complex biological systems.
Abstract: We describe here the use of nonnegative matrix factorization (NMF), an algorithm based on decomposition by parts that can reduce the dimension of expression data from thousands of genes to a handful of metagenes. Coupled with a model selection mechanism, adapted to work for any stochastic clustering algorithm, NMF is an efficient method for identification of distinct molecular patterns and provides a powerful method for class discovery. We demonstrate the ability of NMF to recover meaningful biological information from cancer-related microarray data. NMF appears to have advantages over other methods such as hierarchical clustering or self-organizing maps. We found it less sensitive to a priori selection of genes or initial conditions and able to detect alternative or context-dependent patterns of gene expression in complex biological systems. This ability, similar to semantic polysemy in text, provides a general method for robust molecular pattern discovery.

1,818 citations


Journal ArticleDOI
TL;DR: GeneCluster 2.0 greatly expands the data analysis capabilities of GeneCluster 1.0 by adding classification, class discovery and permutation test methods.
Abstract: Summary: GeneCluster 2.0 is a software package for analyzing gene expression and other bioarray data, giving users a variety of methods to build and evaluate class predictors, visualize marker lists, cluster data and validate results. GeneCluster 2.0 greatly expands the data analysis capabilities of GeneCluster 1.0 by adding classification, class discovery and permutation test methods. It includes algorithms for building and testing supervised models using weighted voting and k-nearest neighbor algorithms, a module for systematically finding and evaluating clustering via self-organizing maps, and modules for marker gene selection and heat map visualization that allow users to view and sort samples and genes by many criteria. GeneCluster 2.0 is a standalone Java application and runs on any platform that supports the Java Runtime Environment version 1.3.1 or greater. Availability: http://www.broad.mit.edu/cancer/software

95 citations


Journal ArticleDOI
TL;DR: Gene expression profiling predicts medulloblastoma outcome independent of clinical variables, and univariate analysis demonstrated expression profiles to be the only significant clinical prognostic factor.
Abstract: Purpose Stratification of risk in patients with medulloblastoma remains a challenge. As clinical parameters have been proven insufficient for accurately defining disease risk, molecular markers have become the focus of interest. Outcome predictions on the basis of microarray gene expression profiles have been the most accurate to date. We ask in a multivariate model whether clinical parameters enhance survival predictions of gene expression profiles. Patients and Methods In a cohort of 55 young patients (whose medulloblastoma samples have been analyzed previously for gene expression profile), associations between clinical and gene expression variables and survival were assessed using Cox proportional hazards models. Available clinical variables included age, stage (ie, the presence of disseminated disease at diagnosis), sex, histologic subtype, treatment, and status. Results Univariate analysis demonstrated expression profiles to be the only significant clinical prognostic factor (P = .03). In multivariat...

75 citations


Patent
27 Aug 2004
TL;DR: In this article, an implementation of NMF functionality integrated into a relational database management system provides the capability to apply NMF to relational datasets and to sparse datasets, where each data table being smaller than the multi-dimensional data table and having a reduced dimensionality relative to the multidimensional data table.
Abstract: An implementation of NMF functionality integrated into a relational database management system provides the capability to apply NMF to relational datasets and to sparse datasets. A database management system comprises a multi-dimensional data table operable to store data and a processing unit operable to perform non-negative matrix factorization on data stored in the multi-dimensional data table and to generate a plurality of data tables, each data table being smaller than the multi-dimensional data table and having reduced dimensionality relative to the multi-dimensional data table. The multi-dimensional data table may be a relational data table.

25 citations


Patent
27 Aug 2004
TL;DR: In this article, an implementation of NMF functionality integrated into a relational database management system provides the capability to apply NMF to relational datasets and to sparse datasets, where each data table being smaller than the multi-dimensional data table and having a reduced dimensionality relative to the multidimensional data table.
Abstract: An implementation of NMF functionality integrated into a relational database management system provides the capability to apply NMF to relational datasets and to sparse datasets. A database management system comprises a multi-dimensional data table operable to store data and a processing unit operable to perform non-negative matrix factorization on data stored in the multi-dimensional data table and to generate a plurality of data tables, each data table being smaller than the multi-dimensional data table and having reduced dimensionality relative to the multi-dimensional data table. The multi-dimensional data table may be a relational data table.

15 citations