scispace - formally typeset
Search or ask a question

Showing papers by "Todd R. Golub published in 2003"


Journal ArticleDOI
TL;DR: An analytical strategy is introduced, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes, which identifies a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle.
Abstract: DNA microarrays can be used to identify gene expression changes characteristic of human disease. This is challenging, however, when relevant differences are subtle at the level of individual genes. We introduce an analytical strategy, Gene Set Enrichment Analysis, designed to detect modest but coordinate changes in the expression of groups of functionally related genes. Using this approach, we identify a set of genes involved in oxidative phosphorylation whose expression is coordinately decreased in human diabetic muscle. Expression of these genes is high at sites of insulin-mediated glucose disposal, activated by PGC-1α and correlated with total-body aerobic capacity. Our results associate this gene set with clinically important variation in human metabolism and illustrate the value of pathway relationships in the analysis of genomic profiling experiments.

7,997 citations


Journal ArticleDOI
TL;DR: It is found that solid tumors carrying the gene-expression signature were most likely to be associated with metastasis and poor clinical outcome, suggesting that the metastatic potential of human tumors is encoded in the bulk of aPrimary tumor, thus challenging the notion that metastases arise from rare cells within a primary tumor that have the ability to metastasize.
Abstract: Metastasis is the principal event leading to death in individuals with cancer, yet its molecular basis is poorly understood. To explore the molecular differences between human primary tumors and metastases, we compared the gene-expression profiles of adenocarcinoma metastases of multiple tumor types to unmatched primary adenocarcinomas. We found a gene-expression signature that distinguished primary from metastatic adenocarcinomas. More notably, we found that a subset of primary tumors resembled metastatic tumors with respect to this gene-expression signature. We confirmed this finding by applying the expression signature to data on 279 primary solid tumors of diverse types. We found that solid tumors carrying the gene-expression signature were most likely to be associated with metastasis and poor clinical outcome (P < 0.03). These results suggest that the metastatic potential of human tumors is encoded in the bulk of a primary tumor, thus challenging the notion that metastases arise from rare cells within a primary tumor that have the ability to metastasize.

2,434 citations


Journal ArticleDOI
TL;DR: A new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data is presented, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters.
Abstract: In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a visualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the effectiveness of the methodology in discovering biologically meaningful clusters.

1,831 citations


Journal Article
TL;DR: It is suggested that class prediction models, based on defined molecular profiles, classify diagnostically challenging malignant gliomas in a manner that better correlates with clinical outcome than does standard pathology.
Abstract: In modern clinical neuro-oncology, histopathological diagnosis affects therapeutic decisions and prognostic estimation more than any other variable. Among high-grade gliomas, histologically classic glioblastomas and anaplastic oligodendrogliomas follow markedly different clinical courses. Unfortunately, many malignant gliomas are diagnostically challenging; these nonclassic lesions are difficult to classify by histological features, generating considerable interobserver variability and limited diagnostic reproducibility. The resulting tentative pathological diagnoses create significant clinical confusion. We investigated whether gene expression profiling, coupled with class prediction methodology, could be used to classify high-grade gliomas in a manner more objective, explicit, and consistent than standard pathology. Microarray analysis was used to determine the expression of ∼12,000 genes in a set of 50 gliomas, 28 glioblastomas and 22 anaplastic oligodendrogliomas. Supervised learning approaches were used to build a two-class prediction model based on a subset of 14 glioblastomas and 7 anaplastic oligodendrogliomas with classic histology. A 20-feature k -nearest neighbor model correctly classified 18 of the 21 classic cases in leave-one-out cross-validation when compared with pathological diagnoses. This model was then used to predict the classification of clinically common, histologically nonclassic samples. When tumors were classified according to pathology, the survival of patients with nonclassic glioblastoma and nonclassic anaplastic oligodendroglioma was not significantly different ( P = 0.19). However, class distinctions according to the model were significantly associated with survival outcome ( P = 0.05). This class prediction model was capable of classifying high-grade, nonclassic glial tumors objectively and reproducibly. Moreover, the model provided a more accurate predictor of prognosis in these nonclassic lesions than did pathological classification. These data suggest that class prediction models, based on defined molecular profiles, classify diagnostically challenging malignant gliomas in a manner that better correlates with clinical outcome than does standard pathology.

926 citations



Journal ArticleDOI
08 Aug 2003-Cell
TL;DR: Patterns of gene expression in human tumors have been deconvoluted to reveal a mechanism of action for the cyclin D1 oncogene and this work demonstrates that tumor gene expression databases can be used to study the function of a humanoncogene in situ.

429 citations



Journal ArticleDOI
TL;DR: The MPAKT model may be useful in studying the role of Akt in prostate epithelial cell transformation and in the discovery of molecular markers relevant to human disease.
Abstract: To determine whether Akt activation was sufficient for the transformation of normal prostate epithelial cells, murine prostate restricted Akt kinase activity was generated in transgenic mice (MPAKT mice). Akt expression led to p70S6K activation, prostatic intraepithelial neoplasia (PIN), and bladder obstruction. mRNA expression profiles from MPAKT ventral prostate revealed similarities to human cancer and an angiogenic signature that included three angiogenin family members, one of which was found elevated in the plasma of men with prostate cancer. Thus, the MPAKT model may be useful in studying the role of Akt in prostate epithelial cell transformation and in the discovery of molecular markers relevant to human disease.

288 citations


Journal ArticleDOI
TL;DR: A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced, based on fitting inverse power-law models to construct empirical learning curves.
Abstract: A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

274 citations


01 Jan 2003
TL;DR: A new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data is presented and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters.
Abstract: In this paper we present a new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data. The method can best be thought of as an analysis approach, to guide and assist in the use of any of a wide range of available clustering algorithms. We call the new methodology consensus clustering, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters. The method can also be used to represent the consensus over multiple runs of a clustering algorithm with random restart (such as K-means, model-based Bayesian clustering, SOM, etc.), so as to account for its sensitivity to the initial conditions. Finally, it provides for a vi- sualization tool to inspect cluster number, membership, and boundaries. We present the results of our experiments on both simulated data and real gene expression data aimed at evaluating the eectiveness of the methodology in discovering biologically meaningful clusters.

118 citations


Journal ArticleDOI
TL;DR: Insights gained from characterization of MLL-rearranged human leukemias by genome-wide expression profiling are reviewed and compared to data from model systems.

Journal ArticleDOI
TL;DR: In this paper, a computational methodology for multiclass prediction that combines class-specific (one vs. all) binary support vector machines was proposed for the diagnosis of multiple common adult malignancies using DNA microarray data.
Abstract: Modern cancer treatment relies upon microscopic tissue examination to classify tumors according to anatomical site of origin. This approach is effective but subjective and variable even among experienced clinicians and pathologists. Recently, DNA microarray-generated gene expression data has been used to build molecular cancer classifiers. Previous work from our group and others demonstrated methods for solving pairwise classification problems using such global gene expression patterns. However, classification across multiple primary tumor classes poses new methodological and computational challenges. In this paper we describe a computational methodology for multiclass prediction that combines class-specific (one vs. all) binary support vector machines. We apply this methodology to the diagnosis of multiple common adult malignancies using DNA microarray data from a collection of 198 tumor samples, spanning 14 of the most common tumor types. Overall classification accuracy is 78%, far exceeding the expecte...

Journal ArticleDOI
TL;DR: A new study examining gene expression profiles could signal a shift in the approach to combination therapy, according to researchers at the Massachusetts General Hospital.
Abstract: The evaluation of drug combinations for cancer treatment has progressed slowly through methodical clinical research. A new study examining gene expression profiles could signal a shift in the approach to combination therapy.

Patent
06 Aug 2003
TL;DR: In this paper, a large-scale Bayes classification framework for across platform and multiple dataset classification is proposed. In one embodiment, the systems combine a Large Bayes classifier with a definition of combined relative features to represent the original values.
Abstract: Systems and methods for across platform and multiple dataset classification. In one embodiment the systems combine a Large Bayes classification framework, constructed from discovered itemsets or common patterns of data, with a definition of combined relative features to represent the original values. One realization of this method is that different datasets representing the same biological system display some amount of invariant biological characteristics independent of the idiosyncrasies of sample sources, preparation and the technological platform used to obtain the measurements. These invariant biological characteristics, when captured and exposed, can provide the basis to build robust, general and accurate classification models based on reproducible biological behavior


Journal ArticleDOI
TL;DR: Reply to "Genomic analysis of primary tumors does not address the prevalence of metastatic cells in the population" and "Genetic background is an important determinant of metastasis potential"
Abstract: Reply to "Genomic analysis of primary tumors does not address the prevalence of metastatic cells in the population" and "Genetic background is an important determinant of metastatic potential"