Author
Pablo Tamayo
Other affiliations: University of California, Berkeley, Harvard University, Massachusetts Institute of Technology ...read more
Bio: Pablo Tamayo is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Gene expression profiling & Cancer. The author has an hindex of 72, co-authored 177 publications receiving 97318 citations. Previous affiliations of Pablo Tamayo include University of California, Berkeley & Harvard University.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: The studies reveal that SNF5 is a key mediator of Hh signaling and that aberrant activation of GLI1 is a previously undescribed targetable mechanism contributing to the growth of MRT cells.
Abstract: Aberrant activation of the Hedgehog (Hh) pathway can drive tumorigenesis1. To investigate the mechanism by which glioma-associated oncogene family zinc finger-1 (GLI1), a crucial effector of Hh signaling2, regulates Hh pathway activation, we searched for GLI1-interacting proteins. We report that the chromatin remodeling protein SNF5 (encoded by SMARCB1, hereafter called SNF5), which is inactivated in human malignant rhabdoid tumors (MRTs), interacts with GLI1. We show that Snf5 localizes to Gli1-regulated promoters and that loss of Snf5 leads to activation of the Hh-Gli pathway. Conversely, re-expression of SNF5 in MRT cells represses GLI1. Consistent with this, we show the presence of a Hh-Gli–activated gene expression profile in primary MRTs and show that GLI1 drives the growth of SNF5-deficient MRT cells in vitro and in vivo. Therefore, our studies reveal that SNF5 is a key mediator of Hh signaling and that aberrant activation of GLI1 is a previously undescribed targetable mechanism contributing to the growth of MRT cells.
223 citations
••
08 Apr 2000TL;DR: A method for performing class prediction is described and illustrated by correctly classifying bone marrow and blood samples from acute leukemia patients, and it is demonstrated how this technique could have discovered the key distinctions among leukemias if they were not already known.
Abstract: Classification of patient samples is a crucial aspect of cancer diagnosis and treatment. We present a method for classifying samples by computational analysis of gene expression data. We consider the classification problem in two parts: class discovery and class prediction. Class discovery refers to the process of dividing samples into reproducible classes that have similar behavior or properties, while class prediction places new samples into already known classes. We describe a method for performing class prediction and illustrate its strength by correctly classifying bone marrow and blood samples from acute leukemia patients. We also describe how to use our predictor to validate newly discovered classes, and we demonstrate how this technique could have discovered the key distinctions among leukemias if they were not already known. This proof-of-concept experiment paves the way for a wealth of future work on the molecular classification and understanding of disease.
220 citations
••
TL;DR: A method called PRISM is reported that allows pooled screening of mixtures of cancer cell lines by labeling each cell line with 24-nucleotide barcodes and revealed the expected patterns of cell killing seen in conventional (unpooled) assays.
Abstract: Hundreds of genetically characterized cell lines are available for the discovery of genotype-specific cancer vulnerabilities. However, screening large numbers of compounds against large numbers of cell lines is currently impractical, and such experiments are often difficult to control. Here we report a method called PRISM that allows pooled screening of mixtures of cancer cell lines by labeling each cell line with 24-nucleotide barcodes. PRISM revealed the expected patterns of cell killing seen in conventional (unpooled) assays. In a screen of 102 cell lines across 8,400 compounds, PRISM led to the identification of BRD-7880 as a potent and highly specific inhibitor of aurora kinases B and C. Cell line pools also efficiently formed tumors as xenografts, and PRISM recapitulated the expected pattern of erlotinib sensitivity in vivo.
213 citations
••
TL;DR: This work evaluates 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome- wide association studies to create a parsimonious composite network with both high efficiency and performance.
Abstract: Gene networks are rapidly growing in size and number, raising the question of which networks are most appropriate for particular applications. Here, we evaluate 21 human genome-wide interaction networks for their ability to recover 446 disease gene sets identified through literature curation, gene expression profiling, or genome-wide association studies. While all networks have some ability to recover disease genes, we observe a wide range of performance with STRING, ConsensusPathDB, and GIANT networks having the best performance overall. A general tendency is that performance scales with network size, suggesting that new interaction discovery currently outweighs the detrimental effects of false positives. Correcting for size, we find that the DIP network provides the highest efficiency (value per interaction). Based on these results, we create a parsimonious composite network with both high efficiency and performance. This work provides a benchmark for selection of molecular networks in human disease research.
206 citations
••
TL;DR: An alternative approach that first separates the HER2+ tumors using a gene amplification signal for Her2/neu amplicon genes and then applies consensus ensemble clustering separately to the Her2+ and HER2- clusters to look for further substructure is proposed.
Abstract: Gene expression analysis has identified biologically relevant subclasses of breast cancer. However, most classification schemes do not robustly cluster all HER2+ breast cancers, in part due to limitations and bias of clustering techniques used. In this article, we propose an alternative approach that first separates the HER2+ tumors using a gene amplification signal for Her2/neu amplicon genes and then applies consensus ensemble clustering separately to the HER2+ and HER2- clusters to look for further substructure. We applied this procedure to a microarray data set of 286 early-stage breast cancers treated only with surgery and radiation and identified two basal and four luminal subtypes in the HER2- tumors, as well as two novel and robust HER2+ subtypes. HER2+ subtypes had median distant metastasis-free survival of 99 months [95% confidence interval (95% CI), 83-118 months] and 33 months (95% CI, 11-54 months), respectively, and recurrence rates of 11% and 58%, respectively. The low recurrence subtype had a strong relative overexpression of lymphocyte-associated genes and was also associated with a prominent lymphocytic infiltration on histologic analysis. These data suggest that early-stage HER2+ cancers associated with lymphocytic infiltration are a biologically distinct subtype with an improved natural history.
201 citations
Cited by
More filters
••
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
34,830 citations
••
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
32,980 citations
••
TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
31,015 citations
••
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
22,147 citations
••
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARS‐EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the
16,538 citations