scispace - formally typeset
Search or ask a question
Author

Pablo Tamayo

Bio: Pablo Tamayo is an academic researcher from University of California, San Diego. The author has contributed to research in topics: Gene expression profiling & Cancer. The author has an hindex of 72, co-authored 177 publications receiving 97318 citations. Previous affiliations of Pablo Tamayo include University of California, Berkeley & Harvard University.


Papers
More filters
Journal Articleā€¢DOIā€¢
TL;DR: This work obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples, and performed multi-class classification by combining the outputs of binary classifiers.
Abstract: Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished by their gene expression patterns (Golub et al. 1999, Ben-Dor et al. 2000, Alizadeh et al. 2000). However, the simultaneous classification across a heterogeneous set of tumor types has not been well studied yet. We obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples. We performed multi-class classification by combining the outputs of binary classifiers. Three binary classifiers (k-nearest neighbors, weighted voting, and support vector machines) were applied in conjunction with three combination scenarios (one-vs-all, all-pairs, hierarchical partitioning). We achieved the best cross validation error rate of 18.75% and the best test error rate of 21.74% by using the one-vs-all support vector machine algorithm. The results demonstrate the feasibility of performing clinically useful classification from samples of multiple tumor types.

305Ā citations

Journal Articleā€¢DOIā€¢
TL;DR: A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced, based on fitting inverse power-law models to construct empirical learning curves.
Abstract: A statistical methodology for estimating dataset size requirements for classifying microarray data using learning curves is introduced. The goal is to use existing classification results to estimate dataset size requirements for future classification experiments and to evaluate the gain in accuracy and significance of classifiers built with additional data. The method is based on fitting inverse power-law models to construct empirical learning curves. It also includes a permutation test procedure to assess the statistical significance of classification performance for a given dataset size. This procedure is applied to several molecular classification problems representing a broad spectrum of levels of complexity.

274Ā citations

Journal Articleā€¢DOIā€¢
TL;DR: The original version of this Data Descriptor contained a typographical error in the spelling of the author Terence C. Wong, which was incorrectly given as Terrence C Wong as discussed by the authors.
Abstract: Scientific Data 1:140035 doi: 10.1038/sdata.2014.35 (2014); Published 30 September 2014; Updated 11 November 2014 The original version of this Data Descriptor contained a typographical error in the spelling of the author Terence C. Wong, which was incorrectly given as Terrence C. Wong. This has now been corrected in the PDF and HTML versions of the Data Descriptor.

244Ā citations

Journal Articleā€¢DOIā€¢
13 Dec 2007-Nature
TL;DR: A large study of in vivo expression profiles of parasites derived directly from blood samples from infected patients reveals a previously unknown physiological diversity in the in vivo biology of the malaria parasite, and indicates in vivo and in vitro studies to determine how this variation may affect disease manifestations and treatment.
Abstract: Infection with the malaria parasite Plasmodium falciparum leads to widely different clinical conditions in children, ranging from mild flu-like symptoms to coma and death. Despite the immense medical implications, the genetic and molecular basis of this diversity remains largely unknown. Studies of in vitro gene expression have found few transcriptional differences between different parasite strains. Here we present a large study of in vivo expression profiles of parasites derived directly from blood samples from infected patients. The in vivo expression profiles define three distinct transcriptional states. The biological basis of these states can be interpreted by comparison with an extensive compendium of expression data in the yeast Saccharomyces cerevisiae. The three states in vivo closely resemble, first, active growth based on glycolytic metabolism, second, a starvation response accompanied by metabolism of alternative carbon sources, and third, an environmental stress response. The glycolytic state is highly similar to the known profile of the ring stage in vitro, but the other states have not been observed in vitro. The results reveal a previously unknown physiological diversity in the in vivo biology of the malaria parasite, in particular evidence for a functional mitochondrion in the asexual-stage parasite, and indicate in vivo and in vitro studies to determine how this variation may affect disease manifestations and treatment.

243Ā citations

Journal Articleā€¢DOIā€¢
19 Jan 2016-Immunity
TL;DR: Analysis using ImmuneSigDB identified signatures induced in activated myeloid cells and differentiating lymphocytes that were highly conserved between humans and mice and species-specific biological processes in the sepsis transcriptional response.

226Ā citations


Cited by
More filters
Journal Articleā€¢DOIā€¢
TL;DR: The Gene Set Enrichment Analysis (GSEA) method as discussed by the authors focuses on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation.
Abstract: Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.

34,830Ā citations

Journal Articleā€¢DOIā€¢
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

32,980Ā citations

Journal Articleā€¢DOIā€¢
TL;DR: By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.
Abstract: DAVID bioinformatics resources consists of an integrated biological knowledgebase and analytic tools aimed at systematically extracting biological meaning from large gene/protein lists. This protocol explains how to use DAVID, a high-throughput and integrated data-mining environment, to analyze gene lists derived from high-throughput genomic experiments. The procedure first requires uploading a gene list containing any number of common gene identifiers followed by analysis using one or more text and pathway-mining tools such as gene functional classification, functional annotation chart or clustering and functional annotation table. By following this protocol, investigators are able to gain an in-depth understanding of the biological themes in lists of genes that are enriched in genome-scale studies.

31,015Ā citations

Journal Articleā€¢DOIā€¢
TL;DR: The philosophy and design of the limma package is reviewed, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
Abstract: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.

22,147Ā citations

Journal Articleā€¢DOIā€¢
Hui Zou1, Trevor Hastie1ā€¢
TL;DR: It is shown that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation, and an algorithm called LARSā€EN is proposed for computing elastic net regularization paths efficiently, much like algorithm LARS does for the lamba.
Abstract: Summary. We propose the elastic net, a new regularization and variable selection method. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in or out of the model together.The elastic net is particularly useful when the number of predictors (p) is much bigger than the number of observations (n). By contrast, the lasso is not a very satisfactory variable selection method in the

16,538Ā citations