scispace - formally typeset
Search or ask a question

Showing papers by "Pablo Tamayo published in 2001"


Journal ArticleDOI
TL;DR: The results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.
Abstract: The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a support vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.

2,099 citations


Journal ArticleDOI
TL;DR: An algorithm for classification of cell line chemosensitivity based on gene expression profiles alone is developed and suggests that at least for a subset of compounds genomic approaches to chemos sensitivity prediction are feasible.
Abstract: In an effort to develop a genomics-based approach to the prediction of drug response, we have developed an algorithm for classification of cell line chemosensitivity based on gene expression profiles alone. Using oligonucleotide microarrays, the expression levels of 6,817 genes were measured in a panel of 60 human cancer cell lines (the NCI-60) for which the chemosensitivity profiles of thousands of chemical compounds have been determined. We sought to determine whether the gene expression signatures of untreated cells were sufficient for the prediction of chemosensitivity. Gene expression-based classifiers of sensitivity or resistance for 232 compounds were generated and then evaluated on independent sets of data. The classifiers were designed to be independent of the cells’ tissue of origin. The accuracy of chemosensitivity prediction was considerably better than would be expected by chance. Eighty-eight of 232 expression-based classifiers performed accurately (with P < 0.05) on an independent test set, whereas only 12 of the 232 would be expected to do so by chance. These results suggest that at least for a subset of compounds genomic approaches to chemosensitivity prediction are feasible.

668 citations


Patent
27 Sep 2001
TL;DR: An enterprise-wide web data mining system, computer program product, and method of operation thereof, that uses Internet based data sources, and which operates in an automated and cost effective manner is described in this article.
Abstract: An enterprise-wide web data mining system, computer program product, and method of operation thereof, that uses Internet based data sources, and which operates in an automated and cost effective manner. The enterprise web mining system comprises: a database coupled to a plurality of data sources, the database operable to store data collected from the data sources; a data mining engine coupled to the web server and the database, the data mining engine operable to generate a plurality of data mining models using the collected data; a server coupled to a network, the server operable to: receive a request for a prediction or recommendation over the network, generate a prediction or recommendation using the data mining models, and transmit the generated prediction or recommendation.

440 citations


Journal ArticleDOI
TL;DR: This work obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples, and performed multi-class classification by combining the outputs of binary classifiers.
Abstract: Using gene expression data to classify tumor types is a very promising tool in cancer diagnosis. Previous works show several pairs of tumor types can be successfully distinguished by their gene expression patterns (Golub et al. 1999, Ben-Dor et al. 2000, Alizadeh et al. 2000). However, the simultaneous classification across a heterogeneous set of tumor types has not been well studied yet. We obtained 190 samples from 14 tumor classes and generated a combined expression dataset containing 16063 genes for each of those samples. We performed multi-class classification by combining the outputs of binary classifiers. Three binary classifiers (k-nearest neighbors, weighted voting, and support vector machines) were applied in conjunction with three combination scenarios (one-vs-all, all-pairs, hierarchical partitioning). We achieved the best cross validation error rate of 18.75% and the best test error rate of 21.74% by using the one-vs-all support vector machine algorithm. The results demonstrate the feasibility of performing clinically useful classification from samples of multiple tumor types.

305 citations


Journal ArticleDOI
TL;DR: Exposure profile analysis revealed the sequential induction of groups of functionally similar genes, whose temporal coregulation underscores known cellular events during T cell activation, which may prove useful for comparisons of lymphocyte responses under experimental conditions and in disease states.

32 citations


Patent
19 Sep 2001
TL;DR: In this paper, sets of genetic markers for specific tumor classes are described, as well as methods of identifying a biological sample based on these markers, including diagnostic, prognostic, and therapeutic screening uses for these markers.
Abstract: Sets of genetic markers for specific tumor classes are described, as well as methods of identifying a biological sample based on these markers. Also described are diagnostic, prognostic, and therapeutic screening uses for these markers, as well as oligonucleotide arrays comprising these markers.

32 citations


Patent
01 Oct 2001
TL;DR: In this article, methods and apparatus for classifying or predicting the classes for samples based on gene expression are described, and methods and methods for ascertaining or discovering new, previously unknown classes of gene expression.
Abstract: Methods and apparatus for classifying or predicting the classes for samples based on gene expression are described. Also described are methods and apparatus for ascertaining or discovering new, previously unknown classes based on gene expression. Methods, computer systems and apparatus for classifying or predicting whether a sample is treatment sensitive (e.g., chemosensitive) or treatment resistant (e.g., chemoresistant) are also provided. Classification occurs based on analysis of gene expression data from samples that have been subjected to one or more compounds.

15 citations


Patent
23 Jan 2001
TL;DR: In this article, the authors proposed a method for identifying unknown classes based on gene expression by sorting genes depending on a degree of their expression in a sample correlating to the class identification, determining whether the correlation is stronger than that which can be obtained by an accident, and identifying a set of genes.
Abstract: PROBLEM TO BE SOLVED: To provide a method for identifying an unknown class basing on the gene expression by sorting genes depending on a degree of their expression in a sample correlating to the class identification, determining whether the correlation is stronger than that which can be obtained by an accident, followed by identifying a set of genes. SOLUTION: This method comprises the steps of sorting genes depending on a degree of their expression in a sample correlating to the class identification such as known class identification and disease class identification such as cancer class identification, determining whether or not the correlation is stronger than that which can be obtained by an accident, wherein a gene whose expression is correlated to stronger class identification than that which can be obtained by an accident is a useful gene which gives information, and identifying a set of useful genes which give information. The cancer class identification is selected from leukemia class identification, intracranial tumor class identification, and lymphoma class identification. A useful gene which gives information is C-myb, proteasome-iota and the like.

6 citations


Patent
27 Sep 2001
TL;DR: In this paper, the authors present a systeme d'exploration en profondeur de donnees reseau a l'echelle de l'entreprise, un produit de program informatique, and un procede d''exploitation de celui-ci, which utilise des sources de donnes of l'Internet, and fonctionne de maniere automatisee et economique.
Abstract: La presente invention concerne un systeme d'exploration en profondeur de donnees reseau a l'echelle de l'entreprise, un produit de programme informatique, et un procede d'exploitation de celui-ci, qui utilise des sources de donnees de l'Internet, et qui fonctionne de maniere automatisee et economique. Le systeme d'entreprise d'exploration de reseau en profondeur comporte: une base de donnees reliee a une pluralite de sources de donnees, la base de donnees etant apte a stocker des donnees recueillies a partir des sources de donnees, un engin d'exploitation de donnees relie au serveur de reseau et a la base de donnees, l'engin d'exploration de donnees etant apte a generer une pluralite de modeles d'exploration de donnees en utilisant les donnees recueillies; un serveur connecte au reseau, le serveur etant apte a: recevoir une requete pour une prediction ou une recommandation sur le reseau, generer une prediction ou une recommandation utilisant les modeles d'exploration de donnees, et transmettre la prediction ou recommandation generee.