scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Multiclass cancer diagnosis using tumor gene expression signatures

TL;DR: The results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.
Abstract: The optimal treatment of patients with cancer depends on establishing accurate diagnoses by using a complex combination of clinical and histopathological data. In some instances, this task is difficult or impossible because of atypical clinical presentation or histopathology. To determine whether the diagnosis of multiple common adult malignancies could be achieved purely by molecular classification, we subjected 218 tumor samples, spanning 14 common tumor types, and 90 normal tissue samples to oligonucleotide microarray gene expression analysis. The expression levels of 16,063 genes and expressed sequence tags were used to evaluate the accuracy of a multiclass classifier based on a support vector machine algorithm. Overall classification accuracy was 78%, far exceeding the accuracy of random classification (9%). Poorly differentiated cancers resulted in low-confidence predictions and could not be accurately classified according to their tissue of origin, indicating that they are molecularly distinct entities with dramatically different gene expression patterns compared with their well differentiated counterparts. Taken together, these results demonstrate the feasibility of accurate, multiclass molecular cancer classification and suggest a strategy for future clinical implementation of molecular cancer diagnostics.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In comparative timings, the new algorithms are considerably faster than competing methods and can handle large problems and can also deal efficiently with sparse features.
Abstract: We develop fast algorithms for estimation of generalized linear models with convex penalties. The models include linear regression, two-class logistic regression, and multinomial regression problems while the penalties include l(1) (the lasso), l(2) (ridge regression) and mixtures of the two (the elastic net). The algorithms use cyclical coordinate descent, computed along a regularization path. The methods can handle large problems and can also deal efficiently with sparse features. In comparative timings we find that the new algorithms are considerably faster than competing methods.

13,656 citations


Cites background from "Multiclass cancer diagnosis using t..."

  • ...• Cancer [Ramaswamy et al., 2001]: gene-expression data with 14 cancer classes....

    [...]

Journal ArticleDOI
09 Jun 2005-Nature
TL;DR: A new, bead-based flow cytometric miRNA expression profiling method is used to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers, and finds the miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours.
Abstract: Recent work has revealed the existence of a class of small non-coding RNA species, known as microRNAs (miRNAs), which have critical functions across various biological processes. Here we use a new, bead-based flow cytometric miRNA expression profiling method to present a systematic expression analysis of 217 mammalian miRNAs from 334 samples, including multiple human cancers. The miRNA profiles are surprisingly informative, reflecting the developmental lineage and differentiation state of the tumours. We observe a general downregulation of miRNAs in tumours compared with normal tissues. Furthermore, we were able to successfully classify poorly differentiated tumours using miRNA expression profiles, whereas messenger RNA profiles were highly inaccurate when applied to the same samples. These findings highlight the potential of miRNA profiling in cancer diagnosis.

9,470 citations

01 Aug 2000
TL;DR: Assessment of medical technology in the context of commercialization with Bioentrepreneur course, which addresses many issues unique to biomedical products.
Abstract: BIOE 402. Medical Technology Assessment. 2 or 3 hours. Bioentrepreneur course. Assessment of medical technology in the context of commercialization. Objectives, competition, market share, funding, pricing, manufacturing, growth, and intellectual property; many issues unique to biomedical products. Course Information: 2 undergraduate hours. 3 graduate hours. Prerequisite(s): Junior standing or above and consent of the instructor.

4,833 citations

Journal ArticleDOI
TL;DR: Support vector machines are becoming popular in a wide variety of biological applications, but how do they work and what are their most promising applications in the life sciences?
Abstract: Support vector machines (SVMs) are becoming popular in a wide variety of biological applications. But, what exactly are SVMs and how do they work? And what are their most promising applications in the life sciences?

3,801 citations


Cites background from "Multiclass cancer diagnosis using t..."

  • ...Essentially, to recognize three classes, A, B and C, we simply have to train three separate SVMs to answer the binary questions, “Is it A?,” “Is it B?” and “Is it C?” This simple approach actually works quite well for cancer classificatio...

    [...]

Journal ArticleDOI
TL;DR: ONCOMINE is presented, a cancer microarray database and web-based data-mining platform aimed at facilitating discovery from genome-wide expression analyses and novel biomarkers and therapeutic targets are discovered.

3,244 citations


Cites background from "Multiclass cancer diagnosis using t..."

  • ...published a report on multicancer type classification highlighting a focused gene set that can accurately classify tumor types of different origin [16]....

    [...]

References
More filters
Journal ArticleDOI
01 Jan 1973
TL;DR: In this paper, a six-step framework for organizing and discussing multivariate data analysis techniques with flowcharts for each is presented, focusing on the use of each technique, rather than its mathematical derivation.
Abstract: Offers an applications-oriented approach to multivariate data analysis, focusing on the use of each technique, rather than its mathematical derivation. The text introduces a six-step framework for organizing and discussing techniques with flowcharts for each. Well-suited for the non-statistician, this applications-oriented introduction to multivariate analysis focuses on the fundamental concepts that affect the use of specific techniques rather than the mathematical derivation of the technique. Provides an overview of several techniques and approaches that are available to analysts today - e.g., data warehousing and data mining, neural networks and resampling/bootstrapping. Chapters are organized to provide a practical, logical progression of the phases of analysis and to group similar types of techniques applicable to most situations. Table of Contents 1. Introduction. I. PREPARING FOR A MULTIVARIATE ANALYSIS. 2. Examining Your Data. 3. Factor Analysis. II. DEPENDENCE TECHNIQUES. 4. Multiple Regression. 5. Multiple Discriminant Analysis and Logistic Regression. 6. Multivariate Analysis of Variance. 7. Conjoint Analysis. 8. Canonical Correlation Analysis. III. INTERDEPENDENCE TECHNIQUES. 9. Cluster Analysis. 10. Multidimensional Scaling. IV. ADVANCED AND EMERGING TECHNIQUES. 11. Structural Equation Modeling. 12. Emerging Techniques in Multivariate Analysis. Appendix A: Applications of Multivariate Data Analysis. Index.

37,124 citations

01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations

Journal ArticleDOI
TL;DR: This chapter discusses Structural Equation Modeling: An Introduction, and SEM: Confirmatory Factor Analysis, and Testing A Structural Model, which shows how the model can be modified for different data types.
Abstract: I Introduction 1 Introduction II Preparing For a MV Analysis 2 Examining Your Data 3 Factor Analysis III Dependence Techniques 4 Multiple Regression Analysis 5 Multiple Discriminate Analysis and Logistic Regression 6 Multivariate Analysis of Variance 7 Conjoint Analysis IV Interdependence Techniques 8 Cluster Analysis 9 Multidimensional Scaling and Correspondence Analysis V Moving Beyond the Basic Techniques 10 Structural Equation Modeling: Overview 10a Appendix -- SEM 11 CFA: Confirmatory Factor Analysis 11a Appendix -- CFA 12 SEM: Testing A Structural Model 12a Appendix -- SEM APPENDIX A Basic Stats

23,353 citations

Journal ArticleDOI
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Abstract: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is de- scribed that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression. The output is displayed graphically, conveying the clustering and the underlying expression data simultaneously in a form intuitive for biologists. We have found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function, and we find a similar tendency in human data. Thus patterns seen in genome-wide expression experiments can be inter- preted as indications of the status of cellular processes. Also, coexpression of genes of known function with poorly charac- terized or novel genes may provide a simple means of gaining leads to the functions of many genes for which information is not available currently.

16,371 citations

Journal ArticleDOI
17 Aug 2000-Nature
TL;DR: Variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals were characterized using complementary DNA microarrays representing 8,102 human genes, providing a distinctive molecular portrait of each tumour.
Abstract: Human breast tumours are diverse in their natural history and in their responsiveness to treatments. Variation in transcriptional programs accounts for much of the biological diversity of human cells and tumours. In each cell, signal transduction and regulatory systems transduce information from the cell's identity to its environmental status, thereby controlling the level of expression of every gene in the genome. Here we have characterized variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals, using complementary DNA microarrays representing 8,102 human genes. These patterns provided a distinctive molecular portrait of each tumour. Twenty of the tumours were sampled twice, before and after a 16-week course of doxorubicin chemotherapy, and two tumours were paired with a lymph node metastasis from the same patient. Gene expression patterns in two tumour samples from the same individual were almost always more similar to each other than either was to any other sample. Sets of co-expressed genes were identified for which variation in messenger RNA levels could be related to specific features of physiological variation. The tumours could be classified into subtypes distinguished by pervasive differences in their gene expression patterns.

14,768 citations

Related Papers (5)