scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Predicting Outcomes of Nonsmall Cell Lung Cancer Using CT Image Features

TL;DR: Focusing on cases of the adenocarcinoma nonsmall cell lung cancer tumor subtype from a larger data set, it is shown that classifiers can be built to predict survival time, the first known result to make such predictions from CT scans of lung cancer.
Abstract: Nonsmall cell lung cancer is a prevalent disease. It is diagnosed and treated with the help of computed tomography (CT) scans. In this paper, we apply radiomics to select 3-D features from CT images of the lung toward providing prognostic information. Focusing on cases of the adenocarcinoma nonsmall cell lung cancer tumor subtype from a larger data set, we show that classifiers can be built to predict survival time. This is the first known result to make such predictions from CT scans of lung cancer. We compare classifiers and feature selection approaches. The best accuracy when predicting survival was 77.5% using a decision tree in a leave-one-out cross validation and was obtained after selecting five features per fold from 219.
Citations
More filters
Journal ArticleDOI
TL;DR: Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.
Abstract: Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients) We identified that Wilcoxon test based feature selection method WLCX (stability = 084 ± 005, AUC = 065 ± 002) and a classification method random forest RF (RSD = 352%, AUC = 066 ± 003) had highest prognostic performance with high stability against data perturbation Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (3421% of total variance) Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice

749 citations

Journal ArticleDOI
TL;DR: The survey provides an overview on deep learning and the popular architectures used for cancer detection and diagnosis and presents four popular deep learning architectures, including convolutional neural networks, fully Convolutional networks, auto-encoders, and deep belief networks in the survey.

356 citations

Journal ArticleDOI
TL;DR: This study identified prognostic and reliable machine-learning methods for the prediction of overall survival of head and neck cancer patients and identified optimal machine- learning methods for radiomics-based prognostic analyses.
Abstract: Introduction: “Radiomics” extracts and mines large number of medical imaging features in a non-invasive and cost-effective way. The underlying assumption of radiomics is that these imaging features quantify phenotypic characteristics of entire tumor. In order to enhance applicability of radiomics in clinical oncology, highly accurate and reliable machine learning approaches are required. In this radiomic study, thirteen feature selection methods and eleven machine learning classification methods were evaluated in terms of their performance and stability for predicting overall survival in head and neck cancer patients. Methods: Two independent head and neck cancer cohorts were investigated. Training cohort HN1 consisted 101 HNSCC patients. Cohort HN2 (n=95) was used for validation. A total of 440 radiomic features were extracted from the segmented tumor regions in CT images. Feature selection and classification methods were compared using an unbiased evaluation framework. Results: We observed that the three feature selection methods MRMR (AUC = 0.69, Stability = 0.66), MIFS (AUC = 0.66, Stability = 0.69), and CIFE (AUC = 0.68, Stability = 0.7) had high prognostic performance and stability. The three classifiers BY (AUC = 0.67, RSD = 11.28), RF (AUC = 0.61, RSD = 7.36), and NN (AUC = 0.62, RSD = 10.52) also showed high prognostic performance and stability. Analysis investigating performance variability indicated that the choice of classification method is the major factor driving the performance variation (29.02% of total variance). Conclusions: Our study identified prognostic and reliable machine learning methods for the prediction of overall survival of head and neck cancer patients. Identification of optimal machine-learning methods for radiomics based prognostic analyses could broaden the scope of radiomics in precision oncology and cancer care.

299 citations

Journal ArticleDOI
23 Aug 2019-Cancers
TL;DR: This bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements.
Abstract: In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements.

256 citations

Journal ArticleDOI
TL;DR: This pilot study shows that radiomic data before treatment is able to predict mutation status and associated gefitinib response non-invasively, demonstrating the potential of radiomics-based phenotyping to improve the stratification and response assessment between tyrosine kinase inhibitors (TKIs) sensitive and resistant patient populations.
Abstract: Medical imaging plays a fundamental role in oncology and drug development, by providing a non-invasive method to visualize tumor phenotype. Radiomics can quantify this phenotype comprehensively by applying image-characterization algorithms, and may provide important information beyond tumor size or burden. In this study, we investigated if radiomics can identify a gefitinib response-phenotype, studying high-resolution computed-tomography (CT) imaging of forty-seven patients with early-stage non-small cell lung cancer before and after three weeks of therapy. On the baseline-scan, radiomic-feature Laws-Energy was significantly predictive for EGFR-mutation status (AUC = 0.67, p = 0.03), while volume (AUC = 0.59, p = 0.27) and diameter (AUC = 0.56, p = 0.46) were not. Although no features were predictive on the post-treatment scan (p > 0.08), the change in features between the two scans was strongly predictive (significant feature AUC-range = 0.74–0.91). A technical validation revealed that the associated features were also highly stable for test-retest (mean ± std: ICC = 0.96 ± 0.06). This pilot study shows that radiomic data before treatment is able to predict mutation status and associated gefitinib response non-invasively, demonstrating the potential of radiomics-based phenotyping to improve the stratification and response assessment between tyrosine kinase inhibitors (TKIs) sensitive and resistant patient populations.

172 citations

References
More filters
Book ChapterDOI

[...]

01 Jan 2012

139,059 citations


"Predicting Outcomes of Nonsmall Cel..." refers background in this paper

  • ...[6] showed the effectiveness of a support vector machine in classifying benign and malignant pulmonary nodules....

    [...]

Journal ArticleDOI
TL;DR: Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.
Abstract: LIBSVM is a library for Support Vector Machines (SVMs). We have been actively developing this package since the year 2000. The goal is to help users to easily apply SVM to their applications. LIBSVM has gained wide popularity in machine learning and many other areas. In this article, we present all implementation details of LIBSVM. Issues such as solving SVM optimization problems theoretical convergence multiclass classification probability estimates and parameter selection are discussed in detail.

40,826 citations


"Predicting Outcomes of Nonsmall Cel..." refers methods in this paper

  • ...We used the support vector machine libSVM by Chang and Lin [24]....

    [...]

Journal ArticleDOI
TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Abstract: The support-vector network is a new learning machine for two-group classification problems. The machine conceptually implements the following idea: input vectors are non-linearly mapped to a very high-dimension feature space. In this feature space a linear decision surface is constructed. Special properties of the decision surface ensures high generalization ability of the learning machine. The idea behind the support-vector network was previously implemented for the restricted case where the training data can be separated without errors. We here extend this result to non-separable training data. High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated. We also compare the performance of the support-vector network to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.

37,861 citations


"Predicting Outcomes of Nonsmall Cel..." refers background in this paper

  • ...Support vector machines are based on statistical learning theory developed by Cortes and Vapnik [20] and have been shown by Kramer et al. [21], among others, to obtain high accuracy on a diverse range of application domains such as the letter, page, pendigit, satimage, and waveform data sets [22]....

    [...]

  • ...The hyperplane construction can be reduced to a quadratic optimization problem; subsets of training patterns that lie on the margin were termed support vectors by Cortes and Vapnik [20]....

    [...]

  • ...Support vector machines are based on statistical learning theory developed by Cortes and Vapnik [20] and have been shown by Kramer et al....

    [...]

Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations


"Predicting Outcomes of Nonsmall Cel..." refers background in this paper

  • ...5 release 8 code developed by Quinlan [16]....

    [...]

Journal ArticleDOI
TL;DR: This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.
Abstract: More than twelve years have elapsed since the first public release of WEKA. In that time, the software has been rewritten entirely from scratch, evolved substantially and now accompanies a text on data mining [35]. These days, WEKA enjoys widespread acceptance in both academia and business, has an active community, and has been downloaded more than 1.4 million times since being placed on Source-Forge in April 2000. This paper provides an introduction to the WEKA workbench, reviews the history of the project, and, in light of the recent 3.6 stable release, briefly discusses what has been added since the last stable version (Weka 3.4) released in 2003.

19,603 citations


"Predicting Outcomes of Nonsmall Cel..." refers methods in this paper

  • ...The decision tree used in this study was Weka’s J48, [15], which is an implementation of C4.5 release 8 code developed by Quinlan [16]....

    [...]

  • ...The implementation used was found in WEKA, [15] and utilized local prediction....

    [...]

  • ...The classifier labeled Naive Bayes [19] in Weka [15] was used for this work....

    [...]

  • ...The decision tree used in this study was Weka’s J48, [15], which is an implementation of C4....

    [...]

  • ...The rule based classifier used was Weka’s JRIP, [15], an implementation of the RIPPER algorithm by Cohen [17]....

    [...]

Related Papers (5)