scispace - formally typeset
Search or ask a question
Author

John N. Korecki

Bio: John N. Korecki is an academic researcher from University of South Florida. The author has contributed to research in topics: Statistical classification & Stability (learning theory). The author has an hindex of 3, co-authored 5 publications receiving 126 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: Focusing on cases of the adenocarcinoma nonsmall cell lung cancer tumor subtype from a larger data set, it is shown that classifiers can be built to predict survival time, the first known result to make such predictions from CT scans of lung cancer.
Abstract: Nonsmall cell lung cancer is a prevalent disease. It is diagnosed and treated with the help of computed tomography (CT) scans. In this paper, we apply radiomics to select 3-D features from CT images of the lung toward providing prognostic information. Focusing on cases of the adenocarcinoma nonsmall cell lung cancer tumor subtype from a larger data set, we show that classifiers can be built to predict survival time. This is the first known result to make such predictions from CT scans of lung cancer. We compare classifiers and feature selection approaches. The best accuracy when predicting survival was 77.5% using a decision tree in a leave-one-out cross validation and was obtained after selecting five features per fold from 219.

110 citations

Journal ArticleDOI
TL;DR: The iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets and showed performance improvement on a reduced by t-test dataset.
Abstract: Gene-expression microarray datasets often consist of a limited number of samples with a large number of gene-expression measurements, usually on the order of thousands. Therefore, dimensionality reduction is critical prior to any classification task. In this work, the iterative feature perturbation method (IFP), an embedded gene selector, is introduced and applied to four microarray cancer datasets: colon cancer, leukemia, Moffitt colon cancer, and lung cancer. We compare results obtained by IFP to those of support vector machine-recursive feature elimination (SVM-RFE) and the t-test as a feature filter using a linear support vector machine as the base classifier. Analysis of the intersection of gene sets selected by the three methods across the four datasets was done. Additional experiments included an initial pre-selection of the top 200 genes based on their p values. IFP and SVM-RFE were then applied on the reduced feature sets. These results showed up to 3.32% average performance improvement for IFP across the four datasets. A statistical analysis (using the Friedman/Holm test) for both scenarios showed the highest accuracies came from the t-test as a filter on experiments without gene pre-selection. IFP and SVM-RFE had greater classification accuracy after gene pre-selection. Analysis showed the t-test is a good gene selector for microarray data. IFP and SVM-RFE showed performance improvement on a reduced by t-test dataset. The IFP approach resulted in comparable or superior average class accuracy when compared to SVM-RFE on three of the four datasets. The same or similar accuracies can be obtained with different sets of genes.

27 citations

Proceedings ArticleDOI
01 Dec 2008
TL;DR: This work presents an algorithm for the application of semi-supervised learning on disjoint data generated by complex simulations that shows a statistically significant accuracy improvement over supervised learning using the same underlying learning algorithm and requires less labeled data for comparable results.
Abstract: Complex simulations can generate very large amounts of data stored disjointedly across many local disks. Learning from this data can be problematic due to the difficulty of obtaining labels for the data. We present an algorithm for the application of semi-supervised learning on disjoint data generated by complex simulations. Our semi-supervised technique shows a statistically significant accuracy improvement over supervised learning using the same underlying learning algorithm and requires less labeled data for comparable results.

11 citations

Proceedings ArticleDOI
21 Nov 2011
TL;DR: This paper explores two different gene selection techniques and examines how well the genes selected compare between methods and also checks gene set consistency between data sets collected using the same protocols at different research institutions.
Abstract: Typically, thousands of gene expression levels are recorded for a group of patients, leading to the situation where the number of features far exceeds the number of examples To combat this, researchers would want to combine gene expression data collected at different sites into one data set to reduce the magnitude of the difference between the number of features (genes) and examples (samples) This makes gene selection a critical component of any process to build models using gene expression data For instance, in the domain of ordering cancer patients based on survival time, one might assume that utilizing genes related to cancer development and progression will allow the best model to be built In this paper, we explore two different gene selection techniques and examine how well the genes selected compare between methods We also check gene set consistency between data sets collected using the same protocols at different research institutions It is shown that gene selection can result in very different sets given different training data
01 Jan 2010
TL;DR: In this paper, the authors propose a novel approach to solve the problem of homonymity in homonymization, i.e., homonymonymity-of-homonymity.
Abstract: vii CHAPTER

Cited by
More filters
Journal ArticleDOI
TL;DR: Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice.
Abstract: Radiomics extracts and mines large number of medical imaging features quantifying tumor phenotypic characteristics Highly accurate and reliable machine-learning approaches can drive the success of radiomic applications in clinical care In this radiomic study, fourteen feature selection methods and twelve classification methods were examined in terms of their performance and stability for predicting overall survival A total of 440 radiomic features were extracted from pre-treatment computed tomography (CT) images of 464 lung cancer patients To ensure the unbiased evaluation of different machine-learning methods, publicly available implementations along with reported parameter configurations were used Furthermore, we used two independent radiomic cohorts for training (n = 310 patients) and validation (n = 154 patients) We identified that Wilcoxon test based feature selection method WLCX (stability = 084 ± 005, AUC = 065 ± 002) and a classification method random forest RF (RSD = 352%, AUC = 066 ± 003) had highest prognostic performance with high stability against data perturbation Our variability analysis indicated that the choice of classification method is the most dominant source of performance variation (3421% of total variance) Identification of optimal machine-learning methods for radiomic applications is a crucial step towards stable and clinically relevant radiomic biomarkers, providing a non-invasive way of quantifying and monitoring tumor-phenotypic characteristics in clinical practice

749 citations

Journal ArticleDOI
TL;DR: An experimental evaluation on the most representative datasets using well-known feature selection methods is presented, bearing in mind that the aim is not to provide the best feature selection method, but to facilitate their comparative study by the research community.

530 citations

Journal ArticleDOI
TL;DR: The survey provides an overview on deep learning and the popular architectures used for cancer detection and diagnosis and presents four popular deep learning architectures, including convolutional neural networks, fully Convolutional networks, auto-encoders, and deep belief networks in the survey.

356 citations

Journal ArticleDOI
TL;DR: This study identified prognostic and reliable machine-learning methods for the prediction of overall survival of head and neck cancer patients and identified optimal machine- learning methods for radiomics-based prognostic analyses.
Abstract: Introduction: “Radiomics” extracts and mines large number of medical imaging features in a non-invasive and cost-effective way. The underlying assumption of radiomics is that these imaging features quantify phenotypic characteristics of entire tumor. In order to enhance applicability of radiomics in clinical oncology, highly accurate and reliable machine learning approaches are required. In this radiomic study, thirteen feature selection methods and eleven machine learning classification methods were evaluated in terms of their performance and stability for predicting overall survival in head and neck cancer patients. Methods: Two independent head and neck cancer cohorts were investigated. Training cohort HN1 consisted 101 HNSCC patients. Cohort HN2 (n=95) was used for validation. A total of 440 radiomic features were extracted from the segmented tumor regions in CT images. Feature selection and classification methods were compared using an unbiased evaluation framework. Results: We observed that the three feature selection methods MRMR (AUC = 0.69, Stability = 0.66), MIFS (AUC = 0.66, Stability = 0.69), and CIFE (AUC = 0.68, Stability = 0.7) had high prognostic performance and stability. The three classifiers BY (AUC = 0.67, RSD = 11.28), RF (AUC = 0.61, RSD = 7.36), and NN (AUC = 0.62, RSD = 10.52) also showed high prognostic performance and stability. Analysis investigating performance variability indicated that the choice of classification method is the major factor driving the performance variation (29.02% of total variance). Conclusions: Our study identified prognostic and reliable machine learning methods for the prediction of overall survival of head and neck cancer patients. Identification of optimal machine-learning methods for radiomics based prognostic analyses could broaden the scope of radiomics in precision oncology and cancer care.

299 citations

Journal ArticleDOI
23 Aug 2019-Cancers
TL;DR: This bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements.
Abstract: In this paper, we first describe the basics of the field of cancer diagnosis, which includes steps of cancer diagnosis followed by the typical classification methods used by doctors, providing a historical idea of cancer classification techniques to the readers. These methods include Asymmetry, Border, Color and Diameter (ABCD) method, seven-point detection method, Menzies method, and pattern analysis. They are used regularly by doctors for cancer diagnosis, although they are not considered very efficient for obtaining better performance. Moreover, considering all types of audience, the basic evaluation criteria are also discussed. The criteria include the receiver operating characteristic curve (ROC curve), Area under the ROC curve (AUC), F1 score, accuracy, specificity, sensitivity, precision, dice-coefficient, average accuracy, and Jaccard index. Previously used methods are considered inefficient, asking for better and smarter methods for cancer diagnosis. Artificial intelligence and cancer diagnosis are gaining attention as a way to define better diagnostic tools. In particular, deep neural networks can be successfully used for intelligent image analysis. The basic framework of how this machine learning works on medical imaging is provided in this study, i.e., pre-processing, image segmentation and post-processing. The second part of this manuscript describes the different deep learning techniques, such as convolutional neural networks (CNNs), generative adversarial models (GANs), deep autoencoders (DANs), restricted Boltzmann’s machine (RBM), stacked autoencoders (SAE), convolutional autoencoders (CAE), recurrent neural networks (RNNs), long short-term memory (LTSM), multi-scale convolutional neural network (M-CNN), multi-instance learning convolutional neural network (MIL-CNN). For each technique, we provide Python codes, to allow interested readers to experiment with the cited algorithms on their own diagnostic problems. The third part of this manuscript compiles the successfully applied deep learning models for different types of cancers. Considering the length of the manuscript, we restrict ourselves to the discussion of breast cancer, lung cancer, brain cancer, and skin cancer. The purpose of this bibliographic review is to provide researchers opting to work in implementing deep learning and artificial neural networks for cancer diagnosis a knowledge from scratch of the state-of-the-art achievements.

256 citations