Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning

doi:10.1109/TCBB.2016.2551745

Home
/
Papers
/
Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning

Journal Article•DOI•

Improve Glioblastoma Multiforme Prognosis Prediction by Using Feature Selection and Multiple Kernel Learning

Ya Zhang¹, Ao Li¹, Chen Peng¹, Minghui Wang¹•Institutions (1)

University of Science and Technology of China¹

01 Sep 2016-IEEE/ACM Transactions on Computational Biology and Bioinformatics (IEEE)-Vol. 13, Iss: 5, pp 825-835

TL;DR: The goal is to establish an integrated model which could predict GBM prognosis with high accuracy by taking advantage of the minimum redundancy feature selection method (mRMR) and Multiple Kernel Machine (MKL) learning method.

read less

Abstract: Glioblastoma multiforme (GBM) is a highly aggressive type of brain cancer with very low median survival. In order to predict the patient's prognosis, researchers have proposed rules to classify different glioma cancer cell subtypes. However, survival time of different subtypes of GBM is often various due to different individual basis. Recent development in gene testing has evolved classic subtype rules to more specific classification rules based on single biomolecular features. These classification methods are proven to perform better than traditional simple rules in GBM prognosis prediction. However, the real power behind the massive data is still under covered. We believe a combined prediction model based on more than one data type could perform better, which will contribute further to clinical treatment of GBM. The Cancer Genome Atlas (TCGA) database provides huge dataset with various data types of many cancers that enables us to inspect this aggressive cancer in a new way. In this research, we have improved GBM prognosis prediction accuracy further by taking advantage of the minimum redundancy feature selection method (mRMR) and Multiple Kernel Machine (MKL) learning method. Our goal is to establish an integrated model which could predict GBM prognosis with high accuracy.

...read moreread less

Citations

PDF

Open Access

More filters

The somatic genomic landscape of glioblastoma

[...]

Cameron Brennan, Roel G.W. Verhaak, Aaron McKenna, Benito Campos, Houtan Noushmehr, Sofie R. Salama, Siyuan Zheng, Debyani Chakravarty, J. Zachary Sanborn, Samuel H. Berman, Rameen Beroukhim, Brady Bernard, Chang-Jiun Wu, Giannicola Genovese, Ilya Shmulevich, Jill S. Barnholtz-Sloan, Lihua Zou, Rahulsimham Vegesna, Sachet A. Shukla, Giovanni Ciriello, W. K. Yung, Wei Zhang, Carrie Sougnez, Tom Mikkelsen, Kenneth Aldape, Darell D. Bigner, Erwin G. Van Meir, Michael D. Prados, Andrew E. Sloan, Keith L. Black, Jennifer M. Eschbacher, Gaetano Finocchiaro, William A. Friedman, David W. Andrews, Abhijit Guha, Mary Iacocca, Brian P. O'Neil, Greg Foltz, Jerome Myers, Daniel J. Weisenberger, Robert Penny, Raju Kucherlapati, Charles M. Perou, D. Neil Hayes, Richard A. Gibbs, Marco A. Marra, Gordon B. Mills, Eric S. Lander, Paul T. Spellman, Richard K. Wilson, Chris Sander, John N. Weinstein, Matthew Meyerson, Stacey Gabriel, Peter W. Laird, David Haussler, Gad Getz, Lynda Chin - Show less +54 more

01 Jan 2013

TL;DR: In this article, the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs) was described, including several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA.

...read moreread less

Abstract: We describe the landscape of somatic genomic alterations based on multidimensional and comprehensive characterization of more than 500 glioblastoma tumors (GBMs). We identify several novel mutated genes as well as complex rearrangements of signature receptors, including EGFR and PDGFRA. TERT promoter mutations are shown to correlate with elevated mRNA expression, supporting a role in telomerase reactivation. Correlative analyses confirm that the survival advantage of the proneural subtype is conferred by the G-CIMP phenotype, and MGMT DNA methylation may be a predictive biomarker for treatment response only in classical subtype GBM. Integrative analysis of genomic and proteomic profiles challenges the notion of therapeutic inhibition of a pathway as an alternative to inhibition of the target itself. These data will facilitate the discovery of therapeutic and diagnostic target candidates, the validation of research and clinical observations and the generation of unanticipated hypotheses that can advance our molecular understanding of this lethal cancer.

...read moreread less

2,616 citations

Journal Article•DOI•

A review on machine learning principles for multi-view biological data integration.

[...]

Yifeng Li¹, Fang-Xiang Wu², Alioune Ngom³•Institutions (3)

National Research Council¹, University of Saskatchewan², University of Windsor³

22 Dec 2016-Briefings in Bioinformatics

TL;DR: It is shown that Bayesian models are able to use prior information and model measurements with various distributions, and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

...read moreread less

Abstract: Driven by high-throughput sequencing techniques, modern genomic and clinical studies are in a strong need of integrative machine learning models for better use of vast volumes of heterogeneous information in the deep understanding of biological systems and the development of predictive models. How data from multiple sources (called multi-view data) are incorporated in a learning system is a key step for successful analysis. In this article, we provide a comprehensive review on omics and clinical data integration techniques, from a machine learning perspective, for various analyses such as prediction, clustering, dimension reduction and association. We shall show that Bayesian models are able to use prior information and model measurements with various distributions; tree-based methods can either build a tree with all features or collectively make a final decision based on trees learned from each view; kernel methods fuse the similarity matrices learned from individual views together for a final similarity matrix or learning model; network-based fusion methods are capable of inferring direct and indirect associations in a heterogeneous network; matrix factorization models have potential to learn interactions among features from different views; and a range of deep neural networks can be integrated in multi-modal learning for capturing the complex mechanism of biological systems.

...read moreread less

333 citations

Cites methods from "Improve Glioblastoma Multiforme Pro..."

...SimpleMKL-based [74] multiple kernel learning has been applied in [75] to predict cancer prognosis by using gene expres-...
[...]

Proceedings Article•DOI•

Brain Tumor Type Classification via Capsule Networks

[...]

Parnian Afshar¹, Arash Mohammadi¹, Konstantinos N. Plataniotis²•Institutions (2)

Concordia University¹, University of Toronto²

27 Feb 2018

TL;DR: In this paper, the authors adopt and incorporate CapsNets for the problem of brain tumor classification to design an improved architecture which maximizes the accuracy of the classification problem at hand.

...read moreread less

Abstract: Brain tumor is considered as one of the deadliest and most common form of cancer both in children and in adults. Consequently, determining the correct type of brain tumor in early stages is of significant importance to devise a precise treatment plan and predict patient's response to the adopted treatment. In this regard, there has been a recent surge of interest in designing Convolutional Neural Networks (CNNs) for the problem of brain tumor type classification. However, CNNs typically require large amount of training data and can not properly handle input transformations. Capsule networks (referred to as CapsNets) are brand new machine learning architectures proposed very recently to overcome these shortcomings of CNNs, and posed to revolutionize deep learning solutions. Of particular interest to this work is that Capsule networks are robust to rotation and affine transformation, and require far less training data, which is the case for processing medical image datasets including brain Magnetic Resonance Imaging (MRI) images. In this paper, we focus to achieve the following four objectives: (i) Adopt and incorporate CapsNets for the problem of brain tumor classification to design an improved architecture which maximizes the accuracy of the classification problem at hand; (ii) Investigate the over-fitting problem of CapsNets based on a real set of MRI images; (iii) Explore whether or not CapsNets are capable of providing better fit for the whole brain images or just the segmented tumor, and; (iv) Develop a visualization paradigm for the output of the CapsNet to better explain the learned features. Our results show that the proposed approach can successfully overcome CNNs for the brain tumor classification problem.

...read moreread less

304 citations

Journal Article•DOI•

Machine Learning and Integrative Analysis of Biomedical Big Data.

[...]

Bilal Mirza¹, Wei Wang, Jie Wang¹, Howard Choi¹, Neo Christopher Chung², Neo Christopher Chung¹, Peipei Ping - Show less +3 more•Institutions (2)

University of California, Los Angeles¹, University of Warsaw²

28 Jan 2019-Genes

TL;DR: In this article, state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.

...read moreread less

Abstract: Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues.

...read moreread less

185 citations

Cites background or methods from "Improve Glioblastoma Multiforme Pro..."

...For improving drug sensitivity in breast cancer, genomics, epigenomics, and proteomics, data were integrated using a multiview multiple kernel learning (MKL) approach [25]....
[...]
...For example, Glioblastoma Multiforme is a highly aggressive type of brain cancer whose prognostic prediction can be improved by considering multiple data types together [77], i....
[...]
...Moreover, scalable MKL methods like dual-layer kernel extreme learning machine (DKELM) [198] and easyMKL [199] can be employed in multi-omics integrative analysis since MKL, a popular approach for integrating multiple omics datasets, can be computationally very expensive for large datasets....
[...]
...Multiple kernel learning (MKL) [82] has become a popular approach to integrate data by calculating individual kernel matrices for each data type and fusing them into a global model....
[...]
...For example, a feed-forward neural network with multiple hidden layers can now be trained to accurately differentiate non-coding RNA types, i.e., circular RNAs (cirRNAs) from long non-coding RNAs (lncRNAs) in just a few hours on a single computer while the MKL method would take four days [182]....
[...]

Journal Article•DOI•

A Multimodal Deep Neural Network for Human Breast Cancer Prognosis Prediction by Integrating Multi-Dimensional Data

[...]

Dongdong Sun¹, Minghui Wang¹, Ao Li¹•Institutions (1)

University of Science and Technology of China¹

01 May 2019-IEEE/ACM Transactions on Computational Biology and Bioinformatics

TL;DR: This study proposes a Multimodal Deep Neural Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer and shows that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other existing approaches.

...read moreread less

Abstract: Breast cancer is a highly aggressive type of cancer with very low median survival. Accurate prognosis prediction of breast cancer can spare a significant number of patients from receiving unnecessary adjuvant systemic treatment and its related expensive medical costs. Previous work relies mostly on selected gene expression data to create a predictive model. The emergence of deep learning methods and multi-dimensional data offers opportunities for more comprehensive analysis of the molecular characteristics of breast cancer and therefore can improve diagnosis, treatment, and prevention. In this study, we propose a Multimodal Deep Neural Network by integrating Multi-dimensional Data (MDNNMD) for the prognosis prediction of breast cancer. The novelty of the method lies in the design of our method's architecture and the fusion of multi-dimensional data. The comprehensive performance evaluation results show that the proposed method achieves a better performance than the prediction methods with single-dimensional data and other existing approaches. The source code implemented by TensorFlow 1.0 deep learning library can be downloaded from the Github: https://github.com/USTC-HIlab/MDNNMD.

...read moreread less

174 citations

Cites background or methods from "Improve Glioblastoma Multiforme Pro..."

...propose a multiple kernel machine learning method by fusing different types of data for the GBM prognosis prediction [16]....
[...]
...To further demonstrate the predictive results of the multi-dimensional data in assessing the risk of developing distant metastases in breast cancer patients, survival data analyses of the proposed method is also performed according to previous studies [16], [51], [52], the Kaplan-Meier curve is plotted and shown in Fig....
[...]
...The optimal parameters are chosen by the parameter combination leading to the best performance (AUC value) [16], [44]....
[...]
...mRMR [36] is one of the most common dimensionality reduction algorithm in a wide range of applications [16], [37], [38]....
[...]
...To comprehensively evaluate our proposed method, we use ten-fold cross validation experiment in consistent with previous existing studies of cancer prognosis prediction [16], [48]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Radiotherapy plus Concomitant and Adjuvant Temozolomide for Glioblastoma

[...]

Roger Stupp¹, Warren P. Mason, Martin J. van den Bent², Michael Weller³, Barbara Fisher⁴, Martin J.B. Taphoorn⁵, Karl Belanger⁶, Alba A. Brandes, Christine Marosi⁷, Ulrich Bogdahn⁸, Jürgen Curschmann⁹, Robert C. Janzer¹, Samuel K. Ludwin¹⁰, Thierry Gorlia, Anouk Allgeier, Denis Lacombe, J. Gregory Cairncross¹¹, Elizabeth Eisenhauer, René O. Mirimanoff¹ - Show less +15 more•Institutions (11)

University of Lausanne¹, Erasmus University Rotterdam², University of Tübingen³, University of Western Ontario⁴, Utrecht University⁵, Université de Montréal⁶, Medical University of Vienna⁷, University of Regensburg⁸, University of Bern⁹, Queen's University¹⁰, University of Calgary¹¹

10 Mar 2005-The New England Journal of Medicine

TL;DR: The addition of temozolomide to radiotherapy for newly diagnosed glioblastoma resulted in a clinically meaningful and statistically significant survival benefit with minimal additional toxicity.

...read moreread less

Abstract: methods Patients with newly diagnosed, histologically confirmed glioblastoma were randomly assigned to receive radiotherapy alone (fractionated focal irradiation in daily fractions of 2 Gy given 5 days per week for 6 weeks, for a total of 60 Gy) or radiotherapy plus continuous daily temozolomide (75 mg per square meter of body-surface area per day, 7 days per week from the first to the last day of radiotherapy), followed by six cycles of adjuvant temozolomide (150 to 200 mg per square meter for 5 days during each 28-day cycle). The primary end point was overall survival. results A total of 573 patients from 85 centers underwent randomization. The median age was 56 years, and 84 percent of patients had undergone debulking surgery. At a median follow-up of 28 months, the median survival was 14.6 months with radiotherapy plus temozolomide and 12.1 months with radiotherapy alone. The unadjusted hazard ratio for death in the radiotherapy-plus-temozolomide group was 0.63 (95 percent confidence interval, 0.52 to 0.75; P<0.001 by the log-rank test). The two-year survival rate was 26.5 percent with radiotherapy plus temozolomide and 10.4 percent with radiotherapy alone. Concomitant treatment with radiotherapy plus temozolomide resulted in grade 3 or 4 hematologic toxic effects in 7 percent of patients.

...read moreread less

16,653 citations

"Improve Glioblastoma Multiforme Pro..." refers background in this paper

...Recent development in gene testing has evolved classic subtype rules to more specific classification rules based on single biomolecular features....
[...]

Journal Article•DOI•

The 2007 WHO Classification of Tumours of the Central Nervous System

[...]

David N. Louis¹, Hiroko Ohgaki², Otmar D. Wiestler³, Webster K. Cavenee⁴, Peter C. Burger⁵, Anne Jouvet, Bernd W. Scheithauer⁶, Paul Kleihues⁷ - Show less +4 more•Institutions (7)

Harvard University¹, International Agency for Research on Cancer², German Cancer Research Center³, Ludwig Institute for Cancer Research⁴, Johns Hopkins University⁵, Mayo Clinic⁶, University of Zurich⁷

06 Jul 2007-Acta Neuropathologica

TL;DR: The fourth edition of the World Health Organization (WHO) classification of tumours of the central nervous system, published in 2007, lists several new entities, including angiocentric glioma, papillary glioneuronal tumour, rosette-forming glioneurs tumour of the fourth ventricle, Papillary tumourof the pineal region, pituicytoma and spindle cell oncocytoma of the adenohypophysis.

...read moreread less

Abstract: The fourth edition of the World Health Organization (WHO) classification of tumours of the central nervous system, published in 2007, lists several new entities, including angiocentric glioma, papillary glioneuronal tumour, rosette-forming glioneuronal tumour of the fourth ventricle, papillary tumour of the pineal region, pituicytoma and spindle cell oncocytoma of the adenohypophysis. Histological variants were added if there was evidence of a different age distribution, location, genetic profile or clinical behaviour; these included pilomyxoid astrocytoma, anaplastic medulloblastoma and medulloblastoma with extensive nodularity. The WHO grading scheme and the sections on genetic profiles were updated and the rhabdoid tumour predisposition syndrome was added to the list of familial tumour syndromes typically involving the nervous system. As in the previous, 2000 edition of the WHO ‘Blue Book’, the classification is accompanied by a concise commentary on clinico-pathological characteristics of each tumour type. The 2007 WHO classification is based on the consensus of an international Working Group of 25 pathologists and geneticists, as well as contributions from more than 70 international experts overall, and is presented as the standard for the definition of brain tumours to the clinical oncology and cancer research communities world-wide.

...read moreread less

13,134 citations

"Improve Glioblastoma Multiforme Pro..." refers background in this paper

...However, the real power behind the massive data is still under covered....
[...]

Journal Article•DOI•

Cancer Statistics, 2009

[...]

Ahmedin Jemal¹, Rebecca L. Siegel¹, Elizabeth Ward¹, Yongping Hao¹, Jiaquan Xu², Michael J. Thun¹ - Show less +2 more•Institutions (2)

American Cancer Society¹, Centers for Disease Control and Prevention²

01 Jul 2009-CA: A Cancer Journal for Clinicians

TL;DR: The most recent data on cancer incidence, mortality, and survival from the American Cancer Society (ACS) is presented in this paper, where the authors compare the three major cancer sites in men (lung, prostate, and colon and rectum [colorectum]) and in two major cancers sites in women (breast and colorectal) over a 15-year period.

...read moreread less

Abstract: Each year, the American Cancer Society estimates the number of new cancer cases and deaths expected in the United States in the current year and compiles the most recent data on cancer incidence, mortality, and survival based on incidence data from the National Cancer Institute, Centers for Disease Control and Prevention, and the North American Association of Central Cancer Registries and mortality data from the National Center for Health Statistics. Incidence and death rates are standardized by age to the 2000 United States standard million population. A total of 1,479,350 new cancer cases and 562,340 deaths from cancer are projected to occur in the United States in 2009. Overall cancer incidence rates decreased in the most recent time period in both men (1.8% per year from 2001 to 2005) and women (0.6% per year from 1998 to 2005), largely because of decreases in the three major cancer sites in men (lung, prostate, and colon and rectum [colorectum]) and in two major cancer sites in women (breast and colorectum). Overall cancer death rates decreased in men by 19.2% between 1990 and 2005, with decreases in lung (37%), prostate (24%), and colorectal (17%) cancer rates accounting for nearly 80% of the total decrease. Among women, overall cancer death rates between 1991 and 2005 decreased by 11.4%, with decreases in breast (37%) and colorectal (24%) cancer rates accounting for 60% of the total decrease. The reduction in the overall cancer death rates has resulted in the avoidance of about 650,000 deaths from cancer over the 15-year period. This report also examines cancer incidence, mortality, and survival by site, sex, race/ethnicity, education, geographic area, and calendar year. Although progress has been made in reducing incidence and mortality rates and improving survival, cancer still accounts for more deaths than heart disease in persons younger than 85 years of age. Further progress can be accelerated by applying existing cancer control knowledge across all segments of the population and by supporting new discoveries in cancer prevention, early detection, and treatment.

...read moreread less

9,129 citations

Journal Article•DOI•

Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy

[...]

Hanchuan Peng¹, Fuhui Long¹, Chris Ding¹•Institutions (1)

Lawrence Berkeley National Laboratory¹

01 Aug 2005-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.

...read moreread less

Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

...read moreread less

8,078 citations

Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy

[...]

Hanchuan Peng, Fuhui Long, Chris Ding

05 Aug 2003

TL;DR: This work derives an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection, and presents a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers).

...read moreread less

7,075 citations