scispace - formally typeset
Open AccessJournal ArticleDOI

Discriminating early- and late-stage cancers using multiple kernel learning on gene sets

Reads0
Chats0
TLDR
This study addressed the problem of separating early‐ and late‐stage cancers from each other using their gene expression profiles and proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets to obtain satisfactory/improved predictive performance and identify biological mechanisms that might have an effect in cancer progression.
Abstract
Motivation Identifying molecular mechanisms that drive cancers from early to late stages is highly important to develop new preventive and therapeutic strategies. Standard machine learning algorithms could be used to discriminate early- and late-stage cancers from each other using their genomic characterizations. Even though these algorithms would get satisfactory predictive performance, their knowledge extraction capability would be quite restricted due to highly correlated nature of genomic data. That is why we need algorithms that can also extract relevant information about these biological mechanisms using our prior knowledge about pathways/gene sets. Results In this study, we addressed the problem of separating early- and late-stage cancers from each other using their gene expression profiles. We proposed to use a multiple kernel learning (MKL) formulation that makes use of pathways/gene sets (i) to obtain satisfactory/improved predictive performance and (ii) to identify biological mechanisms that might have an effect in cancer progression. We extensively compared our proposed MKL on gene sets algorithm against two standard machine learning algorithms, namely, random forests and support vector machines, on 20 diseases from the Cancer Genome Atlas cohorts for two different sets of experiments. Our method obtained statistically significantly better or comparable predictive performance on most of the datasets using significantly fewer gene expression features. We also showed that our algorithm was able to extract meaningful and disease-specific information that gives clues about the progression mechanism. Availability and implementation Our implementations of support vector machine and multiple kernel learning algorithms in R are available at https://github.com/mehmetgonen/gsbc together with the scripts that replicate the reported experiments.

read more

Citations
More filters
Journal ArticleDOI

Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data.

TL;DR: Comparative experiments demonstrated that the XGBoost method has a remarkable performance in predicting the stage of cancer patients with multi-omics data and identification of novel candidate genes associated with cancer stages would contribute to further elucidate disease pathogenesis and develop novel therapeutics.
Journal ArticleDOI

Expression of immune checkpoints and T cell exhaustion markers in early and advanced stages of colorectal cancer

TL;DR: Findings suggest some potential T cell exhaustion markers that could be utilized as prognostic biomarkers and/or therapeutic targets for CRC, however, further investigations and validations in larger cohorts are required to confirm these findings.
Journal ArticleDOI

Integrative analysis of DNA methylation and gene expression in papillary renal cell carcinoma

TL;DR: This study identifies PRCC driver genes and proposes predictive models based on both DNA methylation and gene expression and developed machine learning models using features extracted from single and multi-omics data to distinguish early and late stages of PRCC.
Journal ArticleDOI

ECMarker: interpretable machine learning model identifies gene expression biomarkers predicting clinical outcomes and reveals molecular mechanisms of human disease in early stages.

TL;DR: An interpretable and scalable machine learning model, ECMarker, is developed to predict gene expression biomarkers for disease phenotypes and simultaneously reveal underlying regulatory mechanisms in the lung cancer development.
Journal ArticleDOI

An improved clear cell renal cell carcinoma stage prediction model based on gene sets

TL;DR: New strategies to extract important gene features and trained machine learning-based classifiers to predict stages of clear cell renal cell carcinoma samples are developed and suggested that the model can extract more prognostic information.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Journal ArticleDOI

Support-Vector Networks

TL;DR: High generalization ability of support-vector networks utilizing polynomial input transformations is demonstrated and the performance of the support- vector network is compared to various classical learning algorithms that all took part in a benchmark study of Optical Character Recognition.
Journal ArticleDOI

The Molecular Signatures Database Hallmark Gene Set Collection

TL;DR: A combination of automated approaches and expert curation is used to develop a collection of "hallmark" gene sets, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression in MSigDB.
Journal ArticleDOI

Tumorigenesis and the angiogenic switch

TL;DR: A more detailed understanding of the complex parameters that govern the interactions between the tumour and vascular compartments will help to improve anti-angiogenic strategies — not only for cancer treatment, but also for preventing recurrence.
Journal ArticleDOI

Gene selection and classification of microarray data using random forest

TL;DR: It is shown that random forest has comparable performance to other classification methods, including DLDA, KNN, and SVM, and that the new gene selection procedure yields very small sets of genes (often smaller than alternative methods) while preserving predictive accuracy.
Related Papers (5)