scispace - formally typeset
Open AccessJournal ArticleDOI

PIMKL: Pathway-Induced Multiple Kernel Learning.

TLDR
A team led by María Rodríguez Martínez at IBM Research - Zürich has developed PIMKL, a methodology that exploits prior knowledge and enables the integration of multiple types of data with varying predictive power and produces a molecular signature that enables the interpretation of the results in terms of known biological functions.
Abstract
Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behavior might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway-Induced Multiple Kernel Learning (PIMKL), a methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels to predict a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Integration strategies of multi-omics data for machine learning analysis.

TL;DR: In this article, the authors focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications and summarize the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical.
Journal ArticleDOI

Incorporating biological structure into machine learning models in biomedicine.

TL;DR: For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful.
Journal ArticleDOI

Informative top-k class associative rule for cancer biomarker discovery on microarray data

TL;DR: An enhanced associative classification algorithm that integrates microarray data with biological information from gene ontology, KEGG pathways, and protein-protein interactions to generate informative class associative rules is introduced.
Journal ArticleDOI

The Multiple Dimensions of Networks in Cancer: A Perspective

TL;DR: The focus of the perspective is to demonstrate how networks can model the physics, analyse the interactions, and predict the evolution of the multiple processes behind tumour-host encounters across multiple scales.
Journal ArticleDOI

Accurate cancer phenotype prediction with AKLIMATE, a stacked kernel learner integrating multimodal genomic data and pathway knowledge.

TL;DR: AKLIMATE as mentioned in this paper is a kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks.
References
More filters
Journal ArticleDOI

KEGG: Kyoto Encyclopedia of Genes and Genomes

TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Journal ArticleDOI

Gene Selection for Cancer Classification using Support Vector Machines

TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
Journal ArticleDOI

The Molecular Signatures Database Hallmark Gene Set Collection

TL;DR: A combination of automated approaches and expert curation is used to develop a collection of "hallmark" gene sets, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression in MSigDB.
Journal ArticleDOI

The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.

TL;DR: In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework.
Related Papers (5)