PIMKL: Pathway-Induced Multiple Kernel Learning.
Matteo Manica,Matteo Manica,Joris Cadow,Joris Cadow,Roland Mathis,María Rodríguez Martínez +5 more
- Vol. 5, Iss: 1, pp 8-8
TLDR
A team led by María Rodríguez Martínez at IBM Research - Zürich has developed PIMKL, a methodology that exploits prior knowledge and enables the integration of multiple types of data with varying predictive power and produces a molecular signature that enables the interpretation of the results in terms of known biological functions.Abstract:
Reliable identification of molecular biomarkers is essential for accurate patient stratification. While state-of-the-art machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power, limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, and we have very little understanding about the mechanisms that lead to the prediction. While opaqueness concerning machine behavior might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway-Induced Multiple Kernel Learning (PIMKL), a methodology to reliably classify samples that can also help gain insights into the molecular mechanisms that underlie the classification. PIMKL exploits prior knowledge in the form of a molecular interaction network and annotated gene sets, by optimizing a mixture of pathway-induced kernels using a Multiple Kernel Learning (MKL) algorithm, an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels to predict a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks.read more
Citations
More filters
Journal ArticleDOI
Integration strategies of multi-omics data for machine learning analysis.
TL;DR: In this article, the authors focus on challenges and existing multi-omics integration strategies by paying special attention to machine learning applications and summarize the most recent data integration methods/ frameworks into five different integration strategies: early, mixed, intermediate, late and hierarchical.
Journal ArticleDOI
Incorporating biological structure into machine learning models in biomedicine.
Jake Crawford,Casey S. Greene +1 more
TL;DR: For machine learning in biomedicine, where sample size is limited and model interpretability is crucial, incorporating prior knowledge in the form of structured data can be particularly useful.
Journal ArticleDOI
Informative top-k class associative rule for cancer biomarker discovery on microarray data
TL;DR: An enhanced associative classification algorithm that integrates microarray data with biological information from gene ontology, KEGG pathways, and protein-protein interactions to generate informative class associative rules is introduced.
Journal ArticleDOI
The Multiple Dimensions of Networks in Cancer: A Perspective
TL;DR: The focus of the perspective is to demonstrate how networks can model the physics, analyse the interactions, and predict the evolution of the multiple processes behind tumour-host encounters across multiple scales.
Journal ArticleDOI
Accurate cancer phenotype prediction with AKLIMATE, a stacked kernel learner integrating multimodal genomic data and pathway knowledge.
TL;DR: AKLIMATE as mentioned in this paper is a kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks.
References
More filters
Journal ArticleDOI
KEGG: Kyoto Encyclopedia of Genes and Genomes
Minoru Kanehisa,Susumu Goto +1 more
TL;DR: The Kyoto Encyclopedia of Genes and Genomes (KEGG) as discussed by the authors is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules.
Journal ArticleDOI
Gene Selection for Cancer Classification using Support Vector Machines
TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
Journal ArticleDOI
The Molecular Signatures Database Hallmark Gene Set Collection
Arthur Liberzon,Chet Birger,Helga Thorvaldsdottir,Mahmoud Ghandi,Jill P. Mesirov,Pablo Tamayo +5 more
TL;DR: A combination of automated approaches and expert curation is used to develop a collection of "hallmark" gene sets, derived from multiple "founder" sets, that conveys a specific biological state or process and displays coherent expression in MSigDB.
Journal ArticleDOI
The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible.
Damian Szklarczyk,John H. Morris,Helen Cook,Michael Kuhn,Stefan Wyder,Milan Simonovic,Alberto Santos,Nadezhda Tsankova Doncheva,Alexander Roth,Peer Bork,Lars Juhl Jensen,Christian von Mering +11 more
TL;DR: In the latest version 10.5 of STRING, the biggest changes are concerned with data dissemination: the web frontend has been completely redesigned to reduce dependency on outdated browser technologies, and the database can now also be queried from inside the popular Cytoscape software framework.
Journal ArticleDOI
The Reactome Pathway Knowledgebase.
Antonio Fabregat,Konstantinos Sidiropoulos,Phani V. Garapati,Marc Gillespie,Marc Gillespie,Kerstin Hausmann,Robin Haw,Bijay Jassal,S Jupe,Florian Korninger,Sheldon J. McKay,Lisa Matthews,Bruce May,Marija Milacic,Karen Rothfels,Veronica Shamovsky,Marissa Webber,Joel Weiser,Mark Williams,Guanming Wu,Lincoln Stein,Lincoln Stein,Lincoln Stein,Henning Hermjakob,Henning Hermjakob,Peter D'Eustachio +25 more
TL;DR: The Reactome Knowledgebase provides molecular details of signal transduction, transport, DNA replication, metabolism and other cellular processes as an ordered network of molecular transformations—an extended version of a classic metabolic map, in a single consistent data model.