Showing papers by "Klaus-Robert Müller published in 2019"

PDF

Open Access

Journal Article•DOI•

Unmasking Clever Hans Predictors and Assessing What Machines Really Learn

[...]

Sebastian Lapuschkin¹, Stephan Wäldchen², Alexander Binder³, Grégoire Montavon², Wojciech Samek¹, Klaus-Robert Müller², Klaus-Robert Müller⁴, Klaus-Robert Müller⁵ - Show less +4 more•Institutions (5)

Heinrich Hertz Institute¹, Technical University of Berlin², Singapore University of Technology and Design³, Max Planck Society⁴, Korea University⁵

26 Feb 2019-arXiv: Artificial Intelligence

TL;DR: The authors investigate how these methods approach learning in order to assess the dependability of their decision making and propose a semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines.

...read moreread less

Abstract: Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly "intelligent" behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner.

...read moreread less

614 citations

Book•DOI•

Explainable ai – preface

[...]

Wojciech Samek¹, Grégoire Montavon², Andrea Vedaldi³, Lars Kai Hansen, Klaus-Robert Müller² - Show less +1 more•Institutions (3)

Heinrich Hertz Institute¹, Technical University of Berlin², University of Oxford³

01 Jan 2019

541 citations

Posted Content•

Robust and Communication-Efficient Federated Learning from Non-IID Data

[...]

Felix Sattler, Simon Wiedemann, Klaus-Robert Müller, Wojciech Samek

07 Mar 2019-arXiv: Learning

TL;DR: Sparse ternary compression (STC) is proposed, a new compression framework that is specifically designed to meet the requirements of the federated learning environment and advocate for a paradigm shift in federated optimization toward high-frequency low-bitwidth communication, in particular in the bandwidth-constrained learning environments.

...read moreread less

Abstract: Federated Learning allows multiple parties to jointly train a deep learning model on their combined data, without any of the participants having to reveal their local data to a centralized server. This form of privacy-preserving collaborative learning however comes at the cost of a significant communication overhead during training. To address this problem, several compression methods have been proposed in the distributed training literature that can reduce the amount of required communication by up to three orders of magnitude. These existing methods however are only of limited utility in the Federated Learning setting, as they either only compress the upstream communication from the clients to the server (leaving the downstream communication uncompressed) or only perform well under idealized conditions such as iid distribution of the client data, which typically can not be found in Federated Learning. In this work, we propose Sparse Ternary Compression (STC), a new compression framework that is specifically designed to meet the requirements of the Federated Learning environment. Our experiments on four different learning tasks demonstrate that STC distinctively outperforms Federated Averaging in common Federated Learning scenarios where clients either a) hold non-iid data, b) use small batch sizes during training, or where c) the number of clients is large and the participation rate in every communication round is low. We furthermore show that even if the clients hold iid data and use medium sized batches for training, STC still behaves pareto-superior to Federated Averaging in the sense that it achieves fixed target accuracies on our benchmarks within both fewer training iterations and a smaller communication budget.

...read moreread less

529 citations

Book Chapter•DOI•

Layer-Wise Relevance Propagation: An Overview

[...]

Grégoire Montavon¹, Alexander Binder², Sebastian Lapuschkin³, Wojciech Samek³, Klaus-Robert Müller⁴, Klaus-Robert Müller¹, Klaus-Robert Müller⁵ - Show less +3 more•Institutions (5)

Technical University of Berlin¹, Singapore University of Technology and Design², Heinrich Hertz Institute³, Max Planck Society⁴, Korea University⁵

01 Jan 2019

TL;DR: This chapter gives a concise introduction to LRP with a discussion of how to implement propagation rules easily and efficiently, how the propagation procedure can be theoretically justified as a ‘deep Taylor decomposition’, how to choose the propagation rules at each layer to deliver high explanation quality, and how LRP can be extended to handle a variety of machine learning scenarios beyond deep neural networks.

...read moreread less

Abstract: For a machine learning model to generalize well, one needs to ensure that its decisions are supported by meaningful patterns in the input data. A prerequisite is however for the model to be able to explain itself, e.g. by highlighting which input features it uses to support its prediction. Layer-wise Relevance Propagation (LRP) is a technique that brings such explainability and scales to potentially highly complex deep neural networks. It operates by propagating the prediction backward in the neural network, using a set of purposely designed propagation rules. In this chapter, we give a concise introduction to LRP with a discussion of (1) how to implement propagation rules easily and efficiently, (2) how the propagation procedure can be theoretically justified as a ‘deep Taylor decomposition’, (3) how to choose the propagation rules at each layer to deliver high explanation quality, and (4) how LRP can be extended to handle a variety of machine learning scenarios beyond deep neural networks.

...read moreread less

428 citations

Journal Article•DOI•

Unmasking Clever Hans predictors and assessing what machines really learn.

[...]

Sebastian Lapuschkin¹, Stephan Wäldchen², Alexander Binder³, Grégoire Montavon², Wojciech Samek¹, Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Klaus-Robert Müller² - Show less +4 more•Institutions (5)

Heinrich Hertz Institute¹, Technical University of Berlin², Singapore University of Technology and Design³, Korea University⁴, Max Planck Society⁵

11 Mar 2019-Nature Communications

TL;DR: In this article, the authors apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games, and propose a semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines.

...read moreread less

Abstract: Current learning machines have successfully solved hard application problems, reaching high accuracy and displaying seemingly intelligent behavior. Here we apply recent techniques for explaining decisions of state-of-the-art learning machines and analyze various tasks from computer vision and arcade games. This showcases a spectrum of problem-solving behaviors ranging from naive and short-sighted, to well-informed and strategic. We observe that standard performance evaluation metrics can be oblivious to distinguishing these diverse problem solving behaviors. Furthermore, we propose our semi-automated Spectral Relevance Analysis that provides a practically effective way of characterizing and validating the behavior of nonlinear learning machines. This helps to assess whether a learned model indeed delivers reliably for the problem that it was conceived for. Furthermore, our work intends to add a voice of caution to the ongoing excitement about machine intelligence and pledges to evaluate and judge some of these recent successes in a more nuanced manner. Nonlinear machine learning methods have good predictive ability but the lack of transparency of the algorithms can limit their use. Here the authors investigate how these methods approach learning in order to assess the dependability of their decision making.

...read moreread less

394 citations

Journal Article•DOI•

Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions

[...]

Kristof T. Schütt¹, Michael Gastegger¹, Alexandre Tkatchenko², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴, Reinhard J. Maurer⁵ - Show less +3 more•Institutions (5)

Technical University of Berlin¹, University of Luxembourg², Max Planck Society³, Korea University⁴, University of Warwick⁵

15 Nov 2019-Nature Communications

TL;DR: A deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived and captures quantum mechanics in an analytically differentiable representation is presented.

...read moreread less

Abstract: Machine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry. Machine learning models can accurately predict atomistic chemical properties but do not provide access to the molecular electronic structure. Here the authors use a deep learning approach to predict the quantum mechanical wavefunction at high efficiency from which other ground-state properties can be derived.

...read moreread less

334 citations

Posted Content•

Clustered Federated Learning: Model-Agnostic Distributed Multi-Task Optimization under Privacy Constraints

[...]

Felix Sattler, Klaus-Robert Müller, Wojciech Samek

04 Oct 2019-arXiv: Learning

TL;DR: Closed FL (CFL), a novel federated multitask learning (FMTL) framework, which exploits geometric properties of the FL loss surface to group the client population into clusters with jointly trainable data distributions, and comes with strong mathematical guarantees on the clustering quality.

...read moreread less

Abstract: Federated Learning (FL) is currently the most widely adopted framework for collaborative training of (deep) machine learning models under privacy constraints. Albeit it's popularity, it has been observed that Federated Learning yields suboptimal results if the local clients' data distributions diverge. To address this issue, we present Clustered Federated Learning (CFL), a novel Federated Multi-Task Learning (FMTL) framework, which exploits geometric properties of the FL loss surface, to group the client population into clusters with jointly trainable data distributions. In contrast to existing FMTL approaches, CFL does not require any modifications to the FL communication protocol to be made, is applicable to general non-convex objectives (in particular deep neural networks) and comes with strong mathematical guarantees on the clustering quality. CFL is flexible enough to handle client populations that vary over time and can be implemented in a privacy preserving way. As clustering is only performed after Federated Learning has converged to a stationary point, CFL can be viewed as a post-processing method that will always achieve greater or equal performance than conventional FL by allowing clients to arrive at more specialized models. We verify our theoretical analysis in experiments with deep convolutional and recurrent neural networks on commonly used Federated Learning datasets.

...read moreread less

288 citations

Journal Article•DOI•

SchNetPack: A Deep Learning Toolbox For Atomistic Systems

[...]

Kristof T. Schütt¹, Pan Kessel¹, Michael Gastegger¹, Kim A. Nicoli¹, Alexandre Tkatchenko², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴ - Show less +4 more•Institutions (4)

Technical University of Berlin¹, University of Luxembourg², Korea University³, Max Planck Society⁴

08 Jan 2019-Journal of Chemical Theory and Computation

TL;DR: SchNetPack is a toolbox for the development and application of deep neural networks that predict potential energy surfaces and other quantum-chemical properties of molecules and materials that contains basic building blocks of atomistic neural networks, manages their training, and provides simple access to common benchmark datasets.

...read moreread less

Abstract: SchNetPack is a toolbox for the development and application of deep neural networks that predict potential energy surfaces and other quantum-chemical properties of molecules and materials. It contains basic building blocks of atomistic neural networks, manages their training, and provides simple access to common benchmark datasets. This allows for an easy implementation and evaluation of new models. For now, SchNetPack includes implementations of (weighted) atom-centered symmetry functions and the deep tensor neural network SchNet, as well as ready-to-use scripts that allow one to train these models on molecule and material datasets. Based on the PyTorch deep learning framework, SchNetPack allows one to efficiently apply the neural networks to large datasets with millions of reference calculations, as well as parallelize the model across multiple GPUs. Finally, SchNetPack provides an interface to the Atomic Simulation Environment in order to make trained models easily accessible to researchers that are no...

...read moreread less

288 citations

Book Chapter•DOI•

Towards Explainable Artificial Intelligence

[...]

Wojciech Samek¹, Klaus-Robert Müller², Klaus-Robert Müller³, Klaus-Robert Müller⁴•Institutions (4)

Heinrich Hertz Institute¹, Technical University of Berlin², Max Planck Society³, Korea University⁴

10 Sep 2019

TL;DR: This introductory paper presents recent developments and applications in deep learning, and makes a plea for a wider use of explainable learning algorithms in practice.

...read moreread less

Abstract: In recent years, machine learning (ML) has become a key enabling technology for the sciences and industry. Especially through improvements in methodology, the availability of large databases and increased computational power, today’s ML algorithms are able to achieve excellent performance (at times even exceeding the human level) on an increasing number of complex tasks. Deep learning models are at the forefront of this development. However, due to their nested non-linear structure, these powerful models have been generally considered “black boxes”, not providing any information about what exactly makes them arrive at their predictions. Since in many applications, e.g., in the medical domain, such lack of transparency may be not acceptable, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This introductory paper presents recent developments and applications in this field and makes a plea for a wider use of explainable learning algorithms in practice.

...read moreread less

287 citations

Posted Content•

Deep Semi-Supervised Anomaly Detection

[...]

Lukas Ruff¹, Robert A. Vandermeulen², Nico Görnitz¹, Alexander Binder³, Emmanuel Müller⁴, Klaus-Robert Müller¹, Marius Kloft² - Show less +3 more•Institutions (4)

Technical University of Berlin¹, Kaiserslautern University of Technology², Singapore University of Technology and Design³, University of Bonn⁴

06 Jun 2019-arXiv: Learning

TL;DR: This work presents Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection, and introduces an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy the anomalous distribution, which can serve as a theoretical interpretation for the method.

...read moreread less

Abstract: Deep approaches to anomaly detection have recently shown promising results over shallow methods on large and complex datasets. Typically anomaly detection is treated as an unsupervised learning problem. In practice however, one may have---in addition to a large set of unlabeled samples---access to a small pool of labeled samples, e.g. a subset verified by some domain expert as being normal or anomalous. Semi-supervised approaches to anomaly detection aim to utilize such labeled samples, but most proposed methods are limited to merely including labeled normal samples. Only a few methods take advantage of labeled anomalies, with existing deep approaches being domain-specific. In this work we present Deep SAD, an end-to-end deep methodology for general semi-supervised anomaly detection. We further introduce an information-theoretic framework for deep anomaly detection based on the idea that the entropy of the latent distribution for normal data should be lower than the entropy of the anomalous distribution, which can serve as a theoretical interpretation for our method. In extensive experiments on MNIST, Fashion-MNIST, and CIFAR-10, along with other anomaly detection benchmark datasets, we demonstrate that our method is on par or outperforms shallow, hybrid, and deep competitors, yielding appreciable performance improvements even when provided with only little labeled data.

...read moreread less

227 citations

Journal Article•

iNNvestigate Neural Networks

[...]

Maximilian Alber¹, Sebastian Lapuschkin², Philipp Seegerer¹, Miriam Hägele¹, Kristof T. Schütt¹, Grégoire Montavon¹, Wojciech Samek², Klaus-Robert Müller, Sven Dähne¹, Pieter-Jan Kindermans¹ - Show less +6 more•Institutions (2)

Technical University of Berlin¹, Heinrich Hertz Institute²

01 May 2019-Journal of Machine Learning Research

TL;DR: iNNvestigate as mentioned in this paper provides a common interface and out-of-the-box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods.

...read moreread less

Abstract: In recent years, deep neural networks have revolutionized many application domains of machine learning and are key components of many critical decision or predictive processes. Therefore, it is crucial that domain specialists can understand and analyze actions and pre- dictions, even of the most complex neural network architectures. Despite these arguments neural networks are often treated as black boxes. In the attempt to alleviate this short- coming many analysis methods were proposed, yet the lack of reference implementations often makes a systematic comparison between the methods a major effort. The presented library iNNvestigate addresses this by providing a common interface and out-of-the- box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods. To demonstrate the versatility of iNNvestigate, we provide an analysis of image classifications for variety of state-of-the-art neural network architectures.

...read moreread less

Journal Article•DOI•

Machine learning for molecular simulation

[...]

Frank Noé¹, Alexandre Tkatchenko², Klaus-Robert Müller³, Cecilia Clementi⁴•Institutions (4)

Free University of Berlin¹, University of Luxembourg², Technical University of Berlin³, Rice University⁴

07 Nov 2019-arXiv: Chemical Physics

TL;DR: In this article, a review of deep neural networks for molecular simulation is presented, with particular focus on (deep) neural network for the prediction of quantum-mechanical energies and forces, coarse-grained molecular dynamics, the extraction of free energy surfaces and kinetics and generative network approaches to sample molecular equilibrium structures and compute thermodynamics.

...read moreread less

Abstract: Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for a machine learning revolution and have already been profoundly impacted by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, coarse-grained molecular dynamics, the extraction of free energy surfaces and kinetics and generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into machine learning structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation.

...read moreread less

Posted Content•

Explanations can be manipulated and geometry is to blame

[...]

Ann-Kathrin Dombrowski¹, Maximilian Alber², Christopher J. Anders², Marcel Ackermann, Klaus-Robert Müller³, Pan Kessel² - Show less +2 more•Institutions (3)

RWTH Aachen University¹, Technical University of Berlin², Max Planck Society³

19 Jun 2019-arXiv: Machine Learning

TL;DR: It is shown that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant, and theoretically this phenomenon can be related to certain geometrical properties of neural networks.

...read moreread less

Abstract: Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.

...read moreread less

Journal Article•DOI•

sGDML: Constructing accurate and data efficient molecular force fields using machine learning

[...]

Stefan Chmiela¹, Huziel E. Sauceda², Igor Poltavsky³, Klaus-Robert Müller¹, Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Alexandre Tkatchenko³ - Show less +3 more•Institutions (5)

Technical University of Berlin¹, Fritz Haber Institute of the Max Planck Society², University of Luxembourg³, Max Planck Society⁴, Korea University⁵

01 Jul 2019-Computer Physics Communications

TL;DR: A Python software package is introduced to reconstruct and evaluate custom sGDML force fields (FFs), without requiring in-depth knowledge about the details of the model, in an effort to make this novel machine learning approach accessible to broad practitioners.

...read moreread less

Proceedings Article•DOI•

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

[...]

Felix Sattler¹, Simon Wiedemann¹, Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

14 Jul 2019

TL;DR: Sparse Binary Compression (SBC) as mentioned in this paper combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits.

...read moreread less

Abstract: Currently, progressively larger deep neural networks are trained on ever growing data corpora. In result, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. To mitigate this problem we propose Sparse Binary Compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using ×3531 less bits or train it to a 1% lower accuracy using ×37208 less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.

...read moreread less

Book Chapter•DOI•

Towards Explainable Artificial Intelligence

[...]

Wojciech Samek¹, Klaus-Robert Müller²•Institutions (2)

Fraunhofer Society¹, Technical University of Berlin²

26 Sep 2019-arXiv: Artificial Intelligence

TL;DR: In this article, the authors present recent developments and applications in this field and make a plea for a wider use of explainable learning algorithms in practice, and make an introductory paper on explainable deep learning models.

...read moreread less

Abstract: In recent years, machine learning (ML) has become a key enabling technology for the sciences and industry. Especially through improvements in methodology, the availability of large databases and increased computational power, today's ML algorithms are able to achieve excellent performance (at times even exceeding the human level) on an increasing number of complex tasks. Deep learning models are at the forefront of this development. However, due to their nested non-linear structure, these powerful models have been generally considered "black boxes", not providing any information about what exactly makes them arrive at their predictions. Since in many applications, e.g., in the medical domain, such lack of transparency may be not acceptable, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This introductory paper presents recent developments and applications in this field and makes a plea for a wider use of explainable learning algorithms in practice.

...read moreread less

Posted Content•

Exploring Chemical Compound Space with Quantum-Based Machine Learning

[...]

O. Anatole von Lilienfeld¹, Klaus-Robert Müller², Klaus-Robert Müller³, Klaus-Robert Müller⁴, Alexandre Tkatchenko⁵ - Show less +1 more•Institutions (5)

University of Basel¹, Max Planck Society², Technical University of Berlin³, Korea University⁴, University of Luxembourg⁵

22 Nov 2019-arXiv: Chemical Physics

TL;DR: It is argued that significant progress in the exploration and understanding of chemical compound space can be made through a systematic combination of rigorous physical theories, comprehensive synthetic data sets of microscopic and macroscopic properties, and modern machine-learning methods that account for physical and chemical knowledge.

...read moreread less

Abstract: Rational design of compounds with specific properties requires conceptual understanding and fast evaluation of molecular properties throughout chemical compound space (CCS) -- the huge set of all potentially stable molecules. Recent advances in combining quantum mechanical (QM) calculations with machine learning (ML) provide powerful tools for exploring wide swaths of CCS. We present our perspective on this exciting and quickly developing field by discussing key advances in the development and applications of QM-based ML methods to diverse compounds and properties and outlining the challenges ahead. We argue that significant progress in the exploration and understanding of CCS can be made through a systematic combination of rigorous physical theories, comprehensive synthetic datasets of microscopic and macroscopic properties, and modern ML methods that account for physical and chemical knowledge.

...read moreread less

Journal Article•DOI•

Machine learning analysis of DNA methylation profiles distinguishes primary lung squamous cell carcinomas from head and neck metastases

[...]

Philipp Jurmeister¹, Philipp Jurmeister², Philipp Jurmeister³, Michael Bockmayr³, Michael Bockmayr⁴, Philipp Seegerer⁵, Teresa Bockmayr³, Denise Treue³, Grégoire Montavon⁵, Claudia Vollbrecht³, Claudia Vollbrecht², Alexander Arnold³, Daniel Teichmann³, Keno K. Bressem³, Ulrich Schüller⁴, Maximilian von Laffert³, Klaus-Robert Müller⁶, Klaus-Robert Müller⁵, Klaus-Robert Müller⁷, David Capper², David Capper³, Frederick Klauschen², Frederick Klauschen³ - Show less +19 more•Institutions (7)

Charité¹, German Cancer Research Center², Humboldt University of Berlin³, University of Hamburg⁴, Technical University of Berlin⁵, Max Planck Society⁶, Korea University⁷

11 Sep 2019-Science Translational Medicine

TL;DR: A machine learning algorithm that exploits the differential DNA methylation observed in primary LUSC and metastasized HNSC tumors in the lung was able to discriminate between these two tumor types with high accuracy across multiple cohorts, suggesting its potential as a clinical diagnostic tool.

...read moreread less

Abstract: Head and neck squamous cell carcinoma (HNSC) patients are at risk of suffering from both pulmonary metastases or a second squamous cell carcinoma of the lung (LUSC). Differentiating pulmonary metastases from primary lung cancers is of high clinical importance, but not possible in most cases with current diagnostics. To address this, we performed DNA methylation profiling of primary tumors and trained three different machine learning methods to distinguish metastatic HNSC from primary LUSC. We developed an artificial neural network that correctly classified 96.4% of the cases in a validation cohort of 279 patients with HNSC and LUSC as well as normal lung controls, outperforming support vector machines (95.7%) and random forests (87.8%). Prediction accuracies of more than 99% were achieved for 92.1% (neural network), 90% (support vector machine), and 43% (random forest) of these cases by applying thresholds to the resulting probability scores and excluding samples with low confidence. As independent clinical validation of the approach, we analyzed a series of 51 patients with a history of HNSC and a second lung tumor, demonstrating the correct classifications based on clinicopathological properties. In summary, our approach may facilitate the reliable diagnostic differentiation of pulmonary metastases of HNSC from primary LUSC to guide therapeutic decisions.

...read moreread less

Journal Article•DOI•

Molecular force fields with gradient-domain machine learning: Construction and application to dynamics of small molecules with coupled cluster forces.

[...]

Huziel E. Sauceda¹, Stefan Chmiela², Igor Poltavsky³, Klaus-Robert Müller², Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Alexandre Tkatchenko³ - Show less +3 more•Institutions (5)

Fritz Haber Institute of the Max Planck Society¹, Technical University of Berlin², University of Luxembourg³, Max Planck Society⁴, Korea University⁵

21 Mar 2019-Journal of Chemical Physics

TL;DR: The flexible nature of the sGDML model recovers local and non-local electronic interactions without imposing any restriction on the nature of interatomic potentials, and yields new qualitative insights into dynamics and spectroscopy of small molecules close to spectroscopic accuracy.

...read moreread less

Abstract: We present the construction of molecular force fields for small molecules (less than 25 atoms) using the recently developed symmetrized gradient-domain machine learning (sGDML) approach [Chmiela et al., Nat. Commun. 9, 3887 (2018) and Chmiela et al., Sci. Adv. 3, e1603015 (2017)]. This approach is able to accurately reconstruct complex high-dimensional potential-energy surfaces from just a few 100s of molecular conformations extracted from ab initio molecular dynamics trajectories. The data efficiency of the sGDML approach implies that atomic forces for these conformations can be computed with high-level wavefunction-based approaches, such as the “gold standard” coupled-cluster theory with single, double and perturbative triple excitations [CCSD(T)]. We demonstrate that the flexible nature of the sGDML model recovers local and non-local electronic interactions (e.g., H-bonding, proton transfer, lone pairs, changes in hybridization states, steric repulsion, and n → π* interactions) without imposing any restriction on the nature of interatomic potentials. The analysis of sGDML molecular dynamics trajectories yields new qualitative insights into dynamics and spectroscopy of small molecules close to spectroscopic accuracy.

...read moreread less

Journal Article•DOI•

Analyzing Neuroimaging Data Through Recurrent Deep Learning Models.

[...]

Armin W. Thomas¹, Hauke R. Heekeren², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴, Wojciech Samek⁵ - Show less +2 more•Institutions (5)

Technical University of Berlin¹, Free University of Berlin², Korea University³, Max Planck Society⁴, Heinrich Hertz Institute⁵

10 Dec 2019-Frontiers in Neuroscience

TL;DR: DeepLight outperforms conventional approaches of uni- and multivariate fMRI analysis in decoding the cognitive states and in identifying the physiologically appropriate brain regions associated with these states, and is demonstrated to have the versatility to apply to a large fMRI dataset of the Human Connectome Project.

...read moreread less

Abstract: The application of deep learning (DL) models to neuroimaging data poses several challenges, due to the high dimensionality, low sample size and complex temporo-spatial dependency structure of these data. Even further, DL models often act as as black boxes, impeding insight into the association of cognitive state and brain activity. To approach these challenges, we introduce the DeepLight framework, which utilizes long short-term memory (LSTM) based DL models to analyze whole-brain functional Magnetic Resonance Imaging (fMRI) data. To decode a cognitive state (e.g., seeing the image of a house), DeepLight separates an fMRI volume into a sequence of axial brain slices, which is then sequentially processed by an LSTM. To maintain interpretability, DeepLight adapts the layer-wise relevance propagation (LRP) technique. Thereby, decomposing its decoding decision into the contributions of the single input voxels to this decision. Importantly, the decomposition is performed on the level of single fMRI volumes, enabling DeepLight to study the associations between cognitive state and brain activity on several levels of data granularity, from the level of the group down to the level of single time points. To demonstrate the versatility of DeepLight, we apply it to a large fMRI dataset of the Human Connectome Project. We show that DeepLight outperforms conventional approaches of uni- and multivariate fMRI analysis in decoding the cognitive states and in identifying the physiologically appropriate brain regions associated with these states. We further demonstrate DeepLight’s ability to study the fine-grained temporo-spatial variability of brain activity over sequences of single fMRI samples.

...read moreread less

Journal Article•DOI•

Explaining the unique nature of individual gait patterns with deep learning

[...]

Fabian Horst¹, Sebastian Lapuschkin², Wojciech Samek², Klaus-Robert Müller³, Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Wolfgang I. Schöllhorn¹ - Show less +3 more•Institutions (5)

University of Mainz¹, Heinrich Hertz Institute², Technical University of Berlin³, Max Planck Society⁴, Korea University⁵

20 Feb 2019-Scientific Reports

TL;DR: In this paper, the uniqueness of individual gait patterns in clinical biomechanics using DNNs was studied using Layer-Wise Relevance Propagation (LRP) technique, which reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait pattern from a certain individual.

...read moreread less

Abstract: Machine learning (ML) techniques such as (deep) artificial neural networks (DNN) are solving very successfully a plethora of tasks and provide new predictive models for complex physical, chemical, biological and social systems. However, in most cases this comes with the disadvantage of acting as a black box, rarely providing information about what made them arrive at a particular prediction. This black box aspect of ML techniques can be problematic especially in medical diagnoses, so far hampering a clinical acceptance. The present paper studies the uniqueness of individual gait patterns in clinical biomechanics using DNNs. By attributing portions of the model predictions back to the input variables (ground reaction forces and full-body joint angles), the Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual. By measuring the time-resolved contribution of each input variable to the prediction of ML techniques such as DNNs, our method describes the first general framework that enables to understand and interpret non-linear ML methods in (biomechanical) gait analysis and thereby supplies a powerful tool for analysis, diagnosis and treatment of human gait.

...read moreread less

Journal Article•DOI•

A large scale screening study with a SMR-based BCI: Categorization of BCI users and differences in their SMR activity.

[...]

Claudia Sannelli¹, Carmen Vidaurre¹, Carmen Vidaurre², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴, Benjamin Blankertz¹ - Show less +3 more•Institutions (4)

Technical University of Berlin¹, Universidad Pública de Navarra², Max Planck Society³, Korea University⁴

25 Jan 2019-PLOS ONE

TL;DR: Data from a large-scale screening study conducted on 80 novice participants with the Berlin BCI system and its standard machine-learning approach were investigated to understand this phenomenon in Sensorimotor Rhythm (SMR) based BCIs.

...read moreread less

Abstract: Brain-Computer Interfaces (BCIs) are inefficient for a non-negligible part of the population, estimated around 25%. To understand this phenomenon in Sensorimotor Rhythm (SMR) based BCIs, data from a large-scale screening study conducted on 80 novice participants with the Berlin BCI system and its standard machine-learning approach were investigated. Each participant performed one BCI session with resting state Encephalography, Motor Observation, Motor Execution and Motor Imagery recordings and 128 electrodes. A significant portion of the participants (40%) could not achieve BCI control (feedback performance > 70%). Based on the performance of the calibration and feedback runs, BCI users were stratified in three groups. Analyses directed to detect and elucidate the differences in the SMR activity of these groups were performed. Statistics on reactive frequencies, task prevalence and classification results are reported. Based on their SMR activity, also a systematic list of potential reasons leading to performance drops and thus hints for possible improvements of BCI experimental design are given. The categorization of BCI users has several advantages, allowing researchers 1) to select subjects for further analyses as well as for testing new BCI paradigms or algorithms, 2) to adopt a better subject-dependent training strategy and 3) easier comparisons between different studies.

...read moreread less

Proceedings Article•

Explanations can be manipulated and geometry is to blame

[...]

Ann-Kathrin Dombrowski¹, Maximilian Alber², Christopher J. Anders², Marcel Ackermann, Klaus-Robert Müller³, Pan Kessel² - Show less +2 more•Institutions (3)

RWTH Aachen University¹, Technical University of Berlin², Max Planck Society³

19 Jun 2019

TL;DR: In this paper, the authors show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant, which is disconcerting for both trust and interpretability.

...read moreread less

Proceedings Article•DOI•

Evaluating Recurrent Neural Network Explanations

[...]

Leila Arras¹, Ahmed Osman¹, Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Korea University²

01 Apr 2019

TL;DR: Using the method that performed best in the authors' experiments, it is shown how specific linguistic phenomena such as the negation in sentiment analysis reflect in terms of relevance patterns, and how the relevance visualization can help to understand the misclassification of individual samples.

...read moreread less

Abstract: Recently, several methods have been proposed to explain the predictions of recurrent neural networks (RNNs), in particular of LSTMs. The goal of these methods is to understand the network’s decisions by assigning to each input variable, e.g., a word, a relevance indicating to which extent it contributed to a particular prediction. In previous works, some of these methods were not yet compared to one another, or were evaluated only qualitatively. We close this gap by systematically and quantitatively comparing these methods in different settings, namely (1) a toy arithmetic task which we use as a sanity check, (2) a five-class sentiment prediction of movie reviews, and besides (3) we explore the usefulness of word relevances to build sentence-level representations. Lastly, using the method that performed best in our experiments, we show how specific linguistic phenomena such as the negation in sentiment analysis reflect in terms of relevance patterns, and how the relevance visualization can help to understand the misclassification of individual samples.

...read moreread less

Posted Content•

From Clustering to Cluster Explanations via Neural Networks.

[...]

Jacob R. Kauffmann, Malte Esders, Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller - Show less +1 more

18 Jun 2019-arXiv: Learning

TL;DR: A new framework is proposed that can, for the first time, explain cluster assignments in terms of input features in a comprehensive manner, based on the novel theoretical insight that clustering models can be rewritten as neural networks, or 'neuralized'.

...read moreread less

Abstract: A wealth of algorithms have been developed to extract natural cluster structure in data. Identifying this structure is desirable but not always sufficient: We may also want to understand why the data points have been assigned to a given cluster. Clustering algorithms do not offer a systematic answer to this simple question. Hence we propose a new framework that can, for the first time, explain cluster assignments in terms of input features in a comprehensive manner. It is based on the novel theoretical insight that clustering models can be rewritten as neural networks, or 'neuralized'. Predictions of the obtained networks can then be quickly and accurately attributed to the input features. Several showcases demonstrate the ability of our method to assess the quality of learned clusters and to extract novel insights from the analyzed data and representations.

...read moreread less

Journal Article•DOI•

Molecular Force Fields with Gradient-Domain Machine Learning: Construction and Application to Dynamics of Small Molecules with Coupled Cluster Forces

[...]

Huziel E. Sauceda¹, Stefan Chmiela², Igor Poltavsky³, Klaus-Robert Müller², Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Alexandre Tkatchenko³ - Show less +3 more•Institutions (5)

Fritz Haber Institute of the Max Planck Society¹, Technical University of Berlin², University of Luxembourg³, Max Planck Society⁴, Korea University⁵

19 Jan 2019-arXiv: Chemical Physics

TL;DR: Chmiela et al. as mentioned in this paper used symmetrized gradient-domain machine learning (sGDML) to reconstruct complex high-dimensional potential energy surfaces from a few 100s of molecular conformations extracted from ab initio molecular dynamics trajectories.

...read moreread less

Abstract: We present the construction of molecular force fields for small molecules (less than 25 atoms) using the recently developed symmetrized gradient-domain machine learning (sGDML) approach [Chmiela et al., Nat. Commun. 9, 3887 (2018); Sci. Adv. 3, e1603015 (2017)]. This approach is able to accurately reconstruct complex high-dimensional potential-energy surfaces from just a few 100s of molecular conformations extracted from ab initio molecular dynamics trajectories. The data efficiency of the sGDML approach implies that atomic forces for these conformations can be computed with high-level wavefunction-based approaches, such as the "gold standard" CCSD(T) method. We demonstrate that the flexible nature of the sGDML model recovers local and non-local electronic interactions (e.g. H-bonding, proton transfer, lone pairs, changes in hybridization states, steric repulsion and $n\to\pi^*$ interactions) without imposing any restriction on the nature of interatomic potentials. The analysis of sGDML molecular dynamics trajectories yields new qualitative insights into dynamics and spectroscopy of small molecules close to spectroscopic accuracy.

...read moreread less

Book Chapter•DOI•

Explaining and Interpreting LSTMs

[...]

Leila Arras¹, Jose A. Arjona-Medina², Michael Widrich², Grégoire Montavon³, Michael Gillhofer², Klaus-Robert Müller⁴, Klaus-Robert Müller³, Klaus-Robert Müller⁵, Sepp Hochreiter², Wojciech Samek¹ - Show less +6 more•Institutions (5)

Heinrich Hertz Institute¹, Johannes Kepler University of Linz², Technical University of Berlin³, Korea University⁴, Max Planck Society⁵

01 Jan 2019

TL;DR: This chapter explores how to adapt the Layer-wise Relevance Propagation technique used for explaining the predictions of feed-forward networks to the LSTM architecture used for sequential data modeling and forecasting.

...read moreread less

Abstract: While neural networks have acted as a strong unifying force in the design of modern AI systems, the neural network architectures themselves remain highly heterogeneous due to the variety of tasks to be solved. In this chapter, we explore how to adapt the Layer-wise Relevance Propagation (LRP) technique used for explaining the predictions of feed-forward networks to the LSTM architecture used for sequential data modeling and forecasting. The special accumulators and gated interactions present in the LSTM require both a new propagation scheme and an extension of the underlying theoretical framework to deliver faithful explanations.

...read moreread less

Posted Content•

Finding and Removing Clever Hans: Using Explanation Methods to Debug and Improve Deep Models

[...]

Christopher J. Anders¹, Leander Weber², Leander Weber¹, David Neumann², Wojciech Samek², Klaus-Robert Müller, Sebastian Lapuschkin² - Show less +3 more•Institutions (2)

Technical University of Berlin¹, Heinrich Hertz Institute²

22 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper provides a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora, and proposes several approaches denoted as Class Artifact Compensation (ClArC), which are able to effectively and significantly reduce a model's CH behavior.

...read moreread less

Abstract: Contemporary learning models for computer vision are typically trained on very large (benchmark) datasets with millions of samples. These may, however, contain biases, artifacts, or errors that have gone unnoticed and are exploitable by the model. In the worst case, the trained model does not learn a valid and generalizable strategy to solve the problem it was trained for, and becomes a 'Clever-Hans' (CH) predictor that bases its decisions on spurious correlations in the training data, potentially yielding an unrepresentative or unfair, and possibly even hazardous predictor. In this paper, we contribute by providing a comprehensive analysis framework based on a scalable statistical analysis of attributions from explanation methods for large data corpora. Based on a recent technique - Spectral Relevance Analysis - we propose the following technical contributions and resulting findings: (a) a scalable quantification of artifactual and poisoned classes where the machine learning models under study exhibit CH behavior, (b) several approaches denoted as Class Artifact Compensation (ClArC), which are able to effectively and significantly reduce a model's CH behavior. I.e., we are able to un-Hans models trained on (poisoned) datasets, such as the popular ImageNet data corpus. We demonstrate that ClArC, defined in a simple theoretical framework, may be implemented as part of a Neural Network's training or fine-tuning process, or in a post-hoc manner by injecting additional layers, preventing any further propagation of undesired CH features, into the network architecture. Using our proposed methods, we provide qualitative and quantitative analyses of the biases and artifacts in various datasets. We demonstrate that these insights can give rise to improved, more representative and fairer models operating on implicitly cleaned data corpora.

...read moreread less

Journal Article•DOI•

A new blind source separation framework for signal analysis and artifact rejection in functional Near-Infrared Spectroscopy.

[...]

Alexander von Lühmann¹, Zois Boukouvalas², Zois Boukouvalas³, Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Tulay Adali⁶ - Show less +2 more•Institutions (6)

Boston University¹, University of Maryland, College Park², American University³, Max Planck Society⁴, Korea University⁵, University of Maryland, Baltimore County⁶

15 Oct 2019-NeuroImage

TL;DR: The framework and methods presented can serve as an introduction to a new type of multivariate methods for the analysis of fNIRS signals and as a blueprint for artifact rejection in complex environments beyond the applied paradigm.

...read moreread less

Book Chapter•DOI•

Explaining and Interpreting LSTMs

[...]

Leila Arras¹, Jose A. Arjona-Medina¹, Michael Widrich², Grégoire Montavon², Michael Gillhofer², Klaus-Robert Müller², Sepp Hochreiter³, Wojciech Samek³ - Show less +4 more•Institutions (3)

Fraunhofer Society¹, Johannes Kepler University of Linz², Technical University of Berlin³

25 Sep 2019-arXiv: Learning

TL;DR: In this article, the authors explore how to adapt the Layer-wise Relevance Propagation (LRP) technique used for explaining the predictions of feed-forward networks to the LSTM architecture used for sequential data modeling and forecasting.

...read moreread less