Showing papers by "Klaus-Robert Müller published in 2018"

PDF

Open Access

Journal Article•DOI•

Methods for interpreting and understanding deep neural networks

[...]

Grégoire Montavon¹, Wojciech Samek², Klaus-Robert Müller³, Klaus-Robert Müller⁴, Klaus-Robert Müller¹ - Show less +1 more•Institutions (4)

Technical University of Berlin¹, Heinrich Hertz Institute², Korea University³, Max Planck Society⁴

01 Feb 2018-Digital Signal Processing

TL;DR: The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which the author provides theory, recommendations, and tricks, to make most efficient use of it on real data.

...read moreread less

1,939 citations

Journal Article•DOI•

SchNet - A deep learning architecture for molecules and materials.

[...]

Kristof T. Schütt¹, Huziel E. Sauceda², Pieter-Jan Kindermans¹, Alexandre Tkatchenko³, Klaus-Robert Müller¹ - Show less +1 more•Institutions (3)

Technical University of Berlin¹, Fritz Haber Institute of the Max Planck Society², University of Luxembourg³

28 Jun 2018-Journal of Chemical Physics

TL;DR: SchNet as mentioned in this paper is a deep learning architecture specifically designed to model atomistic systems by making use of continuous-filter convolutional layers, where the model learns chemically plausible embeddings of atom types across the periodic table.

...read moreread less

Abstract: Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.

...read moreread less

1,104 citations

Journal Article•DOI•

Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

[...]

Sebastian Bosse¹, Dominique Maniry¹, Klaus-Robert Müller, Thomas Wiegand¹, Wojciech Samek¹ - Show less +1 more•Institutions (1)

Heinrich Hertz Institute¹

01 Jan 2018-IEEE Transactions on Image Processing

TL;DR: A deep neural network-based approach to image quality assessment (IQA) that allows for joint learning of local quality and local weights in an unified framework and shows a high ability to generalize between different databases, indicating a high robustness of the learned features.

...read moreread less

Abstract: We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features.

...read moreread less

479 citations

Journal Article•DOI•

Towards exact molecular dynamics simulations with machine-learned force fields.

[...]

Stefan Chmiela¹, Huziel E. Sauceda², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴, Alexandre Tkatchenko⁵ - Show less +2 more•Institutions (5)

Technical University of Berlin¹, Fritz Haber Institute of the Max Planck Society², Korea University³, Max Planck Society⁴, University of Luxembourg⁵

24 Sep 2018-Nature Communications

TL;DR: A flexible machine-learning force-field with high-level accuracy for molecular dynamics simulations is developed, for flexible molecules with up to a few dozen atoms and insights into the dynamical behavior of these molecules are provided.

...read moreread less

Abstract: Molecular dynamics (MD) simulations employing classical force fields constitute the cornerstone of contemporary atomistic modeling in chemistry, biology, and materials science. However, the predictive power of these simulations is only as good as the underlying interatomic potential. Classical potentials often fail to faithfully capture key quantum effects in molecules and materials. Here we enable the direct construction of flexible molecular force fields from high-level ab initio calculations by incorporating spatial and temporal physical symmetries into a gradient-domain machine learning (sGDML) model in an automatic data-driven way. The developed sGDML approach faithfully reproduces global force fields at quantum-chemical CCSD(T) level of accuracy and allows converged molecular dynamics simulations with fully quantized electrons and nuclei. We present MD simulations, for flexible molecules with up to a few dozen atoms and provide insights into the dynamical behavior of these molecules. Our approach provides the key missing ingredient for achieving spectroscopic accuracy in molecular simulations.

...read moreread less

445 citations

Journal Article•

Towards Exact Molecular Dynamics Simulations with Machine-Learned Force Fields

[...]

Stefan Chmiela, Huziel E. Sauceda, Klaus-Robert Müller, Alexandre Tkatchenko

06 Mar 2018-Bulletin of the American Physical Society

TL;DR: In this article, a gradient-domain machine learning (sGDML) model is proposed to construct flexible molecular force fields from high-level ab initio calculations by incorporating spatial and temporal physical symmetries into a sGDML model.

...read moreread less

312 citations

Proceedings Article•

Learning how to explain neural networks: PatternNet and PatternAttribution

[...]

Pieter-Jan Kindermans, Kristof T. Schütt¹, Maximilian Alber¹, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne¹ - Show less +3 more•Institutions (1)

Technical University of Berlin¹

01 Jan 2018

TL;DR: This work argues that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models, and proposes a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

...read moreread less

Abstract: DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

...read moreread less

218 citations

Posted Content•

Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals

[...]

Sören Becker, Marcel Ackermann, Sebastian Lapuschkin, Klaus-Robert Müller, Wojciech Samek - Show less +1 more

09 Jul 2018-arXiv: Sound

TL;DR: This paper presents a novel audio dataset of English spoken digits which is used for classification tasks on spoken digits and speaker's gender and confirms that the networks are highly reliant on features marked as relevant by LRP.

...read moreread less

Abstract: Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions. This paper explores the interpretability of neural networks in the audio domain by using the previously proposed technique of layer-wise relevance propagation (LRP). We present a novel audio dataset of English spoken digits which we use for classification tasks on spoken digits and speaker's gender. We use LRP to identify relevant features for two neural network architectures that process either waveform or spectrogram representations of the data. Based on the relevance scores obtained from LRP, hypotheses about the neural networks' feature selection are derived and subsequently tested through systematic manipulations of the input data. The results confirm that the networks are highly reliant on features marked as relevant by LRP.

...read moreread less

129 citations

Journal Article•DOI•

Motion-Based Rapid Serial Visual Presentation for Gaze-Independent Brain-Computer Interfaces

[...]

Dong-Ok Won¹, Han-Jeong Hwang², Dong-Min Kim¹, Klaus-Robert Müller¹, Seong-Whan Lee¹ - Show less +1 more•Institutions (2)

Korea University¹, Kumoh National Institute of Technology²

01 Feb 2018

TL;DR: The results indicate that the use of proposed motion-based RSVP paradigm is more beneficial for target recognition when developing BCI applications for severely paralyzed patients with complex ocular dysfunctions.

...read moreread less

Abstract: Most event-related potential (ERP)-based brain–computer interface (BCI) spellers primarily use matrix layouts and generally require moderate eye movement for successful operation. The fundamental objective of this paper is to enhance the perceptibility of target characters by introducing motion stimuli to classical rapid serial visual presentation (RSVP) spellers that do not require any eye movement, thereby applying them to paralyzed patients with oculomotor dysfunctions. To test the feasibility of the proposed motion-based RSVP paradigm, we implemented three RSVP spellers: 1) fixed-direction motion (FM-RSVP); 2) random-direction motion (RM-RSVP); and 3) (the conventional) non-motion stimulation (NM-RSVP), and evaluated the effect of the three different stimulation methods on spelling performance. The two motion-based stimulation methods, FM- and RM-RSVP, showed shorter P300 latency and higher P300 amplitudes ( i.e. , 360.4–379.6 ms; 5.5867– $5.7662~\mu {V}$ ) than the NM-RSVP ( i.e. , 480.4 ms; $4.7426~\mu {V}$ ). This led to higher and more stable performances for FM- and RM-RSVP spellers than NM-RSVP speller ( i.e. , 79.06±6.45% for NM-RSVP, 90.60±2.98% for RM-RSVP, and 92.74±2.55% for FM-RSVP). In particular, the proposed motion-based RSVP paradigm was significantly beneficial for about half of the subjects who might not accurately perceive rapidly presented static stimuli. These results indicate that the use of proposed motion-based RSVP paradigm is more beneficial for target recognition when developing BCI applications for severely paralyzed patients with complex ocular dysfunctions.

...read moreread less

123 citations

Journal Article•DOI•

Scoring of tumor-infiltrating lymphocytes: From visual estimation to machine learning

[...]

Frederick Klauschen¹, Klaus-Robert Müller², Alexander Binder³, Michael Bockmayr¹, Miriam Hägele², Philipp Seegerer², Stephan Wienert¹, Giancarlo Pruneri⁴, S. de Maria⁵, Sunil S. Badve⁶, Stefan Michiels⁷, Torsten O. Nielsen⁸, Sylvia Adams⁹, Peter Savas¹⁰, Fraser Symmans¹¹, Scooter Willis, Tina Gruosso¹², Morag Park¹², Benjamin Haibe-Kains¹³, Brandon D. Gallas¹⁴, Alastair M. Thompson¹⁵, Ian A. Cree¹⁶, Christos Sotiriou¹⁷, Cinzia Solinas¹⁷, Matthias Preusser¹⁸, Stephen M. Hewitt¹⁹, David L. Rimm²⁰, Giuseppe Viale⁴, Sherene Loi¹⁰, Sibylle Loibl, Roberto Salgado¹⁰, Carsten Denkert¹ - Show less +28 more•Institutions (20)

01 Oct 2018-Seminars in Cancer Biology

TL;DR: Different automated TIL scoring approaches are discussed ranging from classical image segmentation, where cell boundaries are identified and the resulting objects classified according to shape properties, to machine learning-based approaches that directly classify cells without segmentation but rely on large amounts of training data.

...read moreread less

114 citations

Posted Content•

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

[...]

Felix Sattler¹, Simon Wiedemann¹, Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

22 May 2018-arXiv: Learning

TL;DR: SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits to mitigate the limited communication bandwidth between contributing nodes or prohibitive communication cost for distributed training.

...read moreread less

Abstract: Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using $\times 3531$ less bits or train it to a $1\%$ lower accuracy using $\times 37208$ less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.

...read moreread less

103 citations

Journal Article•DOI•

Simultaneous acquisition of EEG and NIRS during cognitive tasks for an open access dataset

[...]

Jaeyoung Shin¹, Alexander von Lühmann, Do-Won Kim², Jan Mehnert, Han-Jeong Hwang³, Klaus-Robert Müller⁴, Klaus-Robert Müller⁵ - Show less +3 more•Institutions (5)

Hanyang University¹, Chonnam National University², Kumoh National Institute of Technology³, Korea University⁴, Max Planck Society⁵

13 Feb 2018-Scientific Data

TL;DR: An open access multimodal brain-imaging dataset of simultaneous electroencephalography (EEG) and near-infrared spectroscopy (NIRS) recordings is provided to facilitate performance evaluation and comparison of many neuroimaging analysis techniques.

...read moreread less

Abstract: We provide an open access multimodal brain-imaging dataset of simultaneous electroencephalography (EEG) and near-infrared spectroscopy (NIRS) recordings. Twenty-six healthy participants performed three cognitive tasks: 1) n-back (0-, 2- and 3-back), 2) discrimination/selection response task (DSR) and 3) word generation (WG) tasks. The data provided includes: 1) measured data, 2) demographic data, and 3) basic analysis results. For n-back (dataset A) and DSR tasks (dataset B), event-related potential (ERP) analysis was performed, and spatiotemporal characteristics and classification results for 'target' versus 'non-target' (dataset A) and symbol 'O' versus symbol 'X' (dataset B) are provided. Time-frequency analysis was performed to show the EEG spectral power to differentiate the task-relevant activations. Spatiotemporal characteristics of hemodynamic responses are also shown. For the WG task (dataset C), the EEG spectral power and spatiotemporal characteristics of hemodynamic responses are analyzed, and the potential merit of hybrid EEG-NIRS BCIs was validated with respect to classification accuracy. We expect that the dataset provided will facilitate performance evaluation and comparison of many neuroimaging analysis techniques.

...read moreread less

Journal Article•DOI•

Many-Body Descriptors for Predicting Molecular Properties with Machine Learning: Analysis of Pairwise and Three-Body Interactions in Molecules

[...]

Wiktor Pronobis¹, Alexandre Tkatchenko², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴ - Show less +1 more•Institutions (4)

Technical University of Berlin¹, University of Luxembourg², Max Planck Society³, Korea University⁴

11 May 2018-Journal of Chemical Theory and Computation

TL;DR: A set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing are proposed and evaluated on predicting several properties of small organic molecules calculated using density-functional theory.

...read moreread less

Abstract: Machine learning (ML) based prediction of molecular properties across chemical compound space is an important and alternative approach to efficiently estimate the solutions of highly complex many-electron problems in chemistry and physics. Statistical methods represent molecules as descriptors that should encode molecular symmetries and interactions between atoms. Many such descriptors have been proposed; all of them have advantages and limitations. Here, we propose a set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing. By adapting the successfully used kernel ridge regression methods of machine learning, we evaluate our descriptors on predicting several properties of small organic molecules calculated using density-functional theory. We use two data sets. The GDB-7 set contains 6868 molecules with up to 7 heavy atoms of type CNO. The GDB-9 set is composed of 131722 molecules with up to 9 heavy atoms containing CNO. When trained ...

...read moreread less

Journal Article•DOI•

Capturing intensive and extensive DFT/TDDFT molecular properties with machine learning

[...]

Wiktor Pronobis¹, Kristof T. Schütt¹, Alexandre Tkatchenko², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴ - Show less +2 more•Institutions (4)

Technical University of Berlin¹, University of Luxembourg², Max Planck Society³, Korea University⁴

06 Aug 2018-European Physical Journal B

TL;DR: Current state-of-the-art machine learning techniques are more suited to predict extensive as opposed to intensive quantities, but it is speculated on the need to develop global descriptors that can describe both extensive and intensive properties on equal footing.

...read moreread less

Abstract: Machine learning has been successfully applied to the prediction of chemical properties of small organic molecules such as energies or polarizabilities. Compared to these properties, the electronic excitation energies pose a much more challenging learning problem. Here, we examine the applicability of two existing machine learning methodologies to the prediction of excitation energies from time-dependent density functional theory. To this end, we systematically study the performance of various 2- and 3-body descriptors as well as the deep neural network SchNet to predict extensive as well as intensive properties such as the transition energies from the ground state to the first and second excited state. As perhaps expected current state-of-the-art machine learning techniques are more suited to predict extensive as opposed to intensive quantities. We speculate on the need to develop global descriptors that can describe both extensive and intensive properties on equal footing.

...read moreread less

Journal Article•DOI•

Explaining the Unique Nature of Individual Gait Patterns with Deep Learning

[...]

Fabian Horst¹, Sebastian Lapuschkin², Wojciech Samek², Klaus-Robert Müller³, Klaus-Robert Müller⁴, Klaus-Robert Müller⁵, Wolfgang I. Schöllhorn¹ - Show less +3 more•Institutions (5)

University of Mainz¹, Heinrich Hertz Institute², Technical University of Berlin³, Max Planck Society⁴, Korea University⁵

13 Aug 2018-arXiv: Learning

TL;DR: In this paper, the uniqueness of individual gait patterns in clinical biomechanics using DNNs was studied using Layer-Wise Relevance Propagation (LRP) technique, which reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait pattern from a certain individual.

...read moreread less

Abstract: Machine learning (ML) techniques such as (deep) artificial neural networks (DNN) are solving very successfully a plethora of tasks and provide new predictive models for complex physical, chemical, biological and social systems. However, in most cases this comes with the disadvantage of acting as a black box, rarely providing information about what made them arrive at a particular prediction. This black box aspect of ML techniques can be problematic especially in medical diagnoses, so far hampering a clinical acceptance. The present paper studies the uniqueness of individual gait patterns in clinical biomechanics using DNNs. By attributing portions of the model predictions back to the input variables (ground reaction forces and full-body joint angles), the Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual. By measuring the time-resolved contribution of each input variable to the prediction of ML techniques such as DNNs, our method describes the first general framework that enables to understand and interpret non-linear ML methods in (biomechanical) gait analysis and thereby supplies a powerful tool for analysis, diagnosis and treatment of human gait.

...read moreread less

Journal Article•DOI•

What is Unique in Individual Gait Patterns? Understanding and Interpreting Deep Learning in Gait Analysis.

[...]

Fabian Horst, Sebastian Lapuschkin, Wojciech Samek, Klaus-Robert Müller, Wolfgang I. Schöllhorn - Show less +1 more

13 Aug 2018-arXiv: Learning

TL;DR: The Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual, and provides a powerful tool for analysis, diagnosis and treatment of human gait.

...read moreread less

Abstract: Machine learning (ML) techniques such as (deep) artificial neural networks (DNN) are solving very successfully a plethora of tasks and provide new predictive models for complex physical, chemical, biological and social systems. However, in most cases this comes with the disadvantage of acting as a black box, rarely providing information about what made them arrive at a particular prediction. This black box aspect of ML techniques can be problematic especially in medical diagnoses, so far hampering a clinical acceptance. The present paper studies the uniqueness of individual gait patterns in clinical biomechanics using DNNs. By attributing portions of the model predictions back to the input variables (ground reaction forces and full-body joint angles), the Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual. By measuring the timeresolved contribution of each input variable to the prediction of ML techniques such as DNNs, our method describes the first general framework that enables to understand and interpret non-linear ML methods in (biomechanical) gait analysis and thereby supplies a powerful tool for analysis, diagnosis and treatment of human gait.

...read moreread less

Posted Content•

iNNvestigate neural networks

[...]

Maximilian Alber¹, Sebastian Lapuschkin², Philipp Seegerer¹, Miriam Hägele¹, Kristof T. Schütt¹, Grégoire Montavon¹, Wojciech Samek², Klaus-Robert Müller, Sven Dähne¹, Pieter-Jan Kindermans¹ - Show less +6 more•Institutions (2)

Technical University of Berlin¹, Heinrich Hertz Institute²

13 Aug 2018-arXiv: Learning

TL;DR: iNNvestigate as discussed by the authors provides a common interface and out-of-the-box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods.

...read moreread less

Abstract: In recent years, deep neural networks have revolutionized many application domains of machine learning and are key components of many critical decision or predictive processes. Therefore, it is crucial that domain specialists can understand and analyze actions and pre- dictions, even of the most complex neural network architectures. Despite these arguments neural networks are often treated as black boxes. In the attempt to alleviate this short- coming many analysis methods were proposed, yet the lack of reference implementations often makes a systematic comparison between the methods a major effort. The presented library iNNvestigate addresses this by providing a common interface and out-of-the- box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods. To demonstrate the versatility of iNNvestigate, we provide an analysis of image classifications for variety of state-of-the-art neural network architectures.

...read moreread less

Journal Article•DOI•

Support Vector Data Descriptions and $k$ -Means Clustering: One Class?

[...]

Nico Görnitz, Luiz Alberto Lima¹, Klaus-Robert Müller, Marius Kloft², Shinichi Nakajima - Show less +1 more•Institutions (2)

Pontifical Catholic University of Rio de Janeiro¹, Humboldt University of Berlin²

01 Sep 2018-IEEE Transactions on Neural Networks

TL;DR: A methodology that unifies support vector data descriptions (SVDDs) andinline-formula-means clustering into a single formulation that allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa is presented.

...read moreread less

Abstract: We present ClusterSVDD , a methodology that unifies support vector data descriptions (SVDDs) and $k$ -means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to $k$ -means. In particular, our approach leads to a new interpretation of $k$ -means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD .

...read moreread less

Journal Article•DOI•

Assessing Perceived Image Quality Using Steady-State Visual Evoked Potentials and Spatio-Spectral Decomposition

[...]

Sebastian Bosse¹, Laura Acqualagna², Wojciech Samek¹, Anne K. Porbadnigk², Gabriel Curio³, Benjamin Blankertz², Klaus-Robert Müller², Thomas Wiegand¹ - Show less +4 more•Institutions (3)

Heinrich Hertz Institute¹, Technical University of Berlin², Charité³

01 Aug 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: It is shown that the use of SSD not only increases the correlation between neural features and MOS to r=-0.93, but also solves the problem of channel selection in an EEG-based image-quality assessment.

...read moreread less

Abstract: Steady-state visual evoked potentials (SSVEPs) are neural responses, measurable using electroencephalography (EEG), that are directly linked to sensory processing of visual stimuli. In this paper, SSVEP is used to assess the perceived quality of texture images. The EEG-based assessment method is compared with conventional methods, and recorded EEG data are correlated to obtained mean opinion scores (MOSs). A dimensionality reduction technique for EEG data called spatio-spectral decomposition (SSD) is adapted for the SSVEP framework and used to extract physiologically meaningful and plausible neural components from the EEG recordings. It is shown that the use of SSD not only increases the correlation between neural features and MOS to $r=-0.93$ , but also solves the problem of channel selection in an EEG-based image-quality assessment.

...read moreread less

Posted Content•

Towards computational fluorescence microscopy: Machine learning-based integrated prediction of morphological and molecular tumor profiles.

[...]

Alexander Binder, Michael Bockmayr, Miriam Hägele, Stephan Wienert, Daniel Heim, Katharina Hellweg, Albrecht Stenzinger, Laura Parlow, Jan Budczies, Benjamin Goeppert, Denise Treue, Manato Kotani, Masaru Ishii, Manfred Dietel, Andreas C. Hocke, Carsten Denkert, Klaus-Robert Müller, Frederick Klauschen - Show less +14 more

28 May 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This integration of microanatomic information of tumors with complex molecular profiling data, including protein or gene expression, copy number variation, gene methylation and somatic mutations, provides a novel means to computationally score molecular markers with respect to their relevance to cancer and their spatial associations within the tumor microenvironment.

...read moreread less

Abstract: Recent advances in cancer research largely rely on new developments in microscopic or molecular profiling techniques offering high level of detail with respect to either spatial or molecular features, but usually not both. Here, we present a novel machine learning-based computational approach that allows for the identification of morphological tissue features and the prediction of molecular properties from breast cancer imaging data. This integration of microanatomic information of tumors with complex molecular profiling data, including protein or gene expression, copy number variation, gene methylation and somatic mutations, provides a novel means to computationally score molecular markers with respect to their relevance to cancer and their spatial associations within the tumor microenvironment.

...read moreread less

Posted Content•

Robustifying Models Against Adversarial Attacks by Langevin Dynamics

[...]

Vignesh Srinivasan¹, Arturo Marban¹, Klaus-Robert Müller², Wojciech Samek², Shinichi Nakajima² - Show less +1 more•Institutions (2)

Fraunhofer Society¹, Technical University of Berlin²

30 May 2018-arXiv: Learning

TL;DR: A novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account.

...read moreread less

Abstract: Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy where adversarial samples are relaxed onto the underlying manifold of the (unknown) target class distribution. Specifically, our algorithm drives off-manifold adversarial samples towards high density regions of the data generating distribution of the target class by the Metroplis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. Although the motivation is similar to projection methods, e.g., Defense-GAN, our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion - projection is distributed broadly, and therefore any whitebox attack cannot accurately align the input so that the MALADE moves it to a targeted untrained spot where the model predicts a wrong label. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.

...read moreread less

Posted Content•

Quantum-chemical insights from interpretable atomistic neural networks

[...]

Kristof T. Schütt¹, Michael Gastegger¹, Alexandre Tkatchenko², Klaus-Robert Müller¹, Klaus-Robert Müller³, Klaus-Robert Müller⁴ - Show less +2 more•Institutions (4)

Technical University of Berlin¹, University of Luxembourg², Max Planck Society³, Korea University⁴

27 Jun 2018-arXiv: Computational Physics

TL;DR: In this article, the authors describe interpretation techniques for atomistic neural networks on the example of Behler-Parrinello networks as well as the end-to-end model SchNet.

...read moreread less

Abstract: With the rise of deep neural networks for quantum chemistry applications, there is a pressing need for architectures that, beyond delivering accurate predictions of chemical properties, are readily interpretable by researchers. Here, we describe interpretation techniques for atomistic neural networks on the example of Behler-Parrinello networks as well as the end-to-end model SchNet. Both models obtain predictions of chemical properties by aggregating atom-wise contributions. These latent variables can serve as local explanations of a prediction and are obtained during training without additional cost. Due to their correspondence to well-known chemical concepts such as atomic energies and partial charges, these atom-wise explanations enable insights not only about the model but more importantly about the underlying quantum-chemical regularities. We generalize from atomistic explanations to 3d space, thus obtaining spatially resolved visualizations which further improve interpretability. Finally, we analyze learned embeddings of chemical elements that exhibit a partial ordering that resembles the order of the periodic table. As the examined neural networks show excellent agreement with chemical knowledge, the presented techniques open up new venues for data-driven research in chemistry, physics and materials science.

...read moreread less

Journal Article•DOI•

Improvement of Information Transfer Rates Using a Hybrid EEG-NIRS Brain-Computer Interface with a Short Trial Length: Offline and Pseudo-Online Analyses

[...]

Jaeyoung Shin¹, Do-Won Kim², Klaus-Robert Müller³, Han-Jeong Hwang⁴•Institutions (4)

Hanyang University¹, Chonnam National University², Technical University of Berlin³, Kumoh National Institute of Technology⁴

05 Jun 2018-Sensors

TL;DR: The suitability of implementing a more practical hBCI based on intuitive mental tasks without preliminary training and with a shorter trial length was validated and the average ITRs were improved, compared to those reported in previous studies.

...read moreread less

Abstract: Electroencephalography (EEG) and near-infrared spectroscopy (NIRS) are non-invasive neuroimaging methods that record the electrical and metabolic activity of the brain, respectively. Hybrid EEG-NIRS brain-computer interfaces (hBCIs) that use complementary EEG and NIRS information to enhance BCI performance have recently emerged to overcome the limitations of existing unimodal BCIs, such as vulnerability to motion artifacts for EEG-BCI or low temporal resolution for NIRS-BCI. However, with respect to NIRS-BCI, in order to fully induce a task-related brain activation, a relatively long trial length (≥10 s) is selected owing to the inherent hemodynamic delay that lowers the information transfer rate (ITR; bits/min). To alleviate the ITR degradation, we propose a more practical hBCI operated by intuitive mental tasks, such as mental arithmetic (MA) and word chain (WC) tasks, performed within a short trial length (5 s). In addition, the suitability of the WC as a BCI task was assessed, which has so far rarely been used in the BCI field. In this experiment, EEG and NIRS data were simultaneously recorded while participants performed MA and WC tasks without preliminary training and remained relaxed (baseline; BL). Each task was performed for 5 s, which was a shorter time than previous hBCI studies. Subsequently, a classification was performed to discriminate MA-related or WC-related brain activations from BL-related activations. By using hBCI in the offline/pseudo-online analyses, average classification accuracies of 90.0 ± 7.1/85.5 ± 8.1% and 85.8 ± 8.6/79.5 ± 13.4% for MA vs. BL and WC vs. BL, respectively, were achieved. These were significantly higher than those of the unimodal EEG- or NIRS-BCI in most cases. Given the short trial length and improved classification accuracy, the average ITRs were improved by more than 96.6% for MA vs. BL and 87.1% for WC vs. BL, respectively, compared to those reported in previous studies. The suitability of implementing a more practical hBCI based on intuitive mental tasks without preliminary training and with a shorter trial length was validated when compared to previous studies.

...read moreread less

Posted Content•

Interpretable LSTMs For Whole-Brain Neuroimaging Analyses

[...]

Armin W. Thomas, Hauke R. Heekeren, Klaus-Robert Müller, Wojciech Samek

23 Oct 2018-arXiv: Learning

TL;DR: The DLight framework is introduced, which overcomes challenges by utilizing a long short-term memory unit (LSTM) based deep neural network architecture to analyze the spatial dependency structure of whole-brain fMRI data and which outperforms conventional decoding approaches, while still detecting physiologically appropriate brain areas for the cognitive states classified.

...read moreread less

Abstract: The analysis of neuroimaging data poses several strong challenges, in particular, due to its high dimensionality, its strong spatio-temporal correlation and the comparably small sample sizes of the respective datasets. To address these challenges, conventional decoding approaches such as the searchlight reduce the complexity of the decoding problem by considering local clusters of voxels only. Thereby, neglecting the distributed spatial patterns of brain activity underlying many cognitive states. In this work, we introduce the DLight framework, which overcomes these challenges by utilizing a long short-term memory unit (LSTM) based deep neural network architecture to analyze the spatial dependency structure of whole-brain fMRI data. In order to maintain interpretability of the neuroimaging data, we adapt the layer-wise relevance propagation (LRP) method. Thereby, we enable the neuroscientist user to study the learned association of the LSTM between the data and the cognitive state of the individual. We demonstrate the versatility of DLight by applying it to a large fMRI dataset of the Human Connectome Project. We show that the decoding performance of our method scales better with large datasets, and moreover outperforms conventional decoding approaches, while still detecting physiologically appropriate brain areas for the cognitive states classified. We also demonstrate that DLight is able to detect these areas on several levels of data granularity (i.e., group, subject, trial, time point).

...read moreread less

Posted Content•

Compact and Computationally Efficient Representation of Deep Neural Networks

[...]

Simon Wiedemann¹, Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

27 May 2018-arXiv: Learning

TL;DR: In this article, the authors present new efficient representations for matrices with low entropy statistics, which have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix.

...read moreread less

Abstract: At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy statistics. These new matrix formats have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix, consequently implying that they are guaranteed to become more efficient as the entropy of the matrix is being reduced. In our experiments we show that performing the dot product under these new matrix formats can indeed be more energy and time efficient under practically relevant assumptions. For instance, we are able to attain up to x42 compression ratios, x5 speed ups and x90 energy savings when we convert in a lossless manner the weight matrices of state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new matrix formats and benchmark their respective dot product operation.

...read moreread less

Posted Content•

Efficient prediction of 3D electron densities using machine learning

[...]

Mihail Bogojeski, Felix Brockherde, Leslie Vogt-Maranto, Li Li, Mark E. Tuckerman, Kieron Burke, Klaus-Robert Müller - Show less +3 more

15 Nov 2018-arXiv: Computational Physics

TL;DR: A machine learning model capable of learning the electron density and the corresponding energy functional based on a set of training examples is revisited, allowing us to bypass solving the Kohn-Sham equations, providing a significant decrease in computation time.

...read moreread less

Abstract: The Kohn-Sham scheme of density functional theory is one of the most widely used methods to solve electronic structure problems for a vast variety of atomistic systems across different scientific fields. While the method is fast relative to other first principles methods and widely successful, the computational time needed is still not negligible, making it difficult to perform calculations for very large systems or over long time-scales. In this submission, we revisit a machine learning model capable of learning the electron density and the corresponding energy functional based on a set of training examples. It allows us to bypass solving the Kohn-Sham equations, providing a significant decrease in computation time. We specifically focus on the machine learning formulation of the Hohenberg-Kohn map and its decomposability. We give results and discuss challenges, limits and future directions.

...read moreread less

Journal Article•DOI•

Unsupervised Learning for Brain-Computer Interfaces Based on Event-Related Potentials: Review and Online Comparison [Research Frontier]

[...]

David Hübner¹, Thibault Verhoeven², Klaus-Robert Müller, Pieter-Jan Kindermans, Michael Tangermann¹ - Show less +1 more•Institutions (2)

University of Freiburg¹, Ghent University²

11 Apr 2018-IEEE Computational Intelligence Magazine

TL;DR: The best possible proof in BCI that an unsupervised decoding method can in practice render a supervised method unnecessary is delivered, possible despite skipping the calibration, without losing much performance and with the prospect of continuous improvement over a session.

...read moreread less

Abstract: One of the fundamental challenges in brain-computer interfaces (BCIs) is to tune a brain signal decoder to reliably detect a user's intention. While information about the decoder can partially be transferred between subjects or sessions, optimal decoding performance can only be reached with novel data from the current session. Thus, it is preferable to learn from unlabeled data gained from the actual usage of the BCI application instead of conducting a calibration recording prior to BCI usage. We review such unsupervised machine learning methods for BCIs based on event-related potentials of the electroencephalogram. We present results of an online study with twelve healthy participants controlling a visual speller. Online performance is reported for three completely unsupervised learning methods: (1) learning from label proportions, (2) an expectation-maximization approach and (3) MIX, which combines the strengths of the two other methods. After a short ramp-up, we observed that the MIX method not only defeats its two unsupervised competitors but even performs on par with a state-of-the-art regularized linear discriminant analysis trained on the same number of data points and with full label access. With this online study, we deliver the best possible proof in BCI that an unsupervised decoding method can in practice render a supervised method unnecessary. This is possible despite skipping the calibration, without losing much performance and with the prospect of continuous improvement over a session. Thus, our findings pave the way for a transition from supervised to unsupervised learning methods in BCIs based on eventrelated potentials.

...read moreread less

Journal Article•DOI•

Wasserstein Stationary Subspace Analysis

[...]

Stephan Kaltenstadler¹, Shinichi Nakajima², Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

04 Oct 2018-IEEE Journal of Selected Topics in Signal Processing

TL;DR: The usefulness of the novel algorithms for toy data demonstrating their mathematical properties and for real-world data 1) allowing better segmentation of time series and 2) brain–computer interfacing, where the Wasserstein-based measure of nonstationarity is used for spatial filter regularization and gives rise to higher decoding performance.

...read moreread less

Abstract: Learning under nonstationarity can be achieved by decomposing the data into a subspace that is stationary and a nonstationary one [stationary subspace analysis (SSA)]. While SSA has been used in various applications, its robustness and computational efficiency have limits due to the difficulty in optimizing the Kullback-Leibler divergence based objective. In this paper, we contribute by extending SSA twofold: we propose SSA with 1) higher numerical efficiency by defining analytical SSA variants and 2) higher robustness by utilizing the Wasserstein-2 distance (Wasserstein SSA). We show the usefulness of our novel algorithms for toy data demonstrating their mathematical properties and for real-world data 1) allowing better segmentation of time series and 2) brain–computer interfacing, where the Wasserstein-based measure of nonstationarity is used for spatial filter regularization and gives rise to higher decoding performance.

...read moreread less

Journal Article•DOI•

Computational analysis reveals histotype-dependent molecular profile and actionable mutation effects across cancers

[...]

Daniel Heim¹, Grégoire Montavon², Peter Hufnagl¹, Klaus-Robert Müller, Frederick Klauschen¹, Frederick Klauschen³ - Show less +2 more•Institutions (3)

Humboldt University of Berlin¹, Technical University of Berlin², German Cancer Research Center³

15 Nov 2018-Genome Medicine

TL;DR: This computational approach can be used to identify mutational signatures that have protein-level effects and can therefore contribute to preclinical in silico tests of the efficacy of molecular classifications as well as the druggability of individual mutations.

...read moreread less

Abstract: Comprehensive mutational profiling data now available on all major cancers have led to proposals of novel molecular tumor classifications that modify or replace the established organ- and tissue-based tumor typing. The rationale behind such molecular reclassifications is that genetic alterations underlying cancer pathology predict response to therapy and may therefore offer a more precise view on cancer than histology. The use of individual actionable mutations to select cancers for treatment across histotypes is already being tested in the so-called basket trials with variable success rates. Here, we present a computational approach that facilitates the systematic analysis of the histological context dependency of mutational effects by integrating genomic and proteomic tumor profiles across cancers. To determine effects of oncogenic mutations on protein profiles, we used the energy distance, which compares the Euclidean distances of protein profiles in tumors with an oncogenic mutation (inner distance) to that in tumors without the mutation (outer distance) and performed Monte Carlo simulations for the significance analysis. Finally, the proteins were ranked by their contribution to profile differences to identify proteins characteristic of oncogenic mutation effects across cancers. We apply our approach to four current proposals of molecular tumor classifications and major therapeutically relevant actionable genes. All 12 actionable genes evaluated show effects on the protein level in the corresponding tumor type and showed additional mutation-related protein profiles in 21 tumor types. Moreover, our analysis identifies consistent cross-cancer effects for 4 genes (FGFR1, ERRB2, IDH1, KRAS/NRAS) in 14 tumor types. We further use cell line drug response data to validate our findings. This computational approach can be used to identify mutational signatures that have protein-level effects and can therefore contribute to preclinical in silico tests of the efficacy of molecular classifications as well as the druggability of individual mutations. It thus supports the identification of novel targeted therapies effective across cancers and guides efficient basket trial designs.

...read moreread less

Posted Content•

Learning representations of molecules and materials with atomistic neural networks

[...]

Kristof T. Schütt¹, Alexandre Tkatchenko², Klaus-Robert Müller¹•Institutions (2)

Technical University of Berlin¹, University of Luxembourg²

11 Dec 2018-arXiv: Computational Physics

TL;DR: This chapter presents neural network architectures that are able to learn efficient representations of molecules and materials and shows that the continuous-filter convolutional network SchNet accurately predicts chemical properties across compositional and configurational space on a variety of datasets.

...read moreread less

Abstract: Deep Learning has been shown to learn efficient representations for structured data such as image, text or audio. In this chapter, we present neural network architectures that are able to learn efficient representations of molecules and materials. In particular, the continuous-filter convolutional network SchNet accurately predicts chemical properties across compositional and configurational space on a variety of datasets. Beyond that, we analyze the obtained representations to find evidence that their spatial and chemical properties agree with chemical intuition.

...read moreread less

Posted Content•

Entropy-Constrained Training of Deep Neural Networks

[...]

Simon Wiedemann¹, Arturo Marban¹, Klaus-Robert Müller², Wojciech Samek¹•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

18 Dec 2018-arXiv: Learning

TL;DR: In this paper, the authors propose a general framework for neural network compression motivated by the minimum description length (MDL) principle and derive an expression for the entropy of a neural network.

...read moreread less

Abstract: We propose a general framework for neural network compression that is motivated by the Minimum Description Length (MDL) principle. For that we first derive an expression for the entropy of a neural network, which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the compression techniques proposed in the literature, in that pruning or reducing the cardinality of the weight elements of the network can be seen special cases of entropy-minimization techniques. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient based optimization techniques. Finally, we show that we can reach state-of-the-art compression results on different network architectures and data sets, e.g. achieving x71 compression gains on a VGG-like architecture.

...read moreread less