scispace - formally typeset
Search or ask a question

Showing papers by "Klaus-Robert Müller published in 2018"


Journal ArticleDOI
TL;DR: The second part of the tutorial focuses on the recently proposed layer-wise relevance propagation (LRP) technique, for which the author provides theory, recommendations, and tricks, to make most efficient use of it on real data.

1,939 citations


Journal ArticleDOI
TL;DR: SchNet as mentioned in this paper is a deep learning architecture specifically designed to model atomistic systems by making use of continuous-filter convolutional layers, where the model learns chemically plausible embeddings of atom types across the periodic table.
Abstract: Deep learning has led to a paradigm shift in artificial intelligence, including web, text, and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning, in general, and deep learning, in particular, are ideally suitable for representing quantum-mechanical interactions, enabling us to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for molecules and materials, where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study on the quantum-mechanical properties of C20-fullerene that would have been infeasible with regular ab initio molecular dynamics.

1,104 citations


Journal ArticleDOI
TL;DR: A deep neural network-based approach to image quality assessment (IQA) that allows for joint learning of local quality and local weights in an unified framework and shows a high ability to generalize between different databases, indicating a high robustness of the learned features.
Abstract: We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features.

479 citations


Journal ArticleDOI
TL;DR: A flexible machine-learning force-field with high-level accuracy for molecular dynamics simulations is developed, for flexible molecules with up to a few dozen atoms and insights into the dynamical behavior of these molecules are provided.
Abstract: Molecular dynamics (MD) simulations employing classical force fields constitute the cornerstone of contemporary atomistic modeling in chemistry, biology, and materials science. However, the predictive power of these simulations is only as good as the underlying interatomic potential. Classical potentials often fail to faithfully capture key quantum effects in molecules and materials. Here we enable the direct construction of flexible molecular force fields from high-level ab initio calculations by incorporating spatial and temporal physical symmetries into a gradient-domain machine learning (sGDML) model in an automatic data-driven way. The developed sGDML approach faithfully reproduces global force fields at quantum-chemical CCSD(T) level of accuracy and allows converged molecular dynamics simulations with fully quantized electrons and nuclei. We present MD simulations, for flexible molecules with up to a few dozen atoms and provide insights into the dynamical behavior of these molecules. Our approach provides the key missing ingredient for achieving spectroscopic accuracy in molecular simulations.

445 citations


Journal Article
TL;DR: In this article, a gradient-domain machine learning (sGDML) model is proposed to construct flexible molecular force fields from high-level ab initio calculations by incorporating spatial and temporal physical symmetries into a sGDML model.
Abstract: Molecular dynamics (MD) simulations employing classical force fields constitute the cornerstone of contemporary atomistic modeling in chemistry, biology, and materials science. However, the predictive power of these simulations is only as good as the underlying interatomic potential. Classical potentials often fail to faithfully capture key quantum effects in molecules and materials. Here we enable the direct construction of flexible molecular force fields from high-level ab initio calculations by incorporating spatial and temporal physical symmetries into a gradient-domain machine learning (sGDML) model in an automatic data-driven way. The developed sGDML approach faithfully reproduces global force fields at quantum-chemical CCSD(T) level of accuracy and allows converged molecular dynamics simulations with fully quantized electrons and nuclei. We present MD simulations, for flexible molecules with up to a few dozen atoms and provide insights into the dynamical behavior of these molecules. Our approach provides the key missing ingredient for achieving spectroscopic accuracy in molecular simulations.

312 citations


Proceedings Article
01 Jan 2018
TL;DR: This work argues that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models, and proposes a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.
Abstract: DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

218 citations


Posted Content
TL;DR: This paper presents a novel audio dataset of English spoken digits which is used for classification tasks on spoken digits and speaker's gender and confirms that the networks are highly reliant on features marked as relevant by LRP.
Abstract: Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions. This paper explores the interpretability of neural networks in the audio domain by using the previously proposed technique of layer-wise relevance propagation (LRP). We present a novel audio dataset of English spoken digits which we use for classification tasks on spoken digits and speaker's gender. We use LRP to identify relevant features for two neural network architectures that process either waveform or spectrogram representations of the data. Based on the relevance scores obtained from LRP, hypotheses about the neural networks' feature selection are derived and subsequently tested through systematic manipulations of the input data. The results confirm that the networks are highly reliant on features marked as relevant by LRP.

129 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: The results indicate that the use of proposed motion-based RSVP paradigm is more beneficial for target recognition when developing BCI applications for severely paralyzed patients with complex ocular dysfunctions.
Abstract: Most event-related potential (ERP)-based brain–computer interface (BCI) spellers primarily use matrix layouts and generally require moderate eye movement for successful operation. The fundamental objective of this paper is to enhance the perceptibility of target characters by introducing motion stimuli to classical rapid serial visual presentation (RSVP) spellers that do not require any eye movement, thereby applying them to paralyzed patients with oculomotor dysfunctions. To test the feasibility of the proposed motion-based RSVP paradigm, we implemented three RSVP spellers: 1) fixed-direction motion (FM-RSVP); 2) random-direction motion (RM-RSVP); and 3) (the conventional) non-motion stimulation (NM-RSVP), and evaluated the effect of the three different stimulation methods on spelling performance. The two motion-based stimulation methods, FM- and RM-RSVP, showed shorter P300 latency and higher P300 amplitudes ( i.e. , 360.4–379.6 ms; 5.5867– $5.7662~\mu {V}$ ) than the NM-RSVP ( i.e. , 480.4 ms; $4.7426~\mu {V}$ ). This led to higher and more stable performances for FM- and RM-RSVP spellers than NM-RSVP speller ( i.e. , 79.06±6.45% for NM-RSVP, 90.60±2.98% for RM-RSVP, and 92.74±2.55% for FM-RSVP). In particular, the proposed motion-based RSVP paradigm was significantly beneficial for about half of the subjects who might not accurately perceive rapidly presented static stimuli. These results indicate that the use of proposed motion-based RSVP paradigm is more beneficial for target recognition when developing BCI applications for severely paralyzed patients with complex ocular dysfunctions.

123 citations


Journal ArticleDOI
TL;DR: Different automated TIL scoring approaches are discussed ranging from classical image segmentation, where cell boundaries are identified and the resulting objects classified according to shape properties, to machine learning-based approaches that directly classify cells without segmentation but rely on large amounts of training data.

114 citations


Posted Content
TL;DR: SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits to mitigate the limited communication bandwidth between contributing nodes or prohibitive communication cost for distributed training.
Abstract: Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off gradient sparsity and temporal sparsity to adapt to the requirements of the learning task. Our experiments show, that SBC can reduce the upstream communication on a variety of convolutional and recurrent neural network architectures by more than four orders of magnitude without significantly harming the convergence speed in terms of forward-backward passes. For instance, we can train ResNet50 on ImageNet in the same number of iterations to the baseline accuracy, using $\times 3531$ less bits or train it to a $1\%$ lower accuracy using $\times 37208$ less bits. In the latter case, the total upstream communication required is cut from 125 terabytes to 3.35 gigabytes for every participating client.

103 citations


Journal ArticleDOI
TL;DR: An open access multimodal brain-imaging dataset of simultaneous electroencephalography (EEG) and near-infrared spectroscopy (NIRS) recordings is provided to facilitate performance evaluation and comparison of many neuroimaging analysis techniques.
Abstract: We provide an open access multimodal brain-imaging dataset of simultaneous electroencephalography (EEG) and near-infrared spectroscopy (NIRS) recordings. Twenty-six healthy participants performed three cognitive tasks: 1) n-back (0-, 2- and 3-back), 2) discrimination/selection response task (DSR) and 3) word generation (WG) tasks. The data provided includes: 1) measured data, 2) demographic data, and 3) basic analysis results. For n-back (dataset A) and DSR tasks (dataset B), event-related potential (ERP) analysis was performed, and spatiotemporal characteristics and classification results for 'target' versus 'non-target' (dataset A) and symbol 'O' versus symbol 'X' (dataset B) are provided. Time-frequency analysis was performed to show the EEG spectral power to differentiate the task-relevant activations. Spatiotemporal characteristics of hemodynamic responses are also shown. For the WG task (dataset C), the EEG spectral power and spatiotemporal characteristics of hemodynamic responses are analyzed, and the potential merit of hybrid EEG-NIRS BCIs was validated with respect to classification accuracy. We expect that the dataset provided will facilitate performance evaluation and comparison of many neuroimaging analysis techniques.

Journal ArticleDOI
TL;DR: A set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing are proposed and evaluated on predicting several properties of small organic molecules calculated using density-functional theory.
Abstract: Machine learning (ML) based prediction of molecular properties across chemical compound space is an important and alternative approach to efficiently estimate the solutions of highly complex many-electron problems in chemistry and physics. Statistical methods represent molecules as descriptors that should encode molecular symmetries and interactions between atoms. Many such descriptors have been proposed; all of them have advantages and limitations. Here, we propose a set of general two-body and three-body interaction descriptors which are invariant to translation, rotation, and atomic indexing. By adapting the successfully used kernel ridge regression methods of machine learning, we evaluate our descriptors on predicting several properties of small organic molecules calculated using density-functional theory. We use two data sets. The GDB-7 set contains 6868 molecules with up to 7 heavy atoms of type CNO. The GDB-9 set is composed of 131722 molecules with up to 9 heavy atoms containing CNO. When trained ...

Journal ArticleDOI
TL;DR: Current state-of-the-art machine learning techniques are more suited to predict extensive as opposed to intensive quantities, but it is speculated on the need to develop global descriptors that can describe both extensive and intensive properties on equal footing.
Abstract: Machine learning has been successfully applied to the prediction of chemical properties of small organic molecules such as energies or polarizabilities. Compared to these properties, the electronic excitation energies pose a much more challenging learning problem. Here, we examine the applicability of two existing machine learning methodologies to the prediction of excitation energies from time-dependent density functional theory. To this end, we systematically study the performance of various 2- and 3-body descriptors as well as the deep neural network SchNet to predict extensive as well as intensive properties such as the transition energies from the ground state to the first and second excited state. As perhaps expected current state-of-the-art machine learning techniques are more suited to predict extensive as opposed to intensive quantities. We speculate on the need to develop global descriptors that can describe both extensive and intensive properties on equal footing.

Journal ArticleDOI
TL;DR: In this paper, the uniqueness of individual gait patterns in clinical biomechanics using DNNs was studied using Layer-Wise Relevance Propagation (LRP) technique, which reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait pattern from a certain individual.
Abstract: Machine learning (ML) techniques such as (deep) artificial neural networks (DNN) are solving very successfully a plethora of tasks and provide new predictive models for complex physical, chemical, biological and social systems. However, in most cases this comes with the disadvantage of acting as a black box, rarely providing information about what made them arrive at a particular prediction. This black box aspect of ML techniques can be problematic especially in medical diagnoses, so far hampering a clinical acceptance. The present paper studies the uniqueness of individual gait patterns in clinical biomechanics using DNNs. By attributing portions of the model predictions back to the input variables (ground reaction forces and full-body joint angles), the Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual. By measuring the time-resolved contribution of each input variable to the prediction of ML techniques such as DNNs, our method describes the first general framework that enables to understand and interpret non-linear ML methods in (biomechanical) gait analysis and thereby supplies a powerful tool for analysis, diagnosis and treatment of human gait.

Journal ArticleDOI
TL;DR: The Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual, and provides a powerful tool for analysis, diagnosis and treatment of human gait.
Abstract: Machine learning (ML) techniques such as (deep) artificial neural networks (DNN) are solving very successfully a plethora of tasks and provide new predictive models for complex physical, chemical, biological and social systems. However, in most cases this comes with the disadvantage of acting as a black box, rarely providing information about what made them arrive at a particular prediction. This black box aspect of ML techniques can be problematic especially in medical diagnoses, so far hampering a clinical acceptance. The present paper studies the uniqueness of individual gait patterns in clinical biomechanics using DNNs. By attributing portions of the model predictions back to the input variables (ground reaction forces and full-body joint angles), the Layer-Wise Relevance Propagation (LRP) technique reliably demonstrates which variables at what time windows of the gait cycle are most relevant for the characterisation of gait patterns from a certain individual. By measuring the timeresolved contribution of each input variable to the prediction of ML techniques such as DNNs, our method describes the first general framework that enables to understand and interpret non-linear ML methods in (biomechanical) gait analysis and thereby supplies a powerful tool for analysis, diagnosis and treatment of human gait.

Posted Content
TL;DR: iNNvestigate as discussed by the authors provides a common interface and out-of-the-box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods.
Abstract: In recent years, deep neural networks have revolutionized many application domains of machine learning and are key components of many critical decision or predictive processes. Therefore, it is crucial that domain specialists can understand and analyze actions and pre- dictions, even of the most complex neural network architectures. Despite these arguments neural networks are often treated as black boxes. In the attempt to alleviate this short- coming many analysis methods were proposed, yet the lack of reference implementations often makes a systematic comparison between the methods a major effort. The presented library iNNvestigate addresses this by providing a common interface and out-of-the- box implementation for many analysis methods, including the reference implementation for PatternNet and PatternAttribution as well as for LRP-methods. To demonstrate the versatility of iNNvestigate, we provide an analysis of image classifications for variety of state-of-the-art neural network architectures.

Journal ArticleDOI
TL;DR: A methodology that unifies support vector data descriptions (SVDDs) andinline-formula-means clustering into a single formulation that allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa is presented.
Abstract: We present ClusterSVDD , a methodology that unifies support vector data descriptions (SVDDs) and $k$ -means clustering into a single formulation. This allows both methods to benefit from one another, i.e., by adding flexibility using multiple spheres for SVDDs and increasing anomaly resistance and flexibility through kernels to $k$ -means. In particular, our approach leads to a new interpretation of $k$ -means as a regularized mode seeking algorithm. The unifying formulation further allows for deriving new algorithms by transferring knowledge from one-class learning settings to clustering settings and vice versa. As a showcase, we derive a clustering method for structured data based on a one-class learning scenario. Additionally, our formulation can be solved via a particularly simple optimization scheme. We evaluate our approach empirically to highlight some of the proposed benefits on artificially generated data, as well as on real-world problems, and provide a Python software package comprising various implementations of primal and dual SVDD as well as our proposed ClusterSVDD .

Journal ArticleDOI
TL;DR: It is shown that the use of SSD not only increases the correlation between neural features and MOS to r=-0.93, but also solves the problem of channel selection in an EEG-based image-quality assessment.
Abstract: Steady-state visual evoked potentials (SSVEPs) are neural responses, measurable using electroencephalography (EEG), that are directly linked to sensory processing of visual stimuli. In this paper, SSVEP is used to assess the perceived quality of texture images. The EEG-based assessment method is compared with conventional methods, and recorded EEG data are correlated to obtained mean opinion scores (MOSs). A dimensionality reduction technique for EEG data called spatio-spectral decomposition (SSD) is adapted for the SSVEP framework and used to extract physiologically meaningful and plausible neural components from the EEG recordings. It is shown that the use of SSD not only increases the correlation between neural features and MOS to $r=-0.93$ , but also solves the problem of channel selection in an EEG-based image-quality assessment.

Posted Content
TL;DR: This integration of microanatomic information of tumors with complex molecular profiling data, including protein or gene expression, copy number variation, gene methylation and somatic mutations, provides a novel means to computationally score molecular markers with respect to their relevance to cancer and their spatial associations within the tumor microenvironment.
Abstract: Recent advances in cancer research largely rely on new developments in microscopic or molecular profiling techniques offering high level of detail with respect to either spatial or molecular features, but usually not both. Here, we present a novel machine learning-based computational approach that allows for the identification of morphological tissue features and the prediction of molecular properties from breast cancer imaging data. This integration of microanatomic information of tumors with complex molecular profiling data, including protein or gene expression, copy number variation, gene methylation and somatic mutations, provides a novel means to computationally score molecular markers with respect to their relevance to cancer and their spatial associations within the tumor microenvironment.

Posted Content
TL;DR: A novel, simple yet effective defense strategy where off-manifold adversarial samples are driven towards high density regions of the data generating distribution of the (unknown) target class by the Metropolis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account.
Abstract: Adversarial attacks on deep learning models have compromised their performance considerably. As remedies, a lot of defense methods were proposed, which however, have been circumvented by newer attacking strategies. In the midst of this ensuing arms race, the problem of robustness against adversarial attacks still remains unsolved. This paper proposes a novel, simple yet effective defense strategy where adversarial samples are relaxed onto the underlying manifold of the (unknown) target class distribution. Specifically, our algorithm drives off-manifold adversarial samples towards high density regions of the data generating distribution of the target class by the Metroplis-adjusted Langevin algorithm (MALA) with perceptual boundary taken into account. Although the motivation is similar to projection methods, e.g., Defense-GAN, our algorithm, called MALA for DEfense (MALADE), is equipped with significant dispersion - projection is distributed broadly, and therefore any whitebox attack cannot accurately align the input so that the MALADE moves it to a targeted untrained spot where the model predicts a wrong label. In our experiments, MALADE exhibited state-of-the-art performance against various elaborate attacking strategies.

Posted Content
TL;DR: In this article, the authors describe interpretation techniques for atomistic neural networks on the example of Behler-Parrinello networks as well as the end-to-end model SchNet.
Abstract: With the rise of deep neural networks for quantum chemistry applications, there is a pressing need for architectures that, beyond delivering accurate predictions of chemical properties, are readily interpretable by researchers. Here, we describe interpretation techniques for atomistic neural networks on the example of Behler-Parrinello networks as well as the end-to-end model SchNet. Both models obtain predictions of chemical properties by aggregating atom-wise contributions. These latent variables can serve as local explanations of a prediction and are obtained during training without additional cost. Due to their correspondence to well-known chemical concepts such as atomic energies and partial charges, these atom-wise explanations enable insights not only about the model but more importantly about the underlying quantum-chemical regularities. We generalize from atomistic explanations to 3d space, thus obtaining spatially resolved visualizations which further improve interpretability. Finally, we analyze learned embeddings of chemical elements that exhibit a partial ordering that resembles the order of the periodic table. As the examined neural networks show excellent agreement with chemical knowledge, the presented techniques open up new venues for data-driven research in chemistry, physics and materials science.

Journal ArticleDOI
05 Jun 2018-Sensors
TL;DR: The suitability of implementing a more practical hBCI based on intuitive mental tasks without preliminary training and with a shorter trial length was validated and the average ITRs were improved, compared to those reported in previous studies.
Abstract: Electroencephalography (EEG) and near-infrared spectroscopy (NIRS) are non-invasive neuroimaging methods that record the electrical and metabolic activity of the brain, respectively. Hybrid EEG-NIRS brain-computer interfaces (hBCIs) that use complementary EEG and NIRS information to enhance BCI performance have recently emerged to overcome the limitations of existing unimodal BCIs, such as vulnerability to motion artifacts for EEG-BCI or low temporal resolution for NIRS-BCI. However, with respect to NIRS-BCI, in order to fully induce a task-related brain activation, a relatively long trial length (≥10 s) is selected owing to the inherent hemodynamic delay that lowers the information transfer rate (ITR; bits/min). To alleviate the ITR degradation, we propose a more practical hBCI operated by intuitive mental tasks, such as mental arithmetic (MA) and word chain (WC) tasks, performed within a short trial length (5 s). In addition, the suitability of the WC as a BCI task was assessed, which has so far rarely been used in the BCI field. In this experiment, EEG and NIRS data were simultaneously recorded while participants performed MA and WC tasks without preliminary training and remained relaxed (baseline; BL). Each task was performed for 5 s, which was a shorter time than previous hBCI studies. Subsequently, a classification was performed to discriminate MA-related or WC-related brain activations from BL-related activations. By using hBCI in the offline/pseudo-online analyses, average classification accuracies of 90.0 ± 7.1/85.5 ± 8.1% and 85.8 ± 8.6/79.5 ± 13.4% for MA vs. BL and WC vs. BL, respectively, were achieved. These were significantly higher than those of the unimodal EEG- or NIRS-BCI in most cases. Given the short trial length and improved classification accuracy, the average ITRs were improved by more than 96.6% for MA vs. BL and 87.1% for WC vs. BL, respectively, compared to those reported in previous studies. The suitability of implementing a more practical hBCI based on intuitive mental tasks without preliminary training and with a shorter trial length was validated when compared to previous studies.

Posted Content
TL;DR: The DLight framework is introduced, which overcomes challenges by utilizing a long short-term memory unit (LSTM) based deep neural network architecture to analyze the spatial dependency structure of whole-brain fMRI data and which outperforms conventional decoding approaches, while still detecting physiologically appropriate brain areas for the cognitive states classified.
Abstract: The analysis of neuroimaging data poses several strong challenges, in particular, due to its high dimensionality, its strong spatio-temporal correlation and the comparably small sample sizes of the respective datasets. To address these challenges, conventional decoding approaches such as the searchlight reduce the complexity of the decoding problem by considering local clusters of voxels only. Thereby, neglecting the distributed spatial patterns of brain activity underlying many cognitive states. In this work, we introduce the DLight framework, which overcomes these challenges by utilizing a long short-term memory unit (LSTM) based deep neural network architecture to analyze the spatial dependency structure of whole-brain fMRI data. In order to maintain interpretability of the neuroimaging data, we adapt the layer-wise relevance propagation (LRP) method. Thereby, we enable the neuroscientist user to study the learned association of the LSTM between the data and the cognitive state of the individual. We demonstrate the versatility of DLight by applying it to a large fMRI dataset of the Human Connectome Project. We show that the decoding performance of our method scales better with large datasets, and moreover outperforms conventional decoding approaches, while still detecting physiologically appropriate brain areas for the cognitive states classified. We also demonstrate that DLight is able to detect these areas on several levels of data granularity (i.e., group, subject, trial, time point).

Posted Content
TL;DR: In this article, the authors present new efficient representations for matrices with low entropy statistics, which have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix.
Abstract: At the core of any inference procedure in deep neural networks are dot product operations, which are the component that require the highest computational resources. A common approach to reduce the cost of inference is to reduce its memory complexity by lowering the entropy of the weight matrices of the neural network, e.g., by pruning and quantizing their elements. However, the quantized weight matrices are then usually represented either by a dense or sparse matrix storage format, whose associated dot product complexity is not bounded by the entropy of the matrix. This means that the associated inference complexity ultimately depends on the implicit statistical assumptions that these matrix representations make about the weight distribution, which can be in many cases suboptimal. In this paper we address this issue and present new efficient representations for matrices with low entropy statistics. These new matrix formats have the novel property that their memory and algorithmic complexity are implicitly bounded by the entropy of the matrix, consequently implying that they are guaranteed to become more efficient as the entropy of the matrix is being reduced. In our experiments we show that performing the dot product under these new matrix formats can indeed be more energy and time efficient under practically relevant assumptions. For instance, we are able to attain up to x42 compression ratios, x5 speed ups and x90 energy savings when we convert in a lossless manner the weight matrices of state-of-the-art networks such as AlexNet, VGG-16, ResNet152 and DenseNet into the new matrix formats and benchmark their respective dot product operation.

Posted Content
TL;DR: A machine learning model capable of learning the electron density and the corresponding energy functional based on a set of training examples is revisited, allowing us to bypass solving the Kohn-Sham equations, providing a significant decrease in computation time.
Abstract: The Kohn-Sham scheme of density functional theory is one of the most widely used methods to solve electronic structure problems for a vast variety of atomistic systems across different scientific fields. While the method is fast relative to other first principles methods and widely successful, the computational time needed is still not negligible, making it difficult to perform calculations for very large systems or over long time-scales. In this submission, we revisit a machine learning model capable of learning the electron density and the corresponding energy functional based on a set of training examples. It allows us to bypass solving the Kohn-Sham equations, providing a significant decrease in computation time. We specifically focus on the machine learning formulation of the Hohenberg-Kohn map and its decomposability. We give results and discuss challenges, limits and future directions.

Journal ArticleDOI
TL;DR: The best possible proof in BCI that an unsupervised decoding method can in practice render a supervised method unnecessary is delivered, possible despite skipping the calibration, without losing much performance and with the prospect of continuous improvement over a session.
Abstract: One of the fundamental challenges in brain-computer interfaces (BCIs) is to tune a brain signal decoder to reliably detect a user's intention. While information about the decoder can partially be transferred between subjects or sessions, optimal decoding performance can only be reached with novel data from the current session. Thus, it is preferable to learn from unlabeled data gained from the actual usage of the BCI application instead of conducting a calibration recording prior to BCI usage. We review such unsupervised machine learning methods for BCIs based on event-related potentials of the electroencephalogram. We present results of an online study with twelve healthy participants controlling a visual speller. Online performance is reported for three completely unsupervised learning methods: (1) learning from label proportions, (2) an expectation-maximization approach and (3) MIX, which combines the strengths of the two other methods. After a short ramp-up, we observed that the MIX method not only defeats its two unsupervised competitors but even performs on par with a state-of-the-art regularized linear discriminant analysis trained on the same number of data points and with full label access. With this online study, we deliver the best possible proof in BCI that an unsupervised decoding method can in practice render a supervised method unnecessary. This is possible despite skipping the calibration, without losing much performance and with the prospect of continuous improvement over a session. Thus, our findings pave the way for a transition from supervised to unsupervised learning methods in BCIs based on eventrelated potentials.

Journal ArticleDOI
TL;DR: The usefulness of the novel algorithms for toy data demonstrating their mathematical properties and for real-world data 1) allowing better segmentation of time series and 2) brain–computer interfacing, where the Wasserstein-based measure of nonstationarity is used for spatial filter regularization and gives rise to higher decoding performance.
Abstract: Learning under nonstationarity can be achieved by decomposing the data into a subspace that is stationary and a nonstationary one [stationary subspace analysis (SSA)]. While SSA has been used in various applications, its robustness and computational efficiency have limits due to the difficulty in optimizing the Kullback-Leibler divergence based objective. In this paper, we contribute by extending SSA twofold: we propose SSA with 1) higher numerical efficiency by defining analytical SSA variants and 2) higher robustness by utilizing the Wasserstein-2 distance (Wasserstein SSA). We show the usefulness of our novel algorithms for toy data demonstrating their mathematical properties and for real-world data 1) allowing better segmentation of time series and 2) brain–computer interfacing, where the Wasserstein-based measure of nonstationarity is used for spatial filter regularization and gives rise to higher decoding performance.

Journal ArticleDOI
TL;DR: This computational approach can be used to identify mutational signatures that have protein-level effects and can therefore contribute to preclinical in silico tests of the efficacy of molecular classifications as well as the druggability of individual mutations.
Abstract: Comprehensive mutational profiling data now available on all major cancers have led to proposals of novel molecular tumor classifications that modify or replace the established organ- and tissue-based tumor typing. The rationale behind such molecular reclassifications is that genetic alterations underlying cancer pathology predict response to therapy and may therefore offer a more precise view on cancer than histology. The use of individual actionable mutations to select cancers for treatment across histotypes is already being tested in the so-called basket trials with variable success rates. Here, we present a computational approach that facilitates the systematic analysis of the histological context dependency of mutational effects by integrating genomic and proteomic tumor profiles across cancers. To determine effects of oncogenic mutations on protein profiles, we used the energy distance, which compares the Euclidean distances of protein profiles in tumors with an oncogenic mutation (inner distance) to that in tumors without the mutation (outer distance) and performed Monte Carlo simulations for the significance analysis. Finally, the proteins were ranked by their contribution to profile differences to identify proteins characteristic of oncogenic mutation effects across cancers. We apply our approach to four current proposals of molecular tumor classifications and major therapeutically relevant actionable genes. All 12 actionable genes evaluated show effects on the protein level in the corresponding tumor type and showed additional mutation-related protein profiles in 21 tumor types. Moreover, our analysis identifies consistent cross-cancer effects for 4 genes (FGFR1, ERRB2, IDH1, KRAS/NRAS) in 14 tumor types. We further use cell line drug response data to validate our findings. This computational approach can be used to identify mutational signatures that have protein-level effects and can therefore contribute to preclinical in silico tests of the efficacy of molecular classifications as well as the druggability of individual mutations. It thus supports the identification of novel targeted therapies effective across cancers and guides efficient basket trial designs.

Posted Content
TL;DR: This chapter presents neural network architectures that are able to learn efficient representations of molecules and materials and shows that the continuous-filter convolutional network SchNet accurately predicts chemical properties across compositional and configurational space on a variety of datasets.
Abstract: Deep Learning has been shown to learn efficient representations for structured data such as image, text or audio. In this chapter, we present neural network architectures that are able to learn efficient representations of molecules and materials. In particular, the continuous-filter convolutional network SchNet accurately predicts chemical properties across compositional and configurational space on a variety of datasets. Beyond that, we analyze the obtained representations to find evidence that their spatial and chemical properties agree with chemical intuition.

Posted Content
TL;DR: In this paper, the authors propose a general framework for neural network compression motivated by the minimum description length (MDL) principle and derive an expression for the entropy of a neural network.
Abstract: We propose a general framework for neural network compression that is motivated by the Minimum Description Length (MDL) principle. For that we first derive an expression for the entropy of a neural network, which measures its complexity explicitly in terms of its bit-size. Then, we formalize the problem of neural network compression as an entropy-constrained optimization objective. This objective generalizes many of the compression techniques proposed in the literature, in that pruning or reducing the cardinality of the weight elements of the network can be seen special cases of entropy-minimization techniques. Furthermore, we derive a continuous relaxation of the objective, which allows us to minimize it using gradient based optimization techniques. Finally, we show that we can reach state-of-the-art compression results on different network architectures and data sets, e.g. achieving x71 compression gains on a VGG-like architecture.