scispace - formally typeset
Search or ask a question

Showing papers by "Klaus-Robert Müller published in 2017"


Journal ArticleDOI
TL;DR: A novel methodology for interpreting generic multilayer neural networks by decomposing the network classification decision into contributions of its input elements by backpropagating the explanations from the output to the input layer is introduced.

1,247 citations


Journal ArticleDOI
TL;DR: In this article, a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps is presented, and the authors compare heatmaps computed by three different methods on the SUN397, ILSVRC2012, and MIT Places data sets.
Abstract: Deep neural networks (DNNs) have demonstrated impressive performance in complex machine learning tasks such as image classification or speech recognition. However, due to their multilayer nonlinear structure, they are not transparent, i.e., it is hard to grasp what makes them arrive at a particular classification or recognition decision, given a new unseen data sample. Recently, several approaches have been proposed enabling one to understand and interpret the reasoning embodied in a DNN for a single test image. These methods quantify the “importance” of individual pixels with respect to the classification decision and allow a visualization in terms of a heatmap in pixel/input space. While the usefulness of heatmaps can be judged subjectively by a human, an objective quality measure is missing. In this paper, we present a general methodology based on region perturbation for evaluating ordered collections of pixels such as heatmaps. We compare heatmaps computed by three different methods on the SUN397, ILSVRC2012, and MIT Places data sets. Our main result is that the recently proposed layer-wise relevance propagation algorithm qualitatively and quantitatively provides a better explanation of what made a DNN arrive at a particular classification decision than the sensitivity-based approach or the deconvolution method. We provide theoretical arguments to explain this result and discuss its practical implications. Finally, we investigate the use of heatmaps for unsupervised assessment of the neural network performance.

866 citations


Posted Content
TL;DR: Two approaches to explaining predictions of deep learning models are presented, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables.
Abstract: With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching or even exceeding the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black box manner, i.e., no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g., in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

819 citations


Journal ArticleDOI
TL;DR: The GDML approach enables quantitative molecular dynamics simulations for molecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force fields with the accuracy and transferability of high-level ab initio methods.
Abstract: Using conservation of energy-a fundamental property of closed classical and quantum mechanical systems-we develop an efficient gradient-domain machine learning (GDML) approach to construct accurate molecular force fields using a restricted number of samples from ab initio molecular dynamics (AIMD) trajectories. The GDML implementation is able to reproduce global potential energy surfaces of intermediate-sized molecules with an accuracy of 0.3 kcal mol-1 for energies and 1 kcal mol-1 A-1 for atomic forces using only 1000 conformational geometries for training. We demonstrate this accuracy for AIMD trajectories of molecules, including benzene, toluene, naphthalene, ethanol, uracil, and aspirin. The challenge of constructing conservative force fields is accomplished in our work by learning in a Hilbert space of vector-valued functions that obey the law of energy conservation. The GDML approach enables quantitative molecular dynamics simulations for molecules at a fraction of cost of explicit AIMD calculations, thereby allowing the construction of efficient force fields with the accuracy and transferability of high-level ab initio methods.

766 citations


Journal ArticleDOI
TL;DR: SchNet as discussed by the authors is a deep learning architecture specifically designed to model atomistic systems by making use of continuous-filter convolutional layers, which can accurately predict a range of properties across chemical space for molecules and materials.
Abstract: Deep learning has led to a paradigm shift in artificial intelligence, including web, text and image search, speech recognition, as well as bioinformatics, with growing impact in chemical physics. Machine learning in general and deep learning in particular is ideally suited for representing quantum-mechanical interactions, enabling to model nonlinear potential-energy surfaces or enhancing the exploration of chemical compound space. Here we present the deep learning architecture SchNet that is specifically designed to model atomistic systems by making use of continuous-filter convolutional layers. We demonstrate the capabilities of SchNet by accurately predicting a range of properties across chemical space for \emph{molecules and materials} where our model learns chemically plausible embeddings of atom types across the periodic table. Finally, we employ SchNet to predict potential-energy surfaces and energy-conserving force fields for molecular dynamics simulations of small molecules and perform an exemplary study of the quantum-mechanical properties of C$_{20}$-fullerene that would have been infeasible with regular ab initio molecular dynamics.

557 citations


Journal ArticleDOI
TL;DR: The first molecular dynamics simulation with a machine-learned density functional on malonaldehyde is performed and the authors are able to capture the intramolecular proton transfer process.
Abstract: Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields. Machine learning holds the promise of learning the energy functional via examples, bypassing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing larger systems and/or longer time-scales to be tackled, but attempts to machine-learn this functional have been limited by the need to find its derivative. The present work overcomes this difficulty by directly learning the density-potential and energy-density maps for test systems and various molecules. We perform the first molecular dynamics simulation with a machine-learned density functional on malonaldehyde and are able to capture the intramolecular proton transfer process. Learning density models now allows the construction of accurate density functionals for realistic molecular systems.Machine learning allows electronic structure calculations to access larger system sizes and, in dynamical simulations, longer time scales. Here, the authors perform such a simulation using a machine-learned density functional that avoids direct solution of the Kohn-Sham equations.

530 citations


Journal ArticleDOI
11 Aug 2017-PLOS ONE
TL;DR: A measure of model explanatory power is introduced and it is shown that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.
Abstract: Text documents can be described by a number of abstract concepts such as semantic category, writing style, or sentiment. Machine learning (ML) models have been trained to automatically map documents to these abstract concepts, allowing to annotate very large text collections, more than could be processed by a human in a lifetime. Besides predicting the text’s category very accurately, it is also highly desirable to understand how and why the categorization process takes place. In this paper, we demonstrate that such understanding can be achieved by tracing the classification decision back to individual words using layer-wise relevance propagation (LRP), a recently developed technique for explaining predictions of complex non-linear classifiers. We train two word-based ML models, a convolutional neural network (CNN) and a bag-of-words SVM classifier, on a topic categorization task and adapt the LRP method to decompose the predictions of these models onto words. Resulting scores indicate how much individual words contribute to the overall classification decision. This enables one to distill relevant information from text documents without an explicit semantic information extraction step. We further use the word-wise relevance scores for generating novel vector-based document representations which capture semantic information. Based on these document vectors, we introduce a measure of model explanatory power and show that, although the SVM and CNN models perform similarly in terms of classification accuracy, the latter exhibits a higher level of explainability which makes it more comprehensible for humans and potentially more useful for other applications.

284 citations


Proceedings Article
26 Jun 2017
TL;DR: This work proposes to use continuous-filter convolutional layers to be able to model local correlations without requiring the data to lie on a grid, and obtains a joint model for the total energy and interatomic forces that follows fundamental quantum-chemical principles.
Abstract: Deep learning has the potential to revolutionize quantum chemistry as it is ideally suited to learn representations for structured data and speed up the exploration of chemical space. While convolutional neural networks have proven to be the first choice for images, audio and video data, the atoms in molecules are not restricted to a grid. Instead, their precise locations contain essential physical information, that would get lost if discretized. Thus, we propose to use continuous-filter convolutional layers to be able to model local correlations without requiring the data to lie on a grid. We apply those layers in SchNet: a novel deep learning architecture modeling quantum interactions in molecules. We obtain a joint model for the total energy and interatomic forces that follows fundamental quantum-chemical principles. Our architecture achieves state-of-the-art performance for benchmarks of equilibrium molecules and molecular dynamics trajectories. Finally, we introduce a more challenging benchmark with chemical and structural variations that suggests the path for further work.

267 citations


Posted Content
TL;DR: This work applies a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs to a word-based bi-directional LSTM model on a five-class sentiment prediction task and evaluates the resulting LRP relevances both qualitatively and quantitatively.
Abstract: Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for understanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the resulting LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.

240 citations


Journal ArticleDOI
22 Feb 2017-PLOS ONE
TL;DR: A convolutional neural network (CNN) is contributed for the robust classification of a steady-state visual evoked potentials (SSVEPs) paradigm for a brain-controlled exoskeleton under ambulatory conditions in which numerous artifacts may deteriorate decoding.
Abstract: The robust analysis of neural signals is a challenging problem. Here, we contribute a convolutional neural network (CNN) for the robust classification of a steady-state visual evoked potentials (SSVEPs) paradigm. We measure electroencephalogram (EEG)-based SSVEPs for a brain-controlled exoskeleton under ambulatory conditions in which numerous artifacts may deteriorate decoding. The proposed CNN is shown to achieve reliable performance under these challenging conditions. To validate the proposed method, we have acquired an SSVEP dataset under two conditions: 1) a static environment, in a standing position while fixated into a lower-limb exoskeleton and 2) an ambulatory environment, walking along a test course wearing the exoskeleton (here, artifacts are most challenging). The proposed CNN is compared to a standard neural network and other state-of-the-art methods for SSVEP decoding (i.e., a canonical correlation analysis (CCA)-based classifier, a multivariate synchronization index (MSI), a CCA combined with k-nearest neighbors (CCA-KNN) classifier) in an offline analysis. We found highly encouraging SSVEP decoding results for the CNN architecture, surpassing those of other methods with classification rates of 99.28% and 94.03% in the static and ambulatory conditions, respectively. A subsequent analysis inspects the representation found by the CNN at each layer and can thus contribute to a better understanding of the CNN’s robust, accurate decoding abilities.

237 citations


Proceedings ArticleDOI
22 Jun 2017
TL;DR: The authors proposed a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs and applied it to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluated the result- ing LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.
Abstract: Recently, a technique called Layer-wise Relevance Propagation (LRP) was shown to deliver insightful explanations in the form of input space relevances for un- derstanding feed-forward neural network classification decisions. In the present work, we extend the usage of LRP to recurrent neural networks. We propose a specific propagation rule applicable to multiplicative connections as they arise in recurrent network architectures such as LSTMs and GRUs. We apply our technique to a word-based bi-directional LSTM model on a five-class sentiment prediction task, and evaluate the result- ing LRP relevances both qualitatively and quantitatively, obtaining better results than a gradient-based related method which was used in previous work.

Posted Content
TL;DR: SchNet as mentioned in this paper uses continuous-filter convolutional layers to model local correlations without requiring the data to lie on a grid, and achieves state-of-the-art performance for benchmarks of equilibrium molecules and molecular dynamics trajectories.
Abstract: Deep learning has the potential to revolutionize quantum chemistry as it is ideally suited to learn representations for structured data and speed up the exploration of chemical space. While convolutional neural networks have proven to be the first choice for images, audio and video data, the atoms in molecules are not restricted to a grid. Instead, their precise locations contain essential physical information, that would get lost if discretized. Thus, we propose to use continuous-filter convolutional layers to be able to model local correlations without requiring the data to lie on a grid. We apply those layers in SchNet: a novel deep learning architecture modeling quantum interactions in molecules. We obtain a joint model for the total energy and interatomic forces that follows fundamental quantum-chemical principles. This includes rotationally invariant energy predictions and a smooth, differentiable potential energy surface. Our architecture achieves state-of-the-art performance for benchmarks of equilibrium molecules and molecular dynamics trajectories. Finally, we introduce a more challenging benchmark with chemical and structural variations that suggests the path for further work.

Posted Content
TL;DR: In this article, the authors argue that explanation methods for neural networks should work reliably in the limit of simplicity, the linear models, and propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.
Abstract: DeConvNet, Guided BackProp, LRP, were invented to better understand deep neural networks. We show that these methods do not produce the theoretically correct explanation for a linear model. Yet they are used on multi-layer networks with millions of parameters. This is a cause for concern since linear models are simple neural networks. We argue that explanation methods for neural nets should work reliably in the limit of simplicity, the linear models. Based on our analysis of linear models we propose a generalization that yields two explanation techniques (PatternNet and PatternAttribution) that are theoretically sound for linear models and produce improved explanations for deep networks.

28 Aug 2017
TL;DR: Two approaches to explaining predictions of deep learning models are presented, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables.
Abstract: With the availability of large databases and recent improvements in deep learning methodology, the performance of AI systems is reaching, or even exceeding, the human level on an increasing number of complex tasks. Impressive examples of this development can be found in domains such as image classification, sentiment analysis, speech understanding or strategic game playing. However, because of their nested non-linear structure, these highly successful machine learning and artificial intelligence models are usually applied in a black-box manner, i.e. no information is provided about what exactly makes them arrive at their predictions. Since this lack of transparency can be a major drawback, e.g. in medical applications, the development of methods for visualizing, explaining and interpreting deep learning models has recently attracted increasing attention. This paper summarizes recent developments in this field and makes a plea for more interpretability in artificial intelligence. Furthermore, it presents two approaches to explaining predictions of deep learning models, one method which computes the sensitivity of the prediction with respect to changes in the input and one approach which meaningfully decomposes the decision in terms of the input variables. These methods are evaluated on three classification tasks.

Journal ArticleDOI
01 Oct 2017
TL;DR: An open access dataset for hybrid brain–computer interfaces (BCIs) using electroencephalography (EEG) and near-infrared spectroscopy (NIRS) is provided.
Abstract: We provide an open access dataset for hybrid brain–computer interfaces (BCIs) using electroencephalography (EEG) and near-infrared spectroscopy (NIRS). For this, we conducted two BCI experiments (left versus right hand motor imagery; mental arithmetic versus resting state). The dataset was validated using baseline signal analysis methods, with which classification performance was evaluated for each modality and a combination of both modalities. As already shown in previous literature, the capability of discriminating different mental states can be enhanced by using a hybrid approach, when comparing to single modality analyses. This makes the provided data highly suitable for hybrid BCI investigations. Since our open access dataset also comprises motion artifacts and physiological data, we expect that it can be used in a wide range of future validation approaches in multimodal BCI research.

Journal ArticleDOI
TL;DR: The design and evaluation of a mobile, modular, multimodal biosignal acquisition architecture (M3BA) based on a high-performance analog front-end optimized for biopotential acquisition, a microcontroller, and openNIRS technology is presented.
Abstract: Objective: For the further development of the fields of telemedicine, neurotechnology, and brain–computer interfaces, advances in hybrid multimodal signal acquisition and processing technology are invaluable. Currently, there are no commonly available hybrid devices combining bioelectrical and biooptical neurophysiological measurements [here electroencephalography (EEG) and functional near-infrared spectroscopy (NIRS)]. Our objective was to design such an instrument in a miniaturized, customizable, and wireless form. Methods: We present here the design and evaluation of a mobile, modular, multimodal biosignal acquisition architecture (M3BA) based on a high-performance analog front-end optimized for biopotential acquisition, a microcontroller, and our openNIRS technology. Results: The designed M3BA modules are very small configurable high-precision and low-noise modules (EEG input referred noise @ $500\,\text{SPS}$ $\text{1.39}\,\mu \text{V}_{\rm pp}$ , NIRS noise equivalent power $\text{NEP}_{\text{750}\;\text{nm}}=\text{5.92}\,\text{pW}_{\rm pp}$ , and $\text{NEP}_{\text{850}\;\text{nm}}=\text{4.77}\,\text{pW}_{\rm pp}$ ) with full input linearity, Bluetooth, 3-D accelerometer, and low power consumption. They support flexible user-specified biopotential reference setups and wireless body area/sensor network scenarios. Conclusion: Performance characterization and in-vivo experiments confirmed functionality and quality of the designed architecture. Significance: Telemedicine and assistive neurotechnology scenarios will increasingly include wearable multimodal sensors in the future. The M3BA architecture can significantly facilitate future designs for research in these and other fields that rely on customized mobile hybrid biosignal acquisition hardware.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This work compares four popular neural network architectures, studies the effect of pretraining, evaluates the robustness of the considered alignment preprocessings via cross-method test set swapping and intuitively visualizes the model's prediction strategies in given preprocessing conditions using the recent Layer-wise Relevance Propagation (LRP) algorithm.
Abstract: Recently, deep neural networks have demonstrated excellent performances in recognizing the age and gender on human face images. However, these models were applied in a black-box manner with no information provided about which facial features are actually used for prediction and how these features depend on image preprocessing, model initialization and architecture choice. We present a study investigating these different effects. In detail, our work compares four popular neural network architectures, studies the effect of pretraining, evaluates the robustness of the considered alignment preprocessings via cross-method test set swapping and intuitively visualizes the model's prediction strategies in given preprocessing conditions using the recent Layer-wise Relevance Propagation (LRP) algorithm. Our evaluations on the challenging Adience benchmark show that suitable parameter initialization leads to a holistic perception of the input, compensating artefactual data representations. With a combination of simple preprocessing steps, we reach state of the art performance in gender recognition.

Journal ArticleDOI
TL;DR: A classifier chain model for multiclass classification (CCMC) is developed to transfer class information between classifiers and an easy-to-hard learning paradigm for multi-label classification is proposed to automatically identify easy and hard labels and then use the predictions from simpler classes to help solve harder classes.
Abstract: Many applications, such as human action recognition and object detection, can be formulated as a multiclass classification problem. One-vs-rest (OVR) is one of the most widely used approaches for multiclass classification due to its simplicity and excellent performance. However, many confusing classes in such applications will degrade its results. For example, hand clap and boxing are two confusing actions. Hand clap is easily misclassified as boxing, and vice versa. Therefore, precisely classifying confusing classes remains a challenging task. To obtain better performance for multiclass classifications that have confusing classes, we first develop a classifier chain model for multiclass classification (CCMC) to transfer class information between classifiers. Then, based on an analysis of our proposed model, we propose an easy-to-hard learning paradigm for multiclass classification to automatically identify easy and hard classes and then use the predictions from simpler classes to help solve harder classes. Similar to CCMC, the classifier chain (CC) model is also proposed by Read et al. (2009) to capture the label dependency for multi-label classification. However, CC does not consider the order of di_culty of the labels and achieves degenerated performance when there are many confusing labels. Therefore, it is non-trivial to learn the appropriate label order for CC. Motivated by our analysis for CCMC, we also propose the easy-to-hard learning paradigm for multi-label classification to automatically identify easy and hard labels, and then use the predictions from simpler labels to help solve harder labels. We also demonstrate that our proposed strategy can be successfully applied to a wide range of applications, such as ordinal classification and relationship prediction. Extensive empirical studies validate our analysis and the efiectiveness of our proposed easy-to-hard learning strategies.

Posted Content
TL;DR: An improved method is proposed that may serve as an extension for existing back-projection and decomposition techniques and formulate a quality criterion for explanation methods.
Abstract: Deep learning has significantly advanced the state of the art in machine learning However, neural networks are often considered black boxes There is significant effort to develop techniques that explain a classifier's decisions Although some of these approaches have resulted in compelling visualisations, there is a lack of theory of what is actually explained Here we present an analysis of these methods and formulate a quality criterion for explanation methods On this ground, we propose an improved method that may serve as an extension for existing back-projection and decomposition techniques

Journal ArticleDOI
02 Nov 2017-PLOS ONE
TL;DR: It was observed that no benefit was attained when using a control model trained with multiple position data in terms of arm position change, and the degree of electrode shift caused by donning/doffing was not severely associated with the level of performance loss under practical conditions.
Abstract: There are some practical factors, such as arm position change and donning/doffing, which prevent robust myoelectric control. The objective of this study is to precisely characterize the impacts of the two representative factors on myoelectric controllability in practical control situations, thereby providing useful references that can be potentially used to find better solutions for clinically reliable myoelectric control. To this end, a real-time target acquisition task was performed by fourteen subjects including one individual with congenital upper-limb deficiency, where the impacts of arm position change, donning/doffing and a combination of both factors on control performance was systematically evaluated. The changes in online performance were examined with seven different performance metrics to comprehensively evaluate various aspects of myoelectric controllability. As a result, arm position change significantly affects offline prediction accuracy, but not online control performance due to real-time feedback, thereby showing no significant correlation between offline and online performance. Donning/doffing was still problematic in online control conditions. It was further observed that no benefit was attained when using a control model trained with multiple position data in terms of arm position change, and the degree of electrode shift caused by donning/doffing was not severely associated with the degree of performance loss under practical conditions (around 1 cm electrode shift). Since this study is the first to concurrently investigate the impacts of arm position change and donning/doffing in practical myoelectric control situations, all findings of this study provide new insights into robust myoelectric control with respect to arm position change and donning/doffing.

Journal ArticleDOI
13 Apr 2017-PLOS ONE
TL;DR: The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder and makes it an ideal solution to avoid tedious calibration sessions.
Abstract: Objective Using traditional approaches, a brain-computer interface (BCI) requires the collection of calibration data for new subjects prior to online use. Calibration time can be reduced or eliminated e.g., by subject-to-subject transfer of a pre-trained classifier or unsupervised adaptive classification methods which learn from scratch and adapt over time. While such heuristics work well in practice, none of them can provide theoretical guarantees. Our objective is to modify an event-related potential (ERP) paradigm to work in unison with the machine learning decoder, and thus to achieve a reliable unsupervised calibrationless decoding with a guarantee to recover the true class means. Method We introduce learning from label proportions (LLP) to the BCI community as a new unsupervised, and easy-to-implement classification approach for ERP-based BCIs. The LLP estimates the mean target and non-target responses based on known proportions of these two classes in different groups of the data. We present a visual ERP speller to meet the requirements of LLP. For evaluation, we ran simulations on artificially created data sets and conducted an online BCI study with 13 subjects performing a copy-spelling task. Results Theoretical considerations show that LLP is guaranteed to minimize the loss function similar to a corresponding supervised classifier. LLP performed well in simulations and in the online application, where 84.5% of characters were spelled correctly on average without prior calibration. Significance The continuously adapting LLP classifier is the first unsupervised decoder for ERP BCIs guaranteed to find the optimal decoder. This makes it an ideal solution to avoid tedious calibration sessions. Additionally, LLP works on complementary principles compared to existing unsupervised methods, opening the door for their further enhancement when combined with LLP.

Journal ArticleDOI
TL;DR: This is the first generic theoretical formulation of the co-adaptive learning problem and gives a simple example of two interacting linear learning systems, a human and a machine, where the two learning agents are coupled by a joint loss function.
Abstract: Objective. We present the first generic theoretical formulation of the co-adaptive learning problem and give a simple example of two interacting linear learning systems, a human and a machine. Approach. After the description of the training protocol of the two learning systems, we define a simple linear model where the two learning agents are coupled by a joint loss function. The simplicity of the model allows us to find learning rules for both human and machine that permit computing theoretical simulations. Main results. As seen in simulations, an astonishingly rich structure is found for this eco-system of learners. While the co-adaptive learners are shown to easily stall or get out of sync for some parameter settings, we can find a broad sweet spot of parameters where the learning system can converge quickly. It is defined by mid-range learning rates on the side of the learning machine, quite independent of the human in the loop. Despite its simplistic assumptions the theoretical study could be confirmed by a real-world experimental study where human and machine co-adapt to perform cursor control under distortion. Also in this practical setting the mid-range learning rates yield the best performance and behavioral ratings. Significance. The results presented in this mathematical study allow the computation of simple theoretical simulations and performance of real experimental paradigms. Additionally, they are nicely in line with previous results in the BCI literature.

Proceedings ArticleDOI
05 Mar 2017
TL;DR: This paper presents a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification.
Abstract: Compressed domain human action recognition algorithms are extremely efficient, because they only require a partial decoding of the video bit stream. However, the question what exactly makes these algorithms decide for a particular action is still a mystery. In this paper, we present a general method, Layer-wise Relevance Propagation (LRP), to understand and interpret action recognition algorithms and apply it to a state-of-the-art compressed domain method based on Fisher vector encoding and SVM classification. By using LRP, the classifiers decisions are propagated back every step in the action recognition pipeline until the input is reached. This methodology allows to identify where and when the important (from the classifier's perspective) action happens in the video. To our knowledge, this is the first work to interpret a compressed domain action recognition algorithm. We evaluate our method on the HMDB51 dataset and show that in many cases a few significant frames contribute most towards the prediction of the video to a particular class.

Journal ArticleDOI
TL;DR: It is shown that the hybrid system outperforms the unimodal EEG and NIRS systems by 6.2% and 2.5%, respectively, and has the potential to be used in real-life situations, such as in neurorehabilitation.
Abstract: We realized a compact hybrid brain-computer interface (BCI) system by integrating a portable near-infrared spectroscopy (NIRS) device with an economical electroencephalography (EEG) system. The NIRS array was located on the subjects’ forehead, covering the prefrontal area. The EEG electrodes were distributed over the frontal, motor/temporal, and parietal areas. The experimental paradigm involved a Stroop word-picture matching test in combination with mental arithmetic (MA) and baseline (BL) tasks, in which the subjects were asked to perform either MA or BL in response to congruent or incongruent conditions, respectively. We compared the classification accuracies of each of the modalities (NIRS or EEG) with that of the hybrid system. We showed that the hybrid system outperforms the unimodal EEG and NIRS systems by 6.2% and 2.5%, respectively. Since the proposed hybrid system is based on portable platforms, it is not confined to a laboratory environment and has the potential to be used in real-life situations, such as in neurorehabilitation.

Journal ArticleDOI
TL;DR: The results suggest that the vertical disparity in 3D-3 condition decreases the perception of depth compared to other 3D conditions and the amplitude of P1 component can be used as a discriminative feature.
Abstract: OBJECTIVE Neurophysiological correlates of vertical disparity in 3D images are studied in an objective approach using EEG technique These disparities are known to negatively affect the quality of experience and to cause visual discomfort in stereoscopic visualizations APPROACH We have presented four conditions to subjects: one in 2D and three conditions in 3D, one without vertical disparity and two with different vertical disparity levels Event related potentials (ERPs) are measured for each condition and the differences between ERP components are studied Analysis is also performed on the induced potentials in the time frequency domain MAIN RESULTS Results show that there is a significant increase in the amplitude of P1 components in 3D conditions in comparison to 2D These results are consistent with previous studies which have shown that P1 amplitude increases due to the depth perception in 3D compared to 2D However the amplitude is significantly smaller for maximum vertical disparity (3D-3) in comparison to 3D with no vertical disparity Our results therefore suggest that the vertical disparity in 3D-3 condition decreases the perception of depth compared to other 3D conditions and the amplitude of P1 component can be used as a discriminative feature SIGNIFICANCE The results show that the P1 component increases in amplitude due to the depth perception in the 3D stimuli compared to the 2D stimulus On the other hand the vertical disparity in the stereoscopic images is studied here We suggest that the amplitude of P1 component is modulated with this parameter and decreases due to the decrease in the perception of depth

Posted Content
TL;DR: In this paper, the authors compare four popular neural network architectures, studies the effect of pretraining, evaluates the robustness of the considered alignment preprocessings via cross-method test set swapping and intuitively visualizes the model's prediction strategies in given preprocessing conditions using the recent Layer-wise Relevance Propagation (LRP) algorithm.
Abstract: Recently, deep neural networks have demonstrated excellent performances in recognizing the age and gender on human face images. However, these models were applied in a black-box manner with no information provided about which facial features are actually used for prediction and how these features depend on image preprocessing, model initialization and architecture choice. We present a study investigating these different effects. In detail, our work compares four popular neural network architectures, studies the effect of pretraining, evaluates the robustness of the considered alignment preprocessings via cross-method test set swapping and intuitively visualizes the model's prediction strategies in given preprocessing conditions using the recent Layer-wise Relevance Propagation (LRP) algorithm. Our evaluations on the challenging Adience benchmark show that suitable parameter initialization leads to a holistic perception of the input, compensating artefactual data representations. With a combination of simple preprocessing steps, we reach state of the art performance in gender recognition.

Journal ArticleDOI
TL;DR: A method is introduced to optimally combine these two unsupervised decoding methods, letting one method's strengths compensate for the weaknesses of the other and vice versa, which shows less dependency on random initialization of model parameters and is consequently more reliable.
Abstract: Objective. Brain-computer interfaces (BCI) based on event-related potentials (ERP) incorporate a decoder to classify recorded brain signals and subsequently select a control signal that drives a computer application. Standard supervised BCI decoders require a tedious calibration procedure prior to every session. Several unsupervised classification methods have been proposed that tune the decoder during actual use and as such omit this calibration. Each of these methods has its own strengths and weaknesses. Our aim is to improve overall accuracy of ERP-based BCIs without calibration. Approach. We consider two approaches for unsupervised classification of ERP signals. Learning from label proportions (LLP) was recently shown to be guaranteed to converge to a supervised decoder when enough data is available. In contrast, the formerly proposed expectation maximization (EM) based decoding for ERP-BCI does not have this guarantee. However, while this decoder has high variance due to random initialization of its parameters, it obtains a higher accuracy faster than LLP when the initialization is good. We introduce a method to optimally combine these two unsupervised decoding methods, letting one method's strengths compensate for the weaknesses of the other and vice versa. The new method is compared to the aforementioned methods in a resimulation of an experiment with a visual speller. Main Results. Analysis of the experimental results shows that the new method exceeds the performance of the previous unsupervised classification approaches in terms of ERP classification accuracy and symbol selection accuracy during the spelling experiment. Furthermore, the method shows less dependency on random initialization of model parameters and is consequently more reliable. Significance. Improving the accuracy and subsequent reliability of calibrationless BCIs makes these systems more appealing for frequent use.

Proceedings Article
01 Jan 2017
TL;DR: This work contrasts random features of approximated kernel machines with learned features of neural networks, and presents basis adaptation schemes that allow for a more compact representation, while retaining the generalization properties of kernel machines.
Abstract: Kernel machines as well as neural networks possess universal function approximation properties. Nevertheless in practice their ways of choosing the appropriate function class differ. Specifically neural networks learn a representation by adapting their basis functions to the data and the task at hand, while kernel methods typically use a basis that is not adapted during training. In this work, we contrast random features of approximated kernel machines with learned features of neural networks. Our analysis reveals how these random and adaptive basis functions affect the quality of learning. Furthermore, we present basis adaptation schemes that allow for a more compact representation, while retaining the generalization properties of kernel machines.

Proceedings ArticleDOI
22 May 2017
TL;DR: This work treats the encoding procedure as a decision process in time and makes it amenable to reinforcement learning, which results in a trade-off between RD-performance and computational complexity controlled by a single parameter.
Abstract: In todays video compression systems, the encoder typically follows an optimization procedure to find a compressed representation of the video signal. While primary optimization criteria are bit rate and image distortion, low complexity of this procedure may also be of importance in some applications, making complexity a third objective. We approach this problem by treating the encoding procedure as a decision process in time and make it amenable to reinforcement learning. Our learning algorithm computes a strategy in a compact functional representation, which is then employed in the video encoder to control its search. By including measured execution time into the reinforcement signal with a lagrangian weight, we realize a trade-off between RD-performance and computational complexity controlled by a single parameter. Using the reference software test model (HM) of the HEVC video coding standard, we show that over half the encoding time can be saved at the same RD-performance.

Journal ArticleDOI
TL;DR: This paper introduces two graph-based preprocessing techniques, which adapt the original TCRFR for extremely weakly supervised scenarios, and outperforms the previous automatic estimation methods on synthetic data and provides a comparable result to the manual labored, time-consuming geostatistics approach on real data.