Showing papers by "Klaus-Robert Müller published in 2016"

PDF

Open Access

By-passing the Kohn-Sham equations with machine learning

[...]

Felix Brockherde, Li Li, Kieron Burke, Klaus-Robert Müller

01 Jan 2016

376 citations

Journal Article•DOI•

Deep Neural Networks for No-Reference and Full-Reference Image Quality Assessment

[...]

Sebastian Bosse¹, Dominique Maniry¹, Klaus-Robert Müller, Thomas Wiegand¹, Wojciech Samek¹ - Show less +1 more•Institutions (1)

Heinrich Hertz Institute¹

06 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a deep neural network-based approach to image quality assessment (IQA) is presented, which is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression.

...read moreread less

Abstract: We present a deep neural network-based approach to image quality assessment (IQA). The network is trained end-to-end and comprises ten convolutional layers and five pooling layers for feature extraction, and two fully connected layers for regression, which makes it significantly deeper than related IQA models. Unique features of the proposed architecture are that: 1) with slight adaptations it can be used in a no-reference (NR) as well as in a full-reference (FR) IQA setting and 2) it allows for joint learning of local quality and local weights, i.e., relative importance of local quality to the global quality estimate, in an unified framework. Our approach is purely data-driven and does not rely on hand-crafted features or other types of prior domain knowledge about the human visual system or image statistics. We evaluate the proposed approach on the LIVE, CISQ, and TID2013 databases as well as the LIVE In the wild image quality challenge database and show superior performance to state-of-the-art NR and FR IQA methods. Finally, cross-database evaluation shows a high ability to generalize between different databases, indicating a high robustness of the learned features.

...read moreread less

365 citations

Journal Article•DOI•

By-passing the Kohn-Sham equations with machine learning

[...]

Felix Brockherde, Leslie Vogt, Li Li, Mark E. Tuckerman, Kieron Burke, Klaus-Robert Müller - Show less +2 more

09 Sep 2016-arXiv: Computational Physics

TL;DR: In this paper, the density potential and energy density maps for test systems and various molecules are learned via examples, bypassing the need to solve the Kohn-Sham equations.

...read moreread less

Abstract: Last year, at least 30,000 scientific papers used the Kohn-Sham scheme of density functional theory to solve electronic structure problems in a wide variety of scientific fields, ranging from materials science to biochemistry to astrophysics. Machine learning holds the promise of learning the kinetic energy functional via examples, by-passing the need to solve the Kohn-Sham equations. This should yield substantial savings in computer time, allowing either larger systems or longer time-scales to be tackled, but attempts to machine-learn this functional have been limited by the need to find its derivative. The present work overcomes this difficulty by directly learning the density-potential and energy-density maps for test systems and various molecules. Both improved accuracy and lower computational cost with this method are demonstrated by reproducing DFT energies for a range of molecular geometries generated during molecular dynamics simulations. Moreover, the methodology could be applied directly to quantum chemical calculations, allowing construction of density functionals of quantum-chemical accuracy.

...read moreread less

336 citations

Book Chapter•DOI•

Layer-Wise Relevance Propagation for Neural Networks with Local Renormalization Layers

[...]

Alexander Binder¹, Grégoire Montavon², Sebastian Lapuschkin³, Klaus-Robert Müller⁴, Klaus-Robert Müller², Wojciech Samek³ - Show less +2 more•Institutions (4)

Singapore University of Technology and Design¹, Technical University of Berlin², Heinrich Hertz Institute³, Korea University⁴

06 Sep 2016

TL;DR: This paper proposed an approach to extend layer-wise relevance propagation to neural networks with local renormalization layers, which is a very common product-type nonlinearity in convolutional neural networks.

...read moreread less

Abstract: Layer-wise relevance propagation is a framework which allows to decompose the prediction of a deep neural network computed over a sample, e.g. an image, down to relevance scores for the single input dimensions of the sample such as subpixels of an image. While this approach can be applied directly to generalized linear mappings, product type non-linearities are not covered. This paper proposes an approach to extend layer-wise relevance propagation to neural networks with local renormalization layers, which is a very common product-type non-linearity in convolutional neural networks. We evaluate the proposed method for local renormalization layers on the CIFAR-10, Imagenet and MIT Places datasets.

...read moreread less

321 citations

Journal Article•DOI•

Interpretable deep neural networks for single-trial EEG classification

[...]

Irene Sturm, Sebastian Lapuschkin¹, Wojciech Samek¹, Klaus-Robert Müller²•Institutions (2)

Heinrich Hertz Institute¹, Korea University²

01 Dec 2016-Journal of Neuroscience Methods

TL;DR: In this article, layer-wise relevance propagation (LRP) has been introduced as a novel method to explain individual network decisions, which can reveal neurophysiologically plausible patterns, resembling CSP-derived scalp maps.

...read moreread less

294 citations

Journal Article•DOI•

The Berlin Brain-Computer Interface: Progress Beyond Communication and Control.

[...]

Benjamin Blankertz¹, Laura Acqualagna¹, Sven Dähne¹, Stefan Haufe¹, Matthias Schultze-Kraft¹, Irene Sturm¹, Marija Uscumlic¹, Markus Wenzel¹, Gabriel Curio², Klaus-Robert Müller³, Klaus-Robert Müller¹ - Show less +7 more•Institutions (3)

Technical University of Berlin¹, Charité², Korea University³

21 Nov 2016-Frontiers in Neuroscience

TL;DR: The reasons why, in some of the prospective application domains, considerable effort is still required to make the systems ready to deal with the full complexity of the real world are discussed.

...read moreread less

Abstract: The combined effect of fundamental results about neurocognitive processes and advancements in decoding mental states from ongoing brain signals has brought forth a whole range of potential neurotechnological applications. In this article, we review our developments in this area and put them into perspective. These examples cover a wide range of maturity levels with respect to their applicability. While we assume we are still a long way away from integrating Brain-Computer Interface (BCI) technology in general interaction with computers, or from implementing neurotechnological measures in safety-critical workplaces, results have already now been obtained involving a BCI as research tool. In this article, we discuss the reasons why, in some of the prospective application domains, considerable effort is still required to make the systems ready to deal with the full complexity of the real world.

...read moreread less

190 citations

Proceedings Article•DOI•

Analyzing Classifiers: Fisher Vectors and Deep Neural Networks

[...]

Sebastian Lapuschkin¹, Alexander Binder², Grégoire Montavon, Klaus-Robert Müller, Wojciech Samek¹ - Show less +1 more•Institutions (2)

Heinrich Hertz Institute¹, Singapore University of Technology and Design²

01 Jun 2016

TL;DR: This paper extends the LRP framework for Layer-wise Relevance Propagation for Fisher vector classifiers and uses it as analysis tool to quantify the importance of context for classification, qualitatively compare DNNs against FV classifiers in terms of important image regions and detect potential flaws and biases in data.

...read moreread less

Abstract: Fisher vector (FV) classifiers and Deep Neural Networks (DNNs) are popular and successful algorithms for solving image classification problems. However, both are generally considered 'black box' predictors as the non-linear transformations involved have so far prevented transparent and interpretable reasoning. Recently, a principled technique, Layer-wise Relevance Propagation (LRP), has been developed in order to better comprehend the inherent structured reasoning of complex nonlinear classification models such as Bag of Feature models or DNNs. In this paper we (1) extend the LRP framework also for Fisher vector classifiers and then use it as analysis tool to (2) quantify the importance of context for classification, (3) qualitatively compare DNNs against FV classifiers in terms of important image regions and (4) detect potential flaws and biases in data. All experiments are performed on the PASCAL VOC 2007 and ILSVRC 2012 data sets.

...read moreread less

153 citations

Journal Article•DOI•

Understanding machine‐learned density functionals

[...]

Li Li¹, John C. Snyder², John C. Snyder³, Isabelle M. Pelaschier¹, Isabelle M. Pelaschier⁴, Jessica Huang¹, Uma Naresh Niranjan¹, Paul Duncan¹, Matthias Rupp⁵, Klaus-Robert Müller², Klaus-Robert Müller⁶, Kieron Burke¹ - Show less +8 more•Institutions (6)

University of California, Berkeley¹, Technical University of Berlin², Max Planck Society³, Vanderbilt University⁴, Fritz Haber Institute of the Max Planck Society⁵, Korea University⁶

05 Jun 2016-International Journal of Quantum Chemistry

TL;DR: In this paper, Kernel ridge regression is used to approximate the kinetic energy of non-interacting fermions in a one dimensional box as a functional of their density, and a projected gradient descent algorithm is derived using local principal component analysis.

...read moreread less

Abstract: Machine learning (ML) is an increasingly popular statistical tool for analyzing either measured or calculated data sets. Here, we explore its application to a well-defined physics problem, investigating issues of how the underlying physics is handled by ML, and how self-consistent solutions can be found by limiting the domain in which ML is applied. The particular problem is how to find accurate approximate density functionals for the kinetic energy (KE) of noninteracting electrons. Kernel ridge regression is used to approximate the KE of non-interacting fermions in a one dimensional box as a functional of their density. The properties of different kernels and methods of cross-validation are explored, reproducing the physics faithfully in some cases, but not others. We also address how self-consistency can be achieved with information on only a limited electronic density domain. Accurate constrained optimal densities are found via a modified Euler-Lagrange constrained minimization of the machine-learned total energy, despite the poor quality of its functional derivative. A projected gradient descent algorithm is derived using local principal component analysis. Additionally, a sparse grid representation of the density can be used without degrading the performance of the methods. The implications for machine-learned density functional approximations are discussed. © 2015 Wiley Periodicals, Inc.

...read moreread less

143 citations

Journal Article•DOI•

Improving the Robustness of Myoelectric Pattern Recognition for Upper Limb Prostheses by Covariate Shift Adaptation

[...]

Marina M.-C. Vidovic¹, Han-Jeong Hwang¹, Sebastian Amsüss², Janne M. Hahne², Dario Farina², Klaus-Robert Müller¹ - Show less +2 more•Institutions (2)

Technical University of Berlin¹, University of Göttingen²

01 Sep 2016

TL;DR: The proposed supervised adaptation methods can contribute to improve robustness of myoelectric pattern recognition methods in daily life applications through the use of adapted classifier using a small calibration set only.

...read moreread less

Abstract: Fundamental changes over time of surface EMG signal characteristics are a challenge for myocontrol algorithms controlling prosthetic devices. These changes are generally caused by electrode shifts after donning and doffing, sweating, additional weight or varying arm positions, which results in a change of the signal distribution—a scenario often referred to as covariate shift. A substantial decrease in classification accuracy due to these factors hinders the possibility to directly translate EMG signals into accurate myoelectric control patterns outside laboratory conditions. To overcome this limitation, we propose the use of supervised adaptation methods. The approach is based on adapting a trained classifier using a small calibration set only, which incorporates the relevant aspects of the nonstationarities, but requires only less than 1 min of data recording. The method was tested first through an offline analysis on signals acquired across 5 days from seven able-bodied individuals and four amputees. Moreover, we also conducted a three day online experiment on eight able-bodied individuals and one amputee, assessing user performance and user-ratings of the controllability. Across different testing days, both offline and online performance improved significantly when shrinking the training model parameters by a given estimator towards the calibration set parameters. In the offline data analysis, the classification accuracy remained above 92% over five days with the proposed approach, whereas it decreased to 75% without adaptation. Similarly, in the online study, with the proposed approach the performance increased by 25% compared to a test without adaptation. These results indicate that the proposed methodology can contribute to improve robustness of myoelectric pattern recognition methods in daily life applications.

...read moreread less

143 citations

Book Chapter•DOI•

Layer-Wise Relevance Propagation for Deep Neural Network Architectures

[...]

Alexander Binder¹, Sebastian Bach², Grégoire Montavon³, Klaus-Robert Müller³, Wojciech Samek² - Show less +1 more•Institutions (3)

Singapore University of Technology and Design¹, Heinrich Hertz Institute², Technical University of Berlin³

01 Jan 2016

TL;DR: This work presents the application of layer-wise relevance propagation to several deep neural networks such as the BVLC reference neural net and googlenet trained on ImageNet and MIT Places datasets.

...read moreread less

Abstract: We present the application of layer-wise relevance propagation to several deep neural networks such as the BVLC reference neural net and googlenet trained on ImageNet and MIT Places datasets. Layer-wise relevance propagation is a method to compute scores for image pixels and image regions denoting the impact of the particular image region on the prediction of the classifier for one particular test image. We demonstrate the impact of different parameter settings on the resulting explanation.

...read moreread less

132 citations

Posted Content•

Investigating the influence of noise and distractors on the interpretation of neural networks

[...]

Pieter-Jan Kindermans, Kristof T. Schütt, Klaus-Robert Müller, Sven Dähne

22 Nov 2016-arXiv: Machine Learning

TL;DR: This work shows how noise and distracting dimensions can influence the result of an explanation model and gives a new theoretical insights to aid selection of the most appropriate explanation model within the deep-Taylor decomposition framework.

...read moreread less

Abstract: Understanding neural networks is becoming increasingly important. Over the last few years different types of visualisation and explanation methods have been proposed. However, none of them explicitly considered the behaviour in the presence of noise and distracting elements. In this work, we will show how noise and distracting dimensions can influence the result of an explanation model. This gives a new theoretical insights to aid selection of the most appropriate explanation model within the deep-Taylor decomposition framework.

...read moreread less

Journal Article•DOI•

Effect of higher frequency on the classification of steady-state visual evoked potentials.

[...]

Dong-Ok Won¹, Han-Jeong Hwang², Sven Dähne², Klaus-Robert Müller¹, Klaus-Robert Müller², Seong-Whan Lee¹ - Show less +2 more•Institutions (2)

Korea University¹, Technical University of Berlin²

01 Feb 2016-Journal of Neural Engineering

TL;DR: The results suggest that the use of higher frequency visual stimuli is more beneficial for performance improvement and stability as time passes when developing practical SSVEP-based BCI applications.

...read moreread less

Abstract: Objective. Most existing brain–computer interface (BCI) designs based on steady-state visual evoked potentials (SSVEPs) primarily use low frequency visual stimuli (e.g., <20 Hz) to elicit relatively high SSVEP amplitudes. While low frequency stimuli could evoke photosensitivity-based epileptic seizures, high frequency stimuli generally show less visual fatigue and no stimulus-related seizures. The fundamental objective of this study was to investigate the effect of stimulation frequency and duty-cycle on the usability of an SSVEP-based BCI system. Approach. We developed an SSVEP-based BCI speller using multiple LEDs flickering with low frequencies (6–14.9 Hz) with a duty-cycle of 50%, or higher frequencies (26–34.7 Hz) with duty-cycles of 50%, 60%, and 70%. The four different experimental conditions were tested with 26 subjects in order to investigate the impact of stimulation frequency and duty-cycle on performance and visual fatigue, and evaluated with a questionnaire survey. Resting state alpha powers were utilized to interpret our results from the neurophysiological point of view. Main results. The stimulation method employing higher frequencies not only showed less visual fatigue, but it also showed higher and more stable classification performance compared to that employing relatively lower frequencies. Different duty-cycles in the higher frequency stimulation conditions did not significantly affect visual fatigue, but a duty-cycle of 50% was a better choice with respect to performance. The performance of the higher frequency stimulation method was also less susceptible to resting state alpha powers, while that of the lower frequency stimulation method was negatively correlated with alpha powers. Significance. These results suggest that the use of higher frequency visual stimuli is more beneficial for performance improvement and stability as time passes when developing practical SSVEP-based BCI applications.

...read moreread less

Proceedings Article•

Wasserstein Training of Restricted Boltzmann Machines

[...]

Grégoire Montavon¹, Klaus-Robert Müller, Marco Cuturi•Institutions (1)

Technical University of Berlin¹

01 Jan 2016

TL;DR: This work proposes a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known, and derives a gradient of that distance with respect to the model parameters from the Kullback-Leibler divergence.

...read moreread less

Abstract: Boltzmann machines are able to learn highly complex, multimodal, structured and multiscale real-world data distributions. Parameters of the model are usually learned by minimizing the Kullback-Leibler (KL) divergence from training samples to the learned model. We propose in this work a novel approach for Boltzmann machine training which assumes that a meaningful metric between observations is known. This metric between observations can then be used to define the Wasserstein distance between the distribution induced by the Boltzmann machine on the one hand, and that given by the training sample on the other hand. We derive a gradient of that distance with respect to the model parameters. Minimization of this new objective leads to generative models with different statistical properties. We demonstrate their practical potential on data completion and denoising, for which the metric between observations plays a crucial role.

...read moreread less

Journal Article•

The LRP toolbox for artificial neural networks

[...]

Sebastian Lapuschkin¹, Alexander Binder², Grégoire Montavon, Klaus-Robert Müller³, Wojciech Samek¹ - Show less +1 more•Institutions (3)

Heinrich Hertz Institute¹, National University of Singapore², Korea University³

01 Jan 2016-Journal of Machine Learning Research

TL;DR: The LRP Toolbox is provided, providing platform-agnostic implementations for explaining the predictions of pretrained state of the art Caffe networks and stand-alone implementations for fully connected Neural Network models.

...read moreread less

Abstract: The Layer-wise Relevance Propagation (LRP) algorithm explains a classifier's prediction specific to a given data point by attributing relevance scores to important components of the input by using the topology of the learned model itself. With the LRP Toolbox we provide platform-agnostic implementations for explaining the predictions of pretrained state of the art Caffe networks and stand-alone implementations for fully connected Neural Network models. The implementations for Matlab and python shall serve as a playing field to familiarize oneself with the LRP algorithm and are implemented with readability and transparency in mind. Models and data can be imported and exported using raw text formats, Matlab's .mat files and the .npy format for numpy or plain text.

...read moreread less

Proceedings Article•DOI•

Explaining Predictions of Non-Linear Classifiers in NLP

[...]

Leila Arras¹, Franziska Horn², Grégoire Montavon², Klaus-Robert Müller², Wojciech Samek¹ - Show less +1 more•Institutions (2)

Heinrich Hertz Institute¹, Technical University of Berlin²

01 Aug 2016

TL;DR: This paper applies layer-wise relevance propagation for the first time to natural language processing (NLP) and uses it to explain the predictions of a convolutional neural network trained on a topic categorization task.

...read moreread less

Abstract: Layer-wise relevance propagation (LRP) is a recently proposed technique for explaining predictions of complex non-linear classifiers in terms of input variables. In this paper, we apply LRP for the first time to natural language processing (NLP). More precisely, we use it to explain the predictions of a convolutional neural network (CNN) trained on a topic categorization task. Our analysis highlights which words are relevant for a specific prediction of the CNN. We compare our technique to standard sensitivity analysis, both qualitatively and quantitatively, using a “word deleting” perturbation experiment, a PCA analysis, and various visualizations. All experiments validate the suitability of LRP for explaining the CNN predictions, which is also in line with results reported in recent image classification studies.

...read moreread less

Posted Content•

Interpretable Deep Neural Networks for Single-Trial EEG Classification

[...]

Irene Sturm, Sebastian Bach, Wojciech Samek, Klaus-Robert Müller

27 Apr 2016-arXiv: Neural and Evolutionary Computing

TL;DR: It is demonstrated that DNN is a powerful non-linear tool for EEG analysis with LRP, a potential remedy for the lack of interpretability of DNNs that has limited their utility in neuroscientific applications.

...read moreread less

Abstract: Background: In cognitive neuroscience the potential of Deep Neural Networks (DNNs) for solving complex classification tasks is yet to be fully exploited. The most limiting factor is that DNNs as notorious 'black boxes' do not provide insight into neurophysiological phenomena underlying a decision. Layer-wise Relevance Propagation (LRP) has been introduced as a novel method to explain individual network decisions. New Method: We propose the application of DNNs with LRP for the first time for EEG data analysis. Through LRP the single-trial DNN decisions are transformed into heatmaps indicating each data point's relevance for the outcome of the decision. Results: DNN achieves classification accuracies comparable to those of CSP-LDA. In subjects with low performance subject-to-subject transfer of trained DNNs can improve the results. The single-trial LRP heatmaps reveal neurophysiologically plausible patterns, resembling CSP-derived scalp maps. Critically, while CSP patterns represent class-wise aggregated information, LRP heatmaps pinpoint neural patterns to single time points in single trials. Comparison with Existing Method(s): We compare the classification performance of DNNs to that of linear CSP-LDA on two data sets related to motor-imaginery BCI. Conclusion: We have demonstrated that DNN is a powerful non-linear tool for EEG analysis. With LRP a new quality of high-resolution assessment of neural activity can be reached. LRP is a potential remedy for the lack of interpretability of DNNs that has limited their utility in neuroscientific applications. The extreme specificity of the LRP-derived heatmaps opens up new avenues for investigating neural activity underlying complex perception or decision-related processes.

...read moreread less

Journal Article•DOI•

Validity of Time Reversal for Testing Granger Causality

[...]

Irene Winkler¹, Danny Panknin¹, Daniel Bartz¹, Klaus-Robert Müller¹, Stefan Haufe¹ - Show less +1 more•Institutions (1)

Technical University of Berlin¹

01 Jun 2016-IEEE Transactions on Signal Processing

TL;DR: It is proved that, for linear finite-order autoregressive processes with unidirectional information flow between two variables, the application of time reversal for testing Granger causality indeed leads to correct estimates of information flow and its directionality.

...read moreread less

Abstract: Inferring causal interactions from observed data is a challenging problem, especially in the presence of measurement noise. To alleviate the problem of spurious causality, Haufe (2013) proposed to contrast measures of information flow obtained on the original data against the same measures obtained on time-reversed data. They show that this procedure, time-reversed Granger causality (TRGC), robustly rejects causal interpretations on mixtures of independent signals. While promising results have been achieved in simulations, it was so far unknown whether time reversal leads to valid measures of information flow in the presence of true interaction. Here, we prove that, for linear finite-order autoregressive processes with unidirectional information flow between two variables, the application of time reversal for testing Granger causality indeed leads to correct estimates of information flow and its directionality. Using simulations, we further show that TRGC is able to infer correct directionality with similar statistical power as the net Granger causality between two variables, while being much more robust to the presence of measurement noise.

...read moreread less

Journal Article•DOI•

Combining Multiple Hypothesis Testing with Machine Learning Increases the Statistical Power of Genome-wide Association Studies

[...]

Bettina Mieth¹, Marius Kloft², Juan Antonio Rodríguez³, Sören Sonnenburg, Robin Vobruba¹, Carlos Morcillo-Suarez³, Xavier Farré³, Urko M. Marigorta⁴, Ernst Fehr⁵, Thorsten Dickhaus⁶, Gilles Blanchard⁷, Daniel Schunk⁸, Arcadi Navarro⁹, Arcadi Navarro³, Klaus-Robert Müller¹, Klaus-Robert Müller¹⁰ - Show less +12 more•Institutions (10)

Technical University of Berlin¹, Humboldt University of Berlin², Pompeu Fabra University³, Georgia Institute of Technology⁴, University of Zurich⁵, University of Bremen⁶, University of Potsdam⁷, University of Mainz⁸, Catalan Institution for Research and Advanced Studies⁹, Korea University¹⁰

28 Nov 2016-Scientific Reports

TL;DR: Applying COMBI to data from a WTCCC study and measuring performance as replication by independent GWAS published within the 2008–2015 period, it is shown that the method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods.

...read moreread less

Abstract: The standard approach to the analysis of genome-wide association studies (GWAS) is based on testing each position in the genome individually for statistical significance of its association with the phenotype under investigation. To improve the analysis of GWAS, we propose a combination of machine learning and statistical testing that takes correlation structures within the set of SNPs under investigation in a mathematically well-controlled manner into account. The novel two-step algorithm, COMBI, first trains a support vector machine to determine a subset of candidate SNPs and then performs hypothesis tests for these SNPs together with an adequate threshold correction. Applying COMBI to data from a WTCCC study (2007) and measuring performance as replication by independent GWAS published within the 2008–2015 period, we show that our method outperforms ordinary raw p-value thresholding as well as other state-of-the-art methods. COMBI presents higher power and precision than the examined alternatives while yielding fewer false (i.e. non-replicated) and more true (i.e. replicated) discoveries when its results are validated on later GWAS studies. More than 80% of the discoveries made by COMBI upon WTCCC data have been validated by independent studies. Implementations of the COMBI method are available as a part of the GWASpi toolbox 2.0.

...read moreread less

Journal Article•DOI•

Ensembles of adaptive spatial filters increase BCI performance: an online evaluation.

[...]

Claudia Sannelli¹, Carmen Vidaurre¹, Carmen Vidaurre², Klaus-Robert Müller¹, Klaus-Robert Müller³, Benjamin Blankertz¹ - Show less +2 more•Institutions (3)

Technical University of Berlin¹, University of Navarra², Korea University³

17 May 2016-Journal of Neural Engineering

TL;DR: The results of the study show that CSPP adapts faster and thereby allows users to achieve better feedback within a shorter time than previous approaches performed with Laplacian derivations and CSP filters, and thus reduce BCI inefficiency to one-fourth in comparison to previous non-adaptive paradigms.

...read moreread less

Abstract: Objective: In electroencephalographic (EEG) data, signals from distinct sources within the brain are widely spread by volume conduction and superimposed such that sensors receive mixtures of a multitude of signals This reduction of spatial information strongly hampers single-trial analysis of EEG data as, for example, required for brain–computer interfacing (BCI) when using features from spontaneous brain rhythms Spatial filtering techniques are therefore greatly needed to extract meaningful information from EEG Our goal is to show, in online operation, that common spatial pattern patches (CSPP) are valuable to counteract this problem Approach: Even though the effect of spatial mixing can be encountered by spatial filters, there is a trade-off between performance and the requirement of calibration data Laplacian derivations do not require calibration data at all, but their performance for single-trial classification is limited Conversely, data-driven spatial filters, such as common spatial patterns (CSP), can lead to highly distinctive features; however they require a considerable amount of training data Recently, we showed in an offline analysis that CSPP can establish a valuable compromise In this paper, we confirm these results in an online BCI study In order to demonstrate the paramount feature that CSPP requires little training data, we used them in an adaptive setting with 20 participants and focused on users who did not have success with previous BCI approaches Main results: The results of the study show that CSPP adapts faster and thereby allows users to achieve better feedback within a shorter time than previous approaches performed with Laplacian derivations and CSP filters The success of the experiment highlights that CSPP has the potential to further reduce BCI inefficiency Significance: CSPP are a valuable compromise between CSP and Laplacian filters They allow users to attain better feedback within a shorter time and thus reduce BCI inefficiency to one-fourth in comparison to previous non-adaptive paradigms

...read moreread less

Journal Article•DOI•

EEG-based BCI for the linear control of an upper-limb neuroprosthesis

[...]

Carmen Vidaurre, Christian Klauer, Thomas Schauer, Ander Ramos-Murguialday¹, Klaus-Robert Müller² - Show less +1 more•Institutions (2)

University of Tübingen¹, Korea University²

01 Nov 2016-Medical Engineering & Physics

TL;DR: A feasibility study in which healthy users were able to use a non-invasive Motor Imagery-based brain computer interface (BCI) to achieve linear control of an upper-limb functional electrical stimulation (FES) controlled neuro-prosthesis.

...read moreread less

Journal Article•DOI•

The LDA beamformer: Optimal estimation of ERP source time series using linear discriminant analysis

[...]

Matthias S. Treder¹, Matthias S. Treder², Anne K. Porbadnigk², Forooz Shahbazi Avarvand², Klaus-Robert Müller³, Klaus-Robert Müller², Benjamin Blankertz² - Show less +3 more•Institutions (3)

University of Cambridge¹, Technical University of Berlin², Korea University³

01 Apr 2016-NeuroImage

TL;DR: The LDA beamformer optimally reconstructs ERP sources by maximizing the ERP signal-to-noise ratio and is a highly suited tool for analyzing ERP source time series, particularly in EEG/MEG studies wherein a source model is not available.

...read moreread less

Proceedings Article•DOI•

Brain-Computer Interfacing for multimedia quality assessment

[...]

Sebastian Bosse¹, Klaus-Robert Müller, Thomas Wiegand¹, Wojciech Samek¹•Institutions (1)

Heinrich Hertz Institute¹

01 Oct 2016

TL;DR: An overview over the shortcomings of conventional approaches is given, the state-of-the art of BCI-based methods are presented and open questions and challenges relevant to the BCI community are discussed.

...read moreread less

Abstract: The assessment of perceived multimedia quality is a central research field in information and media technology. Conventionally, psychophysical techniques are used for determining the quality of multimedia signals. Recently, Brain-Computer Interfacing (BCI)-based methods have been proposed for the assessment of perceived multimedia signal quality. In this paper we give an overview over the shortcomings of conventional approaches, present the state-of-the art of BCI-based methods and discuss open questions and challenges relevant to the BCI community.

...read moreread less

Posted Content•

Feature Importance Measure for Non-linear Learning Algorithms.

[...]

Marina M.-C. Vidovic, Nico Görnitz, Klaus-Robert Müller, Marius Kloft

22 Nov 2016-arXiv: Artificial Intelligence

TL;DR: The Measure of Feature Importance (MFI) is general and can be applied to any arbitrary learning machine (including kernel machines and deep learning) and can detect features that by itself are inconspicuous and only impact the prediction function through their interaction with other features.

...read moreread less

Abstract: Complex problems may require sophisticated, non-linear learning methods such as kernel machines or deep neural networks to achieve state of the art prediction accuracies. However, high prediction accuracies are not the only objective to consider when solving problems using machine learning. Instead, particular scientific applications require some explanation of the learned prediction function. Unfortunately, most methods do not come with out of the box straight forward interpretation. Even linear prediction functions are not straight forward to explain if features exhibit complex correlation structure. In this paper, we propose the Measure of Feature Importance (MFI). MFI is general and can be applied to any arbitrary learning machine (including kernel machines and deep learning). MFI is intrinsically non-linear and can detect features that by itself are inconspicuous and only impact the prediction function through their interaction with other features. Lastly, MFI can be used for both --- model-based feature importance and instance-based feature importance (i.e, measuring the importance of a feature for a particular data point).

...read moreread less

Posted Content•

Interpreting the Predictions of Complex ML Models by Layer-wise Relevance Propagation

[...]

Wojciech Samek, Grégoire Montavon, Alexander Binder, Sebastian Lapuschkin, Klaus-Robert Müller - Show less +1 more

24 Nov 2016-arXiv: Machine Learning

TL;DR: This short paper summarizes a recent technique introduced by Bach et al. that explains predictions by decomposing the classification decision of DNN models in terms of input variables.

...read moreread less

Abstract: Complex nonlinear models such as deep neural network (DNNs) have become an important tool for image classification, speech recognition, natural language processing, and many other fields of application. These models however lack transparency due to their complex nonlinear structure and to the complex data distributions to which they typically apply. As a result, it is difficult to fully characterize what makes these models reach a particular decision for a given input. This lack of transparency can be a drawback, especially in the context of sensitive applications such as medical analysis or security. In this short paper, we summarize a recent technique introduced by Bach et al. [1] that explains predictions by decomposing the classification decision of DNN models in terms of input variables.

...read moreread less

Posted Content•

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers

[...]

Alexander Binder, Grégoire Montavon, Sebastian Bach, Klaus-Robert Müller, Wojciech Samek - Show less +1 more

04 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes an approach to extend layer-wise relevance propagation to neural networks with local renormalization layers, which is a very common product-type non-linearity in convolutional neural networks.

...read moreread less

Journal Article•DOI•

Decoding of top-down cognitive processing for SSVEP-controlled BMI

[...]

Byoung-Kyong Min¹, Sven Dähne, Min-Hee Ahn¹, Yung-Kyun Noh², Klaus-Robert Müller¹ - Show less +1 more•Institutions (2)

Korea University¹, Seoul National University²

03 Nov 2016-Scientific Reports

TL;DR: The present paradigm provides the first neurophysiological evidence for the top-down SSVEP BMI paradigm, which potentially enables multi-class intentional control of EEG-BMIs without using gaze-shifting.

...read moreread less

Abstract: We present a fast and accurate non-invasive brain-machine interface (BMI) based on demodulating steady-state visual evoked potentials (SSVEPs) in electroencephalography (EEG). Our study reports an SSVEP-BMI that, for the first time, decodes primarily based on top-down and not bottom-up visual information processing. The experimental setup presents a grid-shaped flickering line array that the participants observe while intentionally attending to a subset of flickering lines representing the shape of a letter. While the flickering pixels stimulate the participant’s visual cortex uniformly with equal probability, the participant’s intention groups the strokes and thus perceives a ‘letter Gestalt’. We observed decoding accuracy of 35.81% (up to 65.83%) with a regularized linear discriminant analysis; on average 2.05-fold, and up to 3.77-fold greater than chance levels in multi-class classification. Compared to the EEG signals, an electrooculogram (EOG) did not significantly contribute to decoding accuracies. Further analysis reveals that the top-down SSVEP paradigm shows the most focalised activation pattern around occipital visual areas; Granger causality analysis consistently revealed prefrontal top-down control over early visual processing. Taken together, the present paradigm provides the first neurophysiological evidence for the top-down SSVEP BMI paradigm, which potentially enables multi-class intentional control of EEG-BMIs without using gaze-shifting.

...read moreread less

Journal Article•DOI•

Multiscale temporal neural dynamics predict performance in a complex sensorimotor task

[...]

Wojciech Samek¹, Duncan A. J. Blythe, Gabriel Curio², Klaus-Robert Müller³, Benjamin Blankertz, Vadim V. Nikulin², Vadim V. Nikulin⁴ - Show less +3 more•Institutions (4)

Heinrich Hertz Institute¹, Charité², Korea University³, National Research University – Higher School of Economics⁴

01 Nov 2016-NeuroImage

TL;DR: It is shown that Long-Range Temporal Correlations (LRTCs) estimated from the amplitude of EEG oscillations over a range of time-scales predict performance in a complex sensorimotor task, based on Brain-Computer Interfacing (BCI).

...read moreread less

Proceedings Article•DOI•

Neural network-based full-reference image quality assessment

[...]

Sebastian Bosse¹, Dominique Maniry¹, Klaus-Robert Müller², Thomas Wiegand¹, Wojciech Samek¹ - Show less +1 more•Institutions (2)

Heinrich Hertz Institute¹, Korea University²

01 Jan 2016

TL;DR: Three different feature combination methods and two aggregation approaches are proposed and evaluated in this paper and linear Pearson correlations superior to state-of-the-art IQA methods are achieved.

...read moreread less

Abstract: This paper presents a full-reference (FR) image quality assessment (IQA) method based on a deep convolutional neural network (CNN). The CNN extracts features from distorted and reference image patches and estimates the perceived quality of the distorted ones by combining and regressing the feature vectors using two fully connected layers. The CNN consists of 12 convolution and max-pooling layers; activation is done by a rectifier activation function (ReLU). The overall IQA score is computed by aggregating the patch quality estimates. Three different feature combination methods and two aggregation approaches are proposed and evaluated in this paper. Experiments are performed on the LIVE and TID2013 databases. On both databases linear Pearson correlations superior to state-of-the-art IQA methods are achieved.

...read moreread less

Journal Article•DOI•

Brain-computer interfacing under distraction: an evaluation study

[...]

Stephanie Brandl, Laura Frølich¹, Johannes Höhne, Klaus-Robert Müller², Wojciech Samek³ - Show less +1 more•Institutions (3)

Technical University of Denmark¹, Korea University², Heinrich Hertz Institute³

31 Aug 2016-Journal of Neural Engineering

TL;DR: While artifact removal does not enhance the BCI performance significantly, both ensemble classification and the 2-step classification combined with CSP significantly improve the performance compared to the standard procedure.

...read moreread less

Abstract: OBJECTIVE: While motor-imagery based brain-computer interfaces (BCIs) have been studied over many years by now, most of these studies have taken place in controlled lab settings. Bringing BCI technology into everyday life is still one of the main challenges in this field of research. APPROACH: This paper systematically investigates BCI performance under 6 types of distractions that mimic out-of-lab environments. MAIN RESULTS: We report results of 16 participants and show that the performance of the standard common spatial patterns (CSP) + regularized linear discriminant analysis classification pipeline drops significantly in this 'simulated' out-of-lab setting. We then investigate three methods for improving the performance: (1) artifact removal, (2) ensemble classification, and (3) a 2-step classification approach. While artifact removal does not enhance the BCI performance significantly, both ensemble classification and the 2-step classification combined with CSP significantly improve the performance compared to the standard procedure. SIGNIFICANCE: Systematically analyzing out-of-lab scenarios is crucial when bringing BCI into everyday life. Algorithms must be adapted to overcome nonstationary environments in order to tackle real-world challenges. Language: en

...read moreread less

Book Chapter•DOI•

Identifying individual facial expressions by deconstructing a neural network

[...]

Farhad Arbabzadah¹, Grégoire Montavon¹, Klaus-Robert Müller², Klaus-Robert Müller¹, Wojciech Samek³ - Show less +1 more•Institutions (3)

Technical University of Berlin¹, Korea University², Heinrich Hertz Institute³

12 Sep 2016

TL;DR: In this paper, the authors focus on the problem of explaining predictions of psychological attributes such as attractiveness, happiness, confidence and intelligence from face photographs using deep neural networks and apply transfer learning with two base models to avoid overfitting.

...read moreread less

Abstract: This paper focuses on the problem of explaining predictions of psychological attributes such as attractiveness, happiness, confidence and intelligence from face photographs using deep neural networks. Since psychological attribute datasets typically suffer from small sample sizes, we apply transfer learning with two base models to avoid overfitting. These models were trained on an age and gender prediction task, respectively. Using a novel explanation method we extract heatmaps that highlight the parts of the image most responsible for the prediction. We further observe that the explanation method provides important insights into the nature of features of the base model, which allow one to assess the aptitude of the base model for a given transfer learning task. Finally, we observe that the multiclass model is more feature rich than its binary counterpart. The experimental evaluation is performed on the 2222 images from the 10k US faces dataset containing psychological attribute labels as well as on a subset of KDEF images.

...read moreread less