scispace - formally typeset
Search or ask a question

Showing papers by "Klaus-Robert Müller published in 2012"


Journal ArticleDOI
TL;DR: A machine learning model is introduced to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only, and applicability is demonstrated for the prediction of molecular atomization potential energy curves.
Abstract: We introduce a machine learning model to predict atomization energies of a diverse set of organic molecules, based on nuclear charges and atomic positions only. The problem of solving the molecular Schrodinger equation is mapped onto a nonlinear statistical regression problem of reduced complexity. Regression models are trained on and compared to atomization energies computed with hybrid density- functional theory. Cross validation over more than seven thousand organic molecules yields a mean absolute error of � 10 kcal=mol. Applicability is demonstrated for the prediction of molecular atomization potential energy curves.

1,755 citations


Journal ArticleDOI
TL;DR: The BCI competition IV stands in the tradition of prior BCI competitions that aim to provide high quality neuroscientific data for open access to the scientific community and it is the hope that winning entries may enhance the analysis methods of future BCIs.
Abstract: The BCI competition IV stands in the tradition of prior BCI competitions that aim to provide high quality neuroscientific data for open access to the scientific community. As experienced already in prior competitions not only scientists from the narrow field of BCI compete, but scholars with a broad variety of backgrounds and nationalities. They include high specialists as well as students. The goals of all BCI competitions have always been to challenge with respect to novel paradigms and complex data. We report on the following challenges: (1) asynchronous data, (2) synthetic, (3) multi-class continuous data, (4) session-to-session transfer, (5) directionally modulated MEG, (6) finger movements recorded by ECoG. As after past competitions, our hope is that winning entries may enhance the analysis methods of future BCIs.

747 citations


Journal ArticleDOI
TL;DR: The results show that simultaneous measurements of NIRS and EEG can significantly improve the classification accuracy of motor imagery in over 90% of considered subjects and increases performance by 5% on average (p<0:01).

536 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used machine learning to approximate density functionals for the model problem of the kinetic energy of noninteracting fermions in 1D, and achieved mean absolute errors below $1/text{ }\text{ ]/mathm{kcal}/\mathrm{mol}$ on test densities similar to the training set with fewer than 100 training densities.
Abstract: Machine learning is used to approximate density functionals. For the model problem of the kinetic energy of noninteracting fermions in 1D, mean absolute errors below $1\text{ }\text{ }\mathrm{kcal}/\mathrm{mol}$ on test densities similar to the training set are reached with fewer than 100 training densities. A predictor identifies if a test density is within the interpolation region. Via principal component analysis, a projected functional derivative finds highly accurate self-consistent densities. The challenges for application of our method to real electronic structure problems are discussed.

473 citations


Journal ArticleDOI
TL;DR: Psychological parameters as measured in this study play a moderate role for one-session performance in a BCI controlled by modulation of SMR.

242 citations


Journal ArticleDOI
TL;DR: A method which regularizes CSP towards stationary subspaces (sCSP) is proposed and it is shown that this increases classification accuracy, especially for subjects who are hardly able to control a BCI.
Abstract: Classifying motion intentions in brain-computer interfacing (BCI) is a demanding task as the recorded EEG signal is not only noisy and has limited spatial resolution but it is also intrinsically non-stationary. The non-stationarities in the signal may come from many different sources, for instance, electrode artefacts, muscular activity or changes of task involvement, and often deteriorate classification performance. This is mainly because features extracted by standard methods like common spatial patterns (CSP) are not invariant to variations of the signal properties, thus should also change over time. Although many extensions of CSP were proposed to, for example, reduce the sensitivity to noise or incorporate information from other subjects, none of them tackles the non-stationarity problem directly. In this paper, we propose a method which regularizes CSP towards stationary subspaces (sCSP) and show that this increases classification accuracy, especially for subjects who are hardly able to control a BCI. We compare our method with the state-of-the-art approaches on different datasets, show competitive results and analyse the reasons for the improvement.

220 citations


01 Jan 2012
TL;DR: In this paper, the authors have identified some of the reasons that are relevant for explaining the seeming contradiction of the clinical and commercial impact of myoelectric control from EMG, and raised the awareness for the necessity of additional parallel research efforts toward issues whose importance for practical implementations has been underestimated.
Abstract: SUMMARY AND CONCLUSIONS Myoelectric control has a great poten-tial for improving the quality of life of persons with limb deficiency. However, despite the tremendous success in obtaining almost perfect classification accuracy from EMG, its clinical and commercial impact is still limited. We have identified some of the reasons that we believe are relevant for explaining this seeming contradiction. The major-ity of current pattern classification methods do not provide simultaneous and proportional control, are not imple-mented with sensory feedback, do not adapt to the changes in EMG signal characteristics, and do not integrate other sensor modalities to allow com-plex actions. These problems hinder the possibility of using such paradigm in applications that aim at clinical and commercial use. Academic research has focused in the past decades on refining classification accuracy and has rele-gated to secondary importance the aspects outlined in this article. As such, a gap between the academia and the industry state of the art has been gener-ated unnecessarily. This gap could be filled by addressing the specific needs of intuitive myoelectric control and sys-tem robustness. With this position, we are not questioning the need of further research within pattern classification of EMG. Indeed, three of the four demands that we have identified can be imple-mented within a pattern classification paradigm. Rather, our intention is to raise the awareness for the necessity of additional parallel research efforts toward issues whose importance for practical implementations has been underestimated.

164 citations


Journal ArticleDOI
TL;DR: An approach to the direct measurement of perception of video quality change using electroencephalography (EEG) is presented, suggesting that abrupt changes of videoquality give rise to specific components in the EEG that can be detected on a single-trial basis.
Abstract: An approach to the direct measurement of perception of video quality change using electroencephalography (EEG) is presented. Subjects viewed 8-s video clips while their brain activity was registered using EEG. The video signal was either uncompressed at full length or changed from uncompressed to a lower quality level at a random time point. The distortions were introduced by a hybrid video codec. Subjects had to indicate whether they had perceived a quality change. In response to a quality change, a positive voltage change in EEG (the so-called P3 component) was observed at latency of about 400-600 ms for all subjects. The voltage change positively correlated with the magnitude of the video quality change, substantiating the P3 component as a graded neural index of the perception of video quality change within the presented paradigm. By applying machine learning techniques, we could classify on a single-trial basis whether a subject perceived a quality change. Interestingly, some video clips wherein changes were missed (i.e., not reported) by the subject were classified as quality changes, suggesting that the brain detected a change, although the subject did not press a button. In conclusion, abrupt changes of video quality give rise to specific components in the EEG that can be detected on a single-trial basis. Potentially, a neurotechnological approach to video assessment could lead to a more objective quantification of quality change detection, overcoming the limitations of subjective approaches (such as subjective bias and the requirement of an overt response). Furthermore, it allows for real-time applications wherein the brain response to a video clip is monitored while it is being viewed.

134 citations


Proceedings Article
03 Dec 2012
TL;DR: This paper adopts a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry, and suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally.
Abstract: The accurate prediction of molecular energetics in chemical compound space is a crucial ingredient for rational compound design. The inherently graph-like, non-vectorial nature of molecular data gives rise to a unique and difficult machine learning problem. In this paper, we adopt a learning-from-scratch approach where quantum-mechanical molecular energies are predicted directly from the raw molecular geometry. The study suggests a benefit from setting flexible priors and enforcing invariance stochastically rather than structurally. Our results improve the state-of-the-art by a factor of almost three, bringing statistical methods one step closer to chemical accuracy.

122 citations


Journal ArticleDOI
TL;DR: A method for optimizing transition state theory dividing surfaces with support vector machines and it is shown that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface.
Abstract: We present a method for optimizing transition state theory dividing surfaces with support vector machines. The resulting dividing surfaces require no a priori information or intuition about reaction mechanisms. To generate optimal dividing surfaces, we apply a cycle of machine-learning and refinement of the surface by molecular dynamics sampling. We demonstrate that the machine-learned surfaces contain the relevant low-energy saddle points. The mechanisms of reactions may be extracted from the machine-learned surfaces in order to identify unexpected chemically relevant processes. Furthermore, we show that the machine-learned surfaces significantly increase the transmission coefficient for an adatom exchange involving many coupled degrees of freedom on a (100) surface when compared to a distance-based dividing surface.

101 citations


Journal ArticleDOI
TL;DR: Optimal spatial filters that enhance separation properties of EMG signals are investigated and cross-validation results show a significant improvement in performance and a higher robustness against noise than commonly used pattern recognition methods.
Abstract: Pattern recognition techniques have been applied to extract information from electromyographic (EMG) signals that can be used to control electrical powered hand prostheses. In this paper, optimized spatial filters that enhance separation properties of EMG signals are investigated. In particular, different multiclass extensions of the common spatial patterns algorithm are applied to high-density surface EMG signals acquired from the forearms of ten healthy subjects. Visualization of the obtained filter coefficients provides insight into the physiology of the muscles related to the performed contractions. The CSP methods are compared with a commonly used pattern recognition approach in a six-class classification task. Cross-validation results show a significant improvement in performance and a higher robustness against noise than commonly used pattern recognition methods.

Book ChapterDOI
TL;DR: This chapter presents the "centering trick" that consists of rewriting the energy of the system as a function of centered states, which improves the conditioning of the underlying optimization problem and makes learning more stable, leading to models with better generative and discriminative properties.
Abstract: Deep Boltzmann machines are in principle powerful models for extracting the hierarchical structure of data. Unfortunately, attempts to train layers jointly (without greedy layer-wise pretraining) have been largely unsuccessful. We propose a modification of the learning algorithm that initially recenters the output of the activation functions to zero. This modification leads to a better conditioned Hessian and thus makes learning easier. We test the algorithm on real data and demonstrate that our suggestion, the centered deep Boltzmann machine, learns a hierarchy of increasingly abstract representations and a better generative model of data.

Book ChapterDOI
01 Jan 2012
TL;DR: The centering trick as mentioned in this paper improves the conditioning of the underlying optimization problem and makes learning more stable, leading to models with better generative and discriminative properties for deep Boltzmann machines.
Abstract: Deep Boltzmann machines are in theory capable of learning efficient representations of seemingly complex data. Designing an algorithm that effectively learns the data representation can be subject to multiple difficulties. In this chapter, we present the “centering trick” that consists of rewriting the energy of the system as a function of centered states. The centering trick improves the conditioning of the underlying optimization problem and makes learning more stable, leading to models with better generative and discriminative properties.

Journal ArticleDOI
TL;DR: Empirical results show that the local SVM formulation can effectively exploit the taxonomy structure and thus outperforms standard multi-class classification algorithms while it achieves on par results with taxonomy-based structured algorithms at a significantly decreased computing time.
Abstract: We study the problem of classifying images into a given, pre-determined taxonomy. This task can be elegantly translated into the structured learning framework. However, despite its power, structured learning has known limits in scalability due to its high memory requirements and slow training process. We propose an efficient approximation of the structured learning approach by an ensemble of local support vector machines (SVMs) that can be trained efficiently with standard techniques. A first theoretical discussion and experiments on toy-data allow to shed light onto why taxonomy-based classification can outperform taxonomy-free approaches and why an appropriately combined ensemble of local SVMs might be of high practical use. Further empirical results on subsets of Caltech256 and VOC2006 data indeed show that our local SVM formulation can effectively exploit the taxonomy structure and thus outperforms standard multi-class classification algorithms while it achieves on par results with taxonomy-based structured algorithms at a significantly decreased computing time.

Proceedings ArticleDOI
12 Nov 2012
TL;DR: This pilot study shows that robust control is also possible with conventional linear regression if EMG power measures are available for a large number of electrodes and it is possible to linearize the problem with simple nonlinear transformations of band-pass power.
Abstract: Previous approaches for extracting real-time proportional control information simultaneously for multiple degree of Freedom(DoF) from the electromyogram (EMG) often used non-linear methods such as the multilayer perceptron (MLP). In this pilot study we show that robust control is also possible with conventional linear regression if EMG power measures are available for a large number of electrodes. In particular, we show that it is possible to linearize the problem with simple nonlinear transformations of band-pass power. Because of its simplicity the method scales well to high dimensions, is easily regularized when insufficient training data is available, and is particularly well suited for real-time control as well as on-line optimization.

Proceedings ArticleDOI
12 Nov 2012
TL;DR: It is shown that learning in a discriminative and stationary subspace is advantageous for BCI application and outperforms the standard SSA method.
Abstract: The non-stationary nature of neurophysiological measurements, eg EEG, makes classification of motion intentions a demanding task Variations in the underlying brain processes often lead to significant and unexpected changes in the feature distribution resulting in decreased classification accuracy in Brain Computer Interfacing (BCI) Several methods were developed to tackle this problem by either adapting to these changes or extracting features that are invariant Recently, a method called Stationary Subspace Analysis (SSA) was proposed and applied to BCI data It diminishes the influence of non-stationary changes as learning and classification is performed in a stationary subspace of the data which can be extracted by SSA In this paper we extend this method in two ways First we propose a variant of SSA that allows to extract stationary subspaces from labeled data without disregarding class-related variations or treating class-differences as non-stationarities Second we propose a discriminant variant of SSA that trades-off stationarity and discriminativity, thus it allows to extract stationary subspaces without losing relevant information We show that learning in a discriminative and stationary subspace is advantageous for BCI application and outperforms the standard SSA method

Journal ArticleDOI
TL;DR: Results show that abandoning the spatiotemporal separability assumption consistently improves the decoding accuracy of neural signals from fMRI data, and are compared with results from optical imaging and fMRI studies.

Proceedings Article
21 Mar 2012
TL;DR: It is argued that the emerging feature hierarchy is still explicit enough to be traversed in a feedforward fashion and produces a feed-forward hierarchy of increasingly invariant representations that clearly surpasses the layer-wise approach.
Abstract: The deep Boltzmann machine is a powerful model that extracts the hierarchical structure of observed data. While inference is typically slow due to its undirected nature, we argue that the emerging feature hierarchy is still explicit enough to be traversed in a feedforward fashion. The claim is corroborated by training a set of deep neural networks on real data and measuring the evolution of the representation layer after layer. The analysis reveals that the deep Boltzmann machine produces a feed-forward hierarchy of increasingly invariant representations that clearly surpasses the layer-wise approach.

Patent
14 Sep 2012
TL;DR: In this paper, the authors proposed a method for automatic analysis of an image (1, 11, 12, 13) of a biological sample with respect to a pathological relevance using a bag of visual word approach.
Abstract: Method for the automatic analysis of an image (1, 11, 12, 13) of a biological sample with respect to a pathological relevance, wherein fj local features of the image (1, 1.1, 12, 13) are aggregated to a global feature of the image (1, 11, 12, 13) using a bag of visual word approach, g) step a) is repeated at least two times using different methods resulting in at least two bag of word feature datasets,, h) computation of at least two similarity measures using the bag of word features obtained from a training image dataset and bag of word features from the image (1, 1 1, 1 2, 13) i) the image training dataset comprising a set of visual words, classifier parameters, including kernel weights and bag of word features from the training images, j) the computation of the at least two similarity measures is subject: to an adaptive computation of kernel normalization parameters and / or kernel width parameters, f) for each image (1, 11, 12, 13) one score is computed depending on the classifier parameters and kernel weights and the at least two similarity measures, the at least one score being a measure of the certainty of one pathological category compared to the image training dataset, g) for each pixel of the image (1, 11, 12, 13) a pixel-wise score is computed using the classifier parameters, the kernel weights, the at least two similarity measures, the bag of word features of the image (1, 11, 12, 13), all the local features used in the computation of the bag of word features of the image (1, 11, 12, 13) and the pixels used in the computations of the local features, h) the pixel-wise score is stored as a heatmap dataset linking the pixels of the image (1, 11, 12, 13) to the pixel-wise scores.

Journal ArticleDOI
30 Oct 2012-PLOS ONE
TL;DR: A novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean is provided, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables.
Abstract: We provide a novel interpretation of the dual of support vector machines (SVMs) in terms of scatter with respect to class prototypes and their mean. As a key contribution, we extend this framework to multiple classes, providing a new joint Scatter SVM algorithm, at the level of its binary counterpart in the number of optimization variables. This enables us to implement computationally efficient solvers based on sequential minimal and chunking optimization. As a further contribution, the primal problem formulation is developed in terms of regularized risk minimization and the hinge loss, revealing the score function to be used in the actual classification of test patterns. We investigate Scatter SVM properties related to generalization ability, computational efficiency, sparsity and sensitivity maps, and report promising results.

Proceedings Article
21 Mar 2012
TL;DR: In this article, the authors propose a method called ideal regression for approximating an arbitrary system of polynomial equations by a system of a particular type using techniques from approximate computational algebraic geometry.
Abstract: We propose a method called ideal regression for approximating an arbitrary system of polynomial equations by a system of a particular type Using techniques from approximate computational algebraic geometry, we show how we can solve ideal regression directly without resorting to numerical optimization Ideal regression is useful whenever the solution to a learning problem can be described by a system of polynomial equations As an example, we demonstrate how to formulate Stationary Subspace Analysis (SSA), a source separation problem, in terms of ideal regression, which also yields a consistent estimator for SSA We then compare this estimator in simulations with previous optimization-based approaches for SSA

Journal ArticleDOI
TL;DR: It is shown that by treating the cumulants as elements of the polynomial ring the authors can directly solve the unsupervised learning problem of finding the subspace on which several probability distributions agree, at a lower computational cost and with higher accuracy.
Abstract: We propose a novel algebraic algorithmic framework for dealing with probability distributions represented by their cumulants such as the mean and covariance matrix. As an example, we consider the unsupervised learning problem of finding the subspace on which several probability distributions agree. Instead of minimizing an objective function involving the estimated cumulants, we show that by treating the cumulants as elements of the polynomial ring we can directly solve the problem, at a lower computational cost and with higher accuracy. Moreover, the algebraic viewpoint on probability distributions allows us to invoke the theory of algebraic geometry, which we demonstrate in a compact proof for an identifiability criterion.

Journal ArticleDOI
24 Aug 2012-PLOS ONE
TL;DR: A recently developed non-sparse MKL variant is applied to state-of-the-art concept recognition tasks from the application domain of computer vision and compared against its direct competitors, the sum-kernel SVM and sparse MKL.
Abstract: Combining information from various image features has become a standard technique in concept recognition tasks. However, the optimal way of fusing the resulting kernel functions is usually unknown in practical applications. Multiple kernel learning (MKL) techniques allow to determine an optimal linear combination of such similarity matrices. Classical approaches to MKL promote sparse mixtures. Unfortunately, 1-norm regularized MKL variants are often observed to be outperformed by an unweighted sum kernel. The main contributions of this paper are the following: we apply a recently developed non-sparse MKL variant to state-of-the-art concept recognition tasks from the application domain of computer vision. We provide insights on benefits and limits of non-sparse MKL and compare it against its direct competitors, the sum-kernel SVM and sparse MKL. We report empirical results for the PASCAL VOC 2009 Classification and ImageCLEF2010 Photo Annotation challenge data sets. Data sets (kernel matrices) as well as further information are available at http://doc.ml.tu-berlin.de/image_mkl/(Accessed 2012 Jun 25).


Book ChapterDOI
01 Jan 2012
TL;DR: In this paper, the authors introduce basic concepts and ideas of the Support Vector Machines (SVM) and formulate the learning problem in a statistical framework, where a special focus is put on the concept of consistency, which leads to the principle of structural risk minimization.
Abstract: In this chapter we introduce basic concepts and ideas of the Support Vector Machines (SVM) In the first section we formulate the learning problem in a statistical framework A special focus is put on the concept of consistency, which leads to the principle of structural risk minimization (SRM) Application of these ideas to classification problems brings us to the basic, linear formulation of the SVM, described in Sect 303 We then introduce the so called “kernel trick” as a tool for building a non-linear SVM as well as applying an SVM to non-vectorial data (Sect 304) The practical issues of implementation of the SVM training algorithms and the related optimization problems are the topic of Sect 305 Extensions of the SVM algorithms for the problems of non-linear regression and novelty detection are presented in Sect 306 A brief description of the most successful applications of the SVM is given in Sect 307 Finally, in the last Sect 308 we summarize the main ideas of the chapter

Proceedings ArticleDOI
12 Nov 2012
TL;DR: The novel Common Spatial Patterns Patches (CSPP) technique is proposed as a good candidate to improve the co-adaptive calibration of BCI systems, and the evaluation of CSPP in online operation is presented for the first time.
Abstract: Brain-Computer Interfaces (BCI) based on the voluntary modulation of sensorimotor rhythms (SMRs) induced by motor imagery are very prominent because allow a continuous control of the external device. Nevertheless, the design of a SMR-based BCI system that provides every user with a reliable BCI control from the first session, i.e., without extensive training, is still a big challenge. Considerable advances in this direction have been made by the machine learning co-adaptive calibration approach, which combines online adaptation techniques with subject learning in order to offer the user a feedback from the beginning of the experiment. Recently, based on offline analyses, we proposed the novel Common Spatial Patterns Patches (CSPP) technique as a good candidate to improve the co-adaptive calibration. CSPP is an ensemble of localized spatial filters, each of them optimized on subject-specific data by CSP analysis. Here, the evaluation of CSPP in online operation is presented for the first time. Results on three BCI-naive participants show indeed promising results. All three users reach the threshold criterion of 70% accuracy within one session, even one candidate for whom the weak SMR at rest predicted deficient BCI control. Concurrent recordings of the SMR during a relax condition as well as the course of BCI performance indicate a clear learning effect.

Proceedings ArticleDOI
01 Jan 2012
TL;DR: This study shows how healthy subjects are able to use a non-invasive Motor Imagery-based Brain Computer Interface (BCI) to achieve linear control of an upper-limb neuromuscular electrical stimulation (NMES) controlled neuroprosthesis in a simple binary target selection task.
Abstract: In this study we show how healthy subjects are able to use a non-invasive Motor Imagery (MI)-based Brain Computer Interface (BCI) to achieve linear control of an upper-limb neuromuscular electrical stimulation (NMES) controlled neuroprosthesis in a simple binary target selection task. Linear BCI control can be achieved if two motor imagery classes can be discriminated with a reliability over 80% in single trial. The results presented in this work show that there was no significant loss of performance using the neuroproshesis in comparison to MI where no stimulation was present. However, it is remarkable how different the experience of the users was in the same experiment. The stimulation either provoked a positive reinforcement feedback, or prevented the user from concentrating in the task.

Book ChapterDOI
01 Jan 2012
TL;DR: In many cases, the amount of labeled data is limited and does not allow for fully identifying the function that needs to be learned, so the learning algorithm starts to "invent" nonexistent regularities while at the same time not being able to model the true ones.
Abstract: In many cases, the amount of labeled data is limited and does not allow for fully identifying the function that needs to be learned. When labeled data is scarce, the learning algorithm is exposed to simultaneous underfitting and overfitting. The learning algorithm starts to “invent” nonexistent regularities (overfitting) while at the same time not being able to model the true ones (underfitting). In the extreme case, this amounts to perfectly memorizing training data and not being able to generalize at all to new data.