scispace - formally typeset
Search or ask a question

Showing papers by "Klaus-Robert Müller published in 2013"


Journal ArticleDOI
TL;DR: A number of established machine learning techniques are outlined and the influence of the molecular representation on the methods performance is investigated, finding the best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules.
Abstract: The accurate and reliable prediction of properties of molecules typically requires computationally intensive quantum-chemical calculations. Recently, machine learning techniques applied to ab initio calculations have been proposed as an efficient approach for describing the energies of molecules in their given ground-state structure throughout chemical compound space (Rupp et al. Phys. Rev. Lett. 2012, 108, 058301). In this paper we outline a number of established machine learning techniques and investigate the influence of the molecular representation on the methods performance. The best methods achieve prediction errors of 3 kcal/mol for the atomization energies of a wide variety of molecules. Rationales for this performance improvement are given together with pitfalls and challenges when applying machine learning approaches to the prediction of quantum-mechanical observables.

584 citations


Journal ArticleDOI
TL;DR: In this article, a deep multi-task artificial neural network is used to predict multiple electronic ground and excited-state properties, such as atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies.
Abstract: The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure?property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e.?nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ?quantum machine? is similar, and sometimes superior, to modern quantum-chemical methods?at negligible computational cost.

488 citations


Journal ArticleDOI
TL;DR: In this paper, a deep multi-task artificial neural network is used to predict multiple electronic ground-and excited-state properties, such as atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity, and excitation energies.
Abstract: The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel, and predictive structure-property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning (ML) model, trained on a data base of \textit{ab initio} calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity, and excitation energies. The ML model is based on a deep multi-task artificial neural network, exploiting underlying correlations between various molecular properties. The input is identical to \emph{ab initio} methods, \emph{i.e.} nuclear charges and Cartesian coordinates of all atoms. For small organic molecules the accuracy of such a "Quantum Machine" is similar, and sometimes superior, to modern quantum-chemical methods---at negligible computational cost.

456 citations


Journal ArticleDOI
TL;DR: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data, and develops and integrated a novel encoder control that guarantees that high quality intermediate views can be generated based on the decoded data.
Abstract: This paper describes an extension of the high efficiency video coding (HEVC) standard for coding of multi-view video and depth data. In addition to the known concept of disparity-compensated prediction, inter-view motion parameter, and inter-view residual prediction for coding of the dependent video views are developed and integrated. Furthermore, for depth coding, new intra coding modes, a modified motion compensation and motion vector coding as well as the concept of motion parameter inheritance are part of the HEVC extension. A novel encoder control uses view synthesis optimization, which guarantees that high quality intermediate views can be generated based on the decoded data. The bitstream format supports the extraction of partial bitstreams, so that conventional 2D video, stereo video, and the full multi-view video plus depth format can be decoded from a single bitstream. Objective and subjective results are presented, demonstrating that the proposed approach provides 50% bit rate savings in comparison with HEVC simulcast and 20% in comparison with a straightforward multi-view extension of HEVC without the newly developed coding tools.

365 citations


Journal ArticleDOI
TL;DR: A theoretical framework is developed to characterize artifacts of volume conduction, which may still be present even in reconstructed source time series as zero-lag correlations, and to distinguish their time-delayed brain interaction.

302 citations


Journal ArticleDOI
TL;DR: This novel approach to learning from other subjects aims to reduce the adverse effects of common nonstationarities, but does not transfer discriminative information, and can not only achieve a significant increase in performance but also allow for a neurophysiologically meaningful interpretation.
Abstract: Compensating changes between a subjects' training and testing session in brain-computer interfacing (BCI) is challenging but of great importance for a robust BCI operation. We show that such changes are very similar between subjects, and thus can be reliably estimated using data from other users and utilized to construct an invariant feature space. This novel approach to learning from other subjects aims to reduce the adverse effects of common nonstationarities, but does not transfer discriminative information. This is an important conceptual difference to standard multi-subject methods that, e.g., improve the covariance matrix estimation by shrinking it toward the average of other users or construct a global feature space. These methods do not reduces the shift between training and test data and may produce poor results when subjects have very different signal characteristics. In this paper, we compare our approach to two state-of-the-art multi-subject methods on toy data and two datasets of EEG recordings from subjects performing motor imagery. We show that it can not only achieve a significant increase in performance, but also that the extracted change patterns allow for a neurophysiologically meaningful interpretation.

184 citations


Journal ArticleDOI
TL;DR: Using a one-dimensional model, this nonlinear interpolation between Kohn-Sham reference calculations can accurately dissociate a diatomic, be systematically improved with increased reference data and generate accurate self-consistent densities via a projection method that avoids directions with no data.
Abstract: Using a one-dimensional model, we explore the ability of machine learning to approximate the non-interacting kinetic energy density functional of diatomics. This nonlinear interpolation between Kohn-Sham reference calculations can (i) accurately dissociate a diatomic, (ii) be systematically improved with increased reference data and (iii) generate accurate self-consistent densities via a projection method that avoids directions with no data. With relatively few densities, the error due to the interpolation is smaller than typical errors in standard exchange-correlation functionals.

119 citations


Journal ArticleDOI
TL;DR: A set of recent methods that can be universally used to make kernel methods more transparent are reported on that allows to assess the underlying complexity and noise structure of a learning problem and thus to distinguish high/low noise scenarios of high/ low complexity respectively.
Abstract: Over the last decade, nonlinear kernel-based learning methods have been widely used in the sciences and in industry for solving, e.g., classification, regression, and ranking problems. While their users are more than happy with the performance of this powerful technology, there is an emerging need to additionally gain better understanding of both the learning machine and the data analysis problem to be solved. Opening the nonlinear black box, however, is a notoriously difficult challenge. In this review, we report on a set of recent methods that can be universally used to make kernel methods more transparent. In particular, we discuss relevant dimension estimation (RDE) that allows to assess the underlying complexity and noise structure of a learning problem and thus to distinguish high/low noise scenarios of high/low complexity respectively. Moreover, we introduce a novel local technique based on RDE for quantifying the reliability of the learned predictions. Finally, we report on techniques that can explain the individual nonlinear prediction. In this manner, our novel methods not only help to gain further knowledge about the nonlinear signal processing problem itself, but they broaden the general usefulness of kernel methods in practical signal processing applications.

47 citations


Journal ArticleDOI
TL;DR: A novel classification approach helps to detect trials with presumably non-conscious processing at the threshold of perception and uncovers a non-trivial confounder between neural hits and neural misses.
Abstract: Objective. Assessing speech quality perception is a challenge typically addressed in behavioral and opinion-seeking experiments. Only recently, neuroimaging methods were introduced, which were used to study the neural processing of quality at group level. However, our electroencephalography (EEG) studies show that the neural correlates of quality perception are highly individual. Therefore, it became necessary to establish dedicated machine learning methods for decoding subject-specific effects. Approach. The effectiveness of our methods is shown by the data of an EEG study that investigates how the quality of spoken vowels is processed neurally. Participants were asked to indicate whether they had perceived a degradation of quality (signal-correlated noise) in vowels, presented in an oddball paradigm. Main results. We find that the P3 amplitude is attenuated with increasing noise. Single-trial analysis allows one to show that this is partly due to an increasing jitter of the P3 component. A novel classification approach helps to detect trials with presumably non-conscious processing at the threshold of perception. We show that this approach uncovers a non-trivial confounder between neural hits and neural misses. Significance. The combined use of EEG signals and machine learning methods results in a significant ‘neural’ gain in sensitivity (in processing quality loss) when compared to standard behavioral evaluation; averaged over 11 subjects, this amounts to a relative improvement in sensitivity of 35%. (Some figures may appear in colour only in the online journal)

45 citations


Proceedings Article
05 Dec 2013
TL;DR: This work formulate CSP as a divergence maximization problem and utilize the property of a particular type of divergence, namely beta divergence, for robustifying the estimation of spatial filters in the presence of artifacts in the data.
Abstract: The efficiency of Brain-Computer Interfaces (BCI) largely depends upon a reliable extraction of informative features from the high-dimensional EEG signal. A crucial step in this protocol is the computation of spatial filters. The Common Spatial Patterns (CSP) algorithm computes filters that maximize the difference in band power between two conditions, thus it is tailored to extract the relevant information in motor imagery experiments. However, CSP is highly sensitive to artifacts in the EEG data, i.e. few outliers may alter the estimate drastically and decrease classification performance. Inspired by concepts from the field of information geometry we propose a novel approach for robustifying CSP. More precisely, we formulate CSP as a divergence maximization problem and utilize the property of a particular type of divergence, namely beta divergence, for robustifying the estimation of spatial filters in the presence of artifacts in the data. We demonstrate the usefulness of our method on toy data and on EEG recordings from 80 subjects.

44 citations


Journal ArticleDOI
TL;DR: This work tackles the problem of relating a nonlinear function of the raw EEG time-domain signal, say, EEG band power, to another modality such as the hemodynamic response, as measured with NIRS or fMRI by defining a novel algorithm, multimodal source power correlation analysis (mSPoC).
Abstract: The urge to further our understanding of multimodal neural data has recently become an important topic due to the ever increasing availability of simultaneously recorded data from different neural imaging modalities. In case where EEG is one of the modalities, it is of interest to relate a nonlinear function of the raw EEG time-domain signal, say, EEG band power, to another modality such as the hemodynamic response, as measured with NIRS or fMRI. In this work we tackle exactly this problem defining a novel algorithm that we denote multimodal source power correlation analysis (mSPoC). The validity and high performance of the mSPoC framework is demonstrated for simulated and real-world multimodal data.

Journal ArticleDOI
03 Jul 2013-PLOS ONE
TL;DR: In a thorough empirical study for the US, European, and Hong Kong stock market, it is shown that the proposed method leads to improved portfolio allocation and introduces the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error.
Abstract: Robust and reliable covariance estimates play a decisive role in financial and many other applications. An important class of estimators is based on factor models. Here, we show by extensive Monte Carlo simulations that covariance matrices derived from the statistical Factor Analysis model exhibit a systematic error, which is similar to the well-known systematic error of the spectrum of the sample covariance matrix. Moreover, we introduce the Directional Variance Adjustment (DVA) algorithm, which diminishes the systematic error. In a thorough empirical study for the US, European, and Hong Kong stock market we show that our proposed method leads to improved portfolio allocation.

Journal ArticleDOI
TL;DR: A novel biased random sampling strategy for image representation in Bag-of-Words models is proposed and its impact on the feature properties and the ranking quality for a set of semantic concepts is evaluated and it improves performance of classifiers in image annotation tasks and increases the correlation between kernels and labels.

Journal ArticleDOI
TL;DR: Afferent NMES motor patterns can support the calibration of BCI systems and be used to decode MI, which might be a new way to train sensorimotor rhythm (SMR) basedBCI systems for healthy users having difficulties to attain BCI control.

Book ChapterDOI
01 Jan 2013
TL;DR: A novel direction where kernel-based models are used for property optimization, where a stable estimation of the model’s gradient is essential and non-trivial to achieve.
Abstract: In the last decade, kernel-based learning has become a state-of-the-art technology in Machine Learning. We briefly review kernel PCAKernel principal component analysis (kPCA) (kPCA) and the pre-image problem that occurs in kPCA. Subsequently, we discuss a novel direction where kernel-based models are used for property optimization. For this purpose, a stable estimation of the model’s gradient is essential and non-trivial to achieve. The appropriate use of pre-image projections is key to successful gradient-based optimization—as will be shown for toy and real-world problems from quantum chemistry and physics.

Proceedings Article
05 Dec 2013
TL;DR: It is shown that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data, and it is proved consistency under assumptions which do not restrict the covariance structure and therefore better match real world data.
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to cross-validation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage -orthogonal complement shrinkage- which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience.

Journal ArticleDOI
TL;DR: In this paper, the authors used machine learning to approximate the kinetic energy of one-dimensional diatomics as a functional of the electron density, which can accurately dissociate a diatomic, and can be systematically improved with training.
Abstract: Machine learning is used to approximate the kinetic energy of one dimensional diatomics as a functional of the electron density. The functional can accurately dissociate a diatomic, and can be systematically improved with training. Highly accurate self-consistent densities and molecular forces are found, indicating the possibility for ab-initio molecular dynamics simulations.

01 Jan 2013
TL;DR: Kernel methods offer a number of unique advantages for signal processing, and this special issue aims to review some of those.
Abstract: The importance of learning and adaptation in statistical signal processing creates a symbiotic relationship with machine learning. However, the two disciplines possess different momentum and emphasis, which makes it attractive to periodically review trends and new developments in their overlapping spheres of influence. Looking at the recent trends in machine learning, we see increasing interest in kernel methods, Bayesian reasoning, causality, information theoretic learning, reinforcement learning, and nonnumeric data processing, just to name a few. While some of the machine-learning community trends are clearly visible in signal processing, such as the increased popularity of the Bayesian methods and graphical models, others such as kernel approaches are still less prominent. However, kernel methods offer a number of unique advantages for signal processing, and this special issue aims to review some of those. KerneL-BASed LeArnIng: BAcKground

Journal ArticleDOI
TL;DR: Kernel methods offer a number of unique advantages for signal processing, and this special issue aims to review some of those.
Abstract: he importance of learning and adaptation in statistical signal processing creates a ymbiotic relationship with machine learning. However, the two disciplines possess different momentum and emphasis, which makes it attractive to periodically review trends and new developments in their overlapping spheres of influence. Looking at the recent trends in machine learning, we see increasing interest in kernel methods, Bayesian reasoning, causality, information theoretic learning, reinforcement learning, and nonnumeric data processing, just to name a few. While some of the machine-learning community trends are clearly visible in signal processing, such as the increased popularity of the Bayesian methods and graphical models, others such as kernel approaches are still less prominent. However, kernel methods offer a number of unique advantages for signal processing, and this special issue aims to review some of those.

Proceedings ArticleDOI
03 Jul 2013
TL;DR: Multiple Kernel Learning has been widely used for feature fusion in computer vision and allows to simultaneously learn the classifier and the optimal weighting and is compared to two baseline approaches.
Abstract: Combining information from different sources is a common way to improve classification accuracy in Brain-Computer Interfacing (BCI). For instance, in small sample settings it is useful to integrate data from other subjects or sessions in order to improve the estimation quality of the spatial filters or the classifier. Since data from different subjects may show large variability, it is crucial to weight the contributions according to importance. Many multi-subject learning algorithms determine the optimal weighting in a separate step by using heuristics, however, without ensuring that the selected weights are optimal with respect to classification. In this work we apply Multiple Kernel Learning (MKL) to this problem. MKL has been widely used for feature fusion in computer vision and allows to simultaneously learn the classifier and the optimal weighting. We compare the MKL method to two baseline approaches and investigate the reasons for performance improvement.

Journal ArticleDOI
TL;DR: A novel supervised-unsupervised learning scheme is proposed, which aims to differentiate true labels from random ones in a data-driven way, and shows that this approach provides a more crisp view of the brain states that experimenters are looking for, besides discovering additional brain states to which the classical analysis is blind.
Abstract: The last years have seen a rise of interest in using electroencephalography-based brain computer interfacing methodology for investigating non-medical questions, beyond the purpose of communication and control. One of these novel applications is to examine how signal quality is being processed neurally, which is of particular interest for industry, besides providing neuroscientific insights. As for most behavioral experiments in the neurosciences, the assessment of a given stimulus by a subject is required. Based on an EEG study on speech quality of phonemes, we will first discuss the information contained in the neural correlate of this judgement. Typically, this is done by analyzing the data along behavioral responses/labels. However, participants in such complex experiments often guess at the threshold of perception. This leads to labels that are only partly correct, and oftentimes random, which is a problematic scenario for using supervised learning. Therefore, we propose a novel supervised-unsupervised learning scheme, which aims to differentiate true labels from random ones in a data-driven way. We show that this approach provides a more crisp view of the brain states that experimenters are looking for, besides discovering additional brain states to which the classical analysis is blind.

Journal ArticleDOI
TL;DR: A novel algorithm is proposed for disentangling such different causes of non-stationarity and in this manner enable better neurophysiological interpretation for a wider set of experimental paradigms.
Abstract: Neural recordings are non-stationary time series, i.e. their properties typically change over time. Identifying specific changes, e.g., those induced by a learning task, can shed light on the underlying neural processes. However, such changes of interest are often masked by strong unrelated changes, which can be of physiological origin or due to measurement artifacts. We propose a novel algorithm for disentangling such different causes of non-stationarity and in this manner enable better neurophysiological interpretation for a wider set of experimental paradigms. A key ingredient is the repeated application of Stationary Subspace Analysis (SSA) using different temporal scales. The usefulness of our explorative approach is demonstrated in simulations, theory and EEG experiments with 80 brain–computer interfacing subjects.

Proceedings ArticleDOI
23 Apr 2013
TL;DR: It is shown that even though there is high latency of NIRS signal in this proposed multimodal imaging technique, it can be reasonable system for real-time BCI.
Abstract: Electroencephalogram (EEG) has been widely used for brain-computer interface (BCI) due to its high temporal resolution. Meanwhile, multimodal imaging techniques based on combined EEG and near infrared spectroscopy (NIRS) have been studied in BCI research and shown to lead to beneficiary results in terms of classification [1]. However, performance results of this study show that there is a difference of peak accuracy (about 5s) between NIRS and EEG caused by the high latency of the NIRS signal. Based on our experimental results and analysis, we show that even though there is high latency of NIRS signal in our proposed multimodal imaging technique, it can be reasonable system for real-time BCI.

Journal ArticleDOI
TL;DR: In this paper, a small guide intends to pinpoint some neural networks pitfalls, along with corresponding solutions to successfully realize function approximation tasks in physics, chemistry or other fields in computational physics and chemistry.
Abstract: There is a long history of using neural networks for function approximation in computational physics and chemistry Despite their conceptual simplicity, the practitioner may face difficulties when it comes to putting them to work This small guide intends to pinpoint some neural networks pitfalls, along with corresponding solutions to successfully realize function approximation tasks in physics, chemistry or other fields

Proceedings ArticleDOI
TL;DR: In this article, the authors apply multiple kernel learning (MKL) to feature fusion in computer vision and compare the MKL method to two baseline approaches and investigate the reasons for performance improvement.
Abstract: Combining information from different sources is a common way to improve classification accuracy in Brain-Computer Interfacing (BCI). For instance, in small sample settings it is useful to integrate data from other subjects or sessions in order to improve the estimation quality of the spatial filters or the classifier. Since data from different subjects may show large variability, it is crucial to weight the contributions according to importance. Many multi-subject learning algorithms determine the optimal weighting in a separate step by using heuristics, however, without ensuring that the selected weights are optimal with respect to classification. In this work we apply Multiple Kernel Learning (MKL) to this problem. MKL has been widely used for feature fusion in computer vision and allows to simultaneously learn the classifier and the optimal weighting. We compare the MKL method to two baseline approaches and investigate the reasons for performance improvement.

Journal ArticleDOI
TL;DR: An unsupervised signal processing approach is pre- sented, which tackles usability by an algorithmic improve- ment from the field of machine learning, which completely omits the necessity of a calibration recording for BCIs based on event-related potential (ERP) paradigms.
Abstract: This contribution reviews how usability in Brain- Computer Interfaces (BCI) can be enhanced. As an ex- ample, an unsupervised signal processing approach is pre- sented, which tackles usability by an algorithmic improve- ment from the field of machine learning. The approach completely omits the necessity of a calibration recording for BCIs based on event-related potential (ERP) paradigms. The positive effect is twofold - first, the experimental time is shortened and the productive online use of the BCI system starts as early as possible. Second, the unsupervised ses- sion avoids the usual paradigmatic break between calibra- tion phase and online phase, which is known to introduce data-analytic problems related to non-stationarity.

Proceedings ArticleDOI
23 Apr 2013
TL;DR: Two aspects of multi-modal imaging will be reviewed, namely combined EEG and NIRS measurements and how recordings of multiple subjects can help in finding subject-independent BCI classifiers, to help in enhancing as well as robustifying BCI performance.
Abstract: Multimodal techniques have seen a rising interest from the neuroscientific as well as the BCI community in recent times. In this abstract two aspects of multi-modal imaging will be reviewed. Firstly, how recordings of multiple subjects can help in finding subject-independent BCI classifiers and secondly how multi-modal neuroimaging methods, namely combined EEG and NIRS measurements can help in enhancing as well as robustifying BCI performance.

Book ChapterDOI
12 Jul 2013
TL;DR: This chapter presents a straightforward extension of CCA that estimates the correct solution even in the presence of noninstantaneous couplings, that is, temporal delays or convolutions between data sources.
Abstract: Technical advances in the field of noninvasive neuroimaging allow for innovative therapeutical strategies with application potential in neural rehabilitation To improve these methods, combinations of multiple imaging modalities have become an important topic of research This chapter reviews some of the most popular unsupervised statistical learning techniques used in the context of neuroscientific data analysis, and places a special focus on multimodal neural data It starts with the well-known principal component analysis (PCA) First, the chapter shows how to derive the algorithm and provides illustrative examples of the advantages and disadvantages of standard PCA The second method presented is canonical correlation analysis (CCA): a multivariate analysis method that reveals maximally correlated features of simultaneously acquired multiple data streams Finally the chapter presents a straightforward extension of CCA that estimates the correct solution even in the presence of noninstantaneous couplings, that is, temporal delays or convolutions between data sources

Proceedings ArticleDOI
01 Jan 2013
TL;DR: The current state of the art of system design techniques is presented and three major scientific challenges are discussed: linking physicality and computation, to study cyberphysical systems of collaborating computational elements controlling physical entities; component-based systems engineering, in particular as the ability to build correct-byconstruction systems from verified components.
Abstract: It is again a great pleasure to welcome you to the 5th International Winter Conference on Brain-Computer Interface in High1 resort. This is the fifth event of the annual Brain-Computer Interface seasonal conference and it has been a great successful tradition so far.