scispace - formally typeset
Search or ask a question

Showing papers by "Klaus-Robert Müller published in 2001"


Journal ArticleDOI
TL;DR: This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods.
Abstract: This paper provides an introduction to support vector machines, kernel Fisher discriminant analysis, and kernel principal component analysis, as examples for successful kernel-based learning methods. We first give a short background about Vapnik-Chervonenkis theory and kernel feature spaces and then proceed to kernel based learning in supervised and unsupervised scenarios including practical and algorithmic considerations. We illustrate the usefulness of kernel algorithms by discussing applications such as optical character recognition and DNA analysis.

3,566 citations


Journal ArticleDOI
TL;DR: It is found that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors.
Abstract: Recently ensemble methods like ADABOOST have been applied successfully in many problems, while seemingly defying the problems of overfitting. ADABOOST rarely overfits in the low noise regime, however, we show that it clearly does so for higher noise levels. Central to the understanding of this fact is the margin distribution. ADABOOST can be viewed as a constraint gradient descent in an error function with respect to the margin. We find that ADABOOST asymptotically achieves a hard margin distribution, i.e. the algorithm concentrates its resources on a few hard-to-learn patterns that are interestingly very similar to Support Vectors. A hard margin is clearly a sub-optimal strategy in the noisy case, and regularization, in our case a “mistrust” in the data, must be introduced in the algorithm to alleviate the distortions that single difficult patterns (e.g. outliers) can cause to the margin distribution. We propose several regularization methods and generalizations of the original ADABOOST algorithm to achieve a soft margin. In particular we suggest (1) regularized ADABOOSTREG where the gradient decent is done directly with respect to the soft margin and (2) regularized linear and quadratic programming (LP/QP-) ADABOOST, where the soft margin is attained by introducing slack variables. Extensive simulations demonstrate that the proposed regularized ADABOOST-type algorithms are useful and yield competitive results for noisy data.

1,367 citations


Proceedings Article
03 Jan 2001
TL;DR: This work detects upcoming finger movements in a natural keyboard typing condition and predicts their laterality in a pseudo-online simulation, and compares discriminative classifiers like Support Vector Machines (SVMs) and different variants of Fisher Discriminant that possess favorable regularization properties for dealing with high noise cases.
Abstract: Driven by the progress in the field of single-trial analysis of EEG, there is a growing interest in brain computer interfaces (BCIs), i.e., systems that enable human subjects to control a computer only by means of their brain signals. In a pseudo-online simulation our BCI detects upcoming finger movements in a natural keyboard typing condition and predicts their laterality. This can be done on average 100-230ms before the respective key is actually pressed, i.e., long before the onset of EMG. Our approach is appealing for its short response time and high classification accuracy (>96%) in a binary decision where no human training is involved. We compare discriminative classifiers like Support Vector Machines (SVMs) and different variants of Fisher Discriminant that possess favorable regularization properties for dealing with high noise cases (inter-trial variablity).

496 citations


Journal ArticleDOI
03 Jan 2001
TL;DR: This work proposes a new discriminative TOP kernel derived from tangent vectors of posterior log-odds and develops a theoretical framework on feature extractors from probabilistic models and uses it for analyzing the TOP kernel.
Abstract: Recently, Jaakkola and Haussler (1999) proposed a method for constructing kernel functions from probabilistic models. Their so-called Fisher kernel has been combined with discriminative classifiers such as support vector machines and applied successfully in, for example, DNA and protein analysis. Whereas the Fisher kernel is calculated from the marginal log-likelihood, we propose the TOP kernel derived from tangent vectors of posterior log-odds. Furthermore, we develop a theoretical framework on feature extractors from probabilistic models and use it for analyzing the TOP kernel. In experiments, our new discriminative TOP kernel compares favorably to the Fisher kernel.

154 citations


Book ChapterDOI
21 Aug 2001
TL;DR: An algorithm to predict the leave-one-out (LOO) error for kernel based classifiers is proposed, inspired by geometrical intuition and allows to reliably select a good model as demonstrated in simulations on Support Vector and Linear Programming Machines.
Abstract: We propose an algorithm to predict the leave-one-out (LOO) error for kernel based classifiers. To achieve this goal with computational efficiency, we cast the LOO error approximation task into a classification problem. This means that we need to learn a classification of whether or not a given training sample - if left out of the data set - would be misclassified. For this learning task, simple data dependent features are proposed, inspired by geometrical intuition. Our approach allows to reliably select a good model as demonstrated in simulations on Support Vector and Linear Programming Machines. Comparisons to existing learning theoretical bounds, e.g. the span bound, are given for various model selection scenarios.

33 citations


Proceedings Article
03 Jan 2001
TL;DR: A new mathematical construction is proposed that permits to adapt to the intrinsic dimension of the data and to find an orthonormal basis of this submanifold and allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings.
Abstract: In kernel based learning the data is mapped to a kernel feature space of a dimension that corresponds to the number of training data points. In practice, however, the data forms a smaller submanifold in feature space, a fact that has been used e.g. by reduced set techniques for SVMs. We propose a new mathematical construction that permits to adapt to the intrinsic dimension and to find an orthonormal basis of this submanifold. In doing so, computations get much simpler and more important our theoretical framework allows to derive elegant kernelized blind source separation (BSS) algorithms for arbitrary invertible nonlinear mixings. Experiments demonstrate the good performance and high computational efficiency of our kTDSEP algorithm for the problem of nonlinear BSS.

29 citations


Journal ArticleDOI
TL;DR: Using Gaussian kernels to define the correlation sum, it is shown theoretically that the estimates, which are derived for additive white Gaussian noise, are also robust for moderately colored noise and underline the usefulness of the proposed correction schemes.
Abstract: Using Gaussian kernels to define the correlation sum we derive simple formulas that correct the noise bias in estimates of the correlation dimension and K2 entropy of chaotic time series. The corrections are only based on the difference of correlation dimensions for adjacent embedding dimensions and hence preserve the full functional dependencies on both the scale parameter and embedding dimension. It is shown theoretically that the estimates, which are derived for additive white Gaussian noise, are also robust for moderately colored noise. Simulations underline the usefulness of the proposed correction schemes. It is demonstrated that the method gives satisfactory results also for non-Gaussian and dynamical noise.

25 citations


Proceedings Article
03 Jan 2001
TL;DR: In this article, the authors use resampling methods to assess the quality of the discovered projections and show experimentally that their proposed variance estimations are strongly correlated with the separation error, and demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance.
Abstract: When applying unsupervised learning techniques like ICA or temporal decorrelation, a key question is whether the discovered projections are reliable. In other words: can we give error bars or can we assess the quality of our separation? We use resampling methods to tackle these questions and show experimentally that our proposed variance estimations are strongly correlated to the separation error. We demonstrate that this reliability estimation can be used to choose the appropriate ICA-model, to enhance significantly the separation performance, and, most important, to mark the components that have a actual physical meaning. Application to 49-channel-data from an magnetoencephalography (MEG) experiment underlines the usefulness of our approach.

15 citations