scispace - formally typeset
Search or ask a question

Showing papers by "Klaus-Robert Müller published in 1999"


Journal ArticleDOI
TL;DR: The geometry of feature space is reviewed, and the connection between feature space and input space is discussed by dealing with the question of how one can, given some vector in feature space, find a preimage in input space.
Abstract: This paper collects some ideas targeted at advancing our understanding of the feature spaces associated with support vector (SV) kernel functions. We first discuss the geometry of feature space. In particular, we review what is known about the shape of the image of input space under the feature space map, and how this influences the capacity of SV methods. Following this, we describe how the metric governing the intrinsic geometry of the mapped surface can be computed in terms of the kernel, using the example of the class of inhomogeneous polynomial kernels, which are often used in SV pattern recognition. We then discuss the connection between feature space and input space by dealing with the question of how one can, given some vector in feature space, find a preimage (exact or approximate) in input space. We describe algorithms to tackle this issue, and show their utility in two applications of kernel methods. First, we use it to reduce the computational complexity of SV decision functions; second, we combine it with the kernel PCA algorithm, thereby constructing a nonlinear statistical denoising technique which is shown to perform well on real-world data.

1,258 citations


Proceedings ArticleDOI
08 Feb 1999
TL;DR: In this paper, a nonlinear form of principal component analysis (PCA) is proposed to perform polynomial feature extraction in high-dimensional feature spaces, related to input space by some nonlinear map; for instance, the space of all possible d-pixel products in images.
Abstract: A new method for performing a nonlinear form of Principal Component Analysis is proposed. By the use of integral operator kernel functions, one can efficiently compute principal components in highdimensional feature spaces, related to input space by some nonlinear map; for instance the space of all possible d-pixel products in images. We give the derivation of the method and present experimental results on polynomial feature extraction for pattern recognition.

430 citations


Proceedings Article
29 Nov 1999
TL;DR: Employing a unified framework in terms of a nonlinear variant of the Rayleigh coefficient, this work proposes non-linear generalizations of Fisher's discriminant and oriented PCA using Support Vector kernel functions.
Abstract: We incorporate prior knowledge to construct nonlinear algorithms for invariant feature extraction and discrimination. Employing a unified framework in terms of a nonlinear variant of the Rayleigh coefficient, we propose non-linear generalizations of Fisher's discriminant and oriented PCA using Support Vector kernel functions. Extensive simulations show the utility of our approach.

207 citations



Proceedings Article
29 Nov 1999
TL;DR: This work assumes linear combinations of reflectance spectra with some additive normal sensor noise and derives a probabilistic MAP framework for analyzing hyperspectral data and develops an algorithm that can be understood as constrained independent component analysis (ICA).
Abstract: In hyperspectral imagery one pixel typically consists of a mixture of the reflectance spectra of several materials, where the mixture coefficients correspond to the abundances of the constituting materials. We assume linear combinations of reflectance spectra with some additive normal sensor noise and derive a probabilistic MAP framework for analyzing hyperspectral data. As the material reflectance characteristics are not know a priori, we face the problem of unsupervised linear unmixing. The incorporation of different prior information (e.g. positivity and normalization of the abundances) naturally leads to a family of interesting algorithms, for example in the noise-free case yielding an algorithm that can be understood as constrained independent component analysis (ICA). Simulations underline the usefulness of our theory.

158 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A new linear program to deal with classification of data in the case of data given in terms of pairwise proximities, which allows to avoid the problems inherent in using feature spaces with indefinite metric in support vector machines.
Abstract: We provide a new linear program to deal with classification of data in the case of data given in terms of pairwise proximities. This allows to avoid the problems inherent in using feature spaces with indefinite metric in support vector machines, since the notion of a margin is purely needed in input space where the classification actually occurs. Moreover in our approach we can enforce sparsity in the proximity representation by sacrificing training error. This turns out to be favorable for proximity data. Similar to /spl nu/-SV methods, the only parameter needed in the algorithm is the (asymptotical) number of data points being classified with a margin. Finally, the algorithm is successfully compared with /spl nu/-SV learning in proximity space and K-nearest-neighbors on real world data from neuroscience and molecular biology.

96 citations



Proceedings Article
29 Nov 1999
TL;DR: A new boosting algorithm is proposed which allows for the possibility of a pre-specified fraction of points to lie in the margin area, even on the wrong side of the decision boundary.
Abstract: AdaBoost and other ensemble methods have successfully been applied to a number of classification tasks, seemingly defying problems of overfitting. AdaBoost performs gradient descent in an error function with respect to the margin, asymptotically concentrating on the patterns which are hardest to learn. For very noisy problems, however, this can be disadvantageous. Indeed, theoretical analysis has shown that the margin distribution, as opposed to just the minimal margin, plays a crucial role in understanding this phenomenon. Loosely speaking, some outliers should be tolerated if this has the benefit of substantially increasing the margin on the remaining points. We propose a new boosting algorithm which allows for the possibility of a pre-specified fraction of points to lie in the margin area Or even on the wrong side of the decision boundary.

15 citations


Proceedings Article
01 Jan 1999
TL;DR: With the described techniques the recognition performance can be improved by 26% over leading existing approaches, and there is evidence that existing related methods could profit from advanced TIS recognition.

15 citations


Book ChapterDOI
01 May 1999
TL;DR: In this article, an adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated, which can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available.
Abstract: An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for drifting and switching non-stationary blind separation tasks of accoustic signals. Introduction Neural networks are powerful tools that can capture the structure in data by learning. Often the batch learning paradigm is assumed, where the learner is given all training examples simultaneously and allowed to use them as often as desired. In large practical applications batch learning is experienced to be rather infeasible and instead on-line learning is employed. In the on-line learning scenario only one example is given at a time and then discarded after learning. So it is less memory consuming and at the same time it fits well into more natural learning, where the learner receives new information at every moment and should adapt to it, without having a large memory for storing old data. Appart from easier feasibility and data handling the most important advantage of on-line learning is its ability to adapt to changing environments, a quite common scenario in industrial applications where the data distribution changes gradually over time (e.g. due to wear and tear of the machines). If the learning machine does not detect and follow the change it is impossible to learn the data properly and large generalization errors will result.

12 citations


Proceedings ArticleDOI
01 Jan 1999
TL;DR: A framework for modeling switching dynamics from a time series that allows for a fast online detection of dynamical mode changes based on a hidden Markov model (HMM) of prediction experts and an input-density estimator generated for each expert.
Abstract: We present a framework for modeling switching dynamics from a time series that allows for a fast online detection of dynamical mode changes. The method is based on a hidden Markov model (HMM) of prediction experts. The predictors are trained by expectation maximization (EM) and by using an annealing schedule for the HMM state probabilities. This leads to a segmentation of the time series into different dynamical modes and a simultaneous specialization of the prediction experts on the segments. In a second step, an input-density estimator is generated for each expert. It can simply be computed from the data subset assigned to the respective expert. In conjunction with the HMM state probabilities, this allows for a very fast online detection of mode changes: change points are detected as soon as the incoming input data stream contains sufficient information to indicate a change in the dynamics.

Proceedings ArticleDOI
23 Aug 1999
TL;DR: In this paper, the mixtures of experts approach and a generalized hidden Markov model with an input-dependent transition matrix are combined to predict nonstationary dynamical systems by identifying appropriate sub-dynamics and an early detection of mode changes.
Abstract: The prediction of non-stationary dynamical systems may be performed by identifying appropriate sub-dynamics and an early detection of mode changes. We present a framework which unifies the mixtures of experts approach and a generalized hidden Markov model with an input-dependent transition matrix: the hidden Markov mixtures of experts (HMME). The gating procedure incorporates state memory, information about the current location in phase space, and the previous prediction performance. The experts and the hidden Markov gating model are simultaneously trained by an EM algorithm that maximizes the likelihood during an annealing procedure. The HMME architecture allows for a fast online detection of mode changes: change points are detected as soon as the incoming input data stream contains sufficient information to indicate a change in the dynamics.

Journal ArticleDOI
TL;DR: Kernel algorithms in feature spaces as elegant and efficient methods of realizing such machines are described by briefly describing industrial and academic applications, including ones where they obtained benchmark record results.
Abstract: Dieser Beitrag erlautert neue Ansatze und Ergebnisse der statistischen Lerntheorie. Nach einer Einleitung wird zunachst das Lernen aus Beispielen vorgestellt und erklart, dass neben dem Erklaren der Trainingdaten die Komplexitat von Lernmaschinen wesentlich fur den Lernerfolg ist. Weiterhin werden Kern-Algorithmen in Merkmalsraumen eingefuhrt, die eine elegante und effiziente Methode darstellen, verschiedene Lernmaschinen mit kontrollierbarer Komplexitat durch Kernfunktionen zu realisieren. Beispiele fur solche Algorithmen sind Support-Vektor-Maschinen (SVM), die Kernfunktionen zur Schatzung von Funktionen verwenden, oder Kern-PCA (principal component analysis), die Kernfunktionen zur Extraktion von nichtlinearen Merkmalen aus Datensatzen verwendet. Viel wichtiger als jedes einzelne Beispiel ist jedoch die Einsicht, dass jeder Algorithmus, der sich anhand von Skalarprodukten formulieren lasst, durch Verwendung von Kernfunktionen nichtlinear verallgemeinert werden kann. Die Signifikanz der Kernalgorithmen soll durch einen kurzen Abriss einiger industrieller und akademischer Anwendungen unterstrichen werden. Hier konnten wir Rekordergebnisse auf wichtigen praktisch relevanten Benchmarks erzielen.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: The results show that efficient search and retrieval in visual database systems is possible based on a normative feature description such as MPEG-7, and a search engine has been developed on the basis of this description scheme, which allows similarity-based retrieval from an image or video database.
Abstract: This paper reports about a description scheme for visual information content, which has been developed in the context of the forthcoming MPEG-7 standard. The system supports similarity-based retrieval of visual (image and video) data along feature axes like color, texture, shape/geometry and motion. The descriptors for these features have been developed in a way such that invariance against common transformations of visual material, e.g. filtering, contrast/color manipulation, resizing etc. is achieved, and that they are fitted to human perception properties. Furthermore, descriptors have been designed that allow a fast, hierarchical search procedure. A search engine has been developed on the basis of this description scheme, which allows similarity-based retrieval from an image or video database. The results show that efficient search and retrieval in visual database systems is possible based on a normative feature description such as MPEG-7.


Proceedings Article
01 Jan 1999
TL;DR: This work presents a framework of a mixtures of experts architecture and a generalized hidden Markov model (HMM) with a state space dependent transition matrix that allows for a fast on line detection of mode changes in cases where the most recent input data together with the last dynamical mode contain information to indicate a dynamical change.
Abstract: The prediction of switching dynamical systems requires an identi cation of each individual dynamics and an early detection of mode changes. Here we present a uni ed framework of a mixtures of experts architecture and a generalized hidden Markov model (HMM) with a state space dependent transition matrix. The specialization of the experts in the dynamical regimes and the adaptation of the switching probabilities is performed simultaneously during the training procedure. We show that our method allows for a fast on{line detection of mode changes in cases where the most recent input data together with the last dynamical mode contain su cient information to indicate a dynamical change.

Journal ArticleDOI
06 May 1999-Nature
TL;DR: The editor-in-chief of Nutrition, an international medical journal, and the director of a research laboratory, found the Briefing on science and fraud most interesting, because he is both a producer and a consumer of science.
Abstract: NATURE | VOL 399 | 6 MAY 1999 | www.nature.com 13 Sir — As editor-in-chief of Nutrition, an international medical journal, and as director of a research laboratory, I found your Briefing on science and fraud most interesting, because I am both a producer and a consumer of science (Nature 398, 13–17; 1999). My editorial colleagues and I have a high state of awareness of ‘fabrication, falsification and plagiarism (FFP)’. As reviewers of manuscripts, we have a difficult time detecting the two Fs, but allegations of the P have come to our attention several times. I believe that editors have an obligation to the scientific community to pass such concerns to the authors and to their institutes’ research dean or administrative supervisor in a confidential manner for investigation according to the guidelines of the US Office of Research Integrity. In doing so, we do not act as “secret police”, as the editor of the Journal of the Norwegian Medical Association maintains. Instead, we align ourselves with the UK Committee on Publication Ethics and the World Association of Medical Editors, whose recommendations are in my view appropriate. It does untold harm to the scientific community to be betrayed, deceived and defrauded. Such harm ranges from the squandering of limited research resources to the undermining of confidence and trust in the reporting of scientific findings. A journal should not be used to validate misconduct by publishing fraudulent data submitted knowingly by the author. If this occurs, editors bear an obligation to retract the paper. Our journal asks authors to sign a declaration of scientific integrity in their letter of transmittal. To avoid scientific misconduct in my laboratory, each new research fellow’s attention is drawn to this potential problem via policy and procedure material given to them on arrival, and the consequences of such temptations are clearly spelled out. Each new fellow also repeats a portion of their predecessor’s work to confirm the results, as an internal control standard. This has not dampened the lust for data among the ‘young and hungry’. But, ultimately, solid, reliable laboratory habits and supervision and mentoring are critical components to prevent misconduct. Michael M. Meguid Nutrition, Department of Surgery, 750 E. Adams St., Syracuse, New York 13210, USA Editors’ responsibility in defeating fraud