scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 2001"


Journal ArticleDOI
TL;DR: A main theme of this report is the relationship of approximation to learning and the primary role of sampling (inductive inference) and relations of the theory of learning to the mainstream of mathematics are emphasized.
Abstract: (1) A main theme of this report is the relationship of approximation to learning and the primary role of sampling (inductive inference). We try to emphasize relations of the theory of learning to the mainstream of mathematics. In particular, there are large roles for probability theory, for algorithms such as least squares, and for tools and ideas from linear algebra and linear analysis. An advantage of doing this is that communication is facilitated and the power of core mathematics is more easily brought to bear. We illustrate what we mean by learning theory by giving some instances. (a) The understanding of language acquisition by children or the emergence of languages in early human cultures. (b) In Manufacturing Engineering, the design of a new wave of machines is anticipated which uses sensors to sample properties of objects before, during, and after treatment. The information gathered from these samples is to be analyzed by the machine to decide how to better deal with new input objects (see [43]). (c) Pattern recognition of objects ranging from handwritten letters of the alphabet to pictures of animals, to the human voice. Understanding the laws of learning plays a large role in disciplines such as (Cognitive) Psychology, Animal Behavior, Economic Decision Making, all branches of Engineering, Computer Science, and especially the study of human thought processes (how the brain works). Mathematics has already played a big role towards the goal of giving a universal foundation of studies in these disciplines. We mention as examples the theory of Neural Networks going back to McCulloch and Pitts [25] and Minsky and Papert [27], the PAC learning of Valiant [40], Statistical Learning Theory as developed by Vapnik [42], and the use of reproducing kernels as in [17] among many other mathematical developments. We are heavily indebted to these developments. Recent discussions with a number of mathematicians have also been helpful. In

1,651 citations


Proceedings ArticleDOI
21 May 2001
TL;DR: This paper gives a short introduction to some new developments related to support vector machines (SVM), a new class of kernel based techniques introduced within statistical learning theory and structural risk minimization which lends to solving convex optimization problems and also the model complexity follows from this solution.
Abstract: Neural networks such as multilayer perceptrons and radial basis function networks have been very successful in a wide range of problems. In this paper we give a short introduction to some new developments related to support vector machines (SVM), a new class of kernel based techniques introduced within statistical learning theory and structural risk minimization. This new approach lends to solving convex optimization problems and also the model complexity follows from this solution. We especially focus on a least squares support vector machine formulation (LS-SVM) which enables to solve highly nonlinear and noisy black-box modelling problems, even in very high dimensional input spaces. While standard SVMs have been basically only applied to static problems like classification and function estimation, LS-SVM models have been extended to recurrent models and use in optimal control problems. Moreover, using weighted least squares and special pruning techniques, LS-SVMs can be employed for robust nonlinear estimation and sparse approximation. Applications of (LS)-SVMs to a large variety of artificial and real-life data sets indicate the huge potential of these methods.

275 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that the uniform convergence of empirical means (UCEM) property holds in any problem in which the satisfaction of a performance constraint can be expressed in terms of a finite number of polynomial inequalities.

233 citations


Book ChapterDOI
Bernhard Schölkopf1
01 Jan 2001
TL;DR: The main ideas of statistical learning theory, support vector machines, and kernel feature spaces are described.
Abstract: We briefly describe the main ideas of statistical learning theory, support vector machines, and kernel feature spaces.

157 citations


01 Jan 2001
TL;DR: This chapter contains sections titled: Data Representation and Similarity, A Simple Pattern Recognition Algorithm, Some Insights From Statistical Learning Theory, Hyperplane Classifiers, support Vector Classification, Support Vector Regression, Kernel Principal Component Analysis, Empirical Results and Implementations.
Abstract: This chapter contains sections titled: Data Representation and Similarity, A Simple Pattern Recognition Algorithm, Some Insights From Statistical Learning Theory, Hyperplane Classifiers, Support Vector Classification, Support Vector Regression, Kernel Principal Component Analysis, Empirical Results and Implementations

88 citations


Proceedings ArticleDOI
06 Aug 2001
TL;DR: Support vector regression techniques for black-box system identification are demonstrated and the theory underpinning SVR is described, and support vector methods with other approaches using radial basis networks are compared.
Abstract: We demonstrate the use of support vector regression (SVR) techniques for black-box system identification. These methods derive from statistical learning theory, and are of great theoretical and practical interest. We describe the theory underpinning SVR, and compare support vector methods with other approaches using radial basis networks. Finally, we apply SVR to modeling the behaviour of a hydraulic robot arm, and show that SVR improves on previously published results.

87 citations


Proceedings ArticleDOI
01 Jan 2001
TL;DR: Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 170 with similar classification performance.
Abstract: We present a two-step method to speed-up object detection systems in computer vision that use Support Vector Machines (SVMs) as classifiers. In a first step we perform feature reduction by choosing relevant image features according to a measure derived from statistical learning theory. In a second step we build a hierarchy of classifiers. On the bottom level, a simple and fast classifier analyzes the whole image and rejects large parts of the background On the top level, a slower but more accurate classifier performs the final detection. Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 170 with similar classification performance.

75 citations


Journal ArticleDOI
TL;DR: This paper uses standard bounds on empirical probabilities as well as recent results from statistical learning theory on the VC-dimension of families of sets defined by a finite number of polynomial inequalities to show that for each of the above problems, there exists aPolynomial-time randomized algorithm that can provide a yes or no answer to arbitrarily small levels of accuracy and confidence.

72 citations


Proceedings ArticleDOI
19 Jan 2001
TL;DR: The application to remote-sensing image classification of a new pattern recognition technique recently introduced within the framework of the Statistical Learning Theory developed by V. Vapnik and his co-workers, namely, the Support Vector Machines (SVMs).
Abstract: In the last decade, the application of statistical and neural network classifiers to remote-sensing images has been deeply investigated. Therefore, performances, characteristics, and pros and cons of such classifiers are quite well known, even from remote-sensing practitioners. In this paper, we present the application to remote-sensing image classification of a new pattern recognition technique recently introduced within the framework of the Statistical Learning Theory developed by V. Vapnik and his co-workers, namely, the Support Vector Machines (SVMs). In section 1, the main theoretical foundations of SVMs are presented. In section 2, experiments carried out on a data set of multisensor remote-sensing images are described, with particular emphasis on the design and training phase of a SVM. In section 3, the experimental results are reported, together with a comparison between the performances of SVMs, neural network, and k-NN classifiers.

71 citations


Journal ArticleDOI
TL;DR: This work presents multivariate penalized least squares regression estimates using Vapnik-Chervonenkis theory, and shows strong consistency of the truncated versions of the estimates without any conditions on the underlying distribution.
Abstract: We present multivariate penalized least squares regression estimates. We use Vapnik-Chervonenkis (see Statistical Learning Theory 1998) theory and bounds on the covering numbers to analyze convergence of the estimates. We show strong consistency of the truncated versions of the estimates without any conditions on the underlying distribution.

55 citations


Proceedings Article
03 Jan 2001
TL;DR: In contrast to standard statistical learning theory which studies uniform bounds on the expected error, the authors presented a framework that exploits the specific learning algorithm used. But the main difference to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space, it is only necessary to cover the functions which could have been learned using the fixed learning algorithm.
Abstract: In contrast to standard statistical learning theory which studies uniform bounds on the expected error we present a framework that exploits the specific learning algorithm used. Motivated by the luckiness framework [8] we are also able to exploit the serendipity of the training sample. The main difference to previous approaches lies in the complexity measure; rather than covering all hypotheses in a given hypothesis space it is only necessary to cover the functions which could have been learned using the fixed learning algorithm. We show how the resulting framework relates to the VC, luckiness and compression frameworks. Finally, we present an application of this framework to the maximum margin algorithm for linear classifiers which results in a bound that exploits both the margin and the distribution of the data in feature space.

Book ChapterDOI
16 Jul 2001
TL;DR: This paper shows that, assuming an i.i.d. data source but without any further assumptions, the problems of pattern recognition and regression can often be solved (and there are practically useful algorithms to solve them).
Abstract: Statistical learning theory considers three main problems, pattern recognition, regression and density estimation. This paper studies solvability of these problems (mainly concentrating on pattern recognition and density estimation) in the "high-dimensional" case, where the patterns in the training and test sets are never repeated. We show that, assuming an i.i.d. data source but without any further assumptions, the problems of pattern recognition and regression can often be solved (and there are practically useful algorithms to solve them). On the other hand, the problem of density estimation, as we formalize it, cannot be solved under the general i.i.d. assumption, and additional assumptions are required.

Book ChapterDOI
02 Jul 2001
TL;DR: Numerical results for different classifiers on a benchmark data set handwritten digits are presented and binary trees of SVMs are considered to solve the multi-class pattern recognition problem.
Abstract: Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVM approach was originally developed for binary classification problems. In this paper SVM architectures for multi-class classification problems are discussed, in particular we consider binary trees of SVMs to solve the multi-class pattern recognition problem. Numerical results for different classifiers on a benchmark data set handwritten digits are presented.

Proceedings Article
01 Jan 2001
TL;DR: This paper addresses the issue of using the Support Vector Learning technique in combination with the currently well performing GMM models, in order to improve speaker verification results.
Abstract: Current best performing speaker recognition algorithms are based on Gaussian Mixture Models (GMM). Their results are not satisfactory for all experimental conditions, especially for the mismatched (train/test) conditions. Support Vector Machine is a new and very promissing technique in statistical learning theory. Recently, this technique produced very interesting results in image processing [2], [3], [4] and for the fusion of experts in biometric authentification [5]. In this paper we address the issue of using the Support Vector Learning technique in combination with the currently well performing GMM models, in order to improve speaker verification results.

Journal ArticleDOI
TL;DR: A property from statistical learning theory known as uniform convergence of empirical means (UCEM) plays an important role in allowing us to construct efficient randomized algorithms for a wide variety of controller synthesis problems, and whenever the UCEM property holds, there exists an efficient (i.e., polynomial-time) randomized algorithm.

Proceedings ArticleDOI
29 Oct 2001
TL;DR: This paper studies the speaker identification and verification problem using a support vector machine, and presents a SVM training method on large-scale samples according to the speech signal.
Abstract: The support vector machine (SVM) is an important learning method of statistical learning theory, and is also a powerful tool for pattern recognition problems. This paper studies the speaker identification and verification problem using a support vector machine, and presents a SVM training method on large-scale samples according to the speech signal. A text-independent speaker recognition system based on SVM was implemented and the results show good performance.

Journal Article
TL;DR: SVM architectures for multi-class classification problems are discussed, in particular binary trees of SVMs are considered to solve the multi- class problem.
Abstract: Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVM approach was originally developed for binary classification problems. In this paper SVM architectures for multi-class classification problems are discussed, in particular we consider binary trees of SVMs to solve the multi-class problem. Numerical results for different classifiers on a benchmark data set of handwritten digits are presented.

Journal ArticleDOI
TL;DR: A new system identification method based on SVR is proposed for linear in parameter models and the effectiveness of the proposed method is examined through numerical examples.

Proceedings Article
01 Jan 2001
TL;DR: The authors discusses the statistical theory underlying various parameter-estimation methods, and gives algorithms which depend on alternatives to (smoothed) maximum-likelihood estimation, and shows how important concepts from the classification literature - specifically, generalization results based on margins on training data - can be derived for parsing models.
Abstract: A fundamental problem in statistical parsing is the choice of criteria and algo-algorithms used to estimate the parameters in a model. The predominant approach in computational linguistics has been to use a parametric model with some variant of maximum-likelihood estimation. The assumptions under which maximum-likelihood estimation is justified are arguably quite strong. This chapter discusses the statistical theory underlying various parameter-estimation methods, and gives algorithms which depend on alternatives to (smoothed) maximum-likelihood estimation. We first give an overview of results from statistical learning theory. We then show how important concepts from the classification literature - specifically, generalization results based on margins on training data - can be derived for parsing models. Finally, we describe parameter estimation algorithms which are motivated by these generalization bounds.

Book ChapterDOI
04 Dec 2001
TL;DR: A result is derived showing that in the case of systems with fading memory, it is possible to combine standard result's in statistical learning theory with some fading memory arguments to obtain finite time estimates of the desired kind.
Abstract: The problem of system identification is formulated as a problem in statistical learning theory, because statistical learning theory is devoted to the derivation of finite time estimates. If system identification is to be combined with robust control theory to develop a sound theory of indirect adaptive control, it is essential to have finite time estimates of the sort provided by statistical learning theory. As an illustration of the approach, a result is derived showing that in the case of systems with fading memory, it is possible to combine standard result's in statistical learning theory (suitably modified to the present situation) with some fading memory arguments to obtain finite time estimates of the desired kind. In the case of linear systems, the results proved here are not overly conservative, but are more so in the case of nonlinear systems where the adjustable parameters enter linearly into the model description. Though the actual results derived here are rather preliminary in nature, it is hoped that future researchers will pursue the ideas presented here to extend the theory further.

Book ChapterDOI
12 Sep 2001
TL;DR: In this paper, SVMarc hitectures for multi-class classification problems are discussed, in particular, binary trees of SVMs are used to solve the multiclass problem, and numerical results for different classifiers on a benchmark dataset of handwritten digits are presented.
Abstract: Support vector machines (SVM) are learning algorithms derived from statistical learning theory. The SVMapproac h was originally developed for binary classification problems. In this paper SVMarc hitectures for multi-class classification problems are discussed, in particular we consider binary trees of SVMs to solve the multi-class problem. Numerical results for different classifiers on a benchmark data set of handwritten digits are presented.

Proceedings ArticleDOI
15 Oct 2001
TL;DR: This work investigates the ability of SVM to perform appearance-based object recognition and initial experiments indicated that this may be a promising approach to the problem.
Abstract: Support vector machines (SVM) are a class of algorithms derived from the statistical learning theory that are receiving growing interest by the computer vision community as they present some advantages over classical techniques. This work investigates the ability of SVM to perform appearance-based object recognition. Initial experiments indicated that this may be a promising approach to the problem.

01 Jan 2001
TL;DR: This chapter contains sections titled: Introduction, The Law of Large Numbers, When Does LearningWork: the Question of Consistency, Uniform Convergence and Consistsency, How to Derive a VC Bound, A Model Selection Example, and Problems.
Abstract: This chapter contains sections titled: Introduction, The Law of Large Numbers, When Does LearningWork: the Question of Consistency, Uniform Convergence and Consistency, How to Derive a VC Bound, A Model Selection Example, Summary, Problems

Journal Article
TL;DR: An efficient statistical algorithm is used to design a robust, fixed-structure, controller for a high-speed communication network with multiple uncertain propagation delays.
Abstract: Congestion control in the ABR class of ATM network presents interesting challenges due to the presence of multiple uncertain delays. Recently, probabilistic methods and statistical learning theory have been shown to provide approximate solutions to challenging control problems. In this paper, using some recent results by the authors, an efficient statistical algorithm is used to design a robust, fixed-structure, controller for a high-speed communication network with multiple uncertain propagation delays.

Proceedings ArticleDOI
21 Sep 2001
TL;DR: The problem of pattern recognition is formulated as a classification in the statistic learning theory and Vapnik constructed a class of learning algorithms called support vector machine (SMV) to solve the problem, which has some drawbacks.
Abstract: The problem of pattern recognition is formulated as a classification in the statistic learning theory. Vapnik constructed a class of learning algorithms called support vector machine (SMV) to solve the problem. The algorithm not only has strong theoretical foundation but also provides a powerful tool for solving real-life problems. But it still has some drawbacks. Tow of them are 1) the computational complexity of finding the optimal separating hyperplane is quite high in the linearly separable case, and 2) in the linearly non-separable case, for any given sample set it's hard to choose a proper nonlinear mapping (kernel function) such that the sample set is linearly separable in the new space after the mapping. To overcome these drawbacks, we presented some new approaches. The main idea and some experimental results of the approaches are presented.© (2001) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
06 May 2001
TL;DR: Pattern recognition is covered, including Vapnik-Chervonenkis (VC) theory and the implications for support vector machines, neural networks and decision trees and large margin classification is analysed.
Abstract: The article applies statistical learning theory to the supervised learning problem. Pattern recognition is covered, including Vapnik-Chervonenkis (VC) theory and the implications for support vector machines (SVMs), neural networks and decision trees. Real predictions are given for scale-sensitive dimensions. The article concludes by analysing large margin classification.

Journal Article
TL;DR: This paper is a tutorial in which the basic concepts of VC theory and the methodology of SVMs as applied to pattern recognition problems are reviewed.
Abstract: In the field of statistical pattern recognition, optimal classifiers may be designed theoretically based on the Bayesian decision rule, however, it is necessary for the implementation of the design to so1ve a more difficu1t problem of density estimation first The strategy adopted in HP neural networks is learning di-rectly front the measurement data( training samples), which is more efficient and effective Therefore the methodology of neural networks has been widely used in real life applications, but like other heuristic meth-ods, it lacks a solid theoretical foundation to direct engineering practice As the result of the breakthrough in the research of statistical inference, VC theory has been established and accepted as the modern statistical learning theory The behavior of neural networks may he explained by VC theory with mathematical rigor in addition, a more powerful learning method-the support vector machine has been constructed based on the theory and gained real life applications This paper is a tutorial in which the basic concepts of VC theory and the methodology of SVMs as applied to pattern recognition problems are reviewed

Proceedings ArticleDOI
Ying Li1, Licheng Jiao1
20 Sep 2001
TL;DR: This paper analyzes support vector machines (SVMs) and several commonly used soft computing paradigms for pattern recognition including neural and wavelet networks, and fuzzy systems and tries to outline the similarities and differences among them.
Abstract: This paper analyzes support vector machines (SVMs) and several commonly used soft computing paradigms for pattern recognition including neural and wavelet networks, and fuzzy systems. Bayesian classifiers, fuzzy partitions, etc and tries to outline the similarities and differences among them. Support vector machines provide a new approach to the problem of pattern recognition with clear connections to the underlying statistical learning theory. We try to bring SVMs into the framework of the unification paradigm called the weighted radial basis function paradigm. Unifying different classes of methods has enormous advantages, such as the ability to merge all such techniques within the same system. It is hoped that this paper would provide theoretical guides for the study and applications of support vector machine and soft computing paradigms.© (2001) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

Proceedings ArticleDOI
07 Oct 2001
TL;DR: It is shown that the regularized solution can be derived from the Fourier transformation operator in the transformation domain and with equivalent form from the linear differential operators in the spatial domain.
Abstract: The paper provides a new viewpoint on regularization theory from different perspectives. It is shown that the regularized solution can be derived from the Fourier transformation operator in the transformation domain and with equivalent form from the linear differential operator in the spatial domain. The state-of-the-art research in regularization is briefly reviewed with extended discussions on Occam's razor, minimum length description, Bayesian framework, pruning algorithms, statistical learning theory, and equivalent regularization.

Proceedings ArticleDOI
07 Oct 2001
TL;DR: A qualitative introduction and justification of the application of statistical learning theory to uncertainty modeling in business and engineering systems and the main variables that govern the uncertainty in a physical system are defined.
Abstract: Presents a qualitative introduction and justification of the application of statistical learning theory to uncertainty modeling in business and engineering systems. Using simple mathematical tools and metaphorical images, the main variables that govern the uncertainty in a physical system are defined. A general expression of uncertainty models is then obtained. The structure of this expression is the same as that of the uncertainty models that have been developed by rigorously applying the results of statistical learning theory.