scispace - formally typeset
Search or ask a question

Showing papers on "Statistical learning theory published in 1999"


Journal ArticleDOI
Vladimir Vapnik1
TL;DR: How the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms are demonstrated and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems are demonstrated.
Abstract: Statistical learning theory was introduced in the late 1960's. Until the 1990's it was a purely theoretical analysis of the problem of function estimation from a given collection of data. In the middle of the 1990's new types of learning algorithms (called support vector machines) based on the developed theory were proposed. This made statistical learning theory not only a tool for the theoretical analysis but also a tool for creating practical algorithms for estimating multidimensional functions. This article presents a very general overview of statistical learning theory including both theoretical and algorithmic aspects of the theory. The goal of this overview is to demonstrate how the abstract learning theory established conditions for generalization which are more general than those discussed in classical statistical paradigms and how the understanding of these conditions inspired new algorithmic approaches to function estimation problems.

5,370 citations


Proceedings Article
01 Jan 1999
TL;DR: An iterative training algorithm for LS-SVM's which is based on a conjugate gradient method which enables solving large scale classification problems which is illustrated on a multi two-spiral benchmark problem.
Abstract: Support vector machines (SVM's) have been introduced in literature as a method for pattern recognition and function estimation, within the framework of statistical learning theory and structural risk minimization. A least squares version (LSSVM) has been recently reported which expresses the training in terms of solving a set of linear equations instead of quadratic programming as for the standard SVM case. In this paper we present an iterative training algorithm for LS-SVM's which is based on a conjugate gradient method. This enables solving large scale classification problems which is illustrated on a multi two-spiral benchmark problem. Keywords. Support vector machines, classification, neural networks, RBF kernels, conjugate gradient method.

258 citations


Proceedings ArticleDOI
15 Mar 1999
TL;DR: This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition and presents results on several standard vowel and phonetic Classification tasks and shows better performance than Gaussian mixture classifiers.
Abstract: Support vector machines (SVMs) represent a new approach to pattern classification which has attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of structural risk minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other nonlinear classifiers such as artificial neural networks and k-nearest neighbors. This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition. We present results on several standard vowel and phonetic classification tasks and show better performance than Gaussian mixture classifiers. We also present an analysis of the difficulties we foresee in applying SVMs to continuous speech recognition problems.

184 citations


Proceedings ArticleDOI
13 Aug 1999
TL;DR: The purpose of this paper is to introduce the concepts of SVM and to benchmark its performance on the Moving and Stationary Target Acquisition and Recognition (MSTAR) data set.
Abstract: Support vector machines (SVM) are one of the most recent tools to be developed from research in statistical learning theory. The foundations of SVM were developed by Vapnik, and are gaining popularity within the learning theory community due to many attractive features and excellent demonstrated performance. However, SVM have not yet gained popularity within the synthetic aperture radar (SAR) automatic target recognition (ATR) community. The purpose of this paper is to introduce the concepts of SVM and to benchmark its performance on the Moving and Stationary Target Acquisition and Recognition (MSTAR) data set.

49 citations


Proceedings ArticleDOI
10 Jul 1999
TL;DR: A support vector decision tree method for customer targeting in the framework of large databases (database marketing) is introduced to provide a tool to identify the best customers based on historical data.
Abstract: We introduce a support vector decision tree method for customer targeting in the framework of large databases (database marketing). The goal is to provide a tool to identify the best customers based on historical data. This tool is then used to forecast the best potential customers among a pool of prospects. We begin by regressively constructing a decision tree. Each decision consists of a linear combination of independent attributes. A linear program motivated by the support vector machine method from Vapnik's statistical learning theory is used to construct each decision. This linear program automatically selects the relevant subset of attributes for each decision. Each customer is scored based on the decision tree. A gain chart table is used to verify the goodness-of-fit of the targeting, to determine the likely prospects and the expected utility or profit. Successful results are given for three industrial problems.

40 citations


Proceedings ArticleDOI
21 Jun 1999
TL;DR: A new method is presented based on the structural risk minimization principle of statistical learning theory, which permits to exploit this knowledge about the topology of the pattern space for recognition of video sequences of action patterns.
Abstract: The linear combination of prototypical views has been shown to provide a powerful method for the recognition and analysis of images of three-dimensional stationary objects. We present preliminary results on an extension of this idea to video sequences. For this extension, the computation of correspondences in space-time turns out to be the central theoretical problem, which we solve with a new correspondence algorithm. Using simulated images of biological motion we demonstrate the usefulness of the superposition of prototypical sequences for the synthesis of new video sequences, and for the analysis and recognition of actions. Our method permits to impose a topology over the space of video sequences of action patterns. This topology is more complicated than a linear space. We present a new method that is based on the structural risk minimization principle of statistical learning theory, which permits to exploit this knowledge about the topology of the pattern space for recognition.

32 citations


Book ChapterDOI
06 Dec 1999
TL;DR: It is shown that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.
Abstract: The statistical asymptotic theory is often used in theoretical results in computational and statistical learning theory It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied The true parameter is not identifiable, if the target function can be realized by a network of smaller size than the size of the model There has been little known on the behavior of the MLE in these cases of neural networks In this paper, we analyze the expectation of the generalization error of three-layer linear neural networks, and elucidate a strange behavior in unidentifiable cases We show that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function

24 citations


Book ChapterDOI
29 Mar 1999
TL;DR: It is shown indeed that it is possible to replace the 2∈2 under the exponential of the deviation term by the corresponding CramEr transform as shown by large deviations theorems and why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures.
Abstract: Vapnik-Chervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical means to their expectations over the years. The result obtained by Talagrand in 1994 seems to provide more or less the final word to this issue as far as universal bounds are concerned. Though for fixed distributions, this bound can be practically outperformed. We show indeed that it is possible to replace the 2∈2 under the exponential of the deviation term by the corresponding CramEr transform as shown by large deviations theorems. Then, we formulate rigorous distributionsensitive VC bounds and we also explain why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures.

24 citations


Proceedings ArticleDOI
27 Sep 1999
TL;DR: The very good results obtained on real image sequences indicate that SVM can be profitably used for the construction of flexible and effective systems based on computer vision.
Abstract: Support vector machines (SVM) have been recently introduced as techniques for solving pattern recognition and regression estimation problems. SVM are derived within the framework of statistical learning theory and combine a solid theoretical foundation with very good performances in several applications. In this paper we describe a system able to detect, represent, and recognize visual dynamic events from an image sequence. While the events are initially detected by means of low-level visual processing, both the representation and recognition stages are performed with SVM. Therefore, the system is trained, instead of programmed, to perform the required tasks. The very good results obtained on real image sequences indicate that SVM can be profitably used for the construction of flexible and effective systems based on computer vision.

22 citations


01 Jan 1999
TL;DR: A variational Bayesian model selection algorithm for general nor- malized loss functions has a wider applicability than other previously suggested Bayesian techniques and exhibits comparable perfor- mance in cases where both techniques are applicable.
Abstract: We present a common probabilistic framework for kernel or spline smooth- ing methods, including popular architectures such as Gaussian processes and Support Vector machines. We identify the problem of unnormalized loss func- tions and suggest a general technique to overcome this problem at least ap- proximately. We give an intuitive interpretation of the effect an unnormalized loss function can induce, by comparing Support Vector classification (SVC) with Gaussian process classification (GPC) as a nonparametric generalization of logistic regression. This interpretation relates SVC to boosting techniques. We propose a variational Bayesian model selection algorithm for general nor- malized loss functions. This algorithm has a wider applicability than other previously suggested Bayesian techniques and exhibits comparable perfor- mance in cases where both techniques are applicable. We present and discuss results of a substantial number of experiments in which we applied the vari- ational algorithm to common real-world classification tasks and compared it to a range of other known methods. The wider scope of this thesis is to provide a bridge between the fields of probabilistic Bayesian techniques and Statistical Learning Theory, and we present some material of tutorial nature which we hope will be useful to researchers of both fields.

22 citations


Proceedings ArticleDOI
24 Oct 1999
TL;DR: This work implements SVM as receivers in CDMA systems and compares SVM with traditional and adaptive receivers, and shows that a linear SVM converges to the MMSE receiver in the noiseless case.
Abstract: We apply support vector machines (SVM) or optimal margin classifiers to multiuser detection problems. SVM are well suited for multiuser detection problems as they are based on principles of statistical learning theory where the goal is to construct a maximum margin classifier. We show that a linear SVM converges to the MMSE receiver in the noiseless case. The SVM are also modified to construct nonlinear receivers by using kernel functions and they approximate optimal nonlinear multiuser detection receivers. Using the sequential minimization optimization (SMO) algorithm, we implement SVM as receivers in CDMA systems and compare SVM with traditional and adaptive receivers. The simulation performance of SVM compares favorably to these receivers.

Se June Hong1, Sholom M. Weiss1
01 Jan 1999
TL;DR: Some theoretical developments in PAC learning and statistical learning theory leading to the emergence of support vector machines are discussed and technical advances made in enhancing the performance of the models both in accuracy and scalability through distributed model generation are examined.
Abstract: Predictive models have been widely used long before the development of the new eld that we call data mining. Expanding application demand for data mining of ever increasing data warehouses, and the need for understandability of predictive models with increased accuracy of prediction, all have fueled recent advances in automated predictive methods. We rst examine a few successful application areas and technical challenges they present. We discuss some theoretical developments in PAC learning and statistical learning theory leading to the emergence of support vector machines. We then examine some technical advances made in enhancing the performance of the models both in accuracy (boosting, bagging, stacking) and scalability of modeling through distributed model generation. Relatively new techniques for selecting good feature variables, feature discretization, generating probabilistic models, and the use of practical measures for performance will also be discussed.

01 Jan 1999
TL;DR: It is shown that the expectation of the generalization error in the unidentiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.
Abstract: The statistical asymptotic theory is often used in theoreti- cal results in computational and statistical learning theory. It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution. However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not nec- essarily satised. The true parameter is not identiable, if the target function can be realized by a network of smaller size than the size of the model. There has been little known on the behavior of the MLE in these cases of neural networks. In this paper, we analyze the expectation of the generalization error of three-layer linear neural networks, and elucidate a strange behavior in unidentiable cases. We show that the expectation of the generalization error in the unidentiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.

01 Jan 1999
TL;DR: In the present study SVM were applied to the real case studies with spatial data and compared with geostatistical methods like indicator kriging, and it was shown that they are efficient and work well in many applications.
Abstract: The report deals with a first application of Support Vector Machines to the environmental spatial data classification. The simplest problem of classification is considered: using original data develop a model for the classification of the regions to be below or above some predefined level of contamination. Thus, we pose a problem as a pattern recognition task. The report presents 1) short description of Support Vector Machines (SVM) and 2) application of the SVM for spatial (environmental and pollution ) data analysis and modelling. SVM are based on the developments of V. Vapnik's Statistical Learning Theory [1]. The ideas of SVM are very attractive both for research and applications. It was shown that they are efficient and work well in many applications. In the present study SVM were applied to the real case studies with spatial data and compared with geostatistical methods like indicator kriging. SVMs with different kernels were applied (radial basis functions - RBF, polynomial kernels, hyperbolic tangents). The basic results have been obtained with local RBF kernels. It was shown that optimal bandwidth of kernel can be chosen by minimising testing error. Real data on sediments pollution in the Geneva Lake were used.

Book ChapterDOI
Se June Hong1, Sholom M. Weiss1
16 Sep 1999
TL;DR: Some theoretical developments in PAC learning and statistical learning theory leading to the emergence of support vector machines are discussed and some technical advances made in enhancing the performance of the models both in accuracy and scalability through distributed model generation are examined.
Abstract: Predictive models have been widely used long before the development of the new field that we call data mining. Expanding application demand for data mining of ever increasing data warehouses, and the need for understandability of predictive models with increased accuracy of prediction, all have fueled recent advances in automated predictive methods. We first examine a few successful application areas and technical challenges they present. We discuss some theoretical developments in PAC learning and statistical learning theory leading to the emergence of support vector machines. We then examine some technical advances made in enhancing the performance of the models both in accuracy (boosting, bagging, stacking) and scalability of modeling through distributed model generation.

01 Jan 1999
TL;DR: The approach is based on Vapnik's statistical learning theory (SLT), a recent important theoretic framework for learning with finite samples and proposed the VC-based model selection for SVM, the new type of universal learning machines based on VC-theory.
Abstract: In recent years, there has been an explosive growth of methods for predictive learning in the fields of engineering, computer science and statistics. The goal of predictive learning is to estimate (or learn) dependencies from (known) training data, in order to accurately predict (unknown) future data originating from the same (unknown) distribution. Model selection is the task of choosing a model of optimal complexity for the given data. It is one of the major issues in predictive learning with finite samples. This research work is focused on the fundamental issues as well as practical applications of model selection. Our approach is based on Vapnik's statistical learning theory (SLT), a recent important theoretic framework for learning with finite samples. Important contributions of this research include: VC-based model selection. We have studied the practical feasibility of VC-based model selection (complexity control). We have compared VC generalization bounds for model selection for linear and penalized linear estimators with classical model selection methods. We have also proposed a new method for wavelet signal denoising based on VC-theory. Measuring the VC-dimension. We have proposed an optimized experimental design technique for accurate estimation of the VC-dimension, which is the measure of model complexity in VC-theory. We have estimated the VC-dimension for linear and penalized linear estimators using the proposed optimized procedure. We have also developed a method for estimating VC-dimension for Constrained Topological Mapping (CTM) networks. Support Vector Machine (SVM). SVM is the new type of universal learning machines based on VC-theory. We have proposed an important extension to SVM: the Multiresolution Support Vector Machine methods. We have also proposed the VC-based model selection for SVM. on for SVM.

Book ChapterDOI
TL;DR: There is a hierarchy to the difficulty of control problems expressed as polynomial inequalities and a similar hierarchy toThe methods used to solve them, and novel sequential learning methods are presented to illustrate the power of the statistical methods.
Abstract: This paper studies the relationship between quantified multivariable polynomial inequalities and robust control problems. We show that there is a hierarchy to the difficulty of control problems expressed as polynomial inequalities and a similar hierarchy to the methods used to solve them. At one end, we have quantifier elimination methods which are exact, but doubly exponential in their computational complexity and thus may only be used to solve small size problems. The Branch-and-Bound methods sacrifice the exactness of quantifier elimination to approximately solve a larger class of problems, while Monte Carlo and statistical learning methods solve very large problems, but only probabilistically. We also present novel sequential learning methods to illustrate the power of the statistical methods.

Proceedings ArticleDOI
02 Jun 1999
TL;DR: In this article, a novel approach for studying the performance of estimators in terms of their expected performance is introduced, using ideas from statistical learning theory, and sufficient conditions on the manufacturing process, the estimation algorithm, and the design procedure to guarantee asymptotic convergence of the estimator to some optimal estimator when the available data goes to infinity.
Abstract: We analyze the problem of estimating product variables from process measurements in manufacturing systems. In particular, a novel approach for studying the performance of such estimators in terms of their expected performance is introduced. Using ideas from statistical learning theory, we obtain sufficient conditions on the manufacturing process, the estimation algorithm, and the design procedure to guarantee asymptotic convergence of the estimation algorithm to some optimal estimator when the available data goes to infinity.

Proceedings ArticleDOI
07 Dec 1999
TL;DR: A more efficient statistical algorithm is presented and this algorithm is used to design static output controllers for a nonlinear plant with uncertain delay.
Abstract: Recently, probabilistic methods and statistical learning theory have been shown to provide approximate solutions to "difficult" control problems. Unfortunately, the number of samples required in order to guarantee stringent performance levels may be prohibitively large. In this paper, using recent results by the authors, a more efficient statistical algorithm is presented. Using this algorithm we design static output controllers for a nonlinear plant with uncertain delay.

Proceedings ArticleDOI
10 Jul 1999
TL;DR: This paper presents an approach to the experimental verification of the quality of "model selection" delivered by statistical learning theory (SLT), and finds great robustness in the predictive ability of SLT.
Abstract: This paper presents an approach to the experimental verification of the quality of "model selection" delivered by statistical learning theory (SLT). We depart from a function whose analytical approximation properties by polynomials are well known and readily verifiables in our experimental environment. For different sample size sets, the model predicted by SLT is contrasted against the model derived from the mathematical properties of the function. We found great robustness in the predictive ability of SLT.

Proceedings ArticleDOI
27 Sep 1999
TL;DR: How committees of connectionist networks and symbolic learning, evolutionary computation, and statistical learning theory lead to increased performance on tasks including the location of facial landmarks, gender and ethnic categorization, face recognition, and pose discrimination is described.
Abstract: One of the most challenging tasks for visual form ('shape') analysis and object recognition is the understanding of how people process and recognize each other's faces, and the development of corresponding computational models. This paper describes the important and successful role learning and evolution plays in improved and robust face coding and classification schemes. In particular we describe how committees of connectionist networks and symbolic learning, evolutionary computation, and statistical learning theory lead to increased performance on tasks including the location of facial landmarks, gender and ethnic categorization, face recognition, and pose discrimination.

Proceedings ArticleDOI
10 Jul 1999
TL;DR: This study considers adaptive training schemes for optimal margin classification with neural networks and describes some novel schemes and compares them with the conventional schemes.
Abstract: The concept of optimal hyperplane has been recently investigated in the context of statistical learning theory. The important property of an optimal hyperplane is that it provides maximum margins to each class to be separated. Obviously, such a decision boundary is expected to yield good generalization. In neural network learning techniques, the majority of them do not make use of the optimal hyperplane concept. As a result, in many cases extensive tuning is required to reach good generalization. In this study we consider adaptive training schemes for optimal margin classification with neural networks. We describe some novel schemes and compare them with the conventional schemes. Simple experiments are presented to demonstrate the performance of each scheme.