scispace - formally typeset
Search or ask a question
Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.


Papers
More filters
Book ChapterDOI
14 Feb 2007
TL;DR: Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner, and are a group of supervised learning methods that can be applied to classification or regression.
Abstract: Kernel-based techniques (such as support vector machines, Bayes point machines, kernel principal component analysis, and Gaussian processes) represent a major development in machine learning algorithms. Support vector machines (SVM) are a group of supervised learning methods that can be applied to classification or regression. In a short period of time, SVM found numerous applications in chemistry, such as in drug design (discriminating between ligands and nonligands, inhibitors and noninhibitors, etc.), quantitative structure-activity relationships (QSAR, where SVM regression is used to predict various physical, chemical, or biological properties), chemometrics (optimization of chromatographic separation or compound concentration prediction from spectral data as examples), sensors (for qualitative and quantitative prediction from sensor data), chemical engineering (fault detection and modeling of industrial processes), and text mining (automatic recognition of scientific information). Support vector machines represent an extension to nonlinear models of the generalized portrait algorithm developed by Vapnik and Lerner. The SVM algorithm is based on the statistical learning theory and the Vapnik–Chervonenkis

375 citations

Book
30 Jun 2007
TL;DR: An alternative selection scheme based on relative bounds between estimators is described and study, and a two step localization technique which can handle the selection of a parametric model from a family of those is presented.
Abstract: This monograph deals with adaptive supervised classification, using tools borrowed from statistical mechanics and information theory, stemming from the PACBayesian approach pioneered by David McAllester and applied to a conception of statistical learning theory forged by Vladimir Vapnik. Using convex analysis on the set of posterior probability measures, we show how to get local measures of the complexity of the classification model involving the relative entropy of posterior distributions with respect to Gibbs posterior measures. We then discuss relative bounds, comparing the generalization error of two classification rules, showing how the margin assumption of Mammen and Tsybakov can be replaced with some empirical measure of the covariance structure of the classification model.We show how to associate to any posterior distribution an effective temperature relating it to the Gibbs prior distribution with the same level of expected error rate, and how to estimate this effective temperature from data, resulting in an estimator whose expected error rate converges according to the best possible power of the sample size adaptively under any margin and parametric complexity assumptions. We describe and study an alternative selection scheme based on relative bounds between estimators, and present a two step localization technique which can handle the selection of a parametric model from a family of those. We show how to extend systematically all the results obtained in the inductive setting to transductive learning, and use this to improve Vapnik's generalization bounds, extending them to the case when the sample is made of independent non-identically distributed pairs of patterns and labels. Finally we review briefly the construction of Support Vector Machines and show how to derive generalization bounds for them, measuring the complexity either through the number of support vectors or through the value of the transductive or inductive margin.

369 citations

Journal ArticleDOI
TL;DR: For regression problems, based on scale space theory, it is demonstrated the existence of a certain range of σ, within which the generalization performance is stable, and an appropriate σ within the range can be achieved via dynamic evaluation.

359 citations

Journal ArticleDOI
TL;DR: The object of this paper is to illustrate the utility of the data-driven approach to damage identification by means of a number of case studies.
Abstract: In broad terms, there are two approaches to damage identification Model-driven methods establish a high-fidelity physical model of the structure, usually by finite element analysis, and then establish a comparison metric between the model and the measured data from the real structure If the model is for a system or structure in normal (ie undamaged) condition, any departures indicate that the structure has deviated from normal condition and damage is inferred Data-driven approaches also establish a model, but this is usually a statistical representation of the system, eg a probability density function of the normal condition Departures from normality are then signalled by measured data appearing in regions of very low density The algorithms that have been developed over the years for data-driven approaches are mainly drawn from the discipline of pattern recognition, or more broadly, machine learning The object of this paper is to illustrate the utility of the data-driven approach to damage identification by means of a number of case studies

342 citations

Book ChapterDOI
11 Apr 2011
TL;DR: Ann models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN.
Abstract: The choice of input variables is a fundamental, and yet crucial consideration in identifying the optimal functional form of statistical models. The task of selecting input variables is common to the development of all statistical models, and is largely dependent on the discovery of relationships within the available data to identify suitable predictors of the model output. In the case of parametric, or semi-parametric empirical models, the difficulty of the input variable selection task is somewhat alleviated by the a priori assumption of the functional form of the model, which is based on some physical interpretation of the underlying system or process being modelled. However, in the case of artificial neural networks (ANNs), and other similarly data-driven statistical modelling approaches, there is no such assumption made regarding the structure of the model. Instead, the input variables are selected from the available data, and the model is developed subsequently. The difficulty of selecting input variables arises due to (i) the number of available variables, which may be very large; (ii) correlations between potential input variables, which creates redundancy; and (iii) variables that have little or no predictive power. Variable subset selection has been a longstanding issue in fields of applied statistics dealing with inference and linear regression (Miller, 1984), and the advent of ANN models has only served to create new challenges in this field. The non-linearity, inherent complexity and non-parametric nature of ANN regression make it difficult to apply many existing analytical variable selection methods. The difficulty of selecting input variables is further exacerbated during ANN development, since the task of selecting inputs is often delegated to the ANN during the learning phase of development. A popular notion is that an ANN is adequately capable of identifying redundant and noise variables during training, and that the trained network will use only the salient input variables. ANN architectures can be built with arbitrary flexibility and can be successfully trained using any combination of input variables (assuming they are good predictors). Consequently, allowances are often made for a large number of input variables, with the belief that the ability to incorporate such flexibility and redundancy creates a more robust model. Such pragmatism is perhaps symptomatic of the popularisation of ANN models through machine learning, rather than statistical learning theory. ANN models are too often developed without due consideration given to the effect that the choice of input variables has on model complexity, learning difficulty, and performance of the subsequently trained ANN. 1

328 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Optimization problem
96.4K papers, 2.1M citations
80% related
Fuzzy logic
151.2K papers, 2.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202219
202159
202069
201972
201847