scispace - formally typeset
Search or ask a question
Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Support Vector Machines are a new generation of classification method that attempts to produce boundaries between classes by both minimising the empirical error from the training set and also controlling the complexity of the decision boundary, which can be non-linear.
Abstract: Support Vector Machines (SVMs) are a new generation of classification method. Derived from well principled Statistical Learning theory, this method attempts to produce boundaries between classes by both minimising the empirical error from the training set and also controlling the complexity of the decision boundary, which can be non-linear. SVMs use a kernel matrix to transform a non-linear separation problem in input space to a linear separation problem in feature space. Common kernels include the Radial Basis Function, Polynomial and Sigmoidal Functions. In many simulated studies and real applications, SVMs show superior generalisation performance compared to traditional classification methods. SVMs also provide several useful statistics that can be used for both model selection and feature selection because these statistics are the upper bounds of the generalisation performance estimation of Leave-One-Out Cross-Validation. SVMs can be employed for multiclass problems in addition to the traditional two ...

148 citations

Journal Article
TL;DR: The theoretical basis of support vector machines (SVM) is described systematically, the mainstream machine training algorithms of traditional SVM and some new learning models and algorithms detailedly areums up, and the research and development prospects of SVM are pointed out.
Abstract: Statistical learning theory is the statistical theory of smallsample,and it focuses on the statistical law and the nature of learning of small samples.Support vector machine is a new machine learning method based on statistical learning theory,and it has become the research field of machine learning because of its excellent performance.This paper describes the theoretical basis of support vector machines(SVM) systematically,sums up the mainstream machine training algorithms of traditional SVM and some new learning models and algorithms detailedly,and finally points out the research and development prospects of support vector machine.

144 citations

Journal ArticleDOI
TL;DR: In particular, this article showed that simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy.
Abstract: The remarkable practical success of deep learning has revealed some major surprises from a theoretical perspective. In particular, simple gradient methods easily find near-optimal solutions to non-convex optimization problems, and despite giving a near-perfect fit to training data without any explicit effort to control model complexity, these methods exhibit excellent predictive accuracy. We conjecture that specific principles underlie these phenomena: that overparametrization allows gradient methods to find interpolating solutions, that these methods implicitly impose regularization, and that overparametrization leads to benign overfitting, that is, accurate predictions despite overfitting training data. In this article, we survey recent progress in statistical learning theory that provides examples illustrating these principles in simpler settings. We first review classical uniform convergence results and why they fall short of explaining aspects of the behaviour of deep learning methods. We give examples of implicit regularization in simple settings, where gradient methods lead to minimal norm functions that perfectly fit the training data. Then we review prediction methods that exhibit benign overfitting, focusing on regression problems with quadratic loss. For these methods, we can decompose the prediction rule into a simple component that is useful for prediction and a spiky component that is useful for overfitting but, in a favourable setting, does not harm prediction accuracy. We focus specifically on the linear regime for neural networks, where the network can be approximated by a linear model. In this regime, we demonstrate the success of gradient flow, and we consider benign overfitting with two-layer networks, giving an exact asymptotic analysis that precisely demonstrates the impact of overparametrization. We conclude by highlighting the key challenges that arise in extending these insights to realistic deep learning settings.

141 citations

Proceedings Article
02 May 2009
TL;DR: In this paper, the authors present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand, based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier.
Abstract: An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. The statistical theory offers rank-based statistics for this task. However, these statistics depend on a fixed set of characteristics of the underlying distribution. Thus, they work well whenever the change in the underlying distribution affects the properties measured by the statistic, but they perform not very well, if the drift influences the characteristics caught by the test statistic only to a small degree. To address this problem, we show how uniform convergence bounds in learning theory can be adjusted for adaptive concept drift detection. In particular, we present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand. The first one is based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1-norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier. We compare these new approaches with the maximum mean discrepancy method, the StreamKrimp system, and the multivariate Wald–Wolfowitz test. The results indicate that the new methods are able to detect concept drift reliably and that they perform favorably in a precision-recall analysis. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 311-327, 2009

140 citations

14 Dec 2011
TL;DR: One of the standard and thoroughly studied models for learning is the framework of statistical learning theory as mentioned in this paper, and we start by briefly reviewing this model, which is the most widely used model for learning.
Abstract: In a world where automatic data collection becomes ubiquitous, statisticians must update their paradigms to cope with new problems. Whether we discuss the Internet network, consumer data sets, or financial market, a common feature emerges: huge amounts of dynamic data that need to be understood and quickly processed. This state of affair is dramatically different from the classical statistical problems, with many observations and few variables of interest. Over the past decades, learning theory tried to address this issue. One of the standard and thoroughly studied models for learning is the framework of statistical learning theory. We start by briefly reviewing this model.

137 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Optimization problem
96.4K papers, 2.1M citations
80% related
Fuzzy logic
151.2K papers, 2.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202219
202159
202069
201972
201847