scispace - formally typeset
Search or ask a question
Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.


Papers
More filters
Proceedings ArticleDOI
01 Jul 2016
TL;DR: An up-to-date overview of the research concerning complexity measure techniques in GP learning, including methods based on information theory techniques, Bayesian Information Criterion, plus those based on statistical machine learning theory on generalization error bound; and some based on structural complexity.
Abstract: Model complexity of Genetic Programming (GP) as a learning machine is currently attracting considerable interest from the research community. Here we provide an up-to-date overview of the research concerning complexity measure techniques in GP learning. The scope of this review includes methods based on information theory techniques, such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC); plus those based on statistical machine learning theory on generalization error bound, namely, Vapnik-Chervonenkis (VC) theory; and some based on structural complexity. The research contributions from each of these are systematically summarized and compared, allowing us to clearly define existing research challenges, and to highlight promising new research directions. The findings of this review provides valuable insights into the current GP literature and is a good source for anyone who is interested in the research on model complexity and applying statistical learning theory to GP.

25 citations

Book ChapterDOI
06 Dec 1999
TL;DR: It is shown that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.
Abstract: The statistical asymptotic theory is often used in theoretical results in computational and statistical learning theory It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied The true parameter is not identifiable, if the target function can be realized by a network of smaller size than the size of the model There has been little known on the behavior of the MLE in these cases of neural networks In this paper, we analyze the expectation of the generalization error of three-layer linear neural networks, and elucidate a strange behavior in unidentifiable cases We show that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function

24 citations

Journal ArticleDOI
TL;DR: The input and weight Hessians are used to quantify a network's ability to generalize to unseen data and how one can control the generalization capability of the network by means of the training process using the learning rate, batch size and the number of training iterations as controls.

24 citations

Journal Article
TL;DR: This paper introduces a new classifier design method based on a kernel extension of the classical Ho-Kashyap procedure that leads to robustness against outliers and a better approximation of the misclassification error.
Abstract: This paper introduces a new classifier design method based on a kernel extension of the classical Ho-Kashyap procedure. The proposed method uses an approximation of the absolute error rather than the squared error to design a classifier, which leads to robustness against outliers and a better approximation of the misclassification error. Additionally, easy control of the generalization ability is obtained using the structural risk minimization induction principle from statistical learning theory. Finally, examples are given to demonstrate the validity of the introduced method.

24 citations

Book ChapterDOI
29 Mar 1999
TL;DR: It is shown indeed that it is possible to replace the 2∈2 under the exponential of the deviation term by the corresponding CramEr transform as shown by large deviations theorems and why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures.
Abstract: Vapnik-Chervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical means to their expectations over the years. The result obtained by Talagrand in 1994 seems to provide more or less the final word to this issue as far as universal bounds are concerned. Though for fixed distributions, this bound can be practically outperformed. We show indeed that it is possible to replace the 2∈2 under the exponential of the deviation term by the corresponding CramEr transform as shown by large deviations theorems. Then, we formulate rigorous distributionsensitive VC bounds and we also explain why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures.

24 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Optimization problem
96.4K papers, 2.1M citations
80% related
Fuzzy logic
151.2K papers, 2.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202219
202159
202069
201972
201847