Topic
Statistical learning theory
About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.
Papers published on a yearly basis
Papers
More filters
••
01 Jul 2016TL;DR: An up-to-date overview of the research concerning complexity measure techniques in GP learning, including methods based on information theory techniques, Bayesian Information Criterion, plus those based on statistical machine learning theory on generalization error bound; and some based on structural complexity.
Abstract: Model complexity of Genetic Programming (GP) as a learning machine is currently attracting considerable interest from the research community. Here we provide an up-to-date overview of the research concerning complexity measure techniques in GP learning. The scope of this review includes methods based on information theory techniques, such as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC); plus those based on statistical machine learning theory on generalization error bound, namely, Vapnik-Chervonenkis (VC) theory; and some based on structural complexity. The research contributions from each of these are systematically summarized and compared, allowing us to clearly define existing research challenges, and to highlight promising new research directions. The findings of this review provides valuable insights into the current GP literature and is a good source for anyone who is interested in the research on model complexity and applying statistical learning theory to GP.
25 citations
••
06 Dec 1999TL;DR: It is shown that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function.
Abstract: The statistical asymptotic theory is often used in theoretical results in computational and statistical learning theory It describes the limiting distribution of the maximum likelihood estimator (MLE) as an normal distribution However, in layered models such as neural networks, the regularity condition of the asymptotic theory is not necessarily satisfied The true parameter is not identifiable, if the target function can be realized by a network of smaller size than the size of the model There has been little known on the behavior of the MLE in these cases of neural networks In this paper, we analyze the expectation of the generalization error of three-layer linear neural networks, and elucidate a strange behavior in unidentifiable cases We show that the expectation of the generalization error in the unidentifiable cases is larger than what is given by the usual asymptotic theory, and dependent on the rank of the target function
24 citations
••
TL;DR: The input and weight Hessians are used to quantify a network's ability to generalize to unseen data and how one can control the generalization capability of the network by means of the training process using the learning rate, batch size and the number of training iterations as controls.
24 citations
•
TL;DR: This paper introduces a new classifier design method based on a kernel extension of the classical Ho-Kashyap procedure that leads to robustness against outliers and a better approximation of the misclassification error.
Abstract: This paper introduces a new classifier design method based on a kernel extension of the classical Ho-Kashyap procedure. The proposed method uses an approximation of the absolute error rather than the squared error to design a classifier, which leads to robustness against outliers and a better approximation of the misclassification error. Additionally, easy control of the generalization ability is obtained using the structural risk minimization induction principle from statistical learning theory. Finally, examples are given to demonstrate the validity of the introduced method.
24 citations
••
29 Mar 1999TL;DR: It is shown indeed that it is possible to replace the 2∈2 under the exponential of the deviation term by the corresponding CramEr transform as shown by large deviations theorems and why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures.
Abstract: Vapnik-Chervonenkis (VC) bounds play an important role in statistical learning theory as they are the fundamental result which explains the generalization ability of learning machines. There have been consequent mathematical works on the improvement of VC rates of convergence of empirical means to their expectations over the years. The result obtained by Talagrand in 1994 seems to provide more or less the final word to this issue as far as universal bounds are concerned. Though for fixed distributions, this bound can be practically outperformed. We show indeed that it is possible to replace the 2∈2 under the exponential of the deviation term by the corresponding CramEr transform as shown by large deviations theorems. Then, we formulate rigorous distributionsensitive VC bounds and we also explain why these theoretical results on such bounds can lead to practical estimates of the effective VC dimension of learning structures.
24 citations