scispace - formally typeset
Search or ask a question
Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.


Papers
More filters
Journal Article
TL;DR: This study finds out why Mixup works well from the aspect of computational learning theory and reveals how the effect of Mixup changes in each situation, and investigates the effects of changes in the Mixup's parameter.
Abstract: Machine learning models often suffer from the problem of over-fitting. Many data augmentation methods have been proposed to tackle such a problem, and one of them is called mixup. Mixup is a recently proposed regularization procedure, which linearly interpolates a random pair of training examples. This regularization method works very well experimentally, but its theoretical guarantee is not adequately discussed. In this study, we aim to discover why mixup works well from the aspect of the statistical learning theory. In addition, we reveal how the effect of mixup changes in each situation. Furthermore, we also investigated the effects of changes in the parameter of mixup. Our work contributes to searching for the optimal parameters and estimating the effects of the parameters currently used. The results of this study provide a theoretical clarification of when and how effective regularization by mixup is.

5 citations

Proceedings ArticleDOI
03 Jun 2007
TL;DR: Support vector machine (SVM) regression approach is introduced in this paper for table-based nonlinear modeling of field effect transistors (FET) and Experimental results are given out to validate its good ability in predicting electrical performance.
Abstract: Support vector machine (SVM) regression approach is introduced in this paper for table-based nonlinear modeling of field effect transistors (FET). Support vector machine, which based on statistical learning theory and structural risk minimization (SRM) principle, is provided with good generalization ability. For the purpose of demonstration, a table-based SVM regression model is established using a set of training data and testing data produced by an available empirical nonlinear model of SiC MESFET. Experimental results are also given out to validate its good ability in predicting electrical performance.

5 citations

Book ChapterDOI
01 Jan 2013
TL;DR: In this article, the authors present an overview of statistical learning theory, and describe key results regarding uniform convergence of empirical means and related sample complexity, and provide an extension of the probability inequalities studied in Chap. 8 to the case when parameterized families of functions are considered, instead of a fixed function.
Abstract: This chapter presents an overview of statistical learning theory, and describes key results regarding uniform convergence of empirical means and related sample complexity. This theory provides a fundamental extension of the probability inequalities studied in Chap. 8 to the case when parameterized families of functions are considered, instead of a fixed function. The chapter formally studies the UCEM (uniform convergence of empirical means) property and the VC dimension in the context of the Vapnik–Chervonenkis theory. Extensions to the Pollard theory for continuous-valued functions are also discussed.

5 citations

Journal ArticleDOI
TL;DR: It is proved that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds and the quasi- regular case is useful to study statistical learning theory.
Abstract: Many learning machines such as normal mixtures and layered neural networks are not regular but singular statistical models, because the map from a parameter to a probability distribution is not one-to-one. The conventional statistical asymptotic theory can not be applied to such learning machines because the likelihood function can not be approximated by any normal distribution. Recently, new statistical theory has been established based on algebraic geometry and it was clarified that the generalization and training errors are determined by two birational invariants, the real log canonical threshold and the singular fluctuation. However, their concrete values are left unknown. In the present paper, we propose a new concept, a quasi-regular case in statistical learning theory. A quasi-regular case is not a regular case but a singular case, however, it has the same property as a regular case. In fact, we prove that, in a quasi-regular case, two birational invariants are equal to each other, resulting that the symmetry of the generalization and training errors holds. Moreover, the concrete values of two birational invariants are explicitly obtained, the quasi-regular case is useful to study statistical learning theory.

5 citations

Proceedings Article
01 Dec 2011
TL;DR: In this article, a new method of quantifying information, effective information, that links algorithmic information to Shannon information, and also links both to capacities arising in statistical learning theory is presented.
Abstract: There are (at least) three approaches to quantifying information. The first, algorithmic information or Kolmogorov complexity, takes events as strings and, given a universal Turing machine, quantifies the information content of a string as the length of the shortest program producing it. The second, Shannon information, takes events as belonging to ensembles and quantifies the information resulting from observing the given event in terms of the number of alternate events that have been ruled out. The third, statistical learning theory, has introduced measures of capacity that control (in part) the expected risk of classifiers. These capacities quantify the expectations regarding future data that learning algorithms embed into classifiers. This note describes a new method of quantifying information, effective information, that links algorithmic information to Shannon information, and also links both to capacities arising in statistical learning theory. After introducing the measure, we show that it provides a non-universal analog of Kolmogorov complexity. We then apply it to derive basic capacities in statistical learning theory: empirical VC-entropy and empirical Rademacher complexity. A nice byproduct of our approach is an interpretation of the explanatory power of a learning algorithm in terms of the number of hypotheses it falsifies, counted in two different ways for the two capacities. We also discuss how effective information relates to information gain, Shannon and mutual information.

5 citations


Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Optimization problem
96.4K papers, 2.1M citations
80% related
Fuzzy logic
151.2K papers, 2.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202219
202159
202069
201972
201847