scispace - formally typeset
Search or ask a question
Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.


Papers
More filters
Proceedings ArticleDOI
04 Dec 2013
TL;DR: This paper presents and evaluates a model merging the advantages of graph regularization and kernel machines for transductive classification problems, and evaluates an approach taking the feature vectors as input while being smooth over the network connections.
Abstract: Classical foundations of Statistical Learning Theory rely on the assumption that the input patterns are independently and identically distributed. However, in many applications, the inputs, represented as feature vectors, are also embedded into a network of pair wise relations. Transductive approaches like graph regularization rely on the network topology without considering the feature vectors. Semi-supervised approaches like Manifold Regularization learn a function taking the feature vectors as input, while being smooth over the network connections. In this latter case, the connectivity information is processed at training time, but is still neglected during generalization, as the final classification decision takes only the feature vector representations as input. This paper presents and evaluates a model merging the advantages of graph regularization and kernel machines for transductive classification problems.
Posted Content
TL;DR: Under the assumption that all classification tasks are sampled from the same meta-distribution, margin theory and statistical learning theory are used to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC), revealing that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks.
Abstract: By transferring knowledge learned from seen/previous tasks, meta learning aims to generalize well to unseen/future tasks. Existing meta-learning approaches have shown promising empirical performance on various multiclass classification problems, but few provide theoretical analysis on the classifiers' generalization ability on future tasks. In this paper, under the assumption that all classification tasks are sampled from the same meta-distribution, we leverage margin theory and statistical learning theory to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC). These bounds reveal that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks, uniformly over a class of preprocessing feature maps/deep neural networks (i.e. deep feature embeddings). To validate these bounds, instead of the commonly-used cross-entropy loss, a multi-margin loss is employed to train a number of representative MLMC models. Experiments on three benchmarks show that these margin-based models still achieve competitive performance, validating the practical value of our margin-based theoretical analysis.
Proceedings ArticleDOI
01 Jan 2017
TL;DR: This work proposes three algorithms focusing on generating a hierarchical hypothesis structure to achieve the goal of hypothesis selection, in which the two hypotheses are combined based on particular criterion.
Abstract: One of the goals for the machine learning research is to improve the accuracy of the classification. Many research studies have focused on developing novel algorithms according to problem domains and statistical learning theory to continuously improve classification performance over the past decades. Recently, many researchers have found that performance bottleneck often occurs when only using a single classification algorithm, since each algorithm has its strength, but it also has its weakness. Ensemble learning, which combines several classifiers or hypotheses to become a strong classifier or learner, relies on the combination of various hypotheses rather than using state-of-the-art algorithms. In ensemble learning, hypothesis selection is crucial to performance, and the diversity of the selected hypotheses is an important selection criterion. This work proposes three algorithms focusing on generating a hierarchical hypothesis structure to achieve the goal of hypothesis selection, in which the two hypotheses are combined based on particular criterion. We conduct experiments on 8 data sets, and the experimental results indicate that the proposed method outperforms random forest, which is a state-of-the-art method.
Book ChapterDOI
Saburou Saitoh1
01 Jan 2003
TL;DR: In this paper, a general formula based on the general theory of reproducing kernels combined with linear mappings in the framework of Hilbert spaces is proposed. But this formula is not applicable to the problem of regression estimation.
Abstract: In statistical learning theory, reproducing kernel Hilbert spaces are used basically as the hypothese space in the approximation of the regression function. In this paper, in connection with a basic formula by S. Smale and D. X. Zhou which is fundamental in the approximation error estimates, we shall give a general formula based on the general theory of reproducing kernels combined with linear mappings in the framework of Hilbert spaces. We shall give a prototype example.
01 Jan 2018
TL;DR: This thesis documents three different contributions in statistical learning theory, which concern themselves with advancements in information theory, dimension reduction and density estimation - three foundational topics in statistical theory with a plethora of applications in both practical problems and development of other aspects of statistical methodology.
Abstract: Author(s): Saha, Sujayam | Advisor(s): Yu, Bin; Guntuboyina, Aditya | Abstract: This thesis documents three different contributions in statistical learning theory. They were developed with careful emphasis on addressing the demands of modern statistical analysis upon large-scale modern datasets. The contributions concern themselves with advancements in information theory, dimension reduction and density estimation - three foundational topics in statistical theory with a plethora of applications in both practical problems and development of other aspects of statistical methodology.In Chapter \ref{chapter:fdiv}, I describe the development of an unifying treatment of the study of inequalities between $f$-divergences, which are a general class of divergences between probability measures which include as special cases many commonly used divergencesin probability, mathematical statistics and information theory such as Kullback-Leibler divergence, chi-squared divergence, squared Hellinger distance, total variation distance etc. In contrast with previous research in this area, we study the problem of obtaining sharp inequalities between $f$-divergences in full generality. In particular, our main results allow $m$ to be an arbitrary positive integer and all the divergences $D_f$ and $D_{f_1}, \dots, D_{f_m}$ to be arbitrary $f$-divergences. We show that the underlying optimization problems can be reduced to low-dimensional optimization problems and we outline methods for solving them. We also show that many of the existingresults on inequalities between $f$-divergences can be obtained as special cases of our results and we also improve on some existingnon-sharp inequalities. In Chapter \ref{chapter:srp}, I describe the development of a new dimension reduction technique specially suited for interpretable inference in supervised learning problems involving large-dimensional data. This new technique, Supervised Random Projections (SRP), is introduced with the goal of ensuring that in comparison to ordinary dimension reduction, the compressed data is more relevant to the response variable at hand in a supervised learning problem. By incorporating variable importances, we explicate that the compressed data should still accurately explain the response variable; thus lending more interpretability to the dimension reduction step. Further, variable importances ensure that even in the presence of numerous nuisance parameters, the projected data retains at least a moderate amount of information from the important variables, thus allowing said important variables a fair chance at being selected by downstream formal tests of hypotheses.In Chapter \ref{chapter:npmle}, I describe the development of several adaptivity properties of the Non-Parametric Maximum Likelihood Estimator (NPMLE) in the problem of estimating an unknown gaussian location mixture density based on independent identically distributed observations. Further, I explore the role of the NPMLE in the problem of denoising normal means, i.e. the problem of estimating unknown means based on observations. This problem has been studied widely. In this problem, I prove that the Generalized Maximum Likelihood Empirical Bayes estimator (GMLEB) approximates the Oracle Bayes estimator at adaptive parametric rates up to additional logarithmic factors in expected squared $\ell_2$ norm.

Network Information
Related Topics (5)
Artificial neural network
207K papers, 4.5M citations
86% related
Cluster analysis
146.5K papers, 2.9M citations
82% related
Feature extraction
111.8K papers, 2.1M citations
81% related
Optimization problem
96.4K papers, 2.1M citations
80% related
Fuzzy logic
151.2K papers, 2.3M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202219
202159
202069
201972
201847