Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Graph and Manifold Co-regularization

[...]

Claudio Sacca¹, Michelangelo Diligenti¹, Marco Gori¹•Institutions (1)

University of Siena¹

04 Dec 2013

TL;DR: This paper presents and evaluates a model merging the advantages of graph regularization and kernel machines for transductive classification problems, and evaluates an approach taking the feature vectors as input while being smooth over the network connections.

...read moreread less

Abstract: Classical foundations of Statistical Learning Theory rely on the assumption that the input patterns are independently and identically distributed. However, in many applications, the inputs, represented as feature vectors, are also embedded into a network of pair wise relations. Transductive approaches like graph regularization rely on the network topology without considering the feature vectors. Semi-supervised approaches like Manifold Regularization learn a function taking the feature vectors as input, while being smooth over the network connections. In this latter case, the connectivity information is processed at training time, but is still neglected during generalization, as the final classification decision takes only the feature vector representations as input. This paper presents and evaluates a model merging the advantages of graph regularization and kernel machines for transductive classification problems.

...read moreread less

Posted Content•

Margin-Based Transfer Bounds for Meta Learning with Deep Feature Embedding.

[...]

Jiechao Guan, Zhiwu Lu, Tao Xiang, Timothy M. Hospedales

02 Dec 2020-arXiv: Learning

TL;DR: Under the assumption that all classification tasks are sampled from the same meta-distribution, margin theory and statistical learning theory are used to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC), revealing that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks.

...read moreread less

Abstract: By transferring knowledge learned from seen/previous tasks, meta learning aims to generalize well to unseen/future tasks. Existing meta-learning approaches have shown promising empirical performance on various multiclass classification problems, but few provide theoretical analysis on the classifiers' generalization ability on future tasks. In this paper, under the assumption that all classification tasks are sampled from the same meta-distribution, we leverage margin theory and statistical learning theory to establish three margin-based transfer bounds for meta-learning based multiclass classification (MLMC). These bounds reveal that the expected error of a given classification algorithm for a future task can be estimated with the average empirical error on a finite number of previous tasks, uniformly over a class of preprocessing feature maps/deep neural networks (i.e. deep feature embeddings). To validate these bounds, instead of the commonly-used cross-entropy loss, a multi-margin loss is employed to train a number of representative MLMC models. Experiments on three benchmarks show that these margin-based models still achieve competitive performance, validating the practical value of our margin-based theoretical analysis.

...read moreread less

Proceedings Article•DOI•

Hierarchical hypothesis structure for ensemble learning

[...]

Chu-En Yu¹, Chien-Liang Liu¹, Hsin-Lung Hsieh²•Institutions (2)

National Chiao Tung University¹, Industrial Technology Research Institute²

01 Jan 2017

TL;DR: This work proposes three algorithms focusing on generating a hierarchical hypothesis structure to achieve the goal of hypothesis selection, in which the two hypotheses are combined based on particular criterion.

...read moreread less

Abstract: One of the goals for the machine learning research is to improve the accuracy of the classification. Many research studies have focused on developing novel algorithms according to problem domains and statistical learning theory to continuously improve classification performance over the past decades. Recently, many researchers have found that performance bottleneck often occurs when only using a single classification algorithm, since each algorithm has its strength, but it also has its weakness. Ensemble learning, which combines several classifiers or hypotheses to become a strong classifier or learner, relies on the combination of various hypotheses rather than using state-of-the-art algorithms. In ensemble learning, hypothesis selection is crucial to performance, and the diversity of the selected hypotheses is an important selection criterion. This work proposes three algorithms focusing on generating a hierarchical hypothesis structure to achieve the goal of hypothesis selection, in which the two hypotheses are combined based on particular criterion. We conduct experiments on 8 data sets, and the experimental results indicate that the proposed method outperforms random forest, which is a state-of-the-art method.

...read moreread less

Book Chapter•DOI•

Some general approximation error and convergence rate estimates in statistical learning theory

[...]

Saburou Saitoh¹•Institutions (1)

Gunma University¹

01 Jan 2003

TL;DR: In this paper, a general formula based on the general theory of reproducing kernels combined with linear mappings in the framework of Hilbert spaces is proposed. But this formula is not applicable to the problem of regression estimation.

...read moreread less

Abstract: In statistical learning theory, reproducing kernel Hilbert spaces are used basically as the hypothese space in the approximation of the regression function. In this paper, in connection with a basic formula by S. Smale and D. X. Zhou which is fundamental in the approximation error estimates, we shall give a general formula based on the general theory of reproducing kernels combined with linear mappings in the framework of Hilbert spaces. We shall give a prototype example.

...read moreread less

Information Theory, Dimension Reduction and Density Estimation

[...]

Sujayam Saha

01 Jan 2018

TL;DR: This thesis documents three different contributions in statistical learning theory, which concern themselves with advancements in information theory, dimension reduction and density estimation - three foundational topics in statistical theory with a plethora of applications in both practical problems and development of other aspects of statistical methodology.

...read moreread less

Abstract: Author(s): Saha, Sujayam | Advisor(s): Yu, Bin; Guntuboyina, Aditya | Abstract: This thesis documents three different contributions in statistical learning theory. They were developed with careful emphasis on addressing the demands of modern statistical analysis upon large-scale modern datasets. The contributions concern themselves with advancements in information theory, dimension reduction and density estimation - three foundational topics in statistical theory with a plethora of applications in both practical problems and development of other aspects of statistical methodology.In Chapter \ref{chapter:fdiv}, I describe the development of an unifying treatment of the study of inequalities between $f$-divergences, which are a general class of divergences between probability measures which include as special cases many commonly used divergencesin probability, mathematical statistics and information theory such as Kullback-Leibler divergence, chi-squared divergence, squared Hellinger distance, total variation distance etc. In contrast with previous research in this area, we study the problem of obtaining sharp inequalities between $f$-divergences in full generality. In particular, our main results allow $m$ to be an arbitrary positive integer and all the divergences $D_f$ and $D_{f_1}, \dots, D_{f_m}$ to be arbitrary $f$-divergences. We show that the underlying optimization problems can be reduced to low-dimensional optimization problems and we outline methods for solving them. We also show that many of the existingresults on inequalities between $f$-divergences can be obtained as special cases of our results and we also improve on some existingnon-sharp inequalities. In Chapter \ref{chapter:srp}, I describe the development of a new dimension reduction technique specially suited for interpretable inference in supervised learning problems involving large-dimensional data. This new technique, Supervised Random Projections (SRP), is introduced with the goal of ensuring that in comparison to ordinary dimension reduction, the compressed data is more relevant to the response variable at hand in a supervised learning problem. By incorporating variable importances, we explicate that the compressed data should still accurately explain the response variable; thus lending more interpretability to the dimension reduction step. Further, variable importances ensure that even in the presence of numerous nuisance parameters, the projected data retains at least a moderate amount of information from the important variables, thus allowing said important variables a fair chance at being selected by downstream formal tests of hypotheses.In Chapter \ref{chapter:npmle}, I describe the development of several adaptivity properties of the Non-Parametric Maximum Likelihood Estimator (NPMLE) in the problem of estimating an unknown gaussian location mixture density based on independent identically distributed observations. Further, I explore the role of the NPMLE in the problem of denoising normal means, i.e. the problem of estimating unknown means based on observations. This problem has been studied widely. In this problem, I prove that the Generalized Maximum Likelihood Empirical Bayes estimator (GMLEB) approximates the Oracle Bayes estimator at adaptive parametric rates up to additional logarithmic factors in expected squared $\ell_2$ norm.

...read moreread less

Collapse

Network Information

Performance

Metrics

1,647

Papers

173,903

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	19
2021	59
2020	69
2019	72
2018	47

Statistical learning theory

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics