Topic

Statistical learning theory

About: Statistical learning theory is a research topic. Over the lifetime, 1618 publications have been published within this topic receiving 158033 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Verification of ML Systems via Reparameterization.

[...]

Jean-Baptiste Tristan, Joseph Tassarotti, Koundinya Vajjha, Michael Wick, Anindya Banerjee - Show less +1 more

14 Jul 2020-arXiv: Learning

TL;DR: It is shown how a probabilistic program can be automatically represented in a theorem prover using the concept ofreparameterization, and how some of the tedious proofs of measurability can be generated automatically from the probabilism program.

...read moreread less

Abstract: As machine learning is increasingly used in essential systems, it is important to reduce or eliminate the incidence of serious bugs. A growing body of research has developed machine learning algorithms with formal guarantees about performance, robustness, or fairness. Yet, the analysis of these algorithms is often complex, and implementing such systems in practice introduces room for error. Proof assistants can be used to formally verify machine learning systems by constructing machine checked proofs of correctness that rule out such bugs. However, reasoning about probabilistic claims inside of a proof assistant remains challenging. We show how a probabilistic program can be automatically represented in a theorem prover using the concept of \emph{reparameterization}, and how some of the tedious proofs of measurability can be generated automatically from the probabilistic program. To demonstrate that this approach is broad enough to handle rather different types of machine learning systems, we verify both a classic result from statistical learning theory (PAC-learnability of decision stumps) and prove that the null model used in a Bayesian hypothesis test satisfies a fairness criterion called demographic parity.

...read moreread less

2 citations

Posted Content•

Constrained Learning with Non-Convex Losses.

[...]

Luiz F. O. Chamon¹, Santiago Paternain¹, Miguel Calvo-Fullana¹, Alejandro Ribeiro¹•Institutions (1)

University of Pennsylvania¹

08 Mar 2021-arXiv: Learning

TL;DR: In this article, the empirical duality gap is defined as the difference between an approximate, tractable solution and the solution of the original (nonconvex)~statistical problem.

...read moreread less

Abstract: Though learning has become a core technology of modern information processing, there is now ample evidence that it can lead to biased, unsafe, and prejudiced solutions. The need to impose requirements on learning is therefore paramount, especially as it reaches critical applications in social, industrial, and medical domains. However, the non-convexity of most modern learning problems is only exacerbated by the introduction of constraints. Whereas good unconstrained solutions can often be learned using empirical risk minimization (ERM), even obtaining a model that satisfies statistical constraints can be challenging, all the more so a good one. In this paper, we overcome this issue by learning in the empirical dual domain, where constrained statistical learning problems become unconstrained, finite dimensional, and deterministic. We analyze the generalization properties of this approach by bounding the empirical duality gap, i.e., the difference between our approximate, tractable solution and the solution of the original (non-convex)~statistical problem, and provide a practical constrained learning algorithm. These results establish a constrained counterpart of classical learning theory and enable the explicit use of constraints in learning. We illustrate this algorithm and theory in rate-constrained learning applications.

...read moreread less

2 citations

Book Chapter•DOI•

SVM Classification for Large Data Sets by Support Vector Estimating and Selecting

[...]

Fei Li¹, Honglian Li¹•Institutions (1)

Beijing Information Science & Technology University¹

01 Jan 2012

TL;DR: A new method called support vector estimating and selecting (SVES) is presented, which improves SVM for large sample training speed by remove the sample, which help smeller, redundancy or obvious noise.

...read moreread less

Abstract: As a kind of statistical learning theory, in solving the small data set, nonlinear and high dimensional problems, support vector machine (SVM) has shown many advantages. It has been widely applied in recent years. However, with the increase of the training samples, normal SVM training speed becomes the bottleneck of restricting its application. Therefore, this paper presents a new method called support vector estimating and selecting (SVES). It improves SVM for large sample training speed by remove the sample, which help smeller, redundancy or obvious noise.

...read moreread less

2 citations

Proceedings Article•DOI•

Research on methodology of document classification based on statistical learning theory

[...]

Weihong Wang¹•Institutions (1)

Zhejiang University of Technology¹

13 Jun 2005

TL;DR: A kind of classification model for document classification based on statistical learning theory is presented, which adopts organized vectors as the eigenvector of documents, trains classifier by means of SVM algorithm, and gets satisfactory experiment results.

...read moreread less

Abstract: Document classification is one of important steps in document mining. With the statistical learning theory, they proposed a kind of machine learning method based on small sample set. This paper presents a kind of classification model for document classification based on statistical learning theory. In this model, we adopt organized vectors as the eigenvector of documents, trains classifier by means of SVM algorithm, and obtain satisfactory experiment results.

...read moreread less

2 citations

Book Chapter•DOI•

On the Sample Complexity of Cancer Pathways Identification

[...]

Fabio Vandin¹, Fabio Vandin², Benjamin J. Raphael¹, Eli Upfal¹•Institutions (2)

Brown University¹, University of Southern Denmark²

12 Apr 2015

TL;DR: A framework to analyze the sample complexity of problems that arise in the study of genomic datasets based on tools from combinatorial analysis and statistical learning theory, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway.

...read moreread less

Abstract: In this work we propose a framework to analyze the sample complexity of problems that arise in the study of genomic datasets. Our framework is based on tools from combinatorial analysis and statistical learning theory that have been used for the analysis of machine learning and probably approximately correct (PAC) learning. We use our framework to analyze the problem of the identification of cancer pathways through mutual exclusivity analysis of mutations from large cancer sequencing studies. We analytically derive matching upper and lower bounds on the sample complexity of the problem, showing that sample sizes much larger than currently available may be required to identify all the cancer genes in a pathway. We also provide two algorithms to find a cancer pathway from a large genomic dataset. On simulated and cancer data, we show that our algorithms can be used to identify cancer pathways from large genomic datasets.

...read moreread less

2 citations

Collapse

Network Information

Performance

Metrics

1,647

Papers

173,903

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	19
2021	59
2020	69
2019	72
2018	47

Statistical learning theory

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics