A comparative study on sampling with replacement vs Poisson sampling in optimal subsampling

Proceedings Article•

A comparative study on sampling with replacement vs Poisson sampling in optimal subsampling

18 Mar 2021-pp 289-297

About: This article is published in International Conference on Artificial Intelligence and Statistics.The article was published on 2021-03-18 and is currently open access. It has received None citations till now. The article focuses on the topics: Poisson sampling & Simple random sample.

...read moreread less

Content maybe subject to copyright Report

References

PDF

Open Access

More filters

Journal Article•

R: A language and environment for statistical computing.

[...]

R Core Team

01 Jan 2014-MSOR connections

TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.

...read moreread less

Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

...read moreread less

272,030 citations

UCI Machine Learning Repository

[...]

A. Asuncion

01 Jan 2007

17,341 citations

Journal Article•DOI•

Searching for exotic particles in high-energy physics with deep learning

[...]

Pierre Baldi¹, Peter Sadowski¹, Daniel Whiteson¹•Institutions (1)

University of California, Irvine¹

02 Jul 2014-Nature Communications

TL;DR: It is shown that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches, demonstrating that deep learning approaches can improve the power of collider searches for exotic particles.

...read moreread less

Abstract: Collisions at high-energy particle colliders are a traditionally fruitful source of exotic particle discoveries. Finding these rare particles requires solving difficult signal-versus-background classification problems, hence machine-learning approaches are often used. Standard approaches have relied on 'shallow' machine-learning models that have a limited capacity to learn complex nonlinear functions of the inputs, and rely on a painstaking search through manually constructed nonlinear features. Progress on this problem has slowed, as a variety of techniques have shown equivalent performance. Recent advances in the field of deep learning make it possible to learn more complex functions and better discriminate between signal and background classes. Here, using benchmark data sets, we show that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches. This demonstrates that deep-learning approaches can improve the power of collider searches for exotic particles.

...read moreread less

1,175 citations

Book•

Optimum Experimental Designs, with SAS

[...]

Anthony C. Atkinson, A. N. Donev, Randall D. Tobias

01 Jan 2007

TL;DR: This book presents the theory and methods of optimum experimental design, making them available through the use of SAS programs, and stresses the importance of models in the analysis of data and introduces least squares fitting and simple optimum experimental designs.

...read moreread less

Abstract: Experiments on patients, processes or plants all have random error, making statistical methods essential for their efficient design and analysis. This book presents the theory and methods of optimum experimental design, making them available through the use of SAS programs. Little previous statistical knowledge is assumed. The first part of the book stresses the importance of models in the analysis of data and introduces least squares fitting and simple optimum experimental designs. The second part presents a more detailed discussion of the general theory and of a wide variety of experiments. The book stresses the use of SAS to provide hands-on solutions for the construction of designs in both standard and non-standard situations. The mathematical theory of the designs is developed in parallel with their construction in SAS, so providing motivation for the development of the subject. Many chapters cover self-contained topics drawn from science, engineering and pharmaceutical investigations, such as response surface designs, blocking of experiments, designs for mixture experiments and for nonlinear and generalized linear models. Understanding is aided by the provision of "SAS tasks" after most chapters as well as by more traditional exercises and a fully supported website. The authors are leading experts in key fields and this book is ideal for statisticians and scientists in academia, research and the process and pharmaceutical industries.

...read moreread less

1,076 citations

Journal Article•DOI•

CUR matrix decompositions for improved data analysis

[...]

Michael W. Mahoney¹, Petros Drineas²•Institutions (2)

Stanford University¹, Rensselaer Polytechnic Institute²

20 Jan 2009-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.

...read moreread less

Abstract: Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high “statistical leverage” and, thus, in a very precise statistical sense, exert a disproportionately large “influence” on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.

...read moreread less

815 citations

Collapse

A comparative study on sampling with replacement vs Poisson sampling in optimal subsampling

References

Related Papers (5)