scispace - formally typeset
Search or ask a question
Proceedings Article

A comparative study on sampling with replacement vs Poisson sampling in optimal subsampling

18 Mar 2021-pp 289-297
About: This article is published in International Conference on Artificial Intelligence and Statistics.The article was published on 2021-03-18 and is currently open access. It has received None citations till now. The article focuses on the topics: Poisson sampling & Simple random sample.

Content maybe subject to copyright    Report

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

01 Jan 2007

17,341 citations

Journal ArticleDOI
TL;DR: It is shown that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches, demonstrating that deep learning approaches can improve the power of collider searches for exotic particles.
Abstract: Collisions at high-energy particle colliders are a traditionally fruitful source of exotic particle discoveries. Finding these rare particles requires solving difficult signal-versus-background classification problems, hence machine-learning approaches are often used. Standard approaches have relied on 'shallow' machine-learning models that have a limited capacity to learn complex nonlinear functions of the inputs, and rely on a painstaking search through manually constructed nonlinear features. Progress on this problem has slowed, as a variety of techniques have shown equivalent performance. Recent advances in the field of deep learning make it possible to learn more complex functions and better discriminate between signal and background classes. Here, using benchmark data sets, we show that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches. This demonstrates that deep-learning approaches can improve the power of collider searches for exotic particles.

1,175 citations

Book
01 Jan 2007
TL;DR: This book presents the theory and methods of optimum experimental design, making them available through the use of SAS programs, and stresses the importance of models in the analysis of data and introduces least squares fitting and simple optimum experimental designs.
Abstract: Experiments on patients, processes or plants all have random error, making statistical methods essential for their efficient design and analysis. This book presents the theory and methods of optimum experimental design, making them available through the use of SAS programs. Little previous statistical knowledge is assumed. The first part of the book stresses the importance of models in the analysis of data and introduces least squares fitting and simple optimum experimental designs. The second part presents a more detailed discussion of the general theory and of a wide variety of experiments. The book stresses the use of SAS to provide hands-on solutions for the construction of designs in both standard and non-standard situations. The mathematical theory of the designs is developed in parallel with their construction in SAS, so providing motivation for the development of the subject. Many chapters cover self-contained topics drawn from science, engineering and pharmaceutical investigations, such as response surface designs, blocking of experiments, designs for mixture experiments and for nonlinear and generalized linear models. Understanding is aided by the provision of "SAS tasks" after most chapters as well as by more traditional exercises and a fully supported website. The authors are leading experts in key fields and this book is ideal for statisticians and scientists in academia, research and the process and pharmaceutical industries.

1,076 citations

Journal ArticleDOI
TL;DR: An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.
Abstract: Principal components analysis and, more generally, the Singular Value Decomposition are fundamental data analysis tools that express a data matrix in terms of a sequence of orthogonal or uncorrelated vectors of decreasing importance. Unfortunately, being linear combinations of up to all the data points, these vectors are notoriously difficult to interpret in terms of the data and processes generating the data. In this article, we develop CUR matrix decompositions for improved data analysis. CUR decompositions are low-rank matrix decompositions that are explicitly expressed in terms of a small number of actual columns and/or actual rows of the data matrix. Because they are constructed from actual data elements, CUR decompositions are interpretable by practitioners of the field from which the data are drawn (to the extent that the original data are). We present an algorithm that preferentially chooses columns and rows that exhibit high “statistical leverage” and, thus, in a very precise statistical sense, exert a disproportionately large “influence” on the best low-rank fit of the data matrix. By selecting columns and rows in this manner, we obtain improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work. In addition, since the construction involves computing quantities with a natural and widely studied statistical interpretation, we can leverage ideas from diagnostic regression analysis to employ these matrix decompositions for exploratory data analysis.

815 citations