scispace - formally typeset
Search or ask a question
Author

Anirban DasGupta

Bio: Anirban DasGupta is an academic researcher from Purdue University. The author has contributed to research in topics: Central limit theorem & Random variable. The author has an hindex of 19, co-authored 98 publications receiving 4973 citations. Previous affiliations of Anirban DasGupta include University of California, San Diego.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, the problem of interval estimation of a binomial proportion is revisited, and a number of natural alternatives are presented, each with its motivation and con- text, each interval is examined for its coverage probability and its length.
Abstract: We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the stan- d ardWaldconfid ence interval has previously been remarkedon in the literature (Blyth andStill, Agresti andCoull, Santner andothers). We begin by showing that the chaotic coverage properties of the Waldinter- val are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects andcannot be trusted . This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and con- text. Each interval is examinedfor its coverage probability andits length. Basedon this analysis, we recommendthe Wilson interval or the equal- tailedJeffreys prior interval for small n andthe interval suggestedin Agresti andCoull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.

2,893 citations

Book
12 Aug 2008
TL;DR: In this paper, a collection of Inequalities in Probability, Linear Algebra, and Analysis is presented. But they focus mainly on two-sample problems: Chi-square Tests for Goodness of Fit and Goodness-of-Fit with estimated parameters.
Abstract: Basic Convergence Concepts and Theorems.- Metrics, Information Theory, Convergence, and Poisson Approximations.- More General Weak and Strong Laws and the Delta Theorem.- Transformations.- More General Central Limit Theorems.- Moment Convergence and Uniform Integrability.- Sample Percentiles and Order Statistics.- Sample Extremes.- Central Limit Theorems for Dependent Sequences.- Central Limit Theorem for Markov Chains.- Accuracy of Central Limit Theorems.- Invariance Principles.- Edgeworth Expansions and Cumulants.- Saddlepoint Approximations.- U-statistics.- Maximum Likelihood Estimates.- M Estimates.- The Trimmed Mean.- Multivariate Location Parameter and Multivariate Medians.- Bayes Procedures and Posterior Distributions.- Testing Problems.- Asymptotic Efficiency in Testing.- Some General Large-Deviation Results.- Classical Nonparametrics.- Two-Sample Problems.- Goodness of Fit.- Chi-square Tests for Goodness of Fit.- Goodness of Fit with Estimated Parameters.- The Bootstrap.- Jackknife.- Permutation Tests.- Density Estimation.- Mixture Models and Nonparametric Deconvolution.- High-Dimensional Inference and False Discovery.- A Collection of Inequalities in Probability, Linear Algebra, and Analysis.

738 citations

Journal ArticleDOI
01 Jun 1994-Test
TL;DR: An overview of the subject of robust Bayesian analysis is provided, one that is accessible to statisticians outside the field, and recent developments in the area are reviewed.
Abstract: Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis.

587 citations

Journal ArticleDOI
TL;DR: Brown, Cai and DasGupta as mentioned in this paper compared the coverage properties of the standard Wald interval and four alternative interval methods by asymptotic expansions of their coverage probabilities and expected lengths.
Abstract: We address the classic problem of interval estimation of a binomial proportion. The Wald interval $\hat{p}\pm z_{\alpha/2} n^{-1/2} (\hat{p} (1 - \hat{p}))^{1/2}$ is currently in near universal use. We first show that the coverage properties of the Wald interval are persistently poor and defy virtually all conventional wisdom. We then proceed to a theoretical comparison of the standard interval and four additional alternative intervals by asymptotic expansions of their coverage probabilities and expected lengths. The four additional interval methods we study in detail are the score-test interval (Wilson), the likelihood-ratio-test interval, a Jeffreys prior Bayesian interval and an interval suggested by Agresti and Coull. The asymptotic expansions for coverage show that the first three of these alternative methods have coverages that fluctuate about the nominal value, while the Agresti–Coull interval has a somewhat larger and more nearly conservative coverage function. For the five interval methods we also investigate asymptotically their average coverage relative to distributions for $p$ supported within (0 1) . In terms of expected length, asymptotic expansions show that the Agresti–Coull interval is always the longest of these. The remaining three are rather comparable and are shorter than the Wald interval except for $p$ near 0 or 1. These analytical calculations support and complement the findings and the recommendations in Brown, Cai and DasGupta (Statist. Sci. (2001) 16 101–133).

299 citations

Journal ArticleDOI
TL;DR: In this article, a contemporary exposition at a moderately quantitative level of the distribution theory associated with the matching and the birthday problems is provided to help a reader have a feeling for these questions at an intuitive level.

75 citations


Cited by
More filters
Book
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

8,059 citations

Journal ArticleDOI
TL;DR: Convergence of Probability Measures as mentioned in this paper is a well-known convergence of probability measures. But it does not consider the relationship between probability measures and the probability distribution of probabilities.
Abstract: Convergence of Probability Measures. By P. Billingsley. Chichester, Sussex, Wiley, 1968. xii, 253 p. 9 1/4“. 117s.

5,689 citations

Journal ArticleDOI
TL;DR: In this paper, the problem of interval estimation of a binomial proportion is revisited, and a number of natural alternatives are presented, each with its motivation and con- text, each interval is examined for its coverage probability and its length.
Abstract: We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the stan- d ardWaldconfid ence interval has previously been remarkedon in the literature (Blyth andStill, Agresti andCoull, Santner andothers). We begin by showing that the chaotic coverage properties of the Waldinter- val are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects andcannot be trusted . This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and con- text. Each interval is examinedfor its coverage probability andits length. Basedon this analysis, we recommendthe Wilson interval or the equal- tailedJeffreys prior interval for small n andthe interval suggestedin Agresti andCoull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.

2,893 citations

Journal ArticleDOI
TL;DR: The Danish National Patient Registry is a valuable tool for epidemiological research, however, both its strengths and limitations must be considered when interpreting research results, and continuous validation of its clinical data is essential.
Abstract: Background The Danish National Patient Registry (DNPR) is one of the world’s oldest nationwide hospital registries and is used extensively for research. Many studies have validated algorithms for identifying health events in the DNPR, but the reports are fragmented and no overview exists.

2,818 citations