Author
Anirban DasGupta
Other affiliations: University of California, San Diego
Bio: Anirban DasGupta is an academic researcher from Purdue University. The author has contributed to research in topics: Central limit theorem & Random variable. The author has an hindex of 19, co-authored 98 publications receiving 4973 citations. Previous affiliations of Anirban DasGupta include University of California, San Diego.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: In this paper, the problem of interval estimation of a binomial proportion is revisited, and a number of natural alternatives are presented, each with its motivation and con- text, each interval is examined for its coverage probability and its length.
Abstract: We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the stan- d ardWaldconfid ence interval has previously been remarkedon in the literature (Blyth andStill, Agresti andCoull, Santner andothers). We begin by showing that the chaotic coverage properties of the Waldinter- val are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects andcannot be trusted . This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and con- text. Each interval is examinedfor its coverage probability andits length. Basedon this analysis, we recommendthe Wilson interval or the equal- tailedJeffreys prior interval for small n andthe interval suggestedin Agresti andCoull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.
2,893 citations
•
12 Aug 2008
TL;DR: In this paper, a collection of Inequalities in Probability, Linear Algebra, and Analysis is presented. But they focus mainly on two-sample problems: Chi-square Tests for Goodness of Fit and Goodness-of-Fit with estimated parameters.
Abstract: Basic Convergence Concepts and Theorems.- Metrics, Information Theory, Convergence, and Poisson Approximations.- More General Weak and Strong Laws and the Delta Theorem.- Transformations.- More General Central Limit Theorems.- Moment Convergence and Uniform Integrability.- Sample Percentiles and Order Statistics.- Sample Extremes.- Central Limit Theorems for Dependent Sequences.- Central Limit Theorem for Markov Chains.- Accuracy of Central Limit Theorems.- Invariance Principles.- Edgeworth Expansions and Cumulants.- Saddlepoint Approximations.- U-statistics.- Maximum Likelihood Estimates.- M Estimates.- The Trimmed Mean.- Multivariate Location Parameter and Multivariate Medians.- Bayes Procedures and Posterior Distributions.- Testing Problems.- Asymptotic Efficiency in Testing.- Some General Large-Deviation Results.- Classical Nonparametrics.- Two-Sample Problems.- Goodness of Fit.- Chi-square Tests for Goodness of Fit.- Goodness of Fit with Estimated Parameters.- The Bootstrap.- Jackknife.- Permutation Tests.- Density Estimation.- Mixture Models and Nonparametric Deconvolution.- High-Dimensional Inference and False Discovery.- A Collection of Inequalities in Probability, Linear Algebra, and Analysis.
738 citations
••
Purdue University1, University of Granada2, Simón Bolívar University3, University of Valencia4, University of Murcia5, Autonomous University of Madrid6, Technical University of Madrid7, University of Nottingham8, University of Basel9, University of Rouen10, University College London11, Sapienza University of Rome12, University of Cincinnati13
TL;DR: An overview of the subject of robust Bayesian analysis is provided, one that is accessible to statisticians outside the field, and recent developments in the area are reviewed.
Abstract: Robust Bayesian analysis is the study of the sensitivity of Bayesian answers to uncertain inputs. This paper seeks to provide an overview of the subject, one that is accessible to statisticians outside the field. Recent developments in the area are also reviewed, though with very uneven emphasis.
587 citations
••
TL;DR: Brown, Cai and DasGupta as mentioned in this paper compared the coverage properties of the standard Wald interval and four alternative interval methods by asymptotic expansions of their coverage probabilities and expected lengths.
Abstract: We address the classic problem of interval estimation of a binomial proportion. The Wald interval $\hat{p}\pm z_{\alpha/2} n^{-1/2} (\hat{p} (1 - \hat{p}))^{1/2}$ is currently in near universal use. We first show that the coverage properties of the Wald interval are persistently poor and defy virtually all conventional wisdom. We then proceed to a theoretical comparison of the standard interval and four additional alternative intervals by asymptotic expansions of their coverage probabilities and expected lengths. The four additional interval methods we study in detail are the score-test interval (Wilson), the likelihood-ratio-test interval, a Jeffreys prior Bayesian interval and an interval suggested by Agresti and Coull. The asymptotic expansions for coverage show that the first three of these alternative methods have coverages that fluctuate about the nominal value, while the Agresti–Coull interval has a somewhat larger and more nearly conservative coverage function. For the five interval methods we also investigate asymptotically their average coverage relative to distributions for $p$ supported within (0 1) . In terms of expected length, asymptotic expansions show that the Agresti–Coull interval is always the longest of these. The remaining three are rather comparable and are shorter than the Wald interval except for $p$ near 0 or 1. These analytical calculations support and complement the findings and the recommendations in Brown, Cai and DasGupta (Statist. Sci. (2001) 16 101–133).
299 citations
••
TL;DR: In this article, a contemporary exposition at a moderately quantitative level of the distribution theory associated with the matching and the birthday problems is provided to help a reader have a feeling for these questions at an intuitive level.
75 citations
Cited by
More filters
•
24 Aug 2012
TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.
8,059 citations
••
TL;DR: Convergence of Probability Measures as mentioned in this paper is a well-known convergence of probability measures. But it does not consider the relationship between probability measures and the probability distribution of probabilities.
Abstract: Convergence of Probability Measures. By P. Billingsley. Chichester, Sussex, Wiley, 1968. xii, 253 p. 9 1/4“. 117s.
5,689 citations
••
TL;DR: In this paper, the problem of interval estimation of a binomial proportion is revisited, and a number of natural alternatives are presented, each with its motivation and con- text, each interval is examined for its coverage probability and its length.
Abstract: We revisit the problem of interval estimation of a binomial proportion. The erratic behavior of the coverage probability of the stan- d ardWaldconfid ence interval has previously been remarkedon in the literature (Blyth andStill, Agresti andCoull, Santner andothers). We begin by showing that the chaotic coverage properties of the Waldinter- val are far more persistent than is appreciated. Furthermore, common textbook prescriptions regarding its safety are misleading and defective in several respects andcannot be trusted . This leads us to consideration of alternative intervals. A number of natural alternatives are presented, each with its motivation and con- text. Each interval is examinedfor its coverage probability andits length. Basedon this analysis, we recommendthe Wilson interval or the equal- tailedJeffreys prior interval for small n andthe interval suggestedin Agresti andCoull for larger n. We also provide an additional frequentist justification for use of the Jeffreys interval.
2,893 citations
••
TL;DR: The Danish National Patient Registry is a valuable tool for epidemiological research, however, both its strengths and limitations must be considered when interpreting research results, and continuous validation of its clinical data is essential.
Abstract: Background
The Danish National Patient Registry (DNPR) is one of the world’s oldest nationwide hospital registries and is used extensively for research. Many studies have validated algorithms for identifying health events in the DNPR, but the reports are fragmented and no overview exists.
2,818 citations
01 Dec 2009
2,243 citations