scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Post-Stratification: A Modeler's Perspective

01 Sep 1993-Journal of the American Statistical Association (Taylor & Francis Group)-Vol. 88, Iss: 423, pp 1001-1012
TL;DR: This article developed Bayesian model-based theory for post-stratification, which is a common technique in survey analysis for incorporating population distributions of variables into survey estimates, such as functions of means and totals.
Abstract: Post-stratification is a common technique in survey analysis for incorporating population distributions of variables into survey estimates. The basic technique divides the sample into post-strata, and computes a post-stratification weight w ih = rP h /r h for each sample case in post-stratum h, where r h is the number of survey respondents in post-stratum h, P h is the population proportion from a census, and r is the respondent sample size. Survey estimates, such as functions of means and totals, then weight cases by w h . Variants and extensions of the method include truncation of the weights to avoid excessive variability and raking to a set of two or more univariate marginal distributions. Literature on post-stratification is limited and has mainly taken the randomization (or design-based) perspective, where inference is based on the sampling distribution with population values held fixed. This article develops Bayesian model-based theory for the method. A basic normal post-stratification mod...
Citations
More filters
Posted Content
TL;DR: Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.
Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

1,107 citations

Journal ArticleDOI
16 Jan 2015-Science
TL;DR: Results from a nationwide survey of academics support the hypothesis that women are underrepresented in fields whose practitioners believe that raw, innate talent is the main requirement for success, because women are stereotyped as not possessing such talent.
Abstract: The gender imbalance in STEM subjects dominates current debates about women's underrepresentation in academia. However, women are well represented at the Ph.D. level in some sciences and poorly represented in some humanities (e.g., in 2011, 54% of U.S. Ph.D.'s in molecular biology were women versus only 31% in philosophy). We hypothesize that, across the academic spectrum, women are underrepresented in fields whose practitioners believe that raw, innate talent is the main requirement for success, because women are stereotyped as not possessing such talent. This hypothesis extends to African Americans' underrepresentation as well, as this group is subject to similar stereotypes. Results from a nationwide survey of academics support our hypothesis (termed the field-specific ability beliefs hypothesis) over three competing hypotheses.

963 citations

Journal ArticleDOI
TL;DR: Various weighting and imputation methods that assign values for missing responses are used to compensate for item nonresponses.
Abstract: Missing data occur in survey research because an element in the target population is not included on the survey's sampling frame (noncoverage), because a sampled element does not participate in the survey (total nonresponse) and because a responding sampled element fails to provide acceptable responses to one or more of the survey items (item nonresponse). A variety of methods have been developed to attempt to compensate for missing survey data in a general purpose way that enables the survey's data file to be analysed without regard for the missing data. Weighting adjustments are often used to compensate for noncoverage and total nonresponse. Imputation methods that assign values for missing responses are used to compensate for item nonresponses. This paper describes the various weighting and imputation methods that have been developed, and discusses their benefits and limitations.

558 citations

Journal Article
TL;DR: In this paper, the authors discuss in the context of several ongoing public health and social surveys how to develop general families of multilevel probability models that yield reasonable Bayesian inferences.
Abstract: The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

425 citations


Cites background from "Post-Stratification: A Modeler's Pe..."

  • ...Any of these equivalent expressions can be viewed as the posterior variance of θ given a noninformative prior distribution on the regression coefficients, and ignoring posterior uncertainty in σy (Little, 1993)....

    [...]

  • ...When cell means are estimated using certain linear regression models, poststratified estimates can be interpreted as weighted averages (Little, 1991, 1993)....

    [...]

  • ...We now review the unified notation for poststratification and survey weighting of Little (1991, 1993) and Gelman and Carlin (2002); see also Holt and Smith (1979)....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors discuss in the context of several ongoing public health and social surveys how to develop general families of multilevel probability models that yield reasonable Bayesian inferences.
Abstract: The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

382 citations

References
More filters
Journal ArticleDOI
TL;DR: The authors discusses the central role of propensity scores and balancing scores in the analysis of observational studies and shows that adjustment for the scalar propensity score is sufficient to remove bias due to all observed covariates.
Abstract: : The results of observational studies are often disputed because of nonrandom treatment assignment. For example, patients at greater risk may be overrepresented in some treatment group. This paper discusses the central role of propensity scores and balancing scores in the analysis of observational studies. The propensity score is the (estimated) conditional probability of assignment to a particular treatment given a vector of observed covariates. Both large and small sample theory show that adjustment for the scalar propensity score is sufficient to remove bias due to all observed covariates. Applications include: matched sampling on the univariate propensity score which is equal percent bias reducing under more general conditions than required for discriminant matching, multivariate adjustment by subclassification on balancing scores where the same subclasses are used to estimate treatment effects for all outcome variables and in all subpopulations, and visual representation of multivariate adjustment by a two-dimensional plot. (Author)

23,744 citations

Journal ArticleDOI
TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.
Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

8,197 citations

Journal ArticleDOI
TL;DR: In this paper, three types of Bayesianly justifiable and relevant frequency calculations are presented using examples to convey their use for the applied statistician, and they are discussed in detail.
Abstract: A common reaction among applied statisticians is that the Bayesian statistician's energies in an applied problem must be directed at the a priori elicitation of one model specification from which an optimal design and all inferences follow automatically by applying Bayes's theorem to calculate conditional distributions of unknowns given knowns. I feel, however, that the applied Bayesian statistician's tool-kit should be more extensive and include tools that may be usefully labeled frequency calculations. Three types of Bayesianly justifiable and relevant frequency calculations are presented using examples to convey their use for the applied statistician.

1,284 citations

01 Jan 2011
TL;DR: The representative method has attracted the attention of many statisticians in different countries as discussed by the authors, mainly due to the general crisis, to the scarcity of money and to the necessity of carrying out statistical investigations connected with social life in a somewhat hasty way.
Abstract: Owing to the work of the International Statistical Institute, * and perhaps still more to personal achievements of Professor A.L. Bowley, the theory and the possibility of practical applications of the representative method has attracted the attention of many statisticians in different countries. Very probably this popularity of the representative method is also partly due to the general crisis, to the scarcity of money and to the necessity of carrying out statistical investigations connected with social life in a somewhat hasty way. The results are wanted in some few months, sometimes in a few weeks after the beginning of the work, and there is neither time nor money for an exhaustive research. But I think that if practical statistics has acquired something valuable in the representative method, this is due primarily to Professor A.L. Bowley, who not only was one of the first to apply this method in practice,t but also wrote a very fundamental memoirt giving the theory of the method. Since then the representative method has been often applied in different countries a.'l.d for different purposes. My chief topic being the theory of the representative method, I shall not go into its history and shall not quote the examples of its practical application

1,081 citations