Post-Stratification: A Modeler's Perspective

doi:10.1080/01621459.1993.10476368

Home
/
Papers
/
Post-Stratification: A Modeler's Perspective

Journal Article•DOI•

Post-Stratification: A Modeler's Perspective

Roderick J. A. Little¹•Institutions (1)

University of Michigan¹

01 Sep 1993-Journal of the American Statistical Association (Taylor & Francis Group)-Vol. 88, Iss: 423, pp 1001-1012

TL;DR: This article developed Bayesian model-based theory for post-stratification, which is a common technique in survey analysis for incorporating population distributions of variables into survey estimates, such as functions of means and totals.

read less

Abstract: Post-stratification is a common technique in survey analysis for incorporating population distributions of variables into survey estimates. The basic technique divides the sample into post-strata, and computes a post-stratification weight w ih = rP h /r h for each sample case in post-stratum h, where r h is the number of survey respondents in post-stratum h, P h is the population proportion from a census, and r is the respondent sample size. Survey estimates, such as functions of means and totals, then weight cases by w h . Variants and extensions of the method include truncation of the weights to avoid excessive variability and raking to a set of two or more univariate marginal distributions. Literature on post-stratification is limited and has mainly taken the randomization (or design-based) perspective, where inference is based on the sampling distribution with population values held fixed. This article develops Bayesian model-based theory for the method. A basic normal post-stratification mod...

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Advances and Open Problems in Federated Learning

[...]

Peter Kairouz, H. Brendan McMahan¹, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, Rafael G. L. D'Oliveira, Hubert Eichner, Salim El Rouayheb, David Evans, Josh Gardner, Zachary Garrett, Adrià Gascón, Badih Ghazi, Phillip B. Gibbons, Marco Gruteser, Zaid Harchaoui, Chaoyang He, Lie He, Zhouyuan Huo, Ben Hutchinson, Justin Hsu, Martin Jaggi, Tara Javidi, Gauri Joshi, Mikhail Khodak, Jakub Konečný, Aleksandra Korolova, Farinaz Koushanfar, Sanmi Koyejo, Tancrède Lepoint, Yang Liu, Prateek Mittal, Mehryar Mohri, Richard Nock, Ayfer Ozgur, Rasmus Pagh, Mariana Raykova, Hang Qi, Daniel Ramage, Ramesh Raskar, Dawn Song, Weikang Song, Sebastian U. Stich, Ziteng Sun, Ananda Theertha Suresh, Florian Tramèr, Praneeth Vepakomma, Jianyu Wang, Li Xiong, Zheng Xu, Qiang Yang, Felix X. Yu, Han Yu, Sen Zhao - Show less +55 more•Institutions (1)

Google¹

10 Dec 2019-arXiv: Learning

TL;DR: Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

...read moreread less

Abstract: Federated learning (FL) is a machine learning setting where many clients (e.g. mobile devices or whole organizations) collaboratively train a model under the orchestration of a central server (e.g. service provider), while keeping the training data decentralized. FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges.

...read moreread less

1,107 citations

Journal Article•DOI•

Expectations of brilliance underlie gender distributions across academic disciplines

[...]

Sarah-Jane Leslie¹, Andrei Cimpian², Meredith Meyer³, Edward Freeland¹•Institutions (3)

Princeton University¹, University of Illinois at Urbana–Champaign², Otterbein University³

16 Jan 2015-Science

TL;DR: Results from a nationwide survey of academics support the hypothesis that women are underrepresented in fields whose practitioners believe that raw, innate talent is the main requirement for success, because women are stereotyped as not possessing such talent.

...read moreread less

Abstract: The gender imbalance in STEM subjects dominates current debates about women's underrepresentation in academia. However, women are well represented at the Ph.D. level in some sciences and poorly represented in some humanities (e.g., in 2011, 54% of U.S. Ph.D.'s in molecular biology were women versus only 31% in philosophy). We hypothesize that, across the academic spectrum, women are underrepresented in fields whose practitioners believe that raw, innate talent is the main requirement for success, because women are stereotyped as not possessing such talent. This hypothesis extends to African Americans' underrepresentation as well, as this group is subject to similar stereotypes. Results from a nationwide survey of academics support our hypothesis (termed the field-specific ability beliefs hypothesis) over three competing hypotheses.

...read moreread less

963 citations

Journal Article•DOI•

Handling missing data in survey research

[...]

J M Brick¹, Graham Kalton²•Institutions (2)

Westat¹, University of Maryland, College Park²

01 Sep 1996-Statistical Methods in Medical Research

TL;DR: Various weighting and imputation methods that assign values for missing responses are used to compensate for item nonresponses.

...read moreread less

Abstract: Missing data occur in survey research because an element in the target population is not included on the survey's sampling frame (noncoverage), because a sampled element does not participate in the survey (total nonresponse) and because a responding sampled element fails to provide acceptable responses to one or more of the survey items (item nonresponse). A variety of methods have been developed to attempt to compensate for missing survey data in a general purpose way that enables the survey's data file to be analysed without regard for the missing data. Weighting adjustments are often used to compensate for noncoverage and total nonresponse. Imputation methods that assign values for missing responses are used to compensate for item nonresponses. This paper describes the various weighting and imputation methods that have been developed, and discusses their benefits and limitations.

...read moreread less

558 citations

Journal Article•

Struggles with survey weighting and regression modeling

[...]

Andrew Gelman

01 Jan 2008-Quality Engineering

TL;DR: In this paper, the authors discuss in the context of several ongoing public health and social surveys how to develop general families of multilevel probability models that yield reasonable Bayesian inferences.

...read moreread less

Abstract: The general principles of Bayesian data analysis imply that models for survey responses should be constructed conditional on all variables that affect the probability of inclusion and nonresponse, which are also the variables used in survey weighting and clustering. However, such models can quickly become very complicated, with potentially thousands of poststratification cells. It is then a challenge to develop general families of multilevel probability models that yield reasonable Bayesian inferences. We discuss in the context of several ongoing public health and social surveys. This work is currently open-ended, and we conclude with thoughts on how research could proceed to solve these problems.

...read moreread less

425 citations

Cites background from "Post-Stratification: A Modeler's Pe..."

...Any of these equivalent expressions can be viewed as the posterior variance of θ given a noninformative prior distribution on the regression coefficients, and ignoring posterior uncertainty in σy (Little, 1993)....
[...]
...When cell means are estimated using certain linear regression models, poststratified estimates can be interpreted as weighted averages (Little, 1991, 1993)....
[...]
...We now review the unified notation for poststratification and survey weighting of Little (1991, 1993) and Gelman and Carlin (2002); see also Holt and Smith (1979)....
[...]

Journal Article•DOI•

Struggles with Survey Weighting and Regression Modeling

[...]

Andrew Gelman¹•Institutions (1)

Columbia University¹

01 May 2007-Statistical Science

TL;DR: In this article, the authors discuss in the context of several ongoing public health and social surveys how to develop general families of multilevel probability models that yield reasonable Bayesian inferences.

...read moreread less

382 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

The central role of the propensity score in observational studies for causal effects

[...]

Paul R. Rosenbaum, Donald B. Rubin

01 Apr 1983-Biometrika

TL;DR: The authors discusses the central role of propensity scores and balancing scores in the analysis of observational studies and shows that adjustment for the scalar propensity score is sufficient to remove bias due to all observed covariates.

...read moreread less

Abstract: : The results of observational studies are often disputed because of nonrandom treatment assignment. For example, patients at greater risk may be overrepresented in some treatment group. This paper discusses the central role of propensity scores and balancing scores in the analysis of observational studies. The propensity score is the (estimated) conditional probability of assignment to a particular treatment given a vector of observed covariates. Both large and small sample theory show that adjustment for the scalar propensity score is sufficient to remove bias due to all observed covariates. Applications include: matched sampling on the univariate propensity score which is equal percent bias reducing under more general conditions than required for discriminant matching, multivariate adjustment by subclassification on balancing scores where the same subclasses are used to estimate treatment effects for all outcome variables and in all subpopulations, and visual representation of multivariate adjustment by a two-dimensional plot. (Author)

...read moreread less

23,744 citations

Journal Article•DOI•

Inference and missing data

[...]

Donald B. Rubin

01 Dec 1976-Biometrika

TL;DR: In this article, it was shown that ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are missing at random and the observed data are observed at random, and then such inferences are generally conditional on the observed pattern of missing data.

...read moreread less

Abstract: Two results are presented concerning inference when data may be missing. First, ignoring the process that causes missing data when making sampling distribution inferences about the parameter of the data, θ, is generally appropriate if and only if the missing data are “missing at random” and the observed data are “observed at random,” and then such inferences are generally conditional on the observed pattern of missing data. Second, ignoring the process that causes missing data when making Bayesian inferences about θ is generally appropriate if and only if the missing data are missing at random and the parameter of the missing data is “independent” of θ. Examples and discussion indicating the implications of these results are included.

...read moreread less

8,197 citations

Journal Article•DOI•

On a Least Squares Adjustment of a Sampled Frequency Table When the Expected Marginal Totals are Known

[...]

W. Edwards Deming, Frederick F. Stephan

01 Dec 1940-Annals of Mathematical Statistics

1,517 citations

Journal Article•DOI•

Bayesianly Justifiable and Relevant Frequency Calculations for the Applied Statistician

[...]

Donald B. Rubin

01 Dec 1984-Annals of Statistics

TL;DR: In this paper, three types of Bayesianly justifiable and relevant frequency calculations are presented using examples to convey their use for the applied statistician, and they are discussed in detail.

...read moreread less

Abstract: A common reaction among applied statisticians is that the Bayesian statistician's energies in an applied problem must be directed at the a priori elicitation of one model specification from which an optimal design and all inferences follow automatically by applying Bayes's theorem to calculate conditional distributions of unknowns given knowns. I feel, however, that the applied Bayesian statistician's tool-kit should be more extensive and include tools that may be usefully labeled frequency calculations. Three types of Bayesianly justifiable and relevant frequency calculations are presented using examples to convey their use for the applied statistician.

...read moreread less

1,284 citations

On the Two Different Aspects of the Representative Method: the Method of Stratified Sampling and the Method of Purposive Selection

[...]

erzy Neyman

01 Jan 2011

TL;DR: The representative method has attracted the attention of many statisticians in different countries as discussed by the authors, mainly due to the general crisis, to the scarcity of money and to the necessity of carrying out statistical investigations connected with social life in a somewhat hasty way.

...read moreread less

Abstract: Owing to the work of the International Statistical Institute, * and perhaps still more to personal achievements of Professor A.L. Bowley, the theory and the possibility of practical applications of the representative method has attracted the attention of many statisticians in different countries. Very probably this popularity of the representative method is also partly due to the general crisis, to the scarcity of money and to the necessity of carrying out statistical investigations connected with social life in a somewhat hasty way. The results are wanted in some few months, sometimes in a few weeks after the beginning of the work, and there is neither time nor money for an exhaustive research. But I think that if practical statistics has acquired something valuable in the representative method, this is due primarily to Professor A.L. Bowley, who not only was one of the first to apply this method in practice,t but also wrote a very fundamental memoirt giving the theory of the method. Since then the representative method has been often applied in different countries a.'l.d for different purposes. My chief topic being the theory of the representative method, I shall not go into its history and shall not quote the examples of its practical application

...read moreread less

1,081 citations