scispace - formally typeset
Search or ask a question
Journal ArticleDOI

New Important Developments in Small Area Estimation

01 Feb 2013-Statistical Science (Institute of Mathematical Statistics)-Vol. 28, Iss: 1, pp 40-68
TL;DR: The problem of small area estimation (SAE) is how to produce reliable estimates of characteristics of interest such as means, counts, quantiles, etc., for areas or domains for which only small samples or no samples are available, and how to assess their precision.
Abstract: The problem of small area estimation (SAE) is how to produce reliable estimates of characteristics of interest such as means, counts, quantiles, etc., for areas or domains for which only small samples or no samples are available, and how to assess their precision. The purpose of this paper is to review and discuss some of the new important developments in small area estimation methods. Rao (2003) wrote a very comprehensive book, which covers all the main developments in this topic until that time. A few review papers have been written after 2003 but they are limited in scope. Hence, the focus of this review is on new developments in the last 7-8 years but to make the review more self-contained, I also mention shortly some of the older developments. The review covers both design-based and model-dependent methods, with the latter methods further classified into frequentist and Bayesian methods. The style of the paper is similar to the style of my previous review on SAE published in 2002, explaining the new problems investigated and describing the proposed solutions, but without dwelling on theoretical details, which can be found in the original articles. I hope that this paper will be useful both to researchers who like to learn more on the research carried out in SAE and to practitioners who might be interested in the application of the new methods.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: External validation results suggest that multilevel regression and poststratification model-based SAEs using single-year Behavioral Risk Factor Surveillance System data are valid and could be used to characterize geographic variations in health indictors at local levels (such as counties) when high-quality local survey data are not available.
Abstract: Small area estimation is a statistical technique used to produce reliable estimates for smaller geographic areas than those for which the original surveys were designed. Such small area estimates (SAEs) often lack rigorous external validation. In this study, we validated our multilevel regression and poststratification SAEs from 2011 Behavioral Risk Factor Surveillance System data using direct estimates from 2011 Missouri County-Level Study and American Community Survey data at both the state and county levels. Coefficients for correlation between model-based SAEs and Missouri County-Level Study direct estimates for 115 counties in Missouri were all significantly positive (0.28 for obesity and no health-care coverage, 0.40 for current smoking, 0.51 for diabetes, and 0.69 for chronic obstructive pulmonary disease). Coefficients for correlation between model-based SAEs and American Community Survey direct estimates of no health-care coverage were 0.85 at the county level (811 counties) and 0.95 at the state level. Unweighted and weighted model-based SAEs were compared with direct estimates; unweighted models performed better. External validation results suggest that multilevel regression and poststratification model-based SAEs using single-year Behavioral Risk Factor Surveillance System data are valid and could be used to characterize geographic variations in health indictors at local levels (such as counties) when high-quality local survey data are not available.

127 citations

Journal ArticleDOI
TL;DR: A general framework for the production of small area statistics that is governed by the principle of parsimony and is based on three broadly defined stages, namely specification, analysis and adaptation, and evaluation is proposed.
Abstract: Small area estimation is a research area in official and survey statistics of great practical relevance for National Statistical Institutes and related organisations. Despite rapid developments in methodology and software, researchers and users would benefit from having practical guidelines that assist the process of small area estimation. In this paper we propose a general framework for the production of small area statistics that is based on three broadly defined stages namely, Specification, Analysis/Adaptation and Evaluation. The corner stone of the proposed framework is the principle of parsimony. Emphasis is given on the interaction between a user and a methodologist for specifying the target geography and parameters in light of the available data. Model-free and model-dependent methods are described with focus on model selection and testing, model diagnostics and adaptations e.g. use of data transformations. The use of uncertainty measures and model and design-based simulations for method evaluation are also at the centre of the paper. We illustrate each stage of the process both theoretically and by using real data for estimating a simple and complex (non-linear) indicators.

69 citations

Journal ArticleDOI
TL;DR: This work develops computationally efficient, Bayesian spatial smoothing models that acknowledge the design weights, and analyzes data from the Washington State 2006 Behavioral Risk Factor Surveillance System to show mean squared error can be greatly reduced with the proposed methods.

68 citations

Journal ArticleDOI
TL;DR: The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies.
Abstract: Bayesian statistics offers an alternative to classical (frequentist) statistics. It is distinguished by its use of probability distributions to describe uncertain quantities, which leads to elegant solutions to many difficult statistical problems. Although Bayesian demography, like Bayesian statistics more generally, is around 250 years old, only recently has it begun to flourish. The aim of this paper is to review the achievements of Bayesian demography, address some misconceptions, and make the case for wider use of Bayesian methods in population studies. We focus on three applications: demographic forecasts, limited data, and highly structured or complex models. The key advantages of Bayesian methods are the ability to integrate information from multiple sources and to describe uncertainty coherently. Bayesian methods also allow for including additional (prior) information next to the data sample. As such, Bayesian approaches are complementary to many traditional methods, which can be productively re-expressed in Bayesian terms.

48 citations

References
More filters
Book
J. N. K. Rao1
23 Jan 2003
TL;DR: In this paper, the authors proposed a model-based approach for estimating small area statistics based on direct and indirect estimates of the total population of a given region in a given domain.
Abstract: List of Figures. List of Tables. Foreword. Preface. 1. Introduction. What is a Small Area? Demand for Small Area Statistics. Traditional Indirect Estimators. Small Area Models. Model-Based Estimation. Some Examples. 2. Direct Domain Estimation. Introduction. Design-based Approach. Estimation of Totals. Domain Estimation. Modified Direct Estimators. Design Issues. Proofs. 3. Traditional Demographic Methods. Introduction. Symptomatic Accounting Techniques. Regression Symptomatic Procedures. Dual-system Estimation of Total Population. Derivation of Average MSEs. 4. Indirect Domain Estimation. Introduction. Synthetic Estimation. Composite Estimation. James-Stein Method. Proofs. 5. Small Area Models. Introduction. Basic Area Level (Type A) Mode l. Basic Unit Level (Type B) Model. Extensions: Type A Models. Extensions: Type B Models. Generalized Linear Mixed Models. 6. Empirical Best Linear Unbiased Prediction: Theory. Introduction. General Linear Mixed Model. Block Diagonal Covariance Structure. Proofs. 7. EBLUP: Basic Models. Basic Area Level Model. Basic Unit Level Model. 8. EBLUP: Extensions. Multivariate Fay-Herriot Model. Correlated Sampling Errors. Time Series and Cross-sectional Models. Spatial Models. Multivariate Nested Error Regression Model. Random Error Variances Linear Model. Two-fold Nested Error Regression Model. Two-level Model. 9. Empirical Bayes (EB) Method. Introduction. Basic Area Level Model. Linear Mixed Models. Binary Data. Disease Mapping. Triple-goal Estimation. Empirical Linear Bayes. Constrained LB. Proofs. 10. Hierarchical Bayes (HB) Method. Introduction. MCMC Methods. Basic Area Level Model. Unmatched Sampling and Linking Area Level Models. Basic Unit Level Model. General ANOVA Model. Two-level Models. Time Series and Cross-sectional Models. Multivariate Models. Disease Mapping Models. Binary Data. Exponential Family Models. Constrained HB. Proofs. References. Author Index. Subject Index.

1,359 citations

Journal ArticleDOI
TL;DR: In this article, an adaptation of the James-Stein estimator is applied to sample estimates of income for small places (i.e., population less than 1,000) from the 1970 Census of Population and Housing.
Abstract: An adaptation of the James-Stein estimator is applied to sample estimates of income for small places (i.e., population less than 1,000) from the 1970 Census of Population and Housing. The adaptation incorporates linear regression in the context of unequal variances. Evidence is presented that the resulting estimates have smaller average error than either the sample estimates or an alternate procedure of using county averages. The new estimates for these small places now form the basis for the Census Bureau's updated estimates of per capita income for the General Revenue Sharing Program.

1,173 citations

Journal ArticleDOI
TL;DR: In this article, a linear regression model was used to predict the area under corn and soybeans in 12 Iowa counties. But the model was not applied to the U.S. Department of Agriculture's 1978 June Enumerative Survey of the United States.
Abstract: Knowledge of the area under different crops is important to the U.S. Department of Agriculture. Sample surveys have been designed to estimate crop areas for large regions, such as crop-reporting districts, individual states, and the United States as a whole. Predicting crop areas for small areas such as counties has generally not been attempted, due to a lack of available data from farm surveys for these areas. The use of satellite data in association with farm-level survey observations has been the subject of considerable research in recent years. This article considers (a) data for 12 Iowa counties, obtained from the 1978 June Enumerative Survey of the U.S. Department of Agriculture and (b) data obtained from land observatory satellites (LANDSAT) during the 1978 growing season. Emphasis is given to predicting the area under corn and soybeans in these counties. A linear regression model is specified for the relationship between the reported hectares of corn and soybeans within sample segments in...

740 citations

Journal ArticleDOI
TL;DR: In this paper, three small-area models, of Battese, Harter, and Fuller (1988), Dempster, Rubin, and Tsutakawa (1981), and Fay and Herriot (1979), are investigated.
Abstract: Small-area estimation has received considerable attention in recent years because of a growing demand for reliable small-area statistics. The direct-survey estimators, based only on the data from a given small area (or small domain), are likely to yield unacceptably large standard errors because of small sample size in the domain. Therefore, alternative estimators that borrow strength from other related small areas have been proposed in the literature to improve the efficiency. These estimators use models, either implicitly or explicitly, that connect the small areas through supplementary (e.g., census and administrative) data. For example, simple synthetic estimators are based on implicit modeling. In this article, three small-area models, of Battese, Harter, and Fuller (1988), Dempster, Rubin, and Tsutakawa (1981), and Fay and Herriot (1979), are investigated. These models are all special cases of a general mixed linear model involving fixed and random effects, and a small-area mean can be expr...

690 citations

Journal ArticleDOI
TL;DR: The conditional Akaike information (CAIC) as discussed by the authors was proposed for both maximum likelihood and residual maximum likelihood estimation of linear mixed-effects models in the analysis of clustered data, and the penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects.
Abstract: SUMMARY This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, CAIC. The penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The CAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data appli cation is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection.

559 citations