scispace - formally typeset
Search or ask a question

Showing papers on "Imputation (statistics) published in 1986"


Journal ArticleDOI
TL;DR: In this paper, several multiple imputation techniques for simple random samples with ignorable nonresponse on a scalar outcome variable are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean.
Abstract: Several multiple imputation techniques are described for simple random samples with ignorable nonresponse on a scalar outcome variable. The methods are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean. Using m = 2 imputations per missing value gives accurate coverages in common cases and is clearly superior to single imputation (m = 1) in all cases. The performances of the methods for various m can be predicted well by linear interpolation in 1/(m — 1) between the results for m = 2 and m = ∞. As a rough guide, to assure coverages of interval estimates within 2% of the nominal level when using the preferred methods, the number of imputations per missing value should increase from 2 to 3 as the nonresponse rate increases from 10% to 60%.

725 citations


Journal ArticleDOI
TL;DR: In this paper, the theoretical properties of nonresponse adjustments based on adjustment cells are studied, for estimates of means for the whole population and in subclasses that cut across adjustment cells.
Abstract: Summary Theoretical properties of nonresponse adjustments based on adjustment cells are studied, for estimates of means for the whole population and in subclasses that cut across adjustment cells. Three forms of adjustment are considered: weighting by the inverse response rate within cells, post-stratification on known population cell counts, and mean imputation within adjustment cells. Two dimensions of covariate information x are distinguished as particularly useful for reducing nonresponse bias: the response propensity f(x) and the conditional mean ^(x) of the outcome variable y given x. Weighting within adjustment cells based on ^f(x) controls bias, but not necessarily variance. Imputation within adjustment cells based on ^(x) controls bias and variance. Post-stratification yields some gains in efficiency for overall population means, and smaller gains for means in subclasses of the population. A simulation study similar to that of Holt & Smith (1979) is described which explores the mean squared error properties of the estimators. Finally, some modifications of response propensity weighting to control variance are suggested.

526 citations


Journal ArticleDOI
TL;DR: In the most frequently used microdata sets, over a quarter of all respondents now refuse to answer some questions about their incomes as discussed by the authors, which has been increasing in severity over time, by imputing incomes of non-respondents.
Abstract: In the most frequently used microdata sets, over a quarter of all respondents now refuse to answer some questions about their incomes. The Census Bureau has dealt with this problem, which has been increasing in severity over time, by imputing incomes of non-respondents. Their imputation procedure, called the "hot deck," essentially matches nonrespondents with demographically similar donors. In this paper we evaluate the census imputation methodology and raise some questions. First, the census procedure is tied to commonality of events in the population rather than the more appropriate informational content of regressors. Clearly, the census procedure severely understates income in certain occupations. Because it is based on the apparently invalid assumption that income does not affect reporting propensities, it most likely understates average incomes as well.

184 citations


Journal ArticleDOI
TL;DR: In this paper, the authors identify sample outliers as two basic types: representative outliers and non-representative outliers, i.e., sample elements whose data values are incorrect or unique in some sense.
Abstract: Outliers in sample data are a perennial problem for applied survey statisticians. Moreover, it is a problem for which traditional sample survey theory offers no real solution, beyond the sensible advice that such sample elements should not be weighted to their fullest extent in estimation. Sample outliers can be identified as of two basic types. Here we are concerned with the first type, which may conveniently be termed representative outliers. These are sample elements with values that have been correctly recorded and that cannot be assumed to be unique. That is, there is no good reason to assume there are no more similar outliers in the nonsampled part of the target population. The remaining sample outliers, which by default are termed nonrepresentative, are sample elements whose data values are incorrect or unique in some sense. Methods for dealing with these nonrepresentative outliers lie basically within the scope of survey editing and imputation theory and are, therefore, not considered in ...

158 citations


Journal ArticleDOI
TL;DR: In this article, the authors compare the CPS hot deck imputations of wages and salary amounts with alternatives based on regression models for the logarithm of wages, and for the wage rate.
Abstract: The U.S. Bureau of the Census imputes missing income items in the income supplement of the Current Population Survey (CPS) by a technique commonly known as the CPS hot deck. This article compares CPS hot deck imputations of wages and salary amounts with alternatives based on regression models for the logarithm of wages and salary and for the wage rate. Comparisons are effected by comparing imputations with an Internal Revenue Service (IRS) wages and salary amount found by an exact match of CPS data to IRS records. Although limitations in the matching and in the comparison variable preclude a definitive conclusion, we find that (a) the CPS hot deck does not underestimate income aggregates to any serious extent; (b) model-based alternatives have slightly smaller mean absolute error than the hot deck, when comparable data bases of respondents are used to carry out imputations; and (c) multivariate models for imputing recipiency, weeks and hours worked, and earnings need to be developed to provide re...

131 citations


Journal ArticleDOI
TL;DR: It is concluded that pairwise deletion and listwise deletion are among the least effective methods in terms of approximating the results that would have been obtained had the data been complete, whereas replacing missing values with estimates based on correlationalprocedures generally produces the most accurate results.
Abstract: Although research conducted in applied settings is frequently hindered by missing data, there is surprisingly little practical advice concerning effective methods for dealing with the problem. The purpose of this article is to describe several alternative methodsfor dealing with incomplete multivariate data and to examine the effectiveness of these methods. It is concluded that pairwise deletion and listwise deletion are among the least effective methods in terms of approximating the results that would have been obtained had the data been complete, whereas replacing missing values with estimates based on correlationalprocedures generally produces the most accurate results. In addition, some descriptive statistical procedures are recommended that permit researchers to investigate the causes and consequences of incomplete data more fully.

124 citations


Journal ArticleDOI
TL;DR: A model in which a response is modified to pass a set of edits with as little change as possible is developed, which is NP-hard for categorical data and general edits.
Abstract: Responses to surveys often contain large amounts of incorrect information. One option for dealing with the problem is to revise those erroneous responses that can be detected. Fellegi and Holt developed a model in which a response is modified to pass a set of edits with as little change as possible. The model is called Minimum Weighted Fields to Impute MWFI and is NP-hard for categorical data and general edits. We develop two algorithms for MWFI, based on set covering, and present computational experience.

51 citations


Journal ArticleDOI
TL;DR: In this article, a positive correlation between skewness and kurtosis was found to reduce the likelihood of associated decision errors in a wide range of joint price-yield distributions.
Abstract: Cash returns from farming are expected to be nonnormally distributed under a wide range of joint price-yield distributions. Adequate testing for such nonnormality requires use of proper whitening procedures as well as appropriate statistics. With tests and sample sizes commonly employed, a false imputation of normality often will be made. However, positive correlation between skewness and kurtosis reduces the likelihood of associated decision errors. These results are illustrated with data for

47 citations


Journal ArticleDOI
TL;DR: In this paper, a new method is proposed for imputation of missing values in sample survey data, which uses standard statistical methodology, permits a general specification of the nonresponse process, and does not impose specific model assumptions.
Abstract: A new method is proposed for imputation of missing values in sample survey data. The procedure uses standard statistical methodology, permits a general specification of the nonresponse process, and does not impose specific model assumptions. Prior information from past similar surveys or from other sources may be incorporated in a routine manner.

20 citations


Journal ArticleDOI
TL;DR: In this article, the importance of imputations that arise in a representation scheme depends strongly on the use to which the scheme is put and whether it is used as part of a formal, objective account of natural language, or is used rather as a representational tool within an agent.

17 citations


Book ChapterDOI
01 Jan 1986
TL;DR: In this paper, the authors addressed the analysis issues of longitudinal vs. cross-sectional methods of imputation and adjustment for missing values, and the use of weights in longitudinal analyses to adjust for unequal probabilities of selection and nonresponse.
Abstract: Longitudinal survey data can arise in many different settings, e.g., from rotating panel surveys, in cohort studies, and in the context of field experiments that involve economic and social phenomena that change over time. In all of these settings the longitudinal feature implies repeated interviews of respondents from nonstationary populations, and both panel attrition and missing data present special concerns. The issues here are ones involving both design and analysis. Among the design issues in a longitudinal survey is how to achieve a high degree of data continuity by following movers, when the cost of such continuity is high. If the sampling units of interest are groups as opposed to individuals, there is often a critical need for operational definitions of “family” and “household”, because the concepts are dynamic and change over time. Among the analysis issues addressed in the paper are (i) the use of longitudinal vs. cross-sectional methods of imputation and adjustment for missing values, and (ii) the use of weights in longitudinal analyses to adjust for unequal probabilities of selection and nonresponse.

Journal ArticleDOI
TL;DR: Responses to surveys often contain large amounts of incorrect information and one option for dealing with the problem is to revise those erroneous responses that can be detected.
Abstract: Responses to surveys often contain large amounts of incorrect information. One option for dealing with the problem is to revise those erroneous responses that can be detected. Fellegi and Holt deve...

Posted Content
TL;DR: In this article, the authors focus on the missing income problem in analyses of Engel functions, and statistically link particular demographic attributes in affecting the probability of reporting income information, and discuss several techniques to overcome this problem, namely, regression imputation, the Heckman procedure, and item deletion.
Abstract: The empirical evidence from the extant literature in demand analysis points to the importance of income in food expenditure relationships. However, roughly 30 percent of all households in the 1977-78 Nationwide Food Consumption Survey do not report income figures. The focus of this paper is on the missing income problem in analyses of Engel functions. This analysis statistically links particular demographic attributes in affecting the probability of reporting income information. Additionally, several techniques to overcome the missing income problem, namely, regression imputation, the Heckman procedure, and item deletion, are discussed. Empirical evidence suggests that the Heckman procedure is statistically superior to item deletion, and that regression imputation and the Heckman procedure yield similar results.

Journal ArticleDOI
TL;DR: In this article, the authors report several experimental tests of the deterrence set, a solution concept for n -person games proposed by Laffond and Moulin (1977), which is distinctive because it specifies equilibria attained through the use of threats, where threats are conceptualized as costly not only to the target but also to the user.
Abstract: This paper reports several experimental tests of the deterrence set, a solution concept for n -person games proposed by Laffond and Moulin (1977, 1981). This solution concept is distinctive because it specifies equilibria attained through the use of threats, where threats are conceptualized as costly not only to the target but also to the user. The laboratory tests were conducted in the context of 3- and 4-person cooperative non-sidepayment matrix games. In the first experimental test, the deterrence set was juxtaposed against the imputation set. Results indicate that the deterrence set has greater predictive accuracy than the imputation set. In a series of further tests, the deterrence set was juxtaposed against both the imputation set and the von Neumann-Morgenstern stable set solution. Results again show that the deterrence set is more accurate than the imputation set, but neither the deterrence set nor the stable set is reliably more accurate than the other. Overall, these results indicate that deterrence is a viable basis for stability in cooperative non-sidepayment games.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of missing data in end-use energy demand models and discard cases in which values are missing for variables required by their models (see e.g., U.S. Government, 1983, Pacific Gas and Electric, 1983; Hirst and Carney, 1978; and EPRI, 1977).
Abstract: Although missing data are found in all types of data sets, surveys are particularly prone to produce data sets in which values of some respondent variables are missing (see, e.g., Cochran, 1977; Ericson, 1967; Kalton, 1983; and Hutcheson and Prather, 1977). Survey data collected for end-use energy demand models are no exception; high frequencies of nonresponse occur for many variables. This issue is, however, generally disregarded in the end-use literature, and analysts working with end-use models often discard cases in which values are missing for variables required by their models (see e.g., U.S. Government, 1983; Pacific Gas and Electric, 1983; Hirst and Carney, 1978; and EPRI, 1977). Discarding cases with missing values has important consequences. It implicitly assumes that the missing values occur randomly rather than systematically. If, however, missing values do not occur ran-domly, discarding cases with missing values will result in misspecified models and biased forecasts. Furthermore, by discarding cases, the detail appro-priate for a given end-use model can be lost.