scispace - formally typeset
Search or ask a question

Showing papers on "Imputation (statistics) published in 1994"


Journal ArticleDOI
TL;DR: When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating imputation models is to incorporate appropriate importance weights into the standard combining rules.
Abstract: Conducting sample surveys, imputing incomplete observa- tions, and analyzing the resulting data are three indispensable phases of modern practice with public-use data files and with many other statistical applications. Each phase inherits different input, including the information preceding it and the intellectual assessments available, and aims to provide output that is one step closer to arriving at statistical infer- ences with scientific relevance. However, the role of the imputation phase has often been viewed as merely providing computational convenience for users of data. Although facilitating computation is very important, such a viewpoint ignores the imputer's assessments and information inaccessible to the users. This view underlies the recent controversy over the validity of multiple-imputation inference when a procedure for analyzing multi- ply imputed data sets cannot be derived from (is "uncongenial" to) the model adopted for multiple imputation. Given sensible imputations and complete-data analysis procedures, inferences from standard multiple- imputation combining rules are typically superior to, and thus different from, users' incomplete-data analyses. The latter may suffer from serious nonresponse biases because such analyses often must rely on convenient but unrealistic assumptions about the nonresponse mechanism. When it is desirable to conduct inferences under models for nonresponse other than the original imputation model, a possible alternative to recreating impu- tations is to incorporate appropriate importance weights into the standard combining rules. These points are reviewed and explored by simple exam- ples and general theory, from both Bayesian and frequentist perspectives, particularly from the randomization perspective. Some convenient terms are suggested for facilitating communication among researchers from dif- ferent perspectives when evaluating multiple-imputation inferences with uncongenial sources of input.

790 citations


Journal ArticleDOI
TL;DR: Three main topics are discussed: bootstrap methods for missing data, these methods' relationship to the theory of multiple imputation, and computationally efficient ways of executing them.
Abstract: Missing data refers to a class of problems made difficult by the absence of some portions of a familiar data structure. For example, a regression problem might have some missing values in the predi...

419 citations


Journal ArticleDOI
TL;DR: In this article, a new class of pattern-mixture models for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables is described.
Abstract: SUMMARY Likelihood-based methods are developed for analyzing a random sample on two continuous variables when values of one of the variables are missing. Normal maximum likelihood estimates when values are missing completely at random were derived by Anderson (1957). They are also maximum likelihood providing the missing-data mechanism is ignorable, in Rubin's (1976) sense that the mechanism depends only on observed data. A new class of pattern-mixture models (Little, 1993) is described for the situation where missingness is assumed to depend on an arbitrary unspecified function of a linear combination of the two variables. Maximum likelihood for models in this class is straightforward, and yields the estimates of Anderson (1957) when missingness depends solely on the completely observed variable, and the estimates of Brown (1990) when missingness depends solely on the incompletely observed variable. Another choice of linear combination yields estimates from complete-case analysis. Large-sample and Bayesian methods are described for this model. The data do not supply information about the ratio of the coefficients of the linear combination that controls missingness. If this ratio is not welldetermined based on prior knowledge, a prior distribution can be specified, and Bayesian inference is then readily accomplished. Alternatively, sensitivity of inferences can be displayed for a variety of choices of the ratio.

395 citations


Journal ArticleDOI
TL;DR: The disparities in factor structure were essentially resolved by eliminating five CES-D items, suggesting the need to modify the CES- D in populations like the authors', although eliminating these five items results in a more pure factor structure.
Abstract: Having observed a three-fold difference in the prevalence of significant symptoms of depression among four race-gender groups of elderly adults attending an urban primary care practice, we investigated the extent to which these differences might be explained by variability in the measurement properties of the Centers for Epidemiologic Studies depression scale (CES-D). Although the internal consistency of the CES-D was acceptable for all groups, 5% of our patients were excluded for inability to complete the minimum required number of CES-D items, and nearly 40% of patients required response imputation for the allowable one to four items that they could not answer. Imputation was most frequently required for items tapping positive affect. Principal components factor analysis was performed separately for respondents answering all items and for respondents with imputed values. In both analyses we found important race-gender differences in factor structure. Moreover, the factor structure for those with imputed values was markedly different from that of respondents answering all items, including a dissolution of the positive affect dimension. Neither the race-gender differences in factor structure nor the differences among those with and without imputed data were resolved by eliminating respondents with poor education, cognitive impairment, or alcoholism, or by varying the assumptions for data imputation. However, the disparities in factor structure were essentially resolved by eliminating five CES-D items, suggesting the need to modify the CES-D in populations like ours. Although eliminating these five items results in a more pure factor structure, it does not resolve the differences in prevalence of depressive symptoms. These differences may, however, be partially due to differential response tendencies among the race-gender groups.

192 citations


Journal ArticleDOI
TL;DR: A condition is given that identifies the limit in probability of estimators that are solutions of estimating equations computed from the incomplete data and suggests a simple algorithm to compute the asymptotic bias of these estimator that can be easily implemented with existing statistical software.
Abstract: It is well known that many standard analyses, including maximum likelihood estimation and the generalized estimating equation approach (Liang and Zeger, 1986, Biometrika 73, 13-22) can result in biased estimation when there are missing observations. In such cases it is of interest to calculate the magnitude of the bias incurred under specific assumptions about the process generating the full data and the nonresponse mechanism. In this paper we give a condition that identifies the limit in probability of estimators that are solutions of estimating equations computed from the incomplete data. With discrete data, this condition suggests a simple algorithm to compute the asymptotic bias of these estimators that can be easily implemented with existing statistical software. We illustrate our approach with asthma prevalence data in children.

108 citations


01 Jan 1994
TL;DR: In this article, five methods are used to obtain yearly indices of abundance and trends over time: chain index, indexing according to the Mountford method, route regression, imputing of missing data and logline-air Poisson regression.
Abstract: Large-scale monitoring of bird species becomes more and more important in many countries. In the datasets yielded by these censuses, values are often missing. This poses problems in the analysis of the data. Currently five methods are used to obtain yearly indices of abundance and trends over time: chain index, indexing according to the Mountford method, route regression, imputing of missing data and loglineair Poisson regression. Of each method the advantages and the limitations are dealt with. The loglinear Poisson regression appears to be the most promising approach. 1) DLO-Institute for Forestry and Nature Research & DLO-Agricultural Mathematics Group, P.O. Box 100, 6700 AC Wageningen, The Netherlands 2) Statistics Netherlands, P.O. Box 4000, 2270 JM Voorburg, The Netherlands

100 citations


Journal ArticleDOI
TL;DR: This paper investigated the effects of non-randomly missing data in two-predictor regression analyses and the differences in the effectiveness of five common treatments of missing data on estimates of R2 and of each of the two standardized regression weights.
Abstract: This research is an investigation of the effects of nonrandomly missing data in two-predictor regression analyses and the differences in the effectiveness of five common treatments of missing data on estimates of R2 and of each of the two standardized regression weights. Bootstrap samples of 50, 100, and 200 were drawn from three sets of actual field data. Nonrandomly missing data were created within each sample, and the parameter estimates were compared with those obtained from the same samples with no missing data. The results indicated that three imputation procedures (mean substitution, simple and multiple regression imputation) produced biased estimates of R2 and both regression weights. Two deletion procedures (listwise and pairwise) provided accurate parameter estimates with up to 30% of the data missing.

70 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of item nonresponse is considered in the context of contingent valuation, and the most commonly used strategies are weighting or data imputation, which are the two most common strategies.
Abstract: INTRODUCTIONResearchers who use the contingent valuation (CV) method have as a goal the measurement of valid and reliable estimates of economic values, or willingness to pay (WTP), for environmental goods (Mitchell & Carson, 1989). The contingent values should be free of sample-related bias, such as unit nonresponse (Loomis 1987, Edwards & Anderson, 1987, Dalecki, Whitehead, & Blomquist, 1993), sample selection (Whitehead, Groothuis, & Blomquist, 1993), or item nonresponse on the CV question (Mitchell & Carson, 1989).Item nonresponse is a common problem in survey-based research (Madow, Olkin, & Rubin, 1983; Connelly & Brown, 1992). CV researchers have overlooked, however, the potential bias associated with item nonresponse on variables other than WTP. This may be due to a preoccupation with other important CV issues such as market design and appropriate econometric methods. Item nonresponse bias can be a critical problem in contingent valuation research, especially when estimating aggregate benefits.THE POTENTIAL PROBLEMSConsider a mail, telephone, or in-person survey that collects data on a random sample of a general population. Suppose a large proportion of respondents fail to report their income due to embarrassment or for privacy reasons. Several strategies are available when dealing with item nonresponse on income (Little & Rubin, 1989). The simplest strategy, and the one most often observed in CV research, is complete case analysis which is the default in most statistical software packages. That is, the researcher discards the cases with item nonresponse and analyzes only the complete cases. There are two problems with this strategy.First, when incomplete cases are discarded because of item nonresponse on an independent variable, information on other independent variables is lost. This type of item nonresponse creates problems similar to unit nonresponse. Throwing out incomplete cases will result in a biased sample unless the discarded cases are a random subsample. For example, if low income households are less likely to report income and these cases are discarded the remaining sample overly represents high income households. Further, sample sizes can decrease substantially when cases with missing income data are discarded.Second, when incomplete cases are discarded information on the dependent variable, WTP, is lost. The sample bias is intensified when the variable with item nonresponse is a determinant of WTP. Extending the previous example, if incomplete cases are discarded a form of selection bias results. Respondents select themselves out of the analysis by failing to report income. For normal goods, income should be positively related with measures of WTP. If the sample under-represents low income households, WTP will be biased upward if incomplete cases are discarded.POTENTIAL SOLUTIONSTwo strategies that can be used to correct for item nonresponse bias are weighting or data imputation. The weighting approach analyzes only complete cases but corrects for the sample bias by explicitly recognizing the proportion of the population represented by the sample. Weighting reduces the bias from analyzing only the complete cases but information from the incomplete cases, which may be different from information reported in the complete cases, is lost. Weighting reduces the effects of item nonresponse but does not alter the effects of item selection.(1)Imputation requires replacing missing data with estimates of the missing values. Such a strategy allows analysis of the entire sample which reduces the effects of item nonresponse. If the estimate of the missing value is unbiased, data imputation also reduces the effects of item selection. The least costly imputation methods are unconditional mean imputation and conditional mean imputation (Little & Rubin, 1987, 1989).(2)To employ the unconditional mean imputation method, calculate the univariate mean of the variable with problematic item nonresponse and replace missing values with the mean value. …

33 citations



01 Jan 1994
TL;DR: The construction of the panel dataset for the 1983-89 waves of the Survey of Consumer Finances (SCF), focussing on multiple imputation of missing data is described, and an application of the FRITZ imputation system is described.
Abstract: This paper describes the construction of the panel dataset for the 1983-89 waves of the Survey of Consumer Finances (SCF), focussing on multiple imputation of missing data. The existing literature on panel imputation is limited (Camphuis [1993], Little and Su [1989]). In the first section of this paper, we give some background on the design of the 1983-1989 SCF panel. The next section discusses the general sample design issues that lie behind the SCF, and the following section specializes the discussion to the 1983-89 panel. We discuss the construction of the panel dataset and some of the basic issues in data editing. The next section describes our implementation of an application of the FRITZ imputation system, which was originally developed for the 1989 SCF cross-section. Finally, we present some data on the results of the panel imputations. I. Background on the 1983-89 SCF Panel In 1983, the first wave of the SCF panel was conducted as a part of a multi-agency effort, led by the Federal Reserve and made possible by the cooperation of Statistics of Income (SO1) at the Internal Revenue Service. Data were collected by the Survey Research Center of the University of Michigan. The survey was designed to gather comprehensive and detailed financial information from a representative sample of U.S. households. The questionnaire was complex and took, on average, about 75 minutes to administer. The 1983 SCF respondents were reinterviewed in 1986, and again in 1989. The data from the 1983-1986 panel have previously been processed and analyzed (Avery and Kennickell [1991]). However, the 1986 survey is very different from either the 1983 or 1989 waves of the survey. The 1986 survey was much shorter, and in many ways the data quality was inferior to that of the other two surveys. In addition, for most analytic purposes, the major data needs are related to changes between 1983 and 1989. For these reasons, the 1986 data have been used in the work reported here only for bounding imputations and for constructing some summary variables that were asked directly of only some respondents in 1989. Both the 1983 and 1989 surveys were previously edited and imputed independently using only cross-sectional information. 1 However, this may not be an appropriate treatment if the data are to be used to analyze intertemporal relationships. For example, if we know in one wave of a survey that a household has an income of $1 million, we would need to capture this information in some way in other waves, and this need is independent of the ordering of the reporting of information in time. However, if one must first have "completed" data at each cross-section and panel stage, over time there may be many versions of the "same" data. H. Sample Design The sample design for the 1983 survey uses a dual-frame design to address two fundamental problems inherent in measuring wealth. Some components of wealth (for example, holdings of corporate stock) are highly skewed, while others (for example, mortgage debt) are more broadly distributed (Kennickell and Woodburn [1992]). In addition, wealthier households have a higher propensity to refuse participation in surveys (Kennickell and McManus [1993]). If there is no adjustment for this reporting difference, analysis of the survey results will be biased in many cases. A standard multi-stage area-probability sample with 3665 of the completed cases (a 71 percent response rate) provides good representation of broadly-distributed characteristics. A special list sample designed using a file of individual tax data maintained by SOI (IRS [1990]) improves the precision of estimates of skewed financial variables and enables systematic corrections for unit nonresponse. The list sample was selected in a way that tends to oversample wealthy households. Under an agreement with SOI, each selected list case was mailed a packet containing a letter requesting cooperation with the survey and a postcard to be returned if the person agreed to participate. In 1983 only about 9 percent returned the postcard, but about 95 percent of those who did so were eventually interviewed (438 cases). While the level of nonresponse is high (even by more recent SCF experience), it is important to note that such nonresponse is implicit in most surveys, but usually there is no means of identifying the problem. HI. The Panel Sample The 1989 wave of the SCF panel is part of a more complicated design. The 1989 survey was an overlapping panel/cross-section based on the

21 citations


Proceedings ArticleDOI
28 Sep 1994
TL;DR: This paper deals with problems concerning missing data in statistical databases by finding imputations that will extrapolate the structure of the data, as well as the uncertainty about this structure.
Abstract: This paper deals with problems concerning missing data in statistical databases. Multiple imputation is a statistically sound technique for handling incomplete data. Two problems should be addressed before the routine application of the technique becomes feasible. First, if imputations are to be appropriate for more than one statistical analysis, they should be generated independently of any scientific models that are to be applied to the data at a later stage. This is done by finding imputations that will extrapolate the structure of the data, as well as the uncertainty about this structure. A second problem is to use complete-data methods in an efficient way. The HERMES workstation encapsulates existing statistical packages in a client-server model. It forms a natural and convenient environment for implementing multiple imputation. >


Journal ArticleDOI
A. L. Bello1
TL;DR: A proportional bootstrap method is proposed that allows effective use of imputation techniques for all bootstrap samples and particular emphasis is placed on the estimation of standard error for correlation coefficient.
Abstract: Bootstrap is a time-honoured distribution-free approach for attaching standard error to any statistic of interest, but has not received much attention for data with missing values especially when using imputation techniques to replace missing values. We propose a proportional bootstrap method that allows effective use of imputation techniques for all bootstrap samples. Five detcnninistic imputation techniques are examined and particular emphasis is placed on the estimation of standard error for correlation coefficient. Some real data examples are presented. Other possible applications of the proposed bootstrap method are discussed.

Journal ArticleDOI
TL;DR: In this article, the authors considered the inference of incomplete data when the missing data process is non-ignorable in the sense of Rubin (Biometrica 38 (1982) 963-974).

Journal ArticleDOI
TL;DR: An expectation-modeling-maximization (EMM) algorithm, where censored data are imputed as pseudo-complete samples and a forward regression is used to compare all main effects and 2-factor interactions for process characterization, to find the best combination of controllable variables.
Abstract: Censored data resulting from life-test of durable products, coupled with complicated structures of screening experiments, makes process characterization very difficult Existing methods can be inadequate for modeling such data because important effects and factor levels might be identified wrongly This article presents an expectation-modeling-maximization (EMM) algorithm, where censored data are imputed as pseudo-complete samples and a forward regression is used to compare all main effects and 2-factor interactions for process characterization Then, the best combination of controllable variables is determined in order to optimize predictions from the final model A sensitivity study of the selected models, with changes of imputation and parameter estimation methods, shows the importance of using appropriate models and estimation methods in EMM The author's analysis of the Specht (1985) heat-exchanger life-test data indicates that E, EG, EH in the wall data and A, K, D, DJ in the corner data are the dominating factors However, in finding the best process recipe, one might use a model with a few additional terms, which leads to more accurate predictions for better process optimization >

Proceedings Article
01 Jan 1994
TL;DR: This paper deals with problems concerning missing data in clinical databases by outlining the concepts behind multiple imputation, a statistically sound method for handling incomplete data.
Abstract: This paper deals with problems concerning missing data in clinical databases. After signalling some shortcomings of popular solutions to incomplete data problems, we outline the concepts behind multiple imputation. Multiple imputation is a statistically sound method for handling incomplete data. Application of multiple imputation requires a lot of work and not every user is able to do this. A transparent implementation of multiple imputation is necessary. Such an implementation is possible in the HERMES medical workstation. A remaining problem is to find proper imputations.

01 Jan 1994
TL;DR: For example, the authors argues that moral evil of all sorts is manifested in action which, on the one hand, is explicable in terms of an agent's own reasons for action and so imputable, though on the other hand it is, in some sense, irrational.
Abstract: For Kant, moral evil of all sorts — evil that is rooted in a person's character — is manifested in action which, on the one hand, is explicable in terms of an agent's own reasons for action and so imputable, though on the other hand it is, in some sense, irrational. Because such evil is rooted in a person's character, it "corrupts the ground of all maxims"' and thus deserves to be called radical evil. Moreover, according to Kant, not only are human beings susceptible to such evil, being evil is an inescapable condition of being human. These claims raise a number of questions, among them the following: (1) How can we explain the possibility of irrational, yet explicable, freely done actions given Kant's views about human agency? (2) What is the nature of radical evil? (3) In what sense is it a corrupting ground of all maxims? (4) What reason does Kant have for claiming that radical evil is an inescapable part of the human condition? There are other questions to be added to this list, some of them addressed in the recent secondary literature, but for the most part I plan to focus on the ones just mentioned.

Journal Article
TL;DR: The most common solution to the missing data problem is probably listwise deletion as discussed by the authors, which is the default option in several computer programs (LISREL, SPSS, NCSS).
Abstract: Many missing data studies have simulated data, randomly deleted values, and investigated which method of handling the missing values would most closely approximate the original data. Regression procedures have emerged as the most recommended methods. If the values are missing randomly, these procedures are effective. If, however, the values are not missing randomly, the use of regression procedures to impute values for missing data is questionable. The purpose of this study was to determine if values were missing randomly in samples selected from the student cohort of the National Education Longitudinal Study of 1988. Four samples were selected: two samples of eight variables, average intercorrelation of.2 and .4 respectively; and two samples offour variables, average inter-correlation of.2 and .4 respectively. All cases containing more than one missing value were selected. The pattern of simultaneously missing values for each selected case was determined. If values were missing randomly, it was assumed the proportion of jointly missing values would be equivalent. Chi square goodness offit analysis indicates the missing values are not missing randomly (p Different methods of handling missing values may produce different results. When Jackson (1968) entered data on all the available variables in a discriminant analysis, the significance of the regression coefficients (as well as the interpretation of the importance) of individual variables changed with the missing value method used. Witty and Kaiser (1991) reported that the regression coefficients and total variance accounted for by the variables changed depending on the method used to handle missing values. After reanalyzing three studies of private/public school achievement, Ward Jr. and Clark III (1991) concluded that the method used to handle missing data influenced the outcome of these studies. They further add that the iterative regression procedures are considered the most effective. Witty (1993), however, found the regression procedures to be less effective in replicating the population covariance matrix and mean vector than listwise and pairwise deletion when sample sizes were large (2000) and no more effective when sample sizes were small (200). She attributed this to lack of randomness in the missing data. The dilemma is further entangled when ignoring the missing data problem may lead to analysis of data that is of dubious value. Publication of the results of this analysis without correctly handling the missing values may "jeopardize the credibility of the organization conducting the survey and preparing the analysis and report:..." (Little & Smith,1983, p. S 18). Researchers are thus faced with the task of determining which missing data method is most appropriate for their research. Unfortunately, there is no established correct method for handling missing values when the mechanism causing them is unknown. This study examines the mechanism causing the missing values by investigating the pattern of missing values. The purpose of this study was to determine if simultaneously missing values were missing randomly from four samples selected from the National Education Longitudinal Study of 1988 (NELS-88). Because the NELS-88 data base is readily accessible and widely used, information concerning the pattern of missing values and its influence on missing data procedures is needed. Literature Review Methods that are currently being used to handle missing values include deletion methods (listwise and pairwise deletion) and imputation methods (mean substitution and various regression procedures). The most common solution to the missing data problem is probably listwise deletion. This procedure is the default option in several computer programs (LISREL, SPSS, NCSS). This method discards cases with a missing value on any variable and thus is very wasteful of data. If the data are assumed to be missing completely at random, this procedure is acceptable. …


Book ChapterDOI
01 Jan 1994
TL;DR: In this article, the authors consider the problem of imputing a value to an asset when determining the rate base for the regulated portion of the company, in particular when the asset is partially or even completely utilized by a subsidiary.
Abstract: Public utilities often impute a value to an asset when determining the rate base for the regulated portion of the company, in particular when the asset is partially or even completely utilized by a subsidiary (in such a case, an imputed reduction in the rate base) Further, imputation is a practice sometimes employed by state public utility regulatory agencies when a vertically integrated regulated firm is the primary supplier of a productive input both to itself and to its competitors in a downstream market; a value is imputed for the input which is used for the production of the integrated firm’s downstream service In these contexts, the common thread is that there is not necessarily a market transaction that establishes a value for the item which must be imputed

Journal ArticleDOI
TL;DR: In this paper, different approaches for dealing with clinical trials with a repeated binary endpoint are discussed, where many of the binary outcomes are likely to be missing, and the procedures are illustrated with an example of a clinical trial in opiate-dependent individuals.
Abstract: This paper discusses different approaches for dealing with clinical trials with a repeated binary endpoint, where many of the binary outcomes are likely to be missing. Ad hoc tests of means and their rank analogues are discussed as well as combination tests, simple imputation, and model-based approaches. The procedures are illustrated with an example of a clinical trial in opiate-dependent individuals.

Journal ArticleDOI
TL;DR: While imputation leads to undesirable results that are not easily corrected, REML estimation in which test statistics are compared to an F-distribution provides an elegant tool for the analysis of these designs.
Abstract: The statistical analysis of the repeated measures design with two factors within and no factors between subjects, which is popular in clinical pharmacology, is discussed. Use of restricted maximum likelihood (REML) methodology is compared to an imputation procedure in small sample situations with missing data and is illustrated by simulations. While imputation leads to undesirable results that are not easily corrected, REML estimation in which test statistics are compared to an F-distribution provides an elegant tool for the analysis of these designs.

01 Jan 1994
TL;DR: In this paper, the techniques employed in the treatment of an hourly surface wind database during the development and calibration phases of an objective wind field interpolator model are presented, and the model itself has been applied to estimate the regional wind energy resource creating a layer in a GIS environment.
Abstract: The techniques employed in the treatment of an hourly surface wind database during the development and calibration phases of an objective wind field interpolator model are presented. The model itself has been applied to estimate the regional wind energy resource creating a layer in a GIS environment. The outlier detection phase is presented in a companion paper, and here the different techniques applied in order to imputate the missing values are described. The comparative results obtained with an hourly dataset of 15 years long are also presented. Two different problems have been simulated numerically: systematic missing values (i.e. at fixed hours) and non systematic ones. Five different criteria were applied: imputation with the historical mean value; linear time interpolation within single station records; optimum interpolation (kriging) and the two newly developed Penalty Of the Principal Scores and linear Time Interpolation of the Principal Scores which considers all station records in a multivariate fashion; they prove to be the most accurate for this particular wind dataset. There is also some evidence of oversampling in time.

Journal ArticleDOI
TL;DR: The authors assesses the imputation to the judiciary of economic reasoning in determining the standard of care in torts and argues that since the courts consider only the actual damage that occurred, they cannot be using, even implicitly, the economic concept of expected damage.
Abstract: The paper assesses the imputation to the judiciary of economic reasoning in determining the standard of care in torts. The author argues that since the courts consider only the actual damage that occurred, they cannot be using, even implicitly, the economic concept of expected damage. In general judges do not use “risk” or “probability” to weight the level of damage but use the concepts nonmultiplicatively in a manner similar to that implied in Shackle's criteria for decision making in the absence of certainty. Shackle's theory attempts to be descriptive, and it may well fit the rule employed by Judge Learned Hand.


Book ChapterDOI
01 Jan 1994
TL;DR: It is shown that fair imputation belongs to the cores of cooperative games being constructed and the present method is shown to be applicable to all cooperative games with monotonic characteristic functions.
Abstract: A game-theoretic approach is presented for the problem of allocating the damage to environment from pollution by several enterpises. This approach is bases on the cost allocation method proposed in the paper as „a fair distribution” for allocation problem. The present method is shown to be applicable to all cooperative games with monotonic characteristic functions. Properties of the fair distribution are discussed. It is shown that fair imputation belongs to the cores of cooperative games being constructed.

Journal ArticleDOI
01 Jan 1994
TL;DR: In this article, a simple proporcional imputation formula was used to estimate the percentage mean of the obscure suicide cipher (1970-1974) using the official statistical records on suicide.
Abstract: The official statistical records on suicide are not reliable due to an under reporring of suicide cases. Considering that the true suicide count is the sum of the officia/ record plus the obscure cipher, an attempt was made to estimare the latter using a simple proporcional imputation formula. The official statistical records were re-analysed under the following racional: the category of violent death caused by acccident or intention, but where it could not be established for cerrain, served as a reservoir that contained the obscure cipher. Using this indirect technique, it was possible to determine that the percentage mean of the obscure suicide cipher (1970-1974) is lower than the showed in the literature.


01 Jan 1994
TL;DR: It is shown, both by visual inspection on image data and by feeding the imputed values to another classication algorithm, how a mixture of Gaussians can model the data distribution so as to provide a valuable tool for missing values imputation.
Abstract: In data-mining applications, we are frequently faced with a large fraction of missing entries in the data matrix, which is problematic for most discriminant machine learning algorithms. A solution that we explore here is the use of a generative model (a mixture of Gaussians with full covariances) to learn the underlying data distribution and replace missing values by their conditional expectation given the observed variables. Since training a Gaussian mixture with many dieren t patterns of missing values can be computationally very expensive, we introduce a spanning-tree based algorithm that signican tly speeds up training in these conditions. Such mixtures of Gaussians can be applied directly to supervised problems (Ghahramani and Jordan, 1994), but we observe that using them for missing value imputation before applying a separate discriminant learning algorithm yields better results. Our contributions are two-fold: 1. We explain why the basic EM training algorithm is not practical in large-dimensional applications in the presence of missing values, and we propose a novel training algorithm that signican tly speeds up training by EM. The algorithm we propose relies on the idea to re-use the computations performed on one training sample as a basis for the next sample, in order to obtain the quantities required by the EM update equations. We show how these computations can be minimized by ordering samples in such a way that two consecutive samples have similar \missing patterns", i.e. share missing values for similar variables. On 28x28 images with random squares of 5x5 pixels being forced to missing values, we obtain a speed-up on the order of 8 compared to standard EM training. 2. We show, both by visual inspection on image data (gure 1) and by feeding the imputed values to another classication algorithm (gure 2), how a mixture of Gaussians can model the data distribution so as to provide a valuable tool for missing values imputation.