scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1986"





Journal ArticleDOI
TL;DR: In this article, the authors use a particular model for causal inference (Holland and Rubin 1983; Rubin 1974) to critique the discussions of other writers on causation and causal inference.
Abstract: Problems involving causal inference have dogged at the heels of statistics since its earliest days. Correlation does not imply causation, and yet causal conclusions drawn from a carefully designed experiment are often valid. What can a statistical model say about causation? This question is addressed by using a particular model for causal inference (Holland and Rubin 1983; Rubin 1974) to critique the discussions of other writers on causation and causal inference. These include selected philosophers, medical researchers, statisticians, econometricians, and proponents of causal modeling.

4,845 citations


Journal ArticleDOI
TL;DR: These approximations to the posterior means and variances of positive functions of a real or vector-valued parameter, and to the marginal posterior densities of arbitrary parameters can also be used to compute approximate predictive densities.
Abstract: This article describes approximations to the posterior means and variances of positive functions of a real or vector-valued parameter, and to the marginal posterior densities of arbitrary (ie, not necessarily positive) parameters These approximations can also be used to compute approximate predictive densities To apply the proposed method, one only needs to be able to maximize slightly modified likelihood functions and to evaluate the observed information at the maxima Nevertheless, the resulting approximations are generally as accurate and in some cases more accurate than approximations based on third-order expansions of the likelihood and requiring the evaluation of third derivatives The approximate marginal posterior densities behave very much like saddle-point approximations for sampling distributions The principal regularity condition required is that the likelihood times prior be unimodal

2,081 citations



Journal ArticleDOI
TL;DR: Comparison properties of random variables and stochastic processes are given and are illustrated by application to various queueing models and questions in experimental design, renewal and reliability theory, PERT networks and branching processes.
Abstract: Studies stochastic models of queueing, reliability, inventory, and sequencing in which random influences are considered. One stochastic mode--rl is approximated by another that is simpler in structure or about which simpler assumptions can be made. After general results on comparison properties of random variables and stochastic processes are given, the properties are illustrated by application to various queueing models and questions in experimental design, renewal and reliability theory, PERT networks and branching processes.

1,052 citations


Journal ArticleDOI
TL;DR: In this article, a nonlinear relationship between electricity sales and temperature is estimated using a semiparametric regression procedure that easily allows linear transformations of the data and accommodates introduction of covariates, timing adjustments due to the actual billing schedules, and serial correlation.
Abstract: A nonlinear relationship between electricity sales and temperature is estimated using a semiparametric regression procedure that easily allows linear transformations of the data. This accommodates introduction of covariates, timing adjustments due to the actual billing schedules, and serial correlation. The procedure is an extension of smoothing splines with the smoothness parameter estimated from minimization of the generalized cross-validation criterion introduced by Craven and Wahba (1979). Estimates are presented for residential sales for four electric utilities and are compared with models that represent the weather using only heating and cooling degree days or with piecewise linear splines.

954 citations



Journal ArticleDOI
TL;DR: In this article, the authors derived the risk functions and Bayes risks for a number of well-known models and compared them with those of usual estimators and predictors, and showed that some usual predictors are inadmissible relative to the asymmetric LINEX loss by providing alternative estimators.
Abstract: Estimators and predictors that are optimal relative to Varian's asymmetric LINEX loss function are derived for a number of well-known models. Their risk functions and Bayes risks are derived and compared with those of usual estimators and predictors. It is shown that some usual estimators, for example, a scalar sample mean or a scalar least squares regression coefficient estimator, are inadmissible relative to asymmetric LINEX loss by providing alternative estimators that dominate them uniformly in terms of risk.

839 citations


Journal ArticleDOI
TL;DR: The techniques of exploratory data analysis include a resistant rule for identifying possible outliers in univariate data that uses the lower and upper fourths, FL and FU (approximate quartiles), and defines the some-outside rate per sample as the probability that a sample will contain one or more outside observations.
Abstract: The techniques of exploratory data analysis include a resistant rule for identifying possible outliers in univariate data. Using the lower and upper fourths, FL and FU (approximate quartiles), it labels as “outside” any observations below FL − 1.5(FU — FL ) or above FU + 1.5(FU — FL ). For example, in the ordered sample −5, −2, 0, 1, 8, FL = −2 and FU = 1, so any observation below −6.5 or above 5.5 is outside. Thus the rule labels 8 as outside. Some related rules also use cutoffs of the form FL — k(FU — FL ) and FU + k(FU — FL ). This approach avoids the need to specify the number of possible outliers in advance; as long as they are not too numerous, any outliers do not affect the location of the cutoffs. To describe the performance of these rules, we define the some-outside rate per sample as the probability that a sample will contain one or more outside observations. Its complement is the all-inside rate per sample. We also define the outside rate per observation as the average fraction of outs...

Journal ArticleDOI
TL;DR: In this article, a modified Bonferroni procedure with greater power than the B procedure was introduced, and the criterion continues to be modified in a stagewise manner, with the denominator of α′ reduced by 1 each time a hypothesis is rejected, so that tests can be conducted at successively higher significance levels.
Abstract: Suppose that n hypotheses H 1, H 2, …, H n with associated test statistics T 1, T 2, …, T n are to be tested by a procedure with experimentwise significance level (the probability of rejecting one or more true hypotheses) smaller than or equal to some specified value α. A commonly used procedure satisfying this condition is the Bonferroni (B) procedure, which consists of rejecting H i , for any i, iff the associated test statistic T i is significant at the level α′ = α/n. Holm (1979) introduced a modified Bonferroni procedure with greater power than the B procedure. Under Holm's sequentially rejective Bonferroni (SRB) procedure, if any hypothesis is rejected at the level α′ = α/n, the denominator of α′ for the next test is n − 1, and the criterion continues to be modified in a stagewise manner, with the denominator of α′ reduced by 1 each time a hypothesis is rejected, so that tests can be conducted at successively higher significance levels. Holm proved that the experimentwise significance level...

Journal ArticleDOI
TL;DR: In this paper, several multiple imputation techniques for simple random samples with ignorable nonresponse on a scalar outcome variable are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean.
Abstract: Several multiple imputation techniques are described for simple random samples with ignorable nonresponse on a scalar outcome variable. The methods are compared using both analytic and Monte Carlo results concerning coverages of the resulting intervals for the population mean. Using m = 2 imputations per missing value gives accurate coverages in common cases and is clearly superior to single imputation (m = 1) in all cases. The performances of the methods for various m can be predicted well by linear interpolation in 1/(m — 1) between the results for m = 2 and m = ∞. As a rough guide, to assure coverages of interval estimates within 2% of the nominal level when using the preferred methods, the number of imputations per missing value should increase from 2 to 3 as the nonresponse rate increases from 10% to 60%.


Journal ArticleDOI
TL;DR: This work provides simple estimates for the downward bias of the apparent error rate of logistic regression on binary data, with error rates measured by the proportion of misclassified cases.
Abstract: A regression model is fitted to an observed set of data. How accurate is the model for predicting future observations? The apparent error rate tends to underestimate the true error rate because the data have been used twice, both to fit the model and to check its accuracy. We provide simple estimates for the downward bias of the apparent error rate. The theory applies to general exponential family linear models and general measures of prediction error. Special attention is given to the case of logistic regression on binary data, with error rates measured by the proportion of misclassified cases. Several connected ideas are compared: Mallows's Cp , cross-validation, generalized cross-validation, the bootstrap, and Akaike's information criterion.

Journal ArticleDOI
TL;DR: In this article, a completely parametric system is presented for the decomposition of time-varying patterns of risk into additive, overlapping phases, descriptively labeled as early, constant-hazard, and late.
Abstract: The hazard function of time-related events, such as death or reoperation following heart valve replacement, often is time-varying in a structured fashion, as is the influence of risk factors associated with the events. A completely parametric system is presented for the decomposition of time-varying patterns of risk into additive, overlapping phases, descriptively labeled as early, constant-hazard, and late. Each phase is shaped by a different generic function of time constituting a family of nested equations and is scaled by a separate logit-linear or log-linear function of concomitant information. Model building uses maximum likelihood estimation. The resulting parametric equations permit hazard function, survivorship function, and probability estimates and their confidence limits to be portrayed and adjusted for concomitant information. These provide a comprehensive analysis of time-related events from which inferences may be drawn to improve, for example, the management of patients with valva...



Journal ArticleDOI
TL;DR: In this article, a class of regression families that allow the statistician to model overdispersion while carrying out generalized linear regressions is discussed. But the authors focus on two examples: a logistic regression and a large two-way contingency table.
Abstract: In one-parameter exponential families such as the binomial and Poisson, the variance is a function of the mean. Double exponential families allow the introduction of a second parameter that controls variance independently of the mean. Double families are used as constituent distributions in generalized linear regressions, in which both means and variances are allowed to depend on observed covariates. The theory is applied to two examples—a logistic regression and a large two-way contingency table. In such cases the binomial model of variance is often untrustworthy. For example, because genuine random sampling was infeasible, the subjects may have been obtained in clumps so that the statistician should really be using smaller sample sizes. Clumped sampling is just one of many possible causes of overdispersion, a habitual source of concern to users of binomial and Poisson models. This article concerns a class of regression families that allow the statistician to model overdispersion while carrying ...

Journal ArticleDOI
TL;DR: In this paper, an iterative procedure is proposed to identify the outliers, to remove their effects, and to specify a tentative model for the underlying process, which is essentially based on the iterative estimation procedure of Chang and Tiao (1983) and the extended sample autocorrelation function (ESACF) model identification method of Tsay-Tiao (1984).
Abstract: Outliers are commonplace in data analysis. Time series analysis is no exception. Noting that the effect of outliers on model identification statistics could be serious, this article is concerned with the problem of time series model specification in the presence of outliers. An iterative procedure is proposed to identify the outliers, to remove their effects, and to specify a tentative model for the underlying process. The procedure is essentially based on the iterative estimation procedure of Chang and Tiao (1983) and the extended sample autocorrelation function (ESACF) model identification method of Tsay and Tiao (1984). An example is given. Properties of the proposed procedure are discussed.

Journal ArticleDOI
TL;DR: In this article, the beta-binomial distribution is extended to allow negative correlations among binary variates within an experimental unit, and regression models are proposed for both the binary variate response rate and for the pairwise correlation between binary variables.
Abstract: The beta-binomial distribution is extended to allow negative correlations among binary variates within an experimental unit. Regression models are proposed for both the binary variate response rate and for the pairwise correlation between binary variates, and corresponding likelihood estimation procedures are described. Binary regression problems are also considered, in which measurement errors in the regression variables represent the sole source of overdispersion. Some corresponding response rate estimation procedures are described and illustrated.

Journal ArticleDOI
TL;DR: In this paper, the stable unit treatment value assumption (SUTVA) is used for causal inference, which is a priori assumption that the value of an outcome variable for each unit when exposed to treatment t will be the same no matter what mechanism is used to assign treatment t to unit u and no matter how many treatments he other units receive.
Abstract: I congratulate my friend Paul Holland on his lucidly clear description of the basic perspective for causal inference referred to as Rubin's model. I have been advocating this general perspective for defining problems of causal inference since Rubin (1974), and with very little modification since Rubin (1978). The one point concerning the definition of causal effects that has continued to evolve in my thinking is the key role of the stable-unit-treatmentvalue assumption (SUTVA, as labeled in Rubin 1980) for deciding which questions are formulated well enough to have causal answers. Under SUTVA, the model's representation foutcomes is adequate. More explicitly, consider the situation with N units indexed by u = 1, .. ., N; T treatments indexed by t = 1, . . . , T; and outcome variable Y, whose possible values are represented by Y,\" (t = 1, . . . , T; u = 1, ... , N). SUTVA is simply the a priori assumption that the value of Y for unit u when exposed to treatment t will be the same no matter what mechanism isused to assign treatment t to unit u and no matter what treatments he other units receive, and this holds for all u = 1, . . . , N and all t = 1, . . . , T. SUTVA is violated when, for example, there xist unrepresented versions of treatments (Y,u depends on which version of treatment t was received) or interference b tween units (Y,1 depends on whether unit u' received treatment t or t').

Journal ArticleDOI
TL;DR: In this paper, the maximum familywise error rate (MFWER) of Fisher's least significant difference (LSD) test for testing the equality of k population means in a one-way layout was investigated.
Abstract: In this article an investigation is made of the maximum familywise error rate (MFWER) of Fisher's least significant difference (LSD) test for testing the equality of k population means in a one-way layout. An exact expression for the MFWER is derived (see Theorem 1) for all balanced models and for an unbalanced model with k = 3 populations (Type I models). A close upper bound for the MFWER is derived for all unbalanced models with four or more populations (Type II models). These expressions are used to illustrate that the MFWER may greatly exceed the nominal size α of the LSD test. In addition, a simple modification of the LSD test is proposed to control the MFWER. This modified procedure has MFWER equal to the nominal level α for Type I models and no greater than α for Type II models (Theorem 2) and is, therefore, recommended as an improvement over the LSD test. The key to the analysis is two theorems concerning the ranges of independent normal random variables, which are contained in the Append...

Journal ArticleDOI
TL;DR: In this paper, a generalized cross-validation procedure for empirically assessing an appropriate amount of smoothing in nonparametric regression functions in generalized linear models is presented. But this procedure is not applicable to the generalized linear model framework.
Abstract: We consider the penalized likelihood method for estimating nonparametric regression functions in generalized linear models (Nelder and Wedderburn 1972) and present a generalized cross-validation procedure for empirically assessing an appropriate amount of smoothing in these estimates. Asymptotic arguments and numerical simulations are used to show that the generalized cross-validatory procedure preforms well from the point of view of a weighted mean squared error criterion. The methodology adds to the battery of graphical tools for model building and checking within the generalized linear model framework. Included are two examples motivated by medical and horticultural applications.

Journal ArticleDOI
TL;DR: In this paper, the authors define residuals as the signed square roots of the contributions to the Pearson goodness-of-fit statistic, which is defined as the contribution to the devia...
Abstract: Generalized linear models are regression-type models for data not normally distributed, appropriately fitted by maximum likelihood rather than least squares. Typical examples are models for binomial or Poisson data, with a linear regression model for a given, ordinarily nonlinear, function of the expected values of the observations. Use of such models has become very common in recent years, and there is a clear need to study the issue of appropriate residuals to be used for diagnostic purposes. Several definitions of residuals are possible for generalized linear models. The statistical package GLIM (Baker and Nelder 1978) routinely prints out residuals , where V(μ) is the function relating the variance to the mean of y and is the maximum likelihood estimate of the ith mean as fitted to the regression model. These residuals are the signed square roots of the contributions to the Pearson goodness-of-fit statistic. Another choice of residual is the signed square root of the contribution to the devia...

Journal ArticleDOI
TL;DR: This book presents an introduction to Probability Theory and Mathematical Statistics, a measure theory with a soul, which has been prepared taking both aesthetic and practical aspects into account.
Abstract: Introduction to Probability with RIntroduction to Probability and StatisticsAn Introduction to Probability and Statistical InferenceIntroduction to ProbabilityIntroduction to Probability, Statistics, and Random ProcessesIntroduction to ProbabilityIntroduction to ProbabilityAn Introduction to Probability Theory and Its Applications, Volume 2Introduction to ProbabilityMathematical StatisticsIntroduction to Probability and Mathematical StatisticsIntroduction to Probability Theory and Statistical InferenceIntroduction to Probability and Statistics for EngineersProbability and GamesIntroduction to Probability and Statistics Using RElementary Probability Theory with Stochastic ProcessesIntroduction to Probability with Statistical ApplicationsMathematical Theory of Probability and StatisticsIntroduction to ProbabilityIntroduction to ProbabilityStochasticsMathematical Statistics and Data AnalysisProbability and StatisticsIntroduction to Probability and MeasureAn Introduction to Measure and ProbabilityIntroduction to Probability and Mathematical StatisticsAn Introduction to Probability and Mathematical StatisticsProbabilityAn Introduction to Probability TheoryA Natural Introduction to Probability TheoryAn Introduction to Probability and StatisticsIntroduction to Mathematical StatisticsIntroduction to ProbabilityIntroduction to ProbabilityIntroduction to ProbabilityProbability and Mathematical StatisticsThe Mathematics of GamesProbability and Mathematical Statistics: Theory, Applications, and Practice in RAn Introduction to Probability Theory and Mathematical StatisticsA Modern Introduction to Probability and Statistics According to a remark attributed to Mark Kac 'Probability Theory is a measure theory with a soul'. This book with its choice of proofs, remarks, examples and exercises has been prepared taking both these aesthetic and practical aspects into account.Developed from celebrated

Journal ArticleDOI
TL;DR: In this paper, a new class of simple tests is proposed for the general multivariate two-sample problem based on the (possibly weighted) proportion of all k nearest neighbor comparisons in which observations and their neighbors belong to the same sample.
Abstract: A new class of simple tests is proposed for the general multivariate two-sample problem based on the (possibly weighted) proportion of all k nearest neighbor comparisons in which observations and their neighbors belong to the same sample. Large values of the test statistics give evidence against the hypothesis H of equality of the two underlying distributions. Asymptotic null distributions are explicitly determined and shown to involve certain nearest neighbor interaction probabilities. Simple infinite-dimensional approximations are supplied. The unweighted version yields a distribution-free test that is consistent against all alternatives; optimally weighted statistics are also obtained and asymptotic efficiencies are calculated. Each of the tests considered is easily adapted to a permutation procedure that conditions on the pooled sample. Power performance for finite sample sizes is assessed in simulations.

Journal ArticleDOI
TL;DR: In this paper, the authors consider the nonparametric estimation of an average growth curve and study the influence of correlation on the bandwidth minimizing mean squared error of a kernel estimator.
Abstract: The estimation of growth curves has been studied extensively in parametric situations. Here we consider the nonparametric estimation of an average growth curve. Suppose that there are observations from several experimental units, each following the regression model y(xi)=f(xj)+e(j=1,…,n), where e1, …, e n are correlated zero mean errors and 0≤x1<…

Journal ArticleDOI
TL;DR: Common disclosure control policies, such as requiring released cell relative frequencies to be bounded away from both zero and one, are shown to be equivalent to disclosure rules that allow data release only if specific uncertainty functions at particular predictive distribution are allowed.
Abstract: Statistical agencies use a variety of disclosure control policies with ad hoc justification in disseminating data. The issues involved are clarified here by showing that several of these policies are special cases of a general disclosure-limiting (DL) approach based on predictive distributions and uncertainty functions. A user's information posture regarding a target is represented by one predictive distribution before data release and another predictive distribution after data release. A user's lack of knowledge about the target at any time is measured by an uncertainty function applied either to the current predictive distribution or to the current predictive distribution and the previously held predictive distribution. Common disclosure control policies, such as requiring released cell relative frequencies to be bounded away from both zero and one, are shown to be equivalent to disclosure rules that allow data release only if specific uncertainty functions at particular predictive distribution...

Journal ArticleDOI
TL;DR: The authors examine how sensitive the estimates of heterogeneity in the mortality risks in a population are to the choices of two types of function, "one describing the age-specific rate of increase of mortality risks for individuals and the other describing the distribution of mortality risk across individuals".
Abstract: To develop a model to estimate the degree of unobserved heterogeneity in morality risks in a population, it is necessary to specify two types of functions, one describing the age-specific rate of increase of mortality risks for individuals and the other describing the distribution of mortality risks across individuals. There has been considerable interest in the question of how sensitive the estimates of heterogeneity are to the choices of these functions. To explore this question, high-quality data were obtained from published Medicare mortality rates for the period 1968–1978 for analysis of total mortality among the aged. In addition, national vital statistics data for the period 1950–1977 were used to analyze adult lung cancer mortality. For these data, the estimates of structural parameters were less sensitive to reasonable choices of the heterogeneity distribution (gamma vs. inverse Gaussian) than to reasonable choices of the hazard rate function (Gompertz vs. Weibull).