scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1982"


Journal ArticleDOI
TL;DR: The book aims to instill in students an ability to think through biological research problems in such a way as to grasp the essentials of the experimental or analytical setup to know which types of statistical tests to apply in a given case and to carry out the computations required.
Abstract: This text develops the science of biometry from an elementary introduction up to the advanced methods necessary for biological research and for an understanding of the published literature. This text is aimed primarily at the academic biologist including general zoologists botanists microbiologists geneticists and physiologists in universities research institutes and museums. This book while furnishing ample directions for the analysis of experimental works also stresses the descriptive and analytical statistical study of biological phenomena. It is intended both as a text to accompany a lecture course and as a complete course for self-study. The book aims to instill in students an ability to think through biological research problems in such a way as to grasp the essentials of the experimental or analytical setup to know which types of statistical tests to apply in a given case and to carry out the computations required. Chapters cover biological data data handling descriptive statistics probability estimation and hupothesis testing analysis of variance linear regression correlation multiple and curvilinear regression analysis of frequencies and miscellaneous methods.

4,145 citations


Journal ArticleDOI
TL;DR: In this paper, a mathematical text suitable for students of engineering and science who are at the third year undergraduate level or beyond is presented, which is a book of applicable mathematics, which avoids the approach of listing only the techniques, followed by a few examples.
Abstract: This is a mathematical text suitable for students of engineering and science who are at the third year undergraduate level or beyond. It is a book of applicable mathematics. It avoids the approach of listing only the techniques, followed by a few examples, without explaining why the techniques work. Thus, it provides not only the know-how but also the know-why. Equally, the text has not been written as a book of pure mathematics with a list of theorems followed by their proofs. The authors' aim is to help students develop an understanding of mathematics and its applications. They have refrained from using clichés like “it is obvious” and “it can be shown”, which may be true only to a mature mathematician. On the whole, the authors have been generous in writing down all the steps in solving the example problems.The book comprises ten chapters. Each chapter contains several solved problems clarifying the introduced concepts. Some of the examples are taken from the recent literature and serve to illustrate the applications in various fields of engineering and science. At the end of each chapter, there are assignment problems with two levels of difficulty. A list of references is provided at the end of the book.This book is the product of a close collaboration between two mathematicians and an engineer. The engineer has been helpful in pinpointing the problems which engineering students encounter in books written by mathematicians.

2,846 citations


Journal ArticleDOI
TL;DR: In this paper, the authors define measures of linear dependence and feedback for multiple time series, and a readily usable theory of inference for all of these measures and their decompositions is described; the computations involved are modest.
Abstract: Measures of linear dependence and feedback for multiple time series are defined. The measure of linear dependence is the sum of the measure of linear feedback from the first series to the second, linear feedback from the second to the first, and instantaneous linear feedback. The measures are nonnegative, and zero only when feedback (causality) of the relevant type is absent. The measures of linear feedback from one series to another can be additively decomposed by frequency. A readily usable theory of inference for all of these measures and their decompositions is described; the computations involved are modest.

1,874 citations



Journal ArticleDOI
TL;DR: In this paper, the authors present a method of measuring the direct influence along each separate path in a system and thus of finding the degree to which variation of a given effect is determined by each particular cause.
Abstract: The ideal method of science is the study of the direct influence of one condition on another in experiments in which all other possible causes of variation are eliminated. Unfortunately, causes of variation often seem to be beyond control. In the biological sciences, especially, one often has to deal with a group of characteristics or conditions which are correlated because of a complex of interacting, uncontrollable, and often obscure causes. The degree of correlation between two variables can be calculated by well-known methods, but when it is found it gives merely the resultant of all connecting paths of influence. The present paper is an attempt to present a method of measuring the direct influence along each separate path in such a system and thus of finding the degree to which variation of a given effect is determined by each particular cause. The method depends on the combination of knowledge of the degrees of correlation among the variables in a system with such knowledge as may be possessed of the causal relations. In cases in which the causal relations are uncertain the method can be used to find the logical consequences of any particular hypothesis in regard to them. CORRELATION

1,168 citations


Journal ArticleDOI
TL;DR: In this article, an intrinsic diversity ordering of communities is defined and is shown to be equivalent to stochastic ordering, and the sensitivity of an index to rare species is developed, culminating in a crossing-point theorem and a response theory to perturbations.
Abstract: This paper puts forth the view that diversity is an average property of a community and identifies that property as species rarity. An intrinsic diversity ordering of communities is defined and is shown to be equivalent to stochastic ordering. Also, the sensitivity of an index to rare species is developed, culminating in a crossing-point theorem and a response theory to perturbations. Diversity decompositions, analogous to the analysis of variance, are discussed for two-way classifications and mixtures. The paper concludes with a brief survey of genetic diversity, linguistic diversity, industrial concentration, and income inequality.

681 citations


Journal ArticleDOI
TL;DR: In this article, the authors prove a theorem to the effect that a coherent Bayesian expects to be well calibrated, and consider its destructive implications for the theory of coherence, showing that a forecaster is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent.
Abstract: Suppose that a forecaster sequentially assigns probabilities to events. He is well calibrated if, for example, of those events to which he assigns a probability 30 percent, the long-run proportion that actually occurs turns out to be 30 percent. We prove a theorem to the effect that a coherent Bayesian expects to be well calibrated, and consider its destructive implications for the theory of coherence.

588 citations



Journal ArticleDOI
TL;DR: In this paper, the authors considered the construction of sample designs and estimators under a linear regression superpopulation model and used the anticipated variance, the variance of the predictor computed with respect to the sampling design and the super population model, as a criterion for evaluating probability designs and model-unbiased predictors.
Abstract: The construction of sample designs and estimators under a linear regression superpopulation model is considered. The anticipated variance, the variance of the predictor computed with respect to the sampling design and the superpopulation model, is used as a criterion for evaluating probability designs and model-unbiased predictors. Regression predictors that are model unbiased and design consistent are constructed.

453 citations


Journal ArticleDOI
TL;DR: In this paper, a model-based procedure to decompose a time series uniquely into mutually independent additive seasonal, trend, and irregular noise components is proposed, where the series is assumed to follow the Gaussian ARIMA model.
Abstract: This article proposes a model-based procedure to decompose a time series uniquely into mutually independent additive seasonal, trend, and irregular noise components. The series is assumed to follow the Gaussian ARIMA model. Properties of the procedure are discussed and an actual example is given.

434 citations



Journal ArticleDOI
TL;DR: In this article, Bickel and Doksum suggest that serious dangers are associated with the employment of this method, and speak of 'instability' and 'cost' of estimation of the transformation.
Abstract: : In a paper written in 1964, Box and Cox described a method for estimating transformations and showed how in suitable cases valuable increases in simplicity and efficiency were possible. Since that time, this technique has enjoyed wide practical use and considerable success. However, a recent theoretical paper by Bickel and Doksum (1981) seems to suggest that serious dangers are associated with the employment of this method, and speaks of 'instability' and 'cost' of estimation of the transformation. These difficulties seem to be associated with (1) examples which common sense would rule out, namely situations where the effect of transformation on the data is almost linear, so that it is a matter of indifference which transformation is used; (2) the idea that it makes sense to state conclusions in terms of a number measured on an arbitrary scale; (3) failure to take proper account of the Jacobian of the transformation.

Journal ArticleDOI
TL;DR: In this article, an estimator that limits the influence of any small subset of the data and satisfies a first-order condition for strong efficiency subject to the constraint is proposed, and the estimator is asymptotically normal.
Abstract: The least squares estimator for β in the classical linear regression model is strongly efficient under certain conditions. However, in the presence of heavy-tailed errors and/or anomalous data, the least squares efficiency can be markedly reduced. In this article we propose an estimator that limits the influence of any small subset of the data and show that it satisfies a first-order condition for strong efficiency subject to the constraint. We then show that the estimator is asymptotically normal. The article concludes with an outline of an algorithm for computing a bounded-influence regression estimator and with an example comparing least squares, robust regression as developed by Huber, and the estimator proposed in this article.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the mathematical properties of Jeffrey's rule and connect it with sufficient partitions, and maximum entropy updating of contingency tables, and show that the main results concern simultaneous revision on two partitions.
Abstract: Jeffrey's rule for revising a probability P to a new probability P* based on new probabilities P* (Ei ) on a partition {Ei } i = 1 n is P*(A) = Σ P(A| Ei ) P* (Ei ). Jeffrey's rule is applicable if it is judged that P* (A | Ei ) = P(A | Ei ) for all A and i. This article discusses some of the mathematical properties of this rule, connecting it with sufficient partitions, and maximum entropy updating of contingency tables. The main results concern simultaneous revision on two partitions.


Journal ArticleDOI
TL;DR: Key concepts from the literature on incomplete data, such as factorizations of the likelihood for specialData patterns, the EM algorithm for general data patterns, and ignorability of the response mechanism, are discussed within the survey context.
Abstract: The literature on the analysis of incomplete data using models is reviewed in the context of nonresponse in sample surveys. The modeling approach provides a large body of methods for handling unit and item nonresponse, some of which cannot be derived from the randomization theory of inference for surveys. Key concepts from the literature on incomplete data, such as factorizations of the likelihood for special data patterns, the EM algorithm for general data patterns, and ignorability of the response mechanism, are discussed within the survey context. Model-based procedures are related to common methods for handling nonresponse in surveys, such as weighting or imputation of means within subclasses of the population.

Journal ArticleDOI
TL;DR: In this article, the authors look at the effect of intracluster correlation on standard procedures in linear regression and show that the size of the effect tends to be smaller than the corresponding effect on the variance of an estimated mean in two-stage sampling.
Abstract: We look at the effect of intracluster correlation on standard procedures in linear regression. The ordinary least squares estimator, , of the coefficient vector performs well in most cases but the usual estimator of cov() and procedures based on this such as confidence intervals and hypothesis tests can be seriously misleading. The size of the effect, however, tends to be smaller than the corresponding effect on the variance of an estimated mean in two-stage sampling provided that the cluster sample sizes are approximately equal.

Journal ArticleDOI
TL;DR: In this paper, the rank version of the von Neumann ratio statistic is considered and the critical values of this statistic under the randomization hypothesis are obtained and the resulting nonparametric test for randomness has far greater power than the test based on the number of runs up and down.
Abstract: Although rank tests for randomness were proposed in the literature as early as 1943, no such test has gained wide acceptance comparable to, say, Spearman's rho test This may be due to the lack of small-sample theory and of tables of critical values to enable such a test to be carried out on small samples In this article, we consider the rank version of the von Neumann ratio statistic and we obtain the critical values of this statistic under the randomization hypothesis In a Monte Carlo experiment we then show that the resulting nonparametric test for randomness has far greater power than the test based on the number of runs up and down Moreover, under normality, its power vis-a-vis the normal theory von Neumann ratio test is also very good It is therefore suggested that with the tables presented in this article, the rank von Neumann ratio test for randomness provides an easy and powerful alternative to nonparametric tests now in common use

Journal ArticleDOI
TL;DR: In this paper, a method for imputing missing values when the probability of response depends upon the variable being imputed is developed, where the missing data problem is viewed as one of parameter estimation in a regression model with stochastic censoring of the dependent variable.
Abstract: A method is developed for imputing missing values when the probability of response depends upon the variable being imputed. The missing data problem is viewed as one of parameter estimation in a regression model with stochastic censoring of the dependent variable. The prediction approach to imputation is used to solve this estimation problem. Wages and salaries are imputed to non-respondents in the Current Population Survey and the results are compared to the nonrespondents' IRS wage and salary data. The stochastic censoring approach gives improved results relative to a prediction approach that ignores the response mechanism.

Journal ArticleDOI
TL;DR: In this paper, the asymptotic distribution theory of sequentially computed modified-Wilcoxon scores is developed for two-sample survival data with random staggered entry and random loss to follow-up.
Abstract: The asymptotic distribution theory of sequentially computed modified-Wilcoxon scores is developed for two-sample survival data with random staggered entry and random loss to follow-up. The asymptotic covariance indicates generally dependent modified-Wilcoxon increments, contradicting (the authors' reading of) Jones and Whitehead (1979). A repeated significance testing procedure is presented for testing the equality of two survival distributions based on the asymptotic theory. The early stopping properties of this procedure are illustrated by a prostate cancer example.

Journal ArticleDOI
Glenn Shafer1
TL;DR: Lindley's paradox as mentioned in this paper is a classic example of the Lindley paradox, and it has been studied extensively in the literature. Journal of the American Statistical Association: Vol. 77, No. 378, pp. 325-334.
Abstract: (1982). Lindley's Paradox. Journal of the American Statistical Association: Vol. 77, No. 378, pp. 325-334.

Journal ArticleDOI
TL;DR: The asymptotic joint distribution of a class of sequentially computed statistics used in survival analysis is derived when the individuals under study enter serially and are subject to random loss to follow-up as mentioned in this paper.
Abstract: The asymptotic joint distribution of a class of sequentially computed statistics used in survival analysis is derived when the individuals under study enter serially and are subject to random loss to follow-up. These results are used for constructing repeated significance tests over real time.

Journal ArticleDOI
Joseph Naus1
TL;DR: In this paper, the authors give an approximation for extremal distributions of the number of points in a moving interval or window of fixed length. But they do not consider the generalized birthday problem.
Abstract: Certain statistical applications deal with the extremal distributions of the number of points in a moving interval or window of fixed length. This article gives an approximation that is highly accurate for several of these distributions. Applications include the maximum cluster of points on a line or circle, multiple coverage by subintervals or subarcs of fixed size, the length of the longest success run in Bernoulli trials, and the generalized birthday problem.

Journal ArticleDOI
TL;DR: In this paper, the analysis of multiple-response, repeated-measurement or growth curve models with a multivariate random-effects covariance structure for individuals is considered, and results on estimation, hypothesis testing, and simultaneous confidence bounds for the parameters of the generalized linear model of Potthoff and Roy (1964) are derived for the multiple response repeated measurements model under the multivariate Random-Eq. covariance structures.
Abstract: The analysis of multiple-response, repeated-measurement or growth curve models with a multivariate random-effects covariance structure for individuals is considered. Results on estimation, hypothesis testing, and simultaneous confidence bounds for the parameters of the generalized linear model of Potthoff and Roy (1964) are derived for the multiple-response repeated-measurements model under the multivariate random-effects covariance structure. Two numerical examples are presented to illustrate these techniques.

Journal ArticleDOI
TL;DR: Goodman as discussed by the authors presented a class of models for the analysis of association between two discrete, ordinal variables, measured in terms of the odds ratios in 2 × 2 subtables formed from adjacent rows and adjacent columns of the cross-classification.
Abstract: Goodman recently presented a class of models for the analysis of association between two discrete, ordinal variables The association was measured in terms of the odds ratios in 2 × 2 subtables formed from adjacent rows and adjacent columns of the cross-classification, and models were devised that allowed the odds ratios to depend on an overall effect, on row effects, on column effects, and on other effects This article presents some generalizations of this approach appropriate for multiway cross-classifications, including (a) models for the analysis of conditional association, (b) models for the analysis of partial association, and (c) models for the analysis of symmetric association Three cross-classifications are analyzed with these models and methods, and rather simple interpretations of the association in each are provided

Journal ArticleDOI
Camil Fuchs1
TL;DR: In this article, the problem of obtaining maximum likelihood estimates (MLE) for the parameters of log-linear models under this type of incomplete data is addressed and the appropriate systems of equations are presented and the expectation-maximization (EM) algorithm (Dempster, Laird, and Rubin 1977) is suggested as one of the possible methods for solving them.
Abstract: In many studies the values of one or more variables are missing for subsets of the original sample. This article focuses on the problem of obtaining maximum likelihood estimates (MLE) for the parameters of log-linear models under this type of incomplete data. The appropriate systems of equations are presented and the expectation-maximization (EM) algorithm (Dempster, Laird, and Rubin 1977) is suggested as one of the possible methods for solving them. The algorithm has certain advantages but other alternatives may be computationally more effective. Tests of fit for log-linear models in the presence of incomplete data are considered. The data from the Protective Services Project for Older Persons (Blenkner, Bloom, and Nielsen 1971; Blenkner, Bloom, and Weber 1974) are used to illustrate the procedures discussed in the article.


Journal ArticleDOI
TL;DR: In this article, the results of empirically testing 16 alternative multipliers for a multiplicative congruential random number generator with modulus 231 − 1 were presented, and the test results raise serious doubts about several of the multipliers, including one in common use.
Abstract: This article presents the results of empirically testing 16 alternative multipliers for a multiplicative congruential random number generator with modulus 231 — 1. Two of the multipliers are in common use, six are the best of 50 candidate multipliers according to the theoretical spectral and lattice tests, and eight are the worst, with regard to 2-tuples, among the 50. The test results raise serious doubts about several of the multipliers, including one in common use. The tests were also applied to a well-known theoretically poor generator, RANDU, and gave strong empirical evidence of its inadequacy. Since comparison of the results of the First eight multipliers with those for the eight worst multipliers failed to show any apparent gross differences, one may want to relax the currently employed stringent criteria for acceptable performance on the lattice and spectral tests.

Journal ArticleDOI
TL;DR: In this article, the authors consider a linear model with normally distributed but heteroscedastic errors and show that likelihood is more sensitive to small misspecifications in the functional relationship between the error variances and the regression parameter.
Abstract: We consider a linear model with normally distributed but heteroscedastic errors. When the error variances are functionally related to the regression parameter, one can use either maximum likelihood or generalized least squares to estimate the regression parameter. We show that likelihood is more sensitive to small misspecifications in the functional relationship between the error variances and the regression parameter.

Journal ArticleDOI
W. J. Studden1
TL;DR: In this paper, a polynomial regression situation on an interval and also a robustness-type formulation of Stigler's are considered, and a technique involving canonical moments is discussed and some explicit solutions are given.
Abstract: Consider a polynomial regression situation on an interval and also a robustness-type formulation of Stigler's. A technique involving canonical moments is discussed and some explicit solutions are given.