scispace - formally typeset
Search or ask a question

Showing papers in "The American Statistician in 1989"


Journal ArticleDOI
TL;DR: This work examines alternatives and their consequences of the boxplot, discusses related background for boxplots (such as the probability that a sample contains one or more outside observations and the average proportion of outside observations in a sample), and offers recommendations that lead to a single standard form of theboxplot.
Abstract: An increasing number of statistical software packages offer exploratory data displays and summaries. For one of these, the graphical technique known as the boxplot, a selective survey of popular software packages revealed several definitions. These alternative constructions arise from different choices in computing quartiles and the fences that determine whether an observation is “outside” and thus plotted individually. We examine these alternatives and their consequences, discuss related background for boxplots (such as the probability that a sample contains one or more outside observations and the average proportion of outside observations in a sample), and offer recommendations that lead to a single standard form of the boxplot.

635 citations


Journal ArticleDOI
TL;DR: In this paper, the development of the several conditions for the OLS estimator to be best linear unbiased is presented, in a historical perspective, using generalized inverses and orthogonal projectors.
Abstract: It is well known that the ordinary least squares estimator of Xβ in the general linear model E y = Xβ, cov y = σ2 V, can be the best linear unbiased estimator even if V is not a multiple of the identity matrix. This article presents, in a historical perspective, the development of the several conditions for the ordinary least squares estimator to be best linear unbiased. Various characterizations of these conditions, using generalized inverses and orthogonal projectors, along with several examples, are also given. In addition, a complete set of references is provided.

202 citations



Journal ArticleDOI
TL;DR: In this paper, the authors provide empirical evidence that this underestimation phenomenon is extreme for certain sample size formulas based on confidence interval width, and they also discuss common sample size models that consider statistical power.
Abstract: One concern in the early stages of study planning and design is the minimum sample size needed to provide statistically credible results. This minimum sample size is usually determined via the use of simple formulas or, equivalently, from tables. The more popular formulas, however, involve large-sample approximations and hence may underestimate required sample sizes. This article provides empirical evidence indicating that this underestimation phenomenon is extreme for certain sample size formulas based on confidence interval width. Common sample size formulas that consider statistical power are also discussed; these are shown to perform quite well, even for small sample size situations. In this department The American Statistician publishes articles, reviews, and notes of interest to teachers of the first mathematical statistics course and of applied statistics courses. The department includes the Accent on Teaching Materials section; suitable contents for the section are described under the sec...

151 citations


Journal ArticleDOI
TL;DR: In this article, D'Agostino, Chase, and Belanger argue that the Fisher exact test and the Yates chi-squared test with continuity correction are much too conservative in small samples.
Abstract: The problem of testing equality of two independent binomial proportions is reexamined. A recent article by D'Agostino, Chase, and Belanger (1988) argues that the Fisher exact test and the Yates chi-squared test with continuity correction are much too conservative in small samples. The authors proposed using a studentized version of the Pearson chi-squared test, and showed that the empirical size of this test is generally close to the nominal level in repeated product-binomial sampling. Although the article is persuasive on its own terms, two central issues are completely ignored: (a) the propriety of analyzing tests based on discrete data using fixed nominal levels of significance, and (b) the question of whether the empirical size should be computed in repeated samples that fix one or both of the margins. The latter is the key issue; Yates and others (1984) argued that both margins should be held fixed for inference, even though only one margin is fixed by the product-binomial sampling design. T...

107 citations


Journal ArticleDOI
TL;DR: The authors Comparing Paired Data: A Simultaneous Test for Means and Variances The American Statistician: Vol 43, No 4, pp 234-235, 1989
Abstract: (1989) Comparing Paired Data: A Simultaneous Test for Means and Variances The American Statistician: Vol 43, No 4, pp 234-235

102 citations


Journal ArticleDOI
TL;DR: The authors found widespread apparent desire to make relative-importance statements, but little self-conscious interest in interpretation or in looking at more than one specification of the concept of relative importance.
Abstract: How is the ambiguous concept of relative importance for independent variables handled in the scientific literature? We sampled from a population of recent papers with relative importance (or the equivalent) in their titles. We found widespread apparent desire to make relative-importance statements, but little self-conscious interest in interpretation or in looking at more than one specification of the concept. We were unhappy to find that a substantial fraction (one-fifth) of the papers used statistical significance to measure relative importance.

100 citations


Journal ArticleDOI
TL;DR: In this article, a gamma distribution with arbitrary scale parameter θ and shape parameter r < 1 can be represented as a scale mixture of exponential distributions, where θ is a Gaussian distribution.
Abstract: A gamma distribution with arbitrary scale parameter θ and shape parameter r < 1 can be represented as a scale mixture of exponential distributions

98 citations


Journal ArticleDOI
TL;DR: In this article, the authors compared the accuracy of the median unbiased estimator with that of the maximum likelihood estimator for a logistic regression model with two binary covariates, and showed that the former estimator is uniformly more accurate than the latter for small to moderately large sample sizes and a broad range of parameter values.
Abstract: This article compares the accuracy of the median unbiased estimator with that of the maximum likelihood estimator for a logistic regression model with two binary covariates. The former estimator is shown to be uniformly more accurate than the latter for small to moderately large sample sizes and a broad range of parameter values. In view of the recently developed efficient algorithms for generating exact distributions of sufficient statistics in binary-data problems, these results call for a serious consideration of median unbiased estimation as an alternative to maximum likelihood estimation, especially when the sample size is not large, or when the data structure is sparse.

97 citations


Journal ArticleDOI
TL;DR: An extended test of fit for normality is introduced based on Kullback—Leibler information, which is an extended concept of entropy that can be applied not only to the composite hypotheses, but also to the simple hypotheses.
Abstract: A goodness-of-fit test (based on sample entropy) for normality was given by Vasicek. The test, however, can be applied only to the composite hypotheses. In this article an extended test of fit for normality is introduced based on Kullback—Leibler information. The Kullback—Leibler information is an extended concept of entropy, so the test can be applied not only to the composite hypotheses, but also to the simple hypotheses. The power comparisons of the proposed test with some other tests are illustrated and discussed.

95 citations


Journal ArticleDOI
TL;DR: In this paper, a method for estimating regression parameters from data containing covariate measurement errors by using Stein estimates of the unobserved true covariates is proposed, which produces consistent estimates for the slope parameter in the classical linear errors-in-variables model.
Abstract: A method is proposed for estimating regression parameters from data containing covariate measurement errors by using Stein estimates of the unobserved true covariates. The method produces consistent estimates for the slope parameter in the classical linear errors-in-variables model and applies to a broad range of nonlinear regression problems, provided the measurement error is Gaussian with known variance. Simulations are used to examine the performance of the estimates in a nonlinear regression problem and to compare them with the usual naive ones obtained by ignoring error and with other estimates proposed recently in the literature.

Journal ArticleDOI
TL;DR: In this paper, the exploratory analysis of data from a repeated measures design with one repeated factor and one treatment factor is considered, and recent developments in repeated measures analysis are reviewed and incorporated into an overall strategy for the analysis of such data.
Abstract: In this article, the exploratory analysis of data from a repeated measures design with one repeated factor and one treatment factor is considered Recent developments in repeated measures analysis are reviewed and incorporated into an overall strategy for the analysis of such data An example is given to illustrate the techniques

Journal ArticleDOI
TL;DR: In this paper, Taylor series expansion is used to prove the asymptotic normality of the first set of variates, under appropriate conditions, and to develop needed covariance estimates.
Abstract: If random variables in one set are defined as explicit functions of random variables in a second set, Taylor series expansion (the delta method) may be used to prove the asymptotic normality of the first set of variates, under appropriate conditions, and to develop needed covariance estimates. Similar results are obtained for a set of random variables that are defined implicitly as functions of a second set of variables. This approach is used to calculate the variance of the attributable risk from case-control data.

Journal ArticleDOI
TL;DR: In this article, a simple derivation of the asymptotic distribution of Fisher's Z statistic for general bivariate parent distributions F is obtained using U-statistic theory.
Abstract: A simple derivation of the asymptotic distribution of Fisher's Z statistic for general bivariate parent distributions F is obtained using U-statistic theory. This method easily reveals that the asymptotic variance of Z generally depends on the correlation ρ and on certain moments of F. It also reveals the particular structure of F that makes the asymptotic variance of Z independent of ρ, and shows that there are many distributions F with this property. The bivariate normal is only one such F.

Journal ArticleDOI
TL;DR: This article compares several MS/PC—DOS programs with respect to the statistical methods they cover, their user interface, their ease of use, their graphics capabilities, and their computational accuracy.
Abstract: Several MS/PC—DOS programs are now available to help with statistical power analysis and sample-size choice. This article compares these programs with respect to the statistical methods they cover, their user interface, their ease of use, their graphics capabilities, and their computational accuracy.



Journal ArticleDOI
TL;DR: In this article, the simple mean of the individual regression coefficients is proposed for estimating the parameters of the mean model, which is similar to the estimated generalized least square estimator (EGLS).
Abstract: Random coefficient regression models have been used to analyze cross-sectional and longitudinal data in economics and growth-curve data from biological and agricultural experiments. In the literature several estimators, including the ordinary least squares and the estimated generalized least squares (EGLS), have been considered for estimating the parameters of the mean model. Based on the asymptotic properties of the EGLS estimators, test statistics have been proposed for testing linear hypotheses involving the parameters of the mean model. An alternative estimator, the simple mean of the individual regression coefficients, provides estimation and hypothesis-testing procedures that are simple to compute and teach. The large sample properties of this simple estimator are shown to be similar to that of the EGLS estimator. The performance of the proposed estimator is compared with that of the existing estimators by Monte Carlo simulation.

Journal ArticleDOI
TL;DR: Two Rules of Thumb for the Approximation of the Binomial Distribution by the Normal Distribution are given in this paper, where they are applied to the case of the normal distribution.
Abstract: (1989). Two Rules of Thumb for the Approximation of the Binomial Distribution by the Normal Distribution. The American Statistician: Vol. 43, No. 1, pp. 23-24.

Journal ArticleDOI
TL;DR: In this article, the authors revisited an article by D. A. Freedman on screening variables for regression models and showed that the theory does not entirely explain the results of computer simulations of his model.
Abstract: We revisit an article by D. A. Freedman on screening variables for regression models. After summarizing his asymptotic results we show that the theory does not entirely explain the results of computer simulations of his model. We demonstrate that this is due to the random correlation between simulated independent random variables. Finally, we explore some consequences of the asymptotic results. In the case of uncorrelated variables using the proposed two-stage screening procedure, it is possible to obtain significance tests for the final F statistic and t statistic that have the correct Type I error rates.


Journal ArticleDOI
TL;DR: In this article, Theil's estimator of slope and two intercept estimators are recommended for inclusion in nonparametrics courses as robust, efficient, and easy-to-calculate alternatives to least squares.
Abstract: Various estimators of slope, intercept, and mean response in the simple linear regression problem are compared in terms of unbiasedness, efficiency, breakdown, and mean squared error. Theil's estimator of slope and two intercept estimators based on Theil's estimator are recommended for inclusion in nonparametrics courses as robust, efficient, and easy-to-calculate alternatives to least squares.

Journal ArticleDOI
TL;DR: In this paper, a general method is proposed by which nonnormally distributed data can be transformed to achieve approximate normality using an empirical nonlinear data-fitting approach and can be applied to a broad class of transformations including the Box-Cox, arcsine, generalized logit, and Weibull-type transformations.
Abstract: A general method is proposed by which nonnormally distributed data can be transformed to achieve approximate normality. The method uses an empirical nonlinear data-fitting approach and can be applied to a broad class of transformations including the Box-Cox, arcsine, generalized logit, and Weibull-type transformations. It is easy to implement using standard statistical software packages. Several examples are provided.

Journal ArticleDOI
TL;DR: In this article, the authors present some of these approaches on a continuum so as to provide a basis for a meaningful comparison, and the issue of grouping data is also discussed, and a data-splitting algorithm is recommended for separating groups.
Abstract: The process of determining the adequacy of the fitted model is referred to as testing for lack of fit. When replicate measurements are not available, there are several approaches to testing for lack of fit. This article presents some of these approaches on a continuum so as to provide a basis for a meaningful comparison. The issue of grouping data is also discussed. A usual approach of forming groups by arbitrary cutoffs in the space of predictor variables is questioned, and a data-splitting algorithm is recommended for separating groups.

Journal ArticleDOI
TL;DR: Procedures for high-interaction two-variable color mapping are described, whereby the user interacts with a graphic display to produce a color statistical map in a few seconds.
Abstract: Procedures for high-interaction two-variable color mapping are described, whereby the user interacts with a graphic display to produce a color statistical map in a few seconds. The approach emphasizes the methodological benefits derived from the ability to examine the nature of the linkage between the statistical and spatial distributions of bivariate data. A series of examples illustrate this method.

Journal ArticleDOI
TL;DR: For example, the authors showed that the regression to the mean statement and its corollary are generally false for models exhibiting a regression effect, where individuals that initially had extreme values regress back toward the mean on subsequent observations.
Abstract: Most social science descriptions of the statistical regression effect envision the effect as occurring toward the population mean. If individuals that initially had extreme values regress back toward the mean on subsequent observations, then as a corollary individuals who were at the population mean initially are expected to stay at the population mean. Both the regression to the mean statement and its corollary are generally false for models exhibiting a regression effect. For commonly used probability mixture models, conditional expectations of subsequent observations based on previous observations regress not to the mean, but to some other value. Examples are presented using mixtures of normal, Poisson, and binomial random variables. Commentaries are informative essays dealing with viewpoints of statistical practice, statistical education, and other topics considered to be of general interest to the broad readership of The American Statistician. Commentaries are similar in spirit to Letters to...

Journal ArticleDOI
TL;DR: In this article, a small group of statisticians participated in a workshop at which discussion focused on three major issues: statistics in the liberal arts, the teaching of statistics, and the role of a statistician at a liberal arts college.
Abstract: Statisticians and others who teach statistics at liberal arts colleges enjoy opportunities and encounter difficulties that are unique to the liberal arts setting. In July 1987 a small group of statisticians participated in a workshop at which discussion focused on three major issues: statistics in the liberal arts, the teaching of statistics, and the role of a statistician at a liberal arts college. By summarizing our discussion in this report we hope to provide support for statisticians at liberal arts colleges and to initiate discussion directed toward giving statistics education a prominent position in the liberal arts curriculum.

Journal ArticleDOI
TL;DR: In this paper, a real data set of considerable intrinsic interest and offering considerable scope for investigation with different techniques of exploratory data analysis is examined using principal components analysis and the biplot.
Abstract: A real data set of considerable intrinsic interest and offering considerable scope for investigation with different techniques of exploratory data analysis is examined using principal components analysis and the biplot. The analysis has an intuitively satisfying interpretation and illustrates well applications of the techniques. Plausible interpretations for the first and second principal components are suggested. A number of interesting aspects of the biplots are noted.

Journal ArticleDOI
TL;DR: In this paper, a simple recursive algorithm is provided for evaluating the noncentral chi-squared distribution function, which basically reduces the problem to that of evaluating a single central chi-square distribution function.
Abstract: A simple recursive algorithm is provided for evaluating the noncentral chi-squared distribution function. This algorithm basically reduces the problem to that of evaluating a single central chi-squared distribution function. The algorithm is summarized in a step-by-step form, and a good algorithm for evaluating the central chi-squared distribution function is also given in a step-by-step form.

Journal ArticleDOI
TL;DR: A look into the future of statistics-aided manufacturing is given in this article, where the authors present a model of a Statistics-Aided Manufacturing: A Look into the Future.
Abstract: (1989). Statistics-Aided Manufacturing: A Look into the Future. The American Statistician: Vol. 43, No. 2, pp. 74-79.