scispace - formally typeset
Search or ask a question

Showing papers in "Journal of the American Statistical Association in 1972"


Journal ArticleDOI
TL;DR: Numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal and Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-square fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood.
Abstract: Uncertainties in measurements probability distributions error analysis estimates of means and errors Monte Carlo techniques dependent and independent variables least-squares fit to a polynomial least-squares fit to an arbitrary function fitting composite peaks direct application of the maximum likelihood. Appendices: numerical methods matrices graphs and tables histograms and graphs computer routines in Pascal.

10,546 citations


Journal ArticleDOI
J. A. Hartigan1
TL;DR: This article presents a model, and a technique, for clustering cases and variables simultaneously and the principal advantage in this approach is the direct interpretation of the clusters on the data.
Abstract: Clustering algorithms are now in widespread use for sorting heterogeneous data into homogeneous blocks. If the data consist of a number of variables taking values over a number of cases, these algorithms may be used either to construct clusters of variables (using, say, correlation as a measure of distance between variables) or clusters of cases. This article presents a model, and a technique, for clustering cases and variables simultaneously. The principal advantage in this approach is the direct interpretation of the clusters on the data.

1,150 citations


Journal ArticleDOI
TL;DR: In this paper, a modification of the Shapiro-Wilk W statistic for testing normality which can be used with large samples is presented. But the proposed test uses coefficients which depend only on the expected values of the normal order statistics which are generally available.
Abstract: This article presents a modification of the Shapiro-Wilk W statistic for testing normality which can be used with large samples. Shapiro and Wilk gave coefficients and percentage points for sample sizes up to 50. These coefficients required obtaining an approximation to the covariance matrix of the normal order statistics. The proposed test uses coefficients which depend only on the expected values of the normal order statistics which are generally available. Results of an empirical sampling study to compare the sensitivity of the test statistic to the W test statistic are briefly discussed.

935 citations


Journal ArticleDOI
TL;DR: In this paper, a systematic procedure for constructing confidence bounds and point estimates based on rank statistics is given for the two sample location parameters, two sample scale parameter and one sample location parameter problems.
Abstract: Systematic procedures for constructing confidence bounds and point estimates based on rank statistics are given for the two sample location parameter, two sample scale parameter and one sample location parameter problems.

631 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of discontinuous shifts in regression regimes at unknown points in the data series by assuming that nature chooses between regimes with probabilities λ and 1 − λ.
Abstract: In recent years much attention has been focussed on the problem of discontinuous shifts in regression regimes at unknown points in the data series. This article approaches this problem by assuming that nature chooses between regimes with probabilities λ and 1 — λ. This allows formulation of the appropriate likelihood function maximized with respect to the parameters in the regression equations and λ. The method is compared to another recent procedure in some sampling experiments and in a realistic economic problem and is found satisfactory.

621 citations


Journal ArticleDOI
TL;DR: A table of critical values of the Spearman rank correlation coefficient, r8, is given for n = 4(1)50(2) 100, for nine levels of significance as discussed by the authors.
Abstract: A table of critical values of the Spearman rank correlation coefficient, r8, is given for n = 4(1)50(2) 100, for nine levels of significance: α = 0.50, 0.20, 0.10, 0.05, 0.02, 0.01, 0.005, 0.002, 0.001.

605 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the Savage's sure-thing principle is not applicable to alternatives f and g that involve sequential operations, and that this can be seen as a paradox.
Abstract: This paradox is the possibility of P(A|B)

526 citations



Journal ArticleDOI
TL;DR: In this article, a linear model in the form, where is an unknown parameter and ξ is a hypothetical random variable with a given dispersion structure but containing unknown parameters called variance and covariance components.
Abstract: We write a linear model in the form , where is an unknown parameter and ξ is a hypothetical random variable with a given dispersion structure but containing unknown parameters called variance and covariance components. A new method of estimation called MINQUE (Minimum Norm Quadratic Unbiased Estimation) developed in a previous article [5] is extended for the estimation of variance and covariance components.

348 citations


Journal ArticleDOI
TL;DR: A detailed examination of the distribution of stock returns following reports that the distribution is best described by the symmetric stable class of distributions is made in this article, where the distributions are shown to be "fat-tailed" relative to the normal distribution but a number of properties inconsistent with the stable hypothesis are noted.
Abstract: A detailed examination is made of the distribution of stock returns following reports that the distribution is best described by the symmetric stable class of distributions. The distributions are shown to be “fat-tailed” relative to the normal distribution but a number of properties inconsistent with the stable hypothesis are noted. In particular, the standard deviation appears to be a well behaved measure of scale.

330 citations


Journal ArticleDOI
TL;DR: In this paper, the authors discuss compromises between Stein's estimator and the MLE which limit the risk to individual components of the estimation problem while sacrificing only a small fraction of the savings in total squared error loss given by Stein's rule.
Abstract: We discuss compromises between Stein's estimator and the MLE which limit the risk to individual components of the estimation problem while sacrificing only a small fraction of the savings in total squared error loss given by Stein's rule. The compromise estimators “limit translation” away from the MLE. The calculations are pursued in an empirical Bayesian manner by considering their performance against an entire family of prior distributions on the unknown parameters.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed several methods of estimating parameters in stable distributions, which involve sample characteristic functions, and one of the methods which is based upon the method of moments is treated in some detail.
Abstract: This paper proposes several methods of estimating parameters in stable distributions. All the methods involve sample characteristic functions. One of the methods which is based upon the method of moments is treated in some detail. Asymptotic normal distributions for the proposed moment estimators are provided. Moreover, all methods provide consistent estimators. The estimation problem is treated for both univariate and multivariate stable distributions.

Book ChapterDOI
TL;DR: In this paper, the authors simplify and unify these derivations by exploiting the expression of measures of association as ratios, and comment on the use of asymptotic variances, and on a trap in their calculation, are also given.
Abstract: The asymptotic sampling theory discussed in our 1963 article [3] for measures of association presented in earlier articles [1, 2] turns on the derivation of asymptotic variances that may be complex and tedious in specific cases. In the present article, we simplify and unify these derivations by exploiting the expression of measures of association as ratios. Comments on the use of asymptotic variances, and on a trap in their calculation, are also given.

Journal ArticleDOI
TL;DR: This work presents the relevant random-β regression theory as a natural extension of conventional fixed- β regression theory and derives the optimal recursive estimators in terms of the extended regression theory for a typical form of the recursive model.
Abstract: A large class of useful multivariate recursive time series models and estimation methods has appeared in the engineering literature. Despite the interest and utility which this recursive work has when viewed as an extension of regression analysis, little of it has reached statisticians working in regression. To overcome this we (a) present the relevant random-β regression theory as a natural extension of conventional fixed-β regression theory and (b) derive the optimal recursive estimators in terms of the extended regression theory for a typical form of the recursive model. This also opens the way for further developments in recursive estimation, which are more tractable in the regression approach and will be presented in future papers.

Journal ArticleDOI
TL;DR: In this article, a generalization of the Fenchel dual isotonic regression problem has been proposed to solve the problem of inventory theory and statistics, and a function of the isotonic regressions has been defined to solve these problems.
Abstract: The isotonic regression problem is to minimize Σt i = 1 [gi − xi]2wi subject to xi ≤ xj when where wi>0 and gi (i= 1, 2, …, k) are given and is a specified partial ordering on {1, 2, …, k}. The solution is called the isotonic regression on g. We formulate a generalization of this problem and calculate its Fenchel dual. A function of the isotonic regression also solves these problems. Problems in inventory theory and statistics are identified as dual isotonic regression problems.


Journal ArticleDOI
TL;DR: The authors present 2 methods for the approximation of a representative schedule recording first marriage frequencies by age, one of which achieves a very close approximation with a simple closed form frequency function, and the other provides a feasible model of nuptiality.
Abstract: The schedule recording first marriage frequencies has been shown to take the same basic form in different populations, with differences only in the origin, area, and horizontal scale. It is shown here that a representative schedule is very closely approximated by a simple closed form frequency function, which is the limiting distribution of the convolution of an infinite number of exponentially distributed components. The schedule is approximated equally well by the convolution of a normal distribution (of age of entry into a marriageable state) and as few as three exponentially distributed delays. The latter convolution provides a plausible model of nuptiality, a model that receives surprising empirical support.

Journal ArticleDOI
TL;DR: The Kolmogorov test may be used as an exact goodness-of-fit test for all completely specified distribution functions, whether continuous or not continuous as discussed by the authors, and a method for finding the exact critical level (approximate in the two-sided case) and power in such cases is derived.
Abstract: The Kolmogorov goodness-of-fit test is known to be conservative when the hypothesized distribution function is not continuous. A method for finding the exact critical level (approximate in the two-sided case) and the power in such cases is derived. Thus the Kolmogorov test may be used as an exact goodness-of-fit test for all completely specified distribution functions, whether continuous or not continuous. Several examples of the application of this extension of the Kolmogorov test are also included.

Journal ArticleDOI
Frank K. Hwang1
TL;DR: A new group testing method to detect all defectives in the population when the actual number of defectives is known or when its probability distribution is known is proposed.
Abstract: Suppose a population containing items classified either as defectives or as non-defectives. This article presents a new group testing method to detect all defectives in the population. The proposed method applies either when the actual number of defectives (or an upper bound) is known or when its probability distribution is known. The method is surprisingly simple yet compares favorably with other existing methods by the number of tests required to detect all defectives.

MonographDOI
TL;DR: Asymptotic expansion and asymptotically optimal tests as discussed by the authors have been used to estimate a family of probability measures by an exponential family of probabilistic functions. But they have not yet been applied to the problem of contiguity.
Abstract: 1. On the concept of contiguity and related theorems 2. Asymptotic expansion and asymptotic distribution of likelihood functions 3. Approximation of a given family of probability measures by an exponential family - asymptotic sufficiency 4. Some statistical applications: AUMP and AUMPU tests for certain testing hypotheses problems 5. Some statistical applications: asymptotic efficiency of estimates 6. Multiparameter asymptotically optimal tests.

Journal ArticleDOI
TL;DR: In this article, the two-way array interaction model with one observation per cell is discussed and the model is given by Likelihood ratio tests are presented for two hypotheses: (1) no interaction (λ = 0) and (2) equality of treatments (τ1 = τ2 = … = τt).
Abstract: In this article the two-way array interaction model with one observation per cell is discussed. The model is given by Likelihood ratio tests are presented for two hypotheses: (1) no interaction (λ = 0) and (2) equality of treatments (τ1 = τ2 = … = τt). Also maximum likelihood estimators are given for all parameters including σ2 when λ ≠ 0.

Journal ArticleDOI
TL;DR: The nature of statistical confidentiality is explored, its essential role in the collection of data by statistical offices, its relationship to privacy and the need for increased attention to potential statistical disclosures because of the increased tabulation and dissemination capabilities of statistical offices are explored.
Abstract: In Section 1 the nature of statistical confidentiality is explored, i.e., its essential role in the collection of data by statistical offices, its relationship to privacy and the need for increased attention to potential statistical disclosures because of the increased tabulation and dissemination capabilities of statistical offices. In Section 2 a definition of inadvertent direct disclosure is provided as well as a theorem concerning a test for residual disclosure of tabulations. In Section 3 different media and methods of data dissemination are considered from the point of view of potential for statistical disclosure.

Journal ArticleDOI
TL;DR: From such an aggregate model, the optimal predictor of the aggregate variable is derived and it performs remarkably well compared to the optimal disaggregate predictor.
Abstract: The article shows that if the original variable follows a pth order autoregressive system then the non-overlapping moving sum follows a pth order autoregression with at most a pth order moving-average of an independent sequence regardless of the length of the summation. From such an aggregate model we derive the optimal predictor of the aggregate variable and show that it performs remarkably well compared to the optimal disaggregate predictor. The article contains both theoretical and numerical analysis.

Journal ArticleDOI
TL;DR: For various loss functions, the behavior of the optimal interval is investigated, a comparison is made with the usual non-decision-theoretic interval estimates, and applications and examples are discussed.
Abstract: Under an appropriate loss function, interval estimation may be regarded as a Bayesian decision-making procedure in which the objective is to find an interval that minimizes expected loss. For various loss functions, the behavior of the optimal interval is investigated, a comparison is made with the usual non-decision-theoretic interval estimates, and applications and examples are discussed.

Book ChapterDOI
TL;DR: Constancy in the total number of children per family results in constancyIn the average number ofChildren per cohort, and the Bureau of the Census and the Scripps Foundation have taken advantage of this constancy in their projections.
Abstract: Everything prior to Section 2.9 is in terms of aggregate births of the several years or other periods, considered as a time series. If each family made its decisions on having or not having children in relation to the conditions of the year, without reference to the number of children it has had in the past, nothing more need be said. But suppose now that couples aim at a certain number of children; good or bad times cause them to defer or to advance their childbearing, but not to change the total number. Then the fluctuations in the time series of births are less consequential; the drop of the birth rate in a depression would be made up in the subsequent business upswing, and the rate of increase of the population would be lower only insofar as older parents imply a greater length of generation. Constancy in the total number of children per family results in constancy in the average number of children per cohort, and the Bureau of the Census and the Scripps Foundation have taken advantage of this constancy in their projections.

Journal ArticleDOI
TL;DR: In this paper, the authors investigated the problem of finding the expected value of functions of a random variable X of the form f(X) = (X+A)−n where X+A>0 a.s.
Abstract: We investigate the problem of finding the expected value of functions of a random variable X of the form f(X) = (X+A)−n where X+A>0 a.s. and n is a non-negative integer. The technique is to successively integrate the probability generating function and is suggested by the well-known result that successive differentiation leads to the positive moments. The technique is applied to the problem of finding E[1/(X+A)] for the binomial and Poisson distributions.

Journal ArticleDOI
TL;DR: The possibility that some time series that arise in economics have error or innovation terms that come from infinite variance distributions throws doubt on most of the classical methods of analysis is discussed and a variety of alternative explanations of the long-tailed property of observed distributions examined.
Abstract: The possibility that some time series that arise in economics have error or innovation terms that come from infinite variance distributions throws doubt on most of the classical methods of analysis. The evidence in favor of infinite-variance is discussed and a variety of alternative explanations of the long-tailed property of observed distributions examined. Some of these alternative explanations are based on mixtures of distributions and suggest data-transformations that reduce or remove the problem. Clipping of the series seems to be a particularly useful technique and is applied to U.S. Treasury daily cash flow data.

Journal ArticleDOI
TL;DR: In this article, the authors compared owner's estimates of house value and professional appraisals for a sample of owner occupied single family and multifamily units in St. Louis, and found that errors of estimate are systematically related to the socioeconomic characteristics of the owner-occupants and that knowledge of these biases can be used to improve the accuracy of both the individual and aggregate estimates.
Abstract: This study compares owner's estimates of house value and professional appraisals for a sample of owner occupied single family and multifamily units in St. Louis. It confirms the findings of an earlier study by Kish and Lansing that: (1) the errors of estimate are quite large for individual properties, but, (2) the errors are largely offsetting for reasonably sized samples. In addition, the analysis indicates errors of estimate are systematically related to the socioeconomic characteristics of the owner-occupants and that knowledge of these biases can be used to improve the accuracy of both the individual and aggregate estimates of market value.

Journal ArticleDOI
TL;DR: In this paper, a new Gaussian approximation to the noncentral chi-square (x2) distribution is found for which the coefficient of skewness is smaller than a cube root transformation in the literature.
Abstract: Let Qk = Σk j-1 cj(xj + aj)2 be a definite quadratic form in independent standardized Gaussian variables, xj, EQk = 01. The normalizing transformation (Qk/01)h is investigated, where h is determined by the first three moments of Qk. A new Gaussian approximation to the noncentral chi-square (x2) distribution is found for which the coefficient of skewness is smaller than a cube root transformation in the literature. Our transformation specializes to the cube root transformation of Wilson and Hilferty for the central x2 distribution. The approximation is simple to apply and compares well with other approximations in a number of cases studied numerically.

Journal ArticleDOI
TL;DR: In this paper, a distributed lag model is proposed to parameterize the lag distribution's form so that only small finite numbers of parameters are required even when it is likely that the model so written involves some specification error.
Abstract: In distributed lag models we often parameterize the lag distribution's form so that only small finite numbers of parameters are required even when it is likely that the model so written involves some specification error. The effects of such error depend on the autocorrelation properties of the independent variable; quasi-difference transforms of the data will have effects, possibly undesirable, on the nature of error due to approximation. Certain hypotheses, e.g., those concerning the sum of coefficients or the mean lag of the distribution, may be untestable in time series regressions in the presence of approximation error of this type.