scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1947"


Journal ArticleDOI
TL;DR: In this paper, the authors show that the limit distribution is normal if n, n$ go to infinity in any arbitrary manner, where n = m = 8 and n = n = 8.
Abstract: Let $x$ and $y$ be two random variables with continuous cumulative distribution functions $f$ and $g$. A statistic $U$ depending on the relative ranks of the $x$'s and $y$'s is proposed for testing the hypothesis $f = g$. Wilcoxon proposed an equivalent test in the Biometrics Bulletin, December, 1945, but gave only a few points of the distribution of his statistic. Under the hypothesis $f = g$ the probability of obtaining a given $U$ in a sample of $n x's$ and $m y's$ is the solution of a certain recurrence relation involving $n$ and $m$. Using this recurrence relation tables have been computed giving the probability of $U$ for samples up to $n = m = 8$. At this point the distribution is almost normal. From the recurrence relation explicit expressions for the mean, variance, and fourth moment are obtained. The 2rth moment is shown to have a certain form which enabled us to prove that the limit distribution is normal if $m, n$ go to infinity in any arbitrary manner. The test is shown to be consistent with respect to the class of alternatives $f(x) > g(x)$ for every $x$.

11,055 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the probability function of a normal bivariate probability function can be approximated as the difference of two integrals in an infinite series of Bessel functions of a certain type.
Abstract: Let $x$ and $y$ follow a normal bivariate probability function with means $\bar X, \bar Y$, standard deviations $\sigma_1, \sigma_2$, respectively, $r$ the coefficient of correlation, and $\rho_1 = \bar X/\sigma_1, \rho_2 = \bar Y/\sigma_2$. Professor C. C. Craig [1] has found the probability function of $z = xy/\sigma_1\sigma_2$ in closed form as the difference of two integrals. For purposes of numerical computation he has expanded this result in an infinite series involving powers of $z, \rho_1, \rho_2$, and Bessel functions of a certain type; in addition, he has determined the moments, semin-variants, and the moment generating function of $z$. However, for $\rho_1$ and $\rho_2$ large, as Craig points out, the series expansion converges very slowly. Even for $\rho_1$ and $\rho_2$ as small as 2, the expansion is unwieldy. We shall show that as $\rho_1$ and $\rho_2 \rightarrow \infty$, the probability function of $z$ approaches a normal curve and in case $r = 0$ the Type III function and the Gram-Charlier Type A series are excellent approximations to the $z$ distribution in the proper region. Numerical integration provides a substitute for the infinite series wherever the exact values of the probability function of $z$ are needed. Some extensions of the main theorem are given in section 5 and a practical problem involving the probability function of $z$ is solved.

482 citations



Journal ArticleDOI
TL;DR: The means, variances, and covariances for samples of size ≤ 10$ from the normal distribution, a selected long-tailed distribution, and the uniform distribution are tabled and compared with the usual asymptotic approximations.
Abstract: The means, variances, and covariances for samples of size $\leq 10$ from the normal distribution, a selected long-tailed distribution, and the uniform distribution are tabled and compared with the usual asymptotic approximations. The methods of computation used and the accuracy expected are discussed. Use is made of the representation of an arbitrarily distributed variate as a monotone function of a uniformly (rectangularly) distributed variate. It is hoped that these tables will encourage experimentation with new statistical procedures.

352 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that whenever there is a sufficient statistic and an unbiased estimate, not a function of $u$ only, for a parameter $\theta$, the function $E(t \mid u)$, which is a function function of u only, is an unbiased estimator with a variance smaller than that of $t.
Abstract: It is shown that $E\lbrack f(x) E(y \mid x)\rbrack = E(fy)$ whenever $E(fy)$ is finite, and that $\sigma^2E(y \mid x) \leq\le \sigma^2y$, where $E(y \mid x)$ denotes the conditional expectation of $y$ with respect to $x$. These results imply that whenever there is a sufficient statistic $u$ and an unbiased estimate $t$, not a function of $u$ only, for a parameter $\theta$, the function $E(t \mid u)$, which is a function of $u$ only, is an unbiased estimate for $\theta$ with a variance smaller than that of $t$. A sequential unbiased estimate for a parameter is obtained, such that when the sequential test terminates after $i$ observations, the estimate is a function of a sufficient statistic for the parameter with respect to these observations. A special case of this estimate is that obtained by Girshick, Mosteller, and Savage [4] for the parameter of a binomial distribution.

297 citations




Journal ArticleDOI
TL;DR: In this article, the authors extended the usefulness of tolerance limits to the case of a continuous probability density function (CPDF) and a continuous cumulative distribution function (CDF).
Abstract: Wald [2, 1943] extended the usefulness of tolerance limits to the simplest multi-dimensional cases. His principle is here used to provide many new ways of using a sample of $n$ to divide the range of the population into $n + 1$ blocks of known behavior. The exact tolerance distribution for the proportions of the population covered by these blocks is extended from the case of a continuous probability density function to the case of a continuous cumulative distribution function. Such an extension is needed in dealing completely with multivariate cases even where the underlying distribution is as smooth as a multivariate normal distribution. The devices used in Paper I [1] to extend the usefulness of tolerance limits to the case of a discontinuous underlying distribution will be applied in the next paper of this series, with some extension, to extend the usefulness of these general tolerance regions to the case of a discontinuous distribution. Some of these results specialize into new results for the univariate case, although they do not seem to have any immediate practical application. The author wishes to acknowledge the stimulation given to his work on this problem by Henry Scheffe, whose modesty has kept this paper from the joint authorship of papers I [1, Scheffe and Tukey 1945] and IV (not yet written).

169 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigated the effect of intra-class correlation on confidence coefficients and significance levels of several well known confidence intervals and significance tests which were derived under the assumption of independence, and to extend these considerations to the case of two sets of values.
Abstract: In practical applications it is frequently assumed that the values obtained by a sampling process are independently drawn from the same normal population. Then confidence intervals and significance tests which were derived under the assumption of independence are applied using these values. Often the assumption of independence between the values may be at best only approximately valid. For some cases, however, it may be permissible to assume that the correlation between each two values is the same (intraclass correlation). The purpose of this paper is to investigate the effect of this intraclass correlation on the confidence coefficients and significance levels of several well known confidence intervals and significance tests which were derived under the assumption of independence, and to extend these considerations to the case of two sets of values. In the first part of the paper the relations given in Table I are used to compute tables which show the effect of intraclass correlation on the confidence coefficients and significance levels of the confidence intervals and significance tests listed in Table II. The second part of the paper consists of the proofs of the relations given in Table I.

130 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that under certain regularity conditions, the lower bound given here is the same as that obtained in [2], page 480, under different conditions of regularity.
Abstract: Let $n$ successive independent observations be made on the same chance variable whose distribution function $f(x, \theta)$ depends on a single parameter $\theta$. The number $n$ is a chance variable which depends upon the outcomes of successive observations; it is precisely defined in the text below. Let $\theta^\ast(x_1, \cdots, x_n)$ be an estimate of $\theta$ whose bias is $b(\theta)$. Subject to certain regularity conditions stated below, it is proved that $\sigma^2(\theta^\ast) \geq \big(1 + \frac{db}{d\theta}\big)^2\big\lbrack EnE\big(\frac{\partial\log f}{\partial\theta}\big)^2\big\rbrack^{-1}.$ When $f(x, \theta)$ is the binomial distribution and $\theta^\ast$ is unbiased the lower bound given here specializes to one first announced by Girshick [3], obtained under no doubt different conditions of regularity. When the chance variable $n$ is a constant the lower bound given above is the same as that obtained in [2], page 480, under different conditions of regularity. Let the parameter $\theta$ consist of $l$ components $\theta_1, \cdots, \theta_l$ for which there are given the respective unbiased estimates $\theta^\ast_1(x_1, \cdots, x_n), \cdots, \theta^\ast_1(x_1, \cdots, x_n)$. Let $\|\lambda_{ij}\|$ be the non-singular covariance matrix of the latter, and $\|\lambda^{ij}\|$ its inverse. The concentration ellipsoid in the space of $(k_1, \cdots, k_l)$ is defined as $\sum_{i,j} \lambda^{ij}(k_i - \theta_i)(k_j - \theta_i) = l + 2.$ (This valuable concept is due to Cramer). If a unit mass be uniformly distributed over the concentration ellipsoid, the matrix of its products of inertia will coincide with the covariance matrix $\|\lambda)_{ij}\|$. In [4] Cramer proves that no matter what the unbiased estimates $\theta^\ast_1, \cdots, \theta^\ast_l$, (provided that certain regularity conditions are fulfilled), when $n$ is constant their concentration ellipsoid always contains within itself the ellipsoid $\sum_{i,j} \mu_{ij}(k_i - \theta_i)(k_j - \theta_j) = l + 2$ where $\mu_{ij} = nE\big(\frac{\partial\log f}{\partial\theta_i}\frac{\partial\log f}{\partial\theta_i}\big).$ Consider now the sequential procedure of this paper. Let $\theta^\ast_1, \cdots, \theta^\ast_l$ be, as before, unbiased estimates of $\theta_1, \cdots, \theta_l$, respectively, recalling, however, that the number of $n$ of observations is a chance variable. It is proved that the concentration ellipsoid of $\theta^\ast_1, \cdots, \theta^\ast_l$ always contains within itself the ellipsoid $\sum_{i,j} \mu'_{ij}(k_i - \theta_i)(k_j - \theta_j) = l + 2$ where $\mu'_{ij} = EnE\big(\frac{\partial\log f}{\partial\theta_i}\frac{\partial\log f}{\partial\theta_j}\big).$ When $n$ is a constant this becomes Cramer's result (under different conditions of regularity). In section 7 is presented a number of results related to the equation $EZ_n = EnEX$, which is due to Wald [6] and is fundamental for sequential analysis.

91 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the class C of decision procedures consisting of all Bayes solutions corresponding to all possible a priori distributions of the true parameter point of the decision procedure.
Abstract: With any statistical decision procedure (function) there will be associated a risk function $r(\theta)$ where $r(\theta)$ denotes the risk due to possible wrong decisions when $\theta$ is the true parameter point. If an a priori probability distribution of $\theta$ is given, a decision procedure which minimizes the expected value of $r(\theta)$ is called the Bayes solution of the problem. The main result in this note may be stated as follows: Consider the class C of decision procedures consisting of all Bayes solutions corresponding to all possible a priori distributions of $\theta$. Under some weak conditions, for any decision procedure $T$ not in $C$ there exists a decision procedure $T^\ast$ in $C$ such that $r^\ast(\theta) \leqq r(\theta)$ identically in $\theta$. Here $r(\theta)$ is the risk function associated with $T$, and $r^\ast(\theta)$ is the risk function associated with $T^\ast$. Applications of this result to the problem of testing a hypothesis are made.


Journal ArticleDOI
TL;DR: In this article, the general canonical correlation distribution is given as a multiple power series in the true canonical correlations, expressed as a generalized hypergeometric function, for the cases both of non-central means and of correlations proper.
Abstract: The general canonical correlation distribution is given as a multiple power series in the true canonical correlations $\rho_i$. When only one true correlation is not zero, this series is expressible as a generalized hypergeometric function, for the cases both of non-central means and of correlations proper. In the general case of more than one non-zero true correlation the coefficients in the expansion depend on the conditional moments of the sample correlations between the pairs of transformed variables representing the true canonical variables, when the sample canonical correlations between the sample canonical variables are fixed. Methods are given of obtaining these coefficients for both cases, non-central means and correlations proper; and their form up to the fourth order, corresponding to $O(\rho^8)$ in the expansion, listed in Appendix I. The detailed terms making up these coefficients are given, in the case of two non-zero correlations, up to the fourth order, and in the general case, up to the third order, in Appendix II.

Journal ArticleDOI
TL;DR: In this article, the quadrant sum (QS) test is proposed for the association of two continuous variables, which is a non-parametric test with special weight given to extreme values of the variables.
Abstract: This paper proposes a new test (the "quadrant sum") for the association of two continuous variables. Its notable properties are: (1) Special weight is given to extreme values of the variables. (2) Computation is very easy. (3) The test is non-parametric. Significance levels (for the quadrant sum) are given to the accuracy needed for practical use. To this accuracy they are independent of sample size (see Fig. 1). The generating function of the quadrant sum is given for the null hypothesis (no association = independence). A limiting distribution is deduced and compared with the cases $2n$ = 4, 6, 8, 10, and 14. Extension to higher dimensions and application to serial correlation are discussed.

Journal ArticleDOI
TL;DR: In this paper, the authors consider sequential procedures for obtaining confidence intervals of prescribed length and confidence coefficient for the mean of a normal distribution with known variance, and prove that the usual non-sequential procedure is optimum.
Abstract: We consider sequential procedures for obtaining confidence intervals of prescribed length and confidence coefficient for the mean of a normal distribution with known variance. A procedure achieving these aims is called optimum if it minimizes the least upper bound (with respect to the mean) of the expected number of observations. The result proved is that the usual nonsequential procedure is optimum.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss some of the concepts underlying small sample estimation and reexamine, in particular, the current notions on "unbiased" estimation, with respect to invariance under simultaneous one-to-one transformation of parameter and estimate; one of these alternatives, closely related to the maximum likelihood method, seems to be new.
Abstract: This paper discusses some of the concepts underlying small sample estimation and reexamines, in particular, the current notions on "unbiased" estimation. Alternatives to the usual unbiased property are examined with respect to invariance under simultaneous one-to-one transformation of parameter and estimate; one of these alternatives, closely related to the maximum likelihood method, seems to be new. The property of being unbiased in the likelihood sense is essentially equivalent to the statement that the estimate is a maximum likelihood estimate based on some distribution derived by integration from the original sampling distribution, by virtue of a "hereditary" property of maximum likelihood estimation. An exposition of maximum likelihood estimation is given in terms of optimum pairwise selection with equal weights, providing a type of rationale for small sample estimation by maximum likelihood.




Journal ArticleDOI
TL;DR: In this paper, the asymptotic distribution of the range for a large sample taken from an initial unlimited distribution possessing all moments is obtained by convolution of the convolution function of the two extremes.
Abstract: The asymptotic distribution of the range $w$ for a large sample taken from an initial unlimited distribution possessing all moments is obtained by the convolution of the asymptotic distribution of the two extremes. Let $\alpha$ and $u$ be the parameters of the distribution of the extremes for a symmetrical variate, and let $R = \alpha(w - 2u)$ be the reduced range. Then its asymptotic probability $\Psi(R)$ and its asymptotic distribution $\psi(R)$ may be expressed by the Hankel function of order one and zero. A table is given in the text. The asymptotic distribution $g(w)$ of the range proper is obtained from $\psi(R)$ by the usual linear transformation. The initial distribution and the sample size influence the position and the shape of the distribution of the range in the same way as they influence the distribution of the largest value. If we take the parameters from the calculated means and standard deviations, the asymptotic distribution of the range gives a good fit to the calculated distributions for normal samples from size 6 onward. Consequently the distribution of the range for normal samples of any size larger than 6 may be obtained from the asymptotic distribution of the reduced range. The asymptotic probabilities and the asymptotic distributions of the mth range and of the range for asymmetrical distributions are obtained by the same method and lead to integrals which may be evaluated by numerical methods.

Journal ArticleDOI
TL;DR: In this article, the equivalence of these three methods from a practical point of view has been emphasized in order to facilitate the integration and adaptation of existing statistical techniques, and it has been shown that the method of minimum chi-square is the sample-frequency form of the ideal method of least squares which leads (by means of appropriate successive approximations) to maximum likelihood statistics in sample frequency problems.
Abstract: In this article certain contributions are made to the theory of estimating linear functions of cell proportions in connection with the methods of (1) least squares, (2) minimum chi-square, and (3) maximum likelihood. Distinctions among these three methods made by previous writers arise out of (1) confusion concerning theoretical vs. practical weights, (2) neglect of effects of correlation between sampling errors, and (3) disagreement concerning methods of minimization. Throughout the paper the equivalence of these three methods from a practical point of view has been emphasized in order to facilitate the integration and adaptation of existing statistical techniques. To this end: 1. The method of least squares as derived by Gauss in 1821-23 [6, pp. 224-228] in which weights in theory are chosen so as to minimize sampling variances is herein called the ideal method of least squares and the theoretical estimates are called ideal linear estimates. This approach avoids confusion between practical approximations and theoretical exact weights. The ideal method of least squares is applied to uncorrelated linear functions of correlated sample frequencies to determine the appropriate quantity to minimize in order to derive ideal linear estimates in sample-frequency problems. This approach leads to a sum of squares of standardized uncorrelated linear functions of sampling errors in which statistics are to be substituted in numerators. 3. A new elementary method is used to reduce the sum of squares in (2)--before substitution of statistics--to Pearson's expression for chi-square. In this result, obtained without approximation, appropriate substitution of statistics shows that the denominators of chi-square should be treated as constant parameters in the differentiation process in order to minimize chi-square in conformity with the ideal method of least squares. 4. The ideal method of minimum chi-square, derived in (3) as the sample-frequency form of the ideal method of least squares, yields ideal linear estimates in terms of the unknown parameters in the denominators of chi-square. When these parameters are estimated by successive approximations in such a way as to be consistent with statistics based on them, it is shown that the method of minimum chi-square leads to maximum likelihood statistics. 5. An iterative method which converges to maximum likelihood estimates is developed for the case in which observations are cross-classified and first order totals are known. In comparison with Deming's asymptotically efficient statistics, it is shown that, in a certain sense, maximum likelihood statistics are superior for any given value of $n$--especially in small samples. 6. The method of proportional distribution of marginal adjustments is developed. This method yields estimates of expected cell frequencies whose efficiency is 100 per cent when universe cell frequencies are proportional--a condition closely approximated in most practical surveys for which first order totals are available from complete censuses. Whether this favorable condition is satisfied or not, the method yields results which are easy to interpret and it has many computational advantages from the point of view of economy of time and effort. Throughout the article discussion is confined to the estimation of parameters whose relationships to cell proportions are linear. However, most of the results can be extended to the case of non-linear relationships, the necessary qualifications being similar to those in curve-fitting problems when the functions to be fitted is not linear in its parameters. In this case, of course, least squares estimates are not linear estimates. In particular, obvious extensions of the general proofs in sections 5 and 6 make them applicable to the non-linear case. Thus even when relationships are non-linear, it can be shown that the method of minimum chi-square is the sample-frequency form of the method of least squares which leads (by means of appropriate successive approximations) to maximum likelihood statistics in sample-frequency problems. This principle which establishes the equivalence of the methods of least squares, minimum chi-square, and maximum likelihood greatly facilitates the integration and adaptation of existing techniques developed in connection with these important methods of estimation.

Journal ArticleDOI
TL;DR: In this paper, the authors developed methods of averaging functions over chains of dependent variables and found the probability distribution of these functions, and for certain types of chains these averages and distribution functions can be expressed in terms of the characteristic values and vectors of a certain operator equation.
Abstract: Although there exists voluminous literature on the theory of probability of independent events, and powerful techniques have been developed for the analysis of most of the interesting problems in this field, the theory of probability of dependent events has been rather neglected. The first detailed investigations in this subject were published by A. Markoff [1]. S. Bernstein [2] has extended the fundamental limit theorems to chains of dependent events. The most extensive exposition of this field has been made by M. Frechet [3]. In the present paper we shall develop methods of averaging functions over chains of dependent variables and find the probability distribution of these functions. It will be shown that for certain types of chains these averages and distribution functions can be expressed in terms of the characteristic values and vectors of a certain operator equation. Many of the methods discussed here have been applied to problems in statistical mechanics [4,5,6,7,8]. The most important application has been made by L. Onsager [8] who proved rigorously (on the basis of a simplified model) that Boltzmann's energy distribution in a solid with cooperative elements leads to a phase transition. The first explicit application of linear operator theory (through matrices and integral equations) to probability chains has apparently been made by Hostinsky [9].

Journal ArticleDOI
TL;DR: In this paper, the authors deal with characteristic functions, convergence, fixed intervals, random variables and distribution functions, and deal with the convergence of characteristic functions and convergence of fixed intervals.
Abstract: : This paper deals with characteristic functions, convergence, fixed intervals, random variables and distribution functions.

Journal ArticleDOI
TL;DR: In this paper, the authors give a few theorems concerning the reciprocal relation between the convergence of a sequence of distribution functions and the corresponding sequence of their moment generating functions, based on the Helly selection principle for bounded sequences of monotonic functions.
Abstract: The purpose of this paper is to give a few theorems concerning the reciprocal relation between the convergence of a sequence of distribution functions and the convergence of the corresponding sequence of their moment generating functions. The paper consists of two parts. In the first part the univariate case is discussed. The content of this part is closely related to that of a recent paper by J. H. Curtiss [1, p. 430-433], but the results are of a somewhat more general nature, and the methods of proofs are different and do not make use of the theory of a complex variable. The second part deals with the multivariate case which, as far as the author knows, has not been treated before with proofs in as complete and rigorous a way. In both the univariate and multivariate cases the proofs are based on the well known Helly selection principle [2, p. 26] for bounded sequences of monotonic functions.



Journal ArticleDOI
TL;DR: In this paper, a family of tests is determined such that given any test w of H there exists a test w' belonging to F which has power uniformly greater than or equal to that of w. The effect on F of various assumptions about the set of alternatives are considered.
Abstract: 1. Summary. For each hypothesis H of a certain class of simple hypotheses, a family F of tests is determined such that (a) given any test w of H there exists a test w' belonging to F which has power uniformly greater than or equal to that of w. (b) no member of F has power uniformly greater than or equal to that of any other member of F. The effect on F of various assumptions about the set of alternatives are considered. As an application an optimum property of the known type A1 tests is proved, and a result is obtained concerning the most stringent tests of the hypotheses considered.



Book ChapterDOI
TL;DR: In this paper, the authors considered the problem of determining the optimum test region for a composite hypothesis specifying the value of the circular serial correlation coefficient in a normal distribution, and showed that for the simple hypothesis obtained by specifying values for the nuisance parameters no test with corresponding optimum properties exists.
Abstract: This paper is concerned with optimum tests of certain composite hypotheses. In section 2 various aspects of a theorem of Scheffe concerning type B 1 tests are discussed. It is pointed out that the theorem can be extended to cover uniformly most powerful tests against a one-sided set of alternatives. It is also shown that the method for determining explicitly the optimum test region may in certain cases be reduced to a simple formal procedure. These results are used in section 3 to obtain optimum tests for the composite hypothesis specifying the value of the circular serial correlation coefficient in a normal distribution. A surprising feature of this example is the fact that for the simple hypothesis obtained by specifying values for the nuisance parameters no test with the corresponding optimum properties exists. In section 4 the totality of similar regions is obtained for a large class of probability laws which admit a sufficient statistic. Some composite hypotheses concerning exponential and rectangular distributions are treated in section 5. It is proved that the likelihood ratio tests of these hypotheses have various optimum properties.