scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1948"


Book ChapterDOI
TL;DR: In this article, the authors considered the problem of estimating a U-statistic of the population characteristic of a regular functional function, where the sum ∑″ is extended over all permutations (α 1, α m ) of different integers, 1 α≤ (αi≤ n, n).
Abstract: Let X 1 …, X n be n independent random vectors, X v = , and Φ(x 1 …, x m ) a function of m(≤n) vectors . A statistic of the form , where the sum ∑″ is extended over all permutations (α1 …, α m ) of different integers, 1 α≤ (αi≤ n, is called a U-statistic. If X 1, …, X n have the same (cumulative) distribution function (d.f.) F(x), U is an unbiased estimate of the population characteristic θ(F) = f … f Φ(x 1,…, x m ) dF(x 1) … dF(x m ). θ(F) is called a regular functional of the d.f. F(x). Certain optimal properties of U-statistics as unbiased estimates of regular functionals have been established by Halmos [9] (cf. Section 4)

2,439 citations



Journal ArticleDOI
TL;DR: In this article, it was shown that the sequential probability ratio test for deciding between two simple alternatives (H_0 and H_1) requires on the average fewest observations.
Abstract: Let $S_0$ be any sequential probability ratio test for deciding between two simple alternatives $H_0$ and $H_1$, and $S_1$ another test for the same purpose. We define $(i, j = 0, 1):$ $\alpha_i(S_j) =$ probability, under $S_j$, of rejecting $H_i$ when it is true; $E_i^j (n) =$ expected number of observations to reach a decision under test $S_j$ when the hypothesis $H_i$ is true. (It is assumed that $E^1_i (n)$ exists.) In this paper it is proved that, if $\alpha_i(S_1) \leq \alpha_i(S_0)\quad(i = 0,1)$, it follows that $E_i^0 (n) \leq E_i^1 (n)\quad(i = 0, 1)$. This means that of all tests with the same power the sequential probability ratio test requires on the average fewest observations. This result had been conjectured earlier ([1], [2]).

1,184 citations


Journal ArticleDOI
TL;DR: In this paper, the authors give the complete solution of the equations governing the generalised birth-and-death process in which the birth and death rates may be any specified functions of the time.
Abstract: The importance of stochastic processes in relation to problems of population growth was pointed out by W. Feller [1] in 1939. He considered among other examples the "birth-and-death" process in which the expected birth and death rates (per head of population per unit of time) were constants, $\lambda_o$ and $\mu_o$, say. In this paper, I shall give the complete solution of the equations governing the generalised birth-and-death process in which the birth and death rates $\lambda(t)$ and $\mu(t)$ may be any specified functions of the time. The mathematical method employed starts from M. S. Bartlett's idea of replacing the differential-difference equations for the distribution of the population size by a partial differential equation for its generating function. For an account of this technique,$^1$ reference may be made to Bartlett's North Carolina lectures [2]. The formulae obtained lead to an expression for the probability of the ultimate extinction of the population, and to the necessary and sufficient condition for a birth-and-death process to be of "transient" type. For transient processes the distribution of the cumulative population is also considered, but here in general it is not found possible to do more than evaluate its mean and variance as functions of $t$, although a complete solution (including the determination of the asymptotic form of the distribution as $t$ tends to infinity) is obtained for the simple process in which the birth and death rates are independent of the time. It is shown that a birth-and-death process can be constructed to give an expected population size $\bar n_t$ which is any desired function of the time $t$, and among the many possible solutions the unique one is determined which makes the fluctuation, Var$(n_t)$, a minimum for all. The general theory is illustrated with reference of two examples. The first of these is the $(\lambda_0, \mu_1t)$ process introduced by N. Arley [3] in his study of the cascade showers associated with cosmic radiation; here the birth rate is constant and the death rate is a constant multiple of the "age, $t$, of the process. The $\bar n_t$-curve is then Gaussian in form, and the process is always of transient type. The second example is provided by the family of "periodic" processes, in which the birth and death rates are periodic functions of the time $t$. These appear well adapted to describe the response of population growth (or epidemic spread) to the influence of the seasons.

754 citations


Journal ArticleDOI
TL;DR: In this article, a test for the independence of two random variables with continuous distribution function (d.f.) is proposed, which is consistent with respect to the class Ω′of d.f.
Abstract: A test is proposed for the independence of two random variables with continuous distribution function (d.f.). The test is consistent with respect to the class Ω′of d.f.’s with continuous joint and marginal probability densities (p.d.). The test statistic D depends only on the rank order of the observations. The mean and variance of D are given and is shown to have a normal limiting distribution for any parent distribution. In the case of independence this limiting distribution is degenerate, and nD has a non-normal limiting distribution whose characteristic function and cumulants are given. The exact distribution of D in the case of independence for samples of size n = 5, 6, 7 is tabulated. In the Appendix it is shown that there do not exist tests of independence based on ranks which are unbiased on any significance level with respect to the class Ω′. It is also shown that if the parent distribution belongs to Ω′ and for some n ≥ 5 the probabilities of the n; rank permutations are equal, the random variables are independent.

469 citations


Book ChapterDOI
TL;DR: In this article, a unified and simplified derivation for the limiting forms of the difference between the empirical distribution of a large sample and the corresponding theoretical distribution of two large samples is given.
Abstract: Unified and simplified derivations are given for the limiting forms of the difference (1) between the empirical distribution of a large sample and the corresponding theoretical distribution and (2) between the distributions of two large samples.

269 citations


Book ChapterDOI
TL;DR: In this article, a test for deciding whether one of k populations has slipped to the right of the rest, under the null hypothesis that all populations are continuous and identical, is proposed, where the procedure is to pick the sample with the largest observation, and to count the number of observations r in it which exceed all observations of all other samples.
Abstract: A test is proposed for deciding whether one of k populations has slipped to the right of the rest, under the null hypothesis that all populations are continuous and identical. The procedure is to pick the sample with the largest observation, and to count the number of observations r in it which exceed all observations of all other samples. If all samples are of the same size n, n large, the probability of getting r or more such observations, when the null hypothesis is true, is about k1−r.

199 citations



Journal ArticleDOI
TL;DR: In this paper, the Neyman-Pearson likelihood-ratio method is used for testing hypotheses of compound symmetry in a normal multivariate population of variates on basis of samples.
Abstract: In this paper test criteria are developed for testing hypotheses of "compound symmetry" in a normal multivariate population of $t$ variates $(t \geq 3)$ on basis of samples. A feature common to the twelve hypotheses considered is that the set of $t$ variates is partitioned into mutually exclusive subsets of variates. In regard to the partitioning, the twelve hypotheses can be divided into two contrasting but very similar types, and the six in one type can be paired off in a natural way with the six in the other type. Three of the hypotheses within a given type are associated with the case of a single sample and moreover are simple modifications of one another; the remaining three are direct extensions of the first three, respectively, to the case of $k$ samples $(k \geq 2)$. The gist of any of the hypotheses is indicated in the following statement of one, denoted by $H_1(mvc)$: within each subset of variates the means are equal, the variances are equal and the covariances are equal and between any two distinct subsets the covariances are equal. The twelve sample criteria for testing the hypotheses are developed by the Neyman-Pearson likelihood-ratio method. The following results are obtained for each criterion (assuming that the respective null hypotheses are true) for any admissible partition of the $t$ variates into subsets and for any sample size, $N$, for which the criterion's distribution exists: (i) the exact moments; (ii) an identification of the exact distribution as the distribution of a product of independent beta variates; (iii) the approximate distribution for large $N$. Exact distributions of the single-sample criteria are given explicitly for special values of $t$ and special partitionings. Certain psychometric and medical research problems in which hypotheses of compound symmetry are relevant are discussed in section 1. Sections 2-6 give statements of the hypotheses and an illustration, for $H_1(mvc)$, of the technique of obtaining the moments and identifying the distributions. Results for the other criteria are given in sections 7-8. Illustrative examples showing applications of the results are given in section 9.

144 citations


Journal ArticleDOI
TL;DR: In this article, it is pointed out that the Mellin transform is a natural analytical tool to use in studying the distribution of products and quotients of independent random variables, and an extension of the transform technique to random variables which are not everywhere positive is given.
Abstract: It is well known that the Fourier transform is a powerful analytical tool in studying the distribution of sums of independent random variables. In this paper it is pointed out that the Mellin transform is a natural analytical tool to use in studying the distribution of products and quotients of independent random variables. Formulae are given for determining the probability density functions of the product and the quotient $\frac{\xi}{\eta}$, where $\xi$ and $\eta$ are independent positive random variables with p.d.f.'s $f(x)$ and $g(y)$, in terms of the Mellin transforms $F(s) = \int_0^\infty f(x) x^{s-1} dx$ and $G(s) = \int_0^\infty g(y)y^{s-1} dy$. An extension of the transform technique to random variables which are not everywhere positive is given. A number of examples including Student's $t$-distribution and Snedecor's $F$-distribution are worked out by the technique of this paper.

143 citations


Journal ArticleDOI
TL;DR: The use of repeated Cauchy principal value affords greater facility in the application of inversion formulae involving characteristic functions as mentioned in this paper, and is especially useful in obtaining the inversion formula (1) for the distribution of the ratio of linear combinations of random variables which may be correlated.
Abstract: The use of the repeated Cauchy principal value affords greater facility in the application of inversion formulae involving characteristic functions. Formula (2) below is especially useful in obtaining the inversion formula (1) for the distribution of the ratio of linear combinations of random variables which may be correlated. Formulae (1), (10), (12) generalize the special cases considered by Cramer [2], Curtiss [4], Geary [6], and are free of some restrictions they impose. The results are further generalized in section 6, where inversion formulae are given for the joint distribution of several ratios. In section 7, the joint distribution of several ratios of quadratic forms in random variables $X_1, X_2,\cdots,X_n$ having a multivariate normal distribution is considered.

Journal ArticleDOI
TL;DR: In this article, a general measure theoretic form of the fundamental theorem is given in Section 2, and in Section 3 the theorem is formulated in terms of finite dimensional spaces and distribution functions.
Abstract: Mixtures of measures or distributions occur frequently in the theory and applications of probability and statistics. In the simplest case it may, for example, be reasonable to assume that one is dealing with the mixture in given proportions of a finite number of normal populations with different means or variances. The mixture parameter may also be denumerably infinite, as in the theory of sums of a random number of random variables, or continuous, as in the compound Poisson distribution. The operation of Lebesgue-Stieltjes integration, $\int f(x) d\mu ,$ is linear with respect to both integrand $f(x)$ and measure $\mu$. The first type of linearity has as its continuous analog the theorem of Fubini on interchange of order of integration; the second type of linearity has a corresponding continuous analog which is of importance whenever one deals with mixtures of measures or distributions, and which forms the subject of the present paper. Other treatments of the same subject have been given ([1], [2]; see also [3], [4]) but it is hoped that the discussion given here will be useful to the mathematical statistician. A general measure theoretic form of the fundamental theorem is given in Section 2, and in Section 3 the theorem is formulated in terms of finite dimensional spaces and distribution functions. The operation of convolution as an example of mixture is treated briefly in Section 4, while Section 5 is devoted to random sampling from a mixed population. We shall refer to Theory of the Integral by S. Saks (second edition, Warszawa, 1937) as [S], and the Mathematical Methods of Statistics by H. Cramer (Princeton, 1946) as [C].

Journal ArticleDOI
TL;DR: In this paper, a systematic method for calculating matrix derivatives was devised for multiple regression and canonical correlation, which is designed for application to statistics, and gives a concise and suggestive method for treating such topics as multiple regressions and canonical correlations.
Abstract: Let $X$ be the matrix $\lbrack x_{mn}\rbrack, t$ a scalar, and let $\partial X/\partial t, \partial t/\partial X$ denote the matrices $\lbrack\partial x_{mn}/\partial t\rbrack, \lbrack\partial t/\partial x_{mn}\rbrack$ respectively. Let $Y = \lbrack y_{pq}\rbrack$ be any matrix product involving $X, X'$ and independent matrices, for example $Y = AXBX'C$. Consider the matrix derivatives $\partial Y/\partial x_{mn}, \partial y_{pq}/\partial X$. Our purpose is to devise a systematic method for calculating these derivatives. Thus if $Y = AX$, we find that $\partial Y/\partial x_{mn} = AJ_{mn}, \partial y_{pq}/\partial X = A'K_{pq}$, where $J_{mn}$ is a matrix of the same dimensions as $X$, with all elements zero except for a unit in the $m$-th row and $n$-th column, and $K_{pq}$ is similarly defined with respect to $Y$. We consider also the derivatives of sums, differences, powers, the inverse matrix and the function of a function, thus setting up a matrix analogue of elementary differential calculus. This is designed for application to statistics, and gives a concise and suggestive method for treating such topics as multiple regression and canonical correlation.

Journal ArticleDOI
TL;DR: In this paper, it was shown that under very broad conditions the usual theorems concerning the limiting distributions of estimates hold for estimates based on samples selected from finite universes, at random without replacement.
Abstract: The paper shows that under very broad conditions the usual theorems concerning the limiting distributions of estimates hold for estimates based on samples selected from finite universes, at random without replacement. It may be remarked that under the same conditions, the same conclusions are true for random sampling from finite universes with replacement, if the universes are permitted to change within the limitations set by condition $W$.

Journal ArticleDOI
TL;DR: In this paper, the authors presented graphs of minimum probable population coverage by sample blocks determined by the order statistics of a sample from a population with a continuous but unknown cumulative distribution function (c.d.).
Abstract: In this note are presented graphs of minimum probable population coverage by sample blocks determined by the order statistics of a sample from a population with a continuous but unknown cumulative distribution function (c.d.f.). The graphs are constructed for the three tolerance levels .90, .95, and .99. The number, $m$, of blocks excluded from the tolerance region runs as follows: $m$ = 1(1)6(2)10(5)30(10)60(20)100, and the sample size, $n$, runs from $m$ to 500. Thus the curves show the solution, $\beta$, of the equation $1 - \alpha = I_\beta(n - m + 1, m)$ for $\alpha = .90, .95, .99$ over the range of $n$ and $m$ given above, where $I_x(p, q)$ is Pearson's notation for the incomplete beta function. Examples are cited below for the one- and two-variate cases. Finally, the exact and approximate formulae used in computations for these graphs are given.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss the extension of the discriminant function to the case where certain variates (called the covariance variates) are known to have the same means in all populations.
Abstract: This paper discusses the extension of the discriminant function to the case where certain variates (called the covariance variates) are known to have the same means in all populations. Although such variates have no discriminating power by themselves, they may still be utilized in the discriminant function. The first step is to adjust the discriminators by means of their `within-sample' regressions on the covariance variates. The discriminant function is then calculated in the usual way from these adjusted variates. The standard tests of significance for the discriminant function (e.g. Hotelling's $T^2$ test) can be extended to this case without difficulty. A measure is suggested of the gain in information due to covariance and the computations are illustrated by a numerical example. The discussion is confined to the case where only a single function of the population means is being investigated.


Journal ArticleDOI
TL;DR: In this article, it was shown that the one-dimensional theory of tolerance regions can be extended to the discontinuous case, if equalities were replaced by inequalities, with the same weakening of the conclusion as in this paper.
Abstract: In Paper II of this series [2, 1947] it was shown that if $n$ functions and a sample of $n$ were used to divide the population space into $n + 1$ blocks in a particular way, and if the joint cumulative of the functions were continuous, then the $n + 1$ fractions of the population, corresponding to the $n + 1$ blocks, were distributed symmetrically and simply. In Paper I of this series [1, 1945] it was shown that the one-dimensional theory of tolerance regions could be extended to the discontinuous case, if equalities were replaced by inequalities. In this paper the results of Paper II will be extended to the discontinuous case with the same weakening of the conclusion. The devices involved are more complex, but the nature of the results is the same (See Section 5). As a tool, it is shown that any $n$-variate distribution can be represented in terms of an $n$-variate distribution with a continuous joint cumulative (in fact, with uniform univariate marginals), where each variate of the given distribution is a different monotone function of the corresponding variate from the continuous distribution.

Journal ArticleDOI
TL;DR: In this paper, the functional equivalence of the decomposition of a stationary stochastic process with a discrete time parameter was shown to amount to a genuine isomorphism, and the functional equivalent of this decomposition was applied to the problem of linear prediction.
Abstract: In time series analysis there are two lines of approach, here called the functional and the stochastic. In the former case, the given time series is interpreted as a mathematical function, in the latter case as a random specimen out of a universe of mathematical functions. The close relation between the two approaches is in section 2 shown to amount to a genuine isomorphism. Considering the problem of prediction from this viewpoint, the author gives in sections 3-4 the functional equivalence of his earlier theorem on the decomposition of a stationary stochastic process with a discrete time parameter (see [9], theorem 7). In section 5 the decomposition theorem is applied to the problem of linear prediction. Finally in section 6 a few comments are made. Since various aspects of the isomorphism in question are known, this paper might be regarded as essentially expository.

Book ChapterDOI
TL;DR: In this article, the authors considered the problem of finding the region with the most powerful test against a particular alternative at a given level of significance, where a region is said to have significance if the probability of the region under the hypothesis tested is bounded above by e.g.
Abstract: For testing a composite hypothesis, critical regions are determined which are most powerful against a particular alternative at a given level of significance. Here a region is said to have level of significance e if the probability of the region under the hypothesis tested is bounded above by e. These problems have been considered by Neyman, Pearson and others, subject to the condition that the critical region be similar. In testing the hypothesis specify-ing the value of the variance of a normal distribution with unknown mean against an alternative with larger variance, and in some other problems, the best similar region is also most powerful in the sense of this paper. However, in the analo-gous problem when the variance under the alternative hypothesis is less than that under the hypothesis tested, in the case of Student’s hypothesis when the level of significance is less than and in some other cases, the best similar region is not most powerful in the sense of this paper. There exist most powerful tests which are quite good against certain alternatives in some cases where no proper similar region exists. These results indicate that in some practical cases the standard test is not best if the class of alternatives is sufficiently restricted.

Journal ArticleDOI
TL;DR: In this paper, it was shown that under certain restrictions on the joint probability distribution of the observations, the maximum likelihood estimate has at least one root which is a consistent estimate of the parameter to be estimated.
Abstract: Asymptotic properties of maximum likelihood estimates have been studied so far mainly in the case of independent observations. In this paper the case of stochastically dependent observations is considered. It is shown that under certain restrictions on the joint probability distribution of the observations the maximum likelihood equation has at least one root which is a consistent estimate of the parameter $\theta$ to be estimated. Furthermore, any root of the maximum likelihood equation which is a consistent estimate of $\theta$ is shown to be asymptotically efficient. Since the maximum likelihood estimate is always a root of the maximum likelihood equation, consistency of the maximum likelihood estimate implies its asymptotic efficiency.

Journal ArticleDOI
TL;DR: In this article, the authors deduce several properties of the compound and generalized Poisson distributions, in particular their closure and divisibility properties, and present an infinite class of functions whose members are both compound and GPs.
Abstract: In this note we deduce several properties of the compound and generalized Poisson distributions; in particular their closure and divisibility properties. An infinite class of functions whose members are both compound and generalized Poisson distributions is exhibited, and several of the distributions of Neyman, Polya, etc. are identified. The present note stems from a paper by Feller [2].

Journal ArticleDOI
TL;DR: In this article, the authors derived exact second moments and productmoments for order statistics selected from the normal distribution of N(0, 1) using a combination of multiple integration and some general properties of the moments.
Abstract: Exact means in samples of size $\geq 3$, and exact second moments and product-moments in samples of size $\leq 4$, are given in Table 1 in terms of $\pi$ for order statistics selected from the normal distribution $N(0, 1)$ The derivation employs multiple integration and some general properties of the moments

Journal ArticleDOI
TL;DR: In this article, the distribution of the largest, smallest and any intermediate root when the roots are specified by their position in a monotonic arrangement has been derived for $p = 2, 3, 4,$ and 5 by the new method.
Abstract: S. N. Roy [2] obtained in 1943 the distribution of the maximum, minimum and any intermediate one of the roots of certain determinantal equations based on covariance matrices of two samples on the null hypothesis of equal covariance matrices in the two populations. The present paper gives a different method of working out the distribution of any of these roots under the same hypothesis. The distribution of the largest, smallest and any intermediate root when the roots are specified by their position in a monotonic arrangement has been derived for $p = 2, 3, 4,$ and 5 by the new method. The method is applicable for obtaining the distribution of the roots of an equation of any order, when the distributions of the roots of lower order equations have been worked out.


Journal ArticleDOI
TL;DR: In this paper, a regression statistic is derived which is independent of change in scale so that a prior knowledge of the frequency distribution parameters is not required in order to obtain a unique estimate.
Abstract: This paper deals with the problem of bivariate regression where both variates are random variables having a finite number of means distributed along a straight line. A regression statistic is derived which is independent of change in scale so that a prior knowledge of the frequency distribution parameters is not required in order to obtain a unique estimate. The statistic is shown to be consistent. The efficiency of the estimate is discussed and its asymptotic distribution is derived for the case when the random variables are normally distributed. A numerical example is presented which compares the performance of the statistic of this paper with that of other commonly used statistics. In the example it is found that the method of estimation proposed in this paper is more efficient.

Journal ArticleDOI
TL;DR: In this paper, a study of the problem of "lock-in-step" is presented, where each of two events recurs with known period and duration, while the starting time of each event is unknown.
Abstract: This paper contains a study of the following problem: Each of two events recurs with definitely known period and duration, while the starting time of each event is unknown. It is desired that, before the elapse of a certain time, the events occur simultaneously and that this "overlap" be of at least a given minimum duration. The probability of this satisfactory coincidence is first evaluated, and it is found that the solution, while mathematically adequate, is of no value for practical application. This circumstance arises from the possibility that, with certain rational ratios of the periods, the events may "lock in step". Accordingly, an attempt is made to smooth the probability function with respect to small variations in the ratio of the periods. Due to difficulties in manipulating the number-theoretic expressions involved, this smoothing is carried through only by the use of certain approximations. Moreover, because of these same difficulties, an averaged value of the probability itself is not obtained, but, in its stead, there is derived a formula for that fraction of randomly related repeated trials in which the original probability will be less than one-half. Thus, the original problem is not completely solved. The results obtained, however, do allow one to compare the relative advantages of different situations and to make a rough estimate of the likelihood of success. Generally speaking, the analysis is applicable whenever the ratio of "on time" to "off time" is small for each event.