scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1960"





Journal ArticleDOI
TL;DR: In this article, the authors considered a special case of the sequential probability ratio test, where the distribution is normal with known variance and the parameter of interest is the mean, and showed that the expected sample size is relatively large for values of the parameter between the two specified ones; that is, in cases in which one does not care greatly which decision is taken, a large number of observations is expected.
Abstract: pected sample sizes under either or both of the two hypotheses Usually, however, one is interested in the performance of the procedure for more values of the parameter than these two A disadvantage of the sequential probability ratio test is that in general the expected sample size is relatively large for values of the parameter between the two specified ones; that is, in cases in which one does not care greatly which decision is taken, a large number of observations is expected The question is how to reduce the expected sample size for values of the parameter when this tends to be large In this paper we consider a special case of the problem, when the distribution is normal with known variance and the parameter of interest is the mean The sequential probability ratio test in this case consists in taking observations sequentially and after each observation is taken comparing the sum of the observations (referred to a suitable origin) with two constants In this study the two constants are replaced by two linear functions of the number of observations taken, and the taking of observations is truncated (Section 2) Approximations to the operating characteristic (or power function) and the average sample size number are given (Section 4 and 5) Computations for two cases of special interest show a considerable decrease in average sample size at parameter values between the two specified ones (Section 3) The problem is studied by replacing the sum of observations by the Wiener stochastic process (of a continuous time parameter); this can be thought of intuitively as interpolating between observations in a manner consistent with the addition of independent random variables For this procedure we calculate exactly the operating characteristic, the distribution of observation time, the expected observation time, and related probabilities

373 citations


Journal ArticleDOI
TL;DR: In this article, the mean and standard deviation of a normal distribution from a sample which is censored has been considered by Sarhan and Greenberg [1], who obtained coefficients for best linear systematic statistics.
Abstract: 0. Summary. Estimators of mean and standard deviation for censored normal samples which are based on linear systematic statistics and which use simple coefficients are almost as efficient as estimators using the best possible coefficients. Estimators are given for samples of size N < 20 for censoring at one extreme and for several types of censoring at both extremes. 1. Introduction. A censored sample is a sample lacking one or more observations at either or both extremes with the number and positions of the missing observations known. Censoring may take place naturally i.e., an observation has a magnitude known only to be more extreme than the other observations in the sample. Censoring may also be imposed by the experimenter who from past experience knows that extreme observations are so unreliable that their magnitudes should not be used as observed. The experimenter may impose censoring to reduce the duration of an experiment and obtain estimates before the extreme cases are determined. Estimation of the mean and standard deviation of a normal distribution from a sample which is censored has been considered by Sarhan and Greenberg [1], who obtained coefficients for best linear systematic statistics. They also record efficiencies of these estimators compared to the case of no censoring. Winsor [4} and perhaps others have suggested using for the magnitude of an extreme, poorly known, or unknown observation the magnitude of the next largest (or smallest) observation. We shall show that when symmetry is maintained (or proper adjustment is made) this practice results in estimators of the mean whose efficiencies are scarcely distinguishable from those of best linear estimators. For non-symmetrical censoring, it is demonstrated that optimum simple estimators of the mean result from these "Winsorized" estimators. Also presented are estimators of the standard deviations using one or two ranlges (not necessarily symmetrical) which have efficiency .94 or greater when compared with the best linear systematic statistics. The variances of the proposed estimators were computed from an original 21 decimal tabulation of the means variances and covariances of the order statistics made available by Dan Teichroew. These tables are described in reference [5]. The efficiencies are the ratios of variances of corresponding estimators givenl by Sarhan and Greenberg [1]. 2. Symmetrical censoring. Estimation of mean. If natural or imposed censoring of the sample results in the same number of observations censored from each extreme of the sample the practice of using for each missing observation the magnitude of its nearest neighbor whose magnitude is known has a minimum

365 citations




Journal ArticleDOI
TL;DR: In this paper, a non-parametric two-sample test on dispersions is proposed, where the difference in locations (medians) of the two populations is not known and the two samples may be adjusted to have equal locations.
Abstract: This paper deals with non-parametric two-sample tests on dispersions. Two samples, $X$- and $Y$-samples of $m$ and $n$ independent observations from populations with continuous cumulative distribution functions $F(u)$ and $G(u)$ respectively, are considered. It is required for the basic test that the difference in locations (medians) of the two populations be known and, when this is so, the two samples may be adjusted to have equal locations. Taking these location parameters to be zero without loss of generality, we test the hypothesis that $G(u) \equiv F(u)$ against alternatives of the form $G(u) \equiv F(\theta u), \theta eq 1$. The two samples are ordered in a single joint array and ranks are assigned from each end of the joint array towards the middle. The statistic used is $W$, the sum of ranks for the $X$-sample. The distribution of $W$ is studied and tables of significant values of $W$ are provided for $m + n \leqq 20$ and both upper- and lower-tail significance levels .005, .01, .025 and .05. The first four moments of $W$ are developed and a normal approximation to the null distribution of $W$ is devised. Large-sample properties of the $W$-test are considered. A proof of limiting normality is based on a theorem of Chernoff and Savage. Consistency of the $W$-test is indicated and its relative efficiency in comparison with the variance-ratio $F$-test is obtained as $6/\pi^2$ when $F(u)$ is the normal distribution function. Other non-parametric tests of dispersions are reviewed. The $W$-test is less efficient asymptotically than some of these other tests but is easier to apply, particularly with the tables provided. A modified test is suggested for the case where the difference in population locations is not known. This involves replacing the two original samples by two corresponding samples of deviations from sample medians. The procedure of the $W$-test is applied to the two samples of deviations. The properties of the modified test have not been investigated except for a sampling study of rather limited scope. That study indicates that the moments of $W$ for the modified test are not greatly different from those under the basic procedure.

298 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the distribution of a random mapping space is a probability measure over a triplet of a set of elements of cardinality n, where n is the cardinality of the elements of a cyclical element.
Abstract: A Random Mapping Space $(X, \mathcal{J}, P)$ is a triplet, where $X$ is a finite set of elements $x$ of cardinality $n, \mathcal{J}$ is a set of transformations $T$ of $X$ into $X$, and $P$ is a probability measure over $\mathcal{J}$. In this paper, four choices of $\mathcal{J}$ are considered (I) $\mathcal{J}$ is the set of all transformations of $X$ into $X$. (II) $\mathcal{J}$ is the set of all transformations of $X$ into $X$ such that for each $x \varepsilon X Tx eq x$. (III) $\mathcal{J}$ is the set of one-to-one mappings of $X$ onto $X$. (IV) $\mathcal{J}$ is the set of one-to-one mappings of $X$ onto $X$, such that for each $x \varepsilon X, Tx eq x$. In each case $P$ is taken as the uniform probability distribution over $\mathcal{J}$. If $x \varepsilon X$ and $T \varepsilon \mathcal{J}$, we will define $T^kx$ as the $k$th iteration of $T$ on $x$, where $k$ is an integer, i.e. $T^kx = T(T^{k-1}x)$, and $T^0x = x$ for all $x$. The reader should note that, in general, $T^kx, k 0$, such that $T^mx = x$, then $x$ is a cyclical element of $T$ and the set of elements $x, Tx, T^2x, \cdots, T^{m-1}x$ is the cycle containing $x, C_T(x)$. If $m$ is the smallest positive integer for which $T^m x = x$, then $C_T(x)$ has cardinality $m$. We note further an interesting equivalence relation induced by $T$. If there exists a pair of integers $k_1, k_2$ such that $T^{k_1}x = T^{k_2}y,$ then $x \sim y$ under $T$. It is readily seen that this is in fact an equivalence, and hence decomposes $X$ into equivalence classes, which we shall call the components of $X$ in $T$; and designate by $K_T(x)$ the component containing $x$. We define $s_T(x)$ to be the number of elements in $S_T(x), p_T(x)$ to be the number of elements in $P_T(x)$, and $l_T(x)$ to be the number of elements in the cycle contained in $K_T(x)$ (i.e. $l(x) =$ the number of elements in $C_T(x)$ if $x$ is cyclical). We designate by $q_T$ the number of elements of $X$ cyclical in $T$, and by $r_T$ the number of components of $X$ in $T$. Rubin and Sitgreaves [9] in a Stanford Technical Report have obtained the distributions of $s, p, l, q,$ and have given a generating function for the distribution of $r$ in case I. Folkert [3], in an unpublished doctoral dissertation has obtained the distribution of $r$ in cases I and II. The distribution of $r$ in case III is classical and may be found in Feller [2], Gontcharoff [4], and Riordan [8]. In the present paper, a number of these earlier results are rederived and extended. Specifically, for cases I and II, we compute the probability distributions of $s, p, l, q$ and $r$. In cases III and IV the distributions of $l$ and $r$ are given. In addition some asymptotic distributions and low order moments are obtained. For the convenience of the reader, an index of notations having a fixed meaning is provided in the appendix to the paper.

225 citations


Journal ArticleDOI
TL;DR: In this paper, the minimum component and the product of the components of a random vector are considered and sharpness bounds for these inequalities are given under various assumptions concerning variances, covariances and independence.
Abstract: If $X$ is a random variable with $EX^2 = \sigma^2$, then by Chebyshev's inequality, \begin{equation*}\tag{1.1}P\{|X| \geqq \epsilon\} \leqq \sigma^2/\epsilon^2.\end{equation*} If in addition $EX = 0$, one obtains a corresponding one-sided inequality \begin{equation*}\tag{1.2}\quad P\{X \geqq \epsilon\} \leqq \sigma^2/ (\epsilon^2 + \sigma^2)\end{equation*} (see, e.g., [8] p. 198). In each case a distribution for $X$ is known that results in equality, so that the bounds are sharp. By a change of variable we can take $\epsilon = 1$. There are many possible multivariate extensions of (1.1) and (1.2). Those providing bounds for $P\{\max_{1 \leqq j \leqq k} |X_j| \geqq 1\}$ and $P\{|\max_{1 \leqq j \leqq k} X_j \geqq 1\}$ have been investigated in [3, 5, 9] and [4], respectively. We consider here various inequalities involving (i) the minimum component or (ii) the product of the components of a random vector. Derivations and proofs of sharpness for these two classes of extensions show remarkable similarities. Some of each type occur as special cases of a general theorem in Section 3. Bounds are given under various assumptions concerning variances, covariances and independence. Notation. We denote the vector $(1, \cdots, 1)$ by $e$ and $(0, \cdots, 0)$ by 0; the dimensionality will be clear from the context. If $x = (x_1, \cdots, x_k)$ and $y = (y_1, \cdots, y_k)$, we write $x \geqq y(x > y)$ to mean $x_j \geqq y_j(x_j > y_j), j = 1, 2, \cdots, k$. If $\Sigma = (\sigma_{ij}): k \times k$ is a moment matrix, for convenience we write $\sigma_{jj} = \sigma^2_j, j = 1, \cdots, k$. Unless otherwise stated, we assume that $\Sigma$ is positive definite.

210 citations




Journal ArticleDOI
TL;DR: The distribution of the latent roots of the covariance matrix calculated from a sample from a normal multivariate population, was found by Fisher [3], Hsu [6] and Roy [10] as mentioned in this paper.
Abstract: The distribution of the latent roots of the covariance matrix calculated from a sample from a normal multivariate population, was found by Fisher [3], Hsu [6] and Roy [10] for the special, but important case when the population covariance matrix is a scalar matrix, $\Sigma = \sigma^2I$. By use of the representation theory of the linear group, we are able to obtain the general distribution for arbitrary $\Sigma$.

Journal ArticleDOI
TL;DR: In this article, a description of the computation of tables of percentage points of the range, moments of the ranges, and percentages of the studentized range for samples from a normal population is given.
Abstract: A description is given of the computation of tables of percentage points of the range, moments of the range, and percentage points of the studentized range for samples from a normal population. Percentage points of the (standardized) range $W = w/\sigma$ corresponding to cumulative probability $P = 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.025, 0.05, 0.1 (0.1) 0.9, 0.95, 0.975, 0.99, 0.995, 0.999, 0.9995$ and $0.9999$ are given to six decimal places for samples of size $n = 2 (1) 20 (2) 40 (10) 100$. Moments (mean, variance, skewness, and elongation) of the range $W$ are given to eight or more significant figures for samples of size $n = 2 (1) 100$. Percentage points of the studentized range $Q = w/s$ corresponding to cumulative probability $P = 0.9, 0.95, 0.975, 0.99, 0.995$, and $0.999$ are given to four significant figures or four decimal places, whichever is less accurate, for samples of size $n = 2 (1) 20 (2) 40 (10) 100$, with degrees of freedom $ u = 1 (1) 20, 24, 30, 40, 60, 120$, and $\infty$ for the independent estimate $s^2$ of the population variance. All tabular values are accurate to within a unit in the last place.

Journal ArticleDOI
TL;DR: In this paper, a sufficient and necessary condition for a mixture of normal distributions to be normal was given for a product-measure mixture, and a necessary and sufficient condition was given in the case of a product measure mixture.
Abstract: If $\mathcal{F} = \{F\}$ is a family of distribution functions and $\mu$ is a measure on a Borel Field of subsets of $\mathcal{F}$ with $\mu(\mathcal{F}) = 1$, then $\int F(\cdot) d\mu (F)$ is again a distribution function which is called a $\mu$-mixture of $\mathcal{F}$. In Section 2, convergence questions when either $F_n$ or $\mu_k$ (or both) tend to limits are dealt with in the case where $\mathcal{F}$ is indexed by a finite number of parameters. In Part 3, mixtures of additively closed families are considered and the class of such $\mu$-mixtures is shown to be closed under convolution (Theorem 3). In Section 4, a sufficient as well as necessary conditions are given for a $\mu$-mixture of normal distributions to be normal. In the case of a product-measure mixture, a necessary and sufficient condition is obtained (Theorem 7). Generation of mixtures is discussed in Part 5 and the concluding remarks of Section 6 link the problem of mixtures of Poisson distributions to a moment problem.







Journal ArticleDOI
TL;DR: In this article, the authors derived analytic expressions for the average queue length and consequently also the average delay under equilibrium conditions for the BMW model for traffic flow through a fixed-cycle traffic light.
Abstract: In their book Studies in the Economics of Transportation, Beckmann, McGuire and Winsten (BMW) ([2], pp. 11-13, 40-42) proposed a simple queuing model for traffic flow through a fixed-cycle traffic light. Although they derived a relation between the average delay per car and the average length of the queue at the beginning of a red phase of the light, they only indicated some possible numerical schemes for evaluating the latter. Here we shall derive analytic expressions for the average queue length and consequently also the average delay under equilibrium conditions for the BMW model.


Journal ArticleDOI
TL;DR: In this paper, the authors characterize new classes of totally positive kernels that arise from summing independent random variables and forming related first passage time distributions, which are called totally positive of order k (TPk).
Abstract: 1. Introduction. The theory of totally positive kernels and P6lya type distributions has been decisively and extensively applied in several domains of mathematics, statistics, economics and mechanics. Totally positive kernels arise naturally in developing procedures for inverting, by differential polynomial operators [7], integral transformations defined in terms of convolution kernels. The theory of P6lya type distributions is fundamental in permitting characterizations of best statistical procedures for decision problems [8] [9] [13]. In clarifying the structure of stochastic processes with continuous path functions we encounter totally positive kernels [111 [12]. Studies in the stability of certain models in mathematical economics frequently use properties of totally positive kernels [10]. The theory of vibrations of certain types of mechanical systems (primarily coupled systems) involves aspects of the theory of totally positive kernels [5]. In this paper, we characterize new classes of totally positive kernels that arise from summing independent random variables and forming related first passage time distributions. A function f(x, y) of two real variables ranging over linearly ordered one dimensional sets X and Y respectively, is said to be totally positive of order k (TPk) if for allx < X2 < ... < Xm.yy < y2< ... < ym, (xieX;yjEY) and all 1 < m < k,

Journal ArticleDOI
TL;DR: In this article, it was shown that the relation among the parameters of the triangular association scheme themselves imply the scheme if $n = 7, but that it is false if n = 8.
Abstract: Connor [3] has shown that the relations among the parameters of the triangular association scheme themselves imply the scheme if $n \geqq 9$. This result was shown by Shrikhande [6] to hold also if $n \leqq 6$. (The problem has no meaning for $n < 4$.) This paper shows that the result holds if $n = 7$, but that it is false if $n = 8$.

Book ChapterDOI
TL;DR: In this article, lower bounds for the expected sample size of an arbitrary sequential test whose error probabilities at two parameter points, θ1 and θ2, do not exceed given numbers, α1 and α2, where E 0(N) is evaluated at a third parameter point, α 0.
Abstract: Sections 1–6 are concerned with lower bounds for the expected sample size, E 0(N), of an arbitrary sequential test whose error probabilities at two parameter points, θ1 and θ2, do not exceed given numbers, α1 and α2, where E 0(N) is evaluated at a third parameter point, θ0. The bounds in (1.3) and (1.4) are shown to be attainable or nearly attainable in certain cases where θ0 lies between θ1 and θ2. In Section 7 lower bounds for the average risk of a general sequential procedure are obtained. In Section 8 these bounds are used to derive further lower bounds for E 0(N) which in general are better than (1.3).

Journal ArticleDOI
TL;DR: In this paper, Rao et al. showed that if the variance of a random variable is not a real-valued function on the domain of the variable, then the mapping is meaningless.
Abstract: Let $X$ be a random variable governed by one of a family of distributions which is conveniently parameterized by $\mu$, the expectation of $X$, so that, in particular, the variance of $X, \sigma^2$, is a function of $\mu$, which we denote by $\sigma^2(\mu)$. A transformation, $\psi(X)$, is sometimes sought so that the variance of $\psi(X)$, as $\mu$ sweeps over its domain, is independent of $\mu$ (or much more nearly constant than $\sigma^2(\mu)$). A standard method of obtaining such a transformation for stabilization of the variance is to consider $X$ as one of a sequence of random variables, the sequence converging asymptotically in distribution, usually to a normal distribution. One form of the basic theorem is stated and proved by C. R. Rao [8], pp. 207-8, as follows. THEOREM (Rao). If $X$ is asymptotically normally distributed about $\mu$, with asymptotic variance $\sigma^2(\mu)$, then any function $\psi = \psi(X)$, with continuous first derivative in some neighborhood of $\mu$, is asymptotically normally distributed with mean $\psi(\mu)$ and variance $\sigma^2(\mu)(d\psi/d\mu)^2$, where $(d\psi/d\mu)$ denotes the derivative of $\psi(X)$ with respect to $X$, evaluated at the point $\mu$. From this we immediately have the following well-known COROLLARY. The random variable \begin{equation*}\tag{1}\psi(X) = c \int^X_K \frac{d\mu}{\sigma(\mu)}\end{equation*}, where $0 < x < \infty$, and where $K$ is an arbitrary constant, has a variance which is stabilized asymptotically at $c^2$ It is assumed, of course, that the integrand in (1) is integrable. If $\psi(X)$ is not a real-valued function on the domain of $X$, then the mapping is meaningless. Transformations such as (1), perhaps slightly modified, not only often work well for stabilizing non-asymptotic variances, but also often serve as well to normalize non-normal distributions. In general, however, nothing is known about the relative closeness to normality of the distribution of a random variable before and after a variance-stabilizing transformation is applied. Nor can anything general be said about the relative rapidity of approach to asymptotic normality. The study of concrete examples, however, suggests some connection between variance stabilization and normalization of non-normal distributions. A theoretical connection that may be relevant in certain cases has been put forward by N. L. Johnson [3], pp. 150-1. Johnson shows that, when the random variable of interest has a certain structure, then the differential equation for the normalizing transformation is similar to the differential equation for the variance-stabilizing transformation. The specified structure is that $X_n = Y_1 + Y_2G(X_1) + \cdots + Y_nG(X_{n-1})$, where the $Y'$s are independent and small, and $G(\cdot)$ is some function. In what follows, we obtain the variance-stabilizing transformation for the noncentral $t$ distributions and consider its normalizing properties. We repeat the same procedure for the topside noncentral $F$ distributions, although the variance-stabilizing transformation in this case is not well-defined. We then derive two other (well-defined) transformations for the approximate normalization of the topside noncentral F. Numerical comparisons of these approximations and the exact values are given.

Journal ArticleDOI
TL;DR: In this paper, it was shown that if the hypergeometric function can tend to one without even the requirement that k/n tends to one, then the representation of small tail probabilities can be obtained.
Abstract: Let $p$ be given, $0 < p < 1$. Let $n$ and $k$ be positive integers such that $np \leqq k \leqq n$, and let $B_n(k) = \sum^n_{r=k} \binom{n}{r} p^rq^{n-r}$, where $q = 1 - p$. It is shown that $B_n(k) = \big\lbrack\binom{n}{k} p^kq^{n - k}\big\rbrack qF(n + 1, 1; k + 1; p),$ where $F$ is the hypergeometric function. This representation seems useful for numerical and theoretical investigations of small tail probabilities. The representation yields, in particular, the result that, with $A_n(k) = \big\lbrack\binom{n}{k}p^kq^{n - k + 1}\big\rbrack \lbrack(k + 1)/(k + 1 - (n + 1)p)\rbrack$, we have $1 \leqq A_n(k)/B_n(k) \leqq 1 + x^{-2}$, where $x = (k - np)/(npq)^{\frac{1}{2}}$. Next, let $N_n(k)$ denote the normal approximation to $B_n(k)$, and let $C_n(k) = (x + \sqrt{q/np}) \sqrt{2\pi} \exp \lbrack x^2/2 \rbrack$. It is shown that $(A_nN_nC_n)/B_n \rightarrow 1$ as $n \rightarrow \infty$, provided only that $k$ varies with $n$ so that $x \geqq 0$ for each $n$. It follows hence that $A_n/B_n \rightarrow 1$ if and only if $x \rightarrow \infty$ (i.e. $B_n \rightarrow 0$). It also follows that $N_nN_n \rightarrow 1$ if and only if $A_nC_n \rightarrow 1$. This last condition reduces to $x = o(n^{1/6})$ for certain values of $p$, but is weaker for other values; in particular, there are values of $p$ for which $N_n/B_n$ can tend to one without even the requirement that $k/n$ tend to $p$.