scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1964"


Journal ArticleDOI
TL;DR: In this article, a new approach toward a theory of robust estimation is presented, which treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators that are asyptotically most robust (in a sense to be specified) among all translation invariant estimators.
Abstract: This paper contains a new approach toward a theory of robust estimation; it treats in detail the asymptotic theory of estimating a location parameter for contaminated normal distributions, and exhibits estimators—intermediaries between sample mean and sample median—that are asymptotically most robust (in a sense to be specified) among all translation invariant estimators. For the general background, see Tukey (1960) (p. 448 ff.)

5,628 citations


Journal ArticleDOI
TL;DR: In this article, the authors defined the hypergeometric functions $_pF_q$ of matrix argument which occur in the multivariate distributions are defined by their expansions in zonal polynomials as defined in Section 5.
Abstract: The paper is largely expository, but some new results are included to round out the paper and bring it up to date. The following distributions are quoted in Section 7. 1. Type $_0F_0$, exponential: (i) $\chi^2$, (ii) Wishart, (iii) latent roots of the covariance matrix. 2. Type $_1F_0$, binomial series: (i) variance ratio, $F$, (ii) latent roots with unequal population covariance matrices. 3. Type $_0F_1$, Bessel: (i) noncentral $\chi^2$, (ii) noncentral Wishart, (iii) noncentral means with known covariance. 4. Type $_1F_1$, confluent hypergeometric: (i) noncentral $F$, (ii) noncentral multivariate $F$, (iii) noncentral latent roots. 5. Type $_2F_1$, Gaussian hypergeometric: (i) multiple correlation coefficient, (ii) canonical correlation coefficients. The modifications required for the corresponding distributions derived from the complex normal distribution are outlined in Section 8, and the distributions are listed. The hypergeometric functions $_pF_q$ of matrix argument which occur in the multivariate distributions are defined in Section 4 by their expansions in zonal polynomials as defined in Section 5. Important properties of zonal polynomials and hypergeometric functions are quoted in Section 6. Formulae and methods for the calculation of zonal polynomials are given in Section 9 and the zonal polynomials up to degree 6 are given in the appendix. The distribution of quadratic forms is discussed in Section 10, orthogonal expansions of $_0F_0$ and $_1F_1$ in Laguerre polynomials in Section 11 and the asymptotic expansion of $_0F_0$ in Section 12. Section 13 has some formulae for moments.

1,432 citations



Journal ArticleDOI
TL;DR: The empirical Bayes approach is applicable when the same decision problem presents itself repeatedly and independently with a fixed but unknown a priori distribution of the parameter as mentioned in this paper, which is the case of statistical decision problems in practice.
Abstract: The empirical Bayes approach to statistical decision problems is applicable when the same decision problem presents itself repeatedly and independently with a fixed but unknown a priori distribution of the parameter. Not all decision problems in practice come to us imbedded in such a sequence, but when they do the empirical Bayes approach offers certain advantages over any approach which ignores the fact that the parameter is itself a random variable, as well as over any approach which assumes a personal or a conventional distribution of the parameter not subject to change with experience. My own interest in the empirical Bayes approach was renewed by recent work of E. Samuel [10], [11] and J. Neyman [6], to both of whom I am very much indebted. In keeping with the purpose of the Rietz Lecture I shall not confine myself to presenting new results and shall try to make the argument explicit at the risk of being tedious. In the current controversy between the Bayesian school and their opponents it is obvious that any theory of statistical inference will find itself in and out of fashion as the winds of doctrine blow. Here, then, are some remarks and references for further reading which I hope will interest my audience in thinking the matter through for themselves. Considerations of space have confined mention of the non-parametric case, and of the closely related “compound” approach in which no a priori distribution of the parameter is assumed, to the references at the end of the article.

627 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian approach is used to estimate the current mean of an object in a given trajectory from a series of observations, and a sequence of tests are designed to locate the last time point of change.
Abstract: : A tracking problem is considered. Observations are taken on the successive positions of an object traveling on a path, and it is desired to estimate its current position. The objective is to arrive at a simple formula which implicitly accounts for possible changes in direction and discounts observations taken before the latest change. To develop a reasonable procedure, a simpler problem is studied. Successive observations are taken on n independently and normally distributed random variables X sub 1, X sub 2, ..., X sub n with means mu sub 1, mu sub 2, ..., mu sub n and variance 1. Each mean mu sub i is equal to the preceding mean mu sub (i-1) except when an occasional change takes place. The object is to estimate the current mean mu sub n. This problem is studied from a Bayesian point of view. An 'ad hoc' estimator is described, which applies a combination of the A.M.O.C. Bayes estimator and a sequence of tests designed to locate the last time point of change. The various estimators are then compared by a Monte Carlo study of samples of size 9. This Bayesian approach seems to be more appropriate for the related problem of testing whether a change in mean has occurred. This test procedure is simpler than that used by Page. The power functions of the two procedures are compared. (Author)

554 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the maximum in a stationary Gaussian sequence can be characterized as a simple mixture of extreme value d.f.'s of a single type.
Abstract: Let $\{X_n, n = 0, \pm 1, \cdots\}$ be a real valued discrete parameter stationary stochastic process on a probability space $(\Omega, \mathscr{F}, P);$ for each $n = 1, 2, \cdots$, let $Z_n = \max (X_1, \cdots, X_n)$. We shall find general conditions under which the random variable $Z_n$ has a limiting distribution function (d.f.) as $n \rightarrow \infty$; that is, there exist sequences $\{a_n\}$ and $\{b_n\}, a_n > 0$, and a proper nondegenerate d.f. $\Phi(x)$ such that \begin{equation*}\tag{1.1}\lim_{n \rightarrow \infty} P\{Z_n \leqq a_nx + b_n\} = \Phi(x)\end{equation*} for each $x$ in the continuity set of $\Phi(x)$. The simplest type of stationary sequence $\{X_n\}$ is one in which the random variables are mutually independent with some common d.f. $F(x)$. In this case, $Z_n$ has the d.f. $F^n(x)$ and (1.1) becomes \begin{equation*}\tag{1.2}\lim_{n \rightarrow \infty} F^n(a_nx + b_n) = \Phi(x).\end{equation*} It is well known that in (1.2) $\Phi(x)$ is of one of exactly three types; necessary and sufficient conditions on $F$ for the validity of (1.2) are also known [9]. The three types are usually called extreme value d.f.'s [10]. Theorem 2.1 gives the limiting d.f. of $Z_n$ in a stationary sequence satisfying a certain condition on the upper tail of the conditional d.f. of $X_1$, given the "past" of sequence: the limiting d.f. is a simple mixture of extreme value d.f.'s of a single type. These are the same kind of d.f.'s found by us [3] to be the limiting d.f.'s of maxima in sequences of exchangeable random variables. The conditions of Theorem 2.1 are specialized to exchangeable and Markov sequences, and Theorem 2.2 extends the methods of Theorem 2.1 to general (not necessarily stationary) Markov sequences. It is shown that stationary Gaussian sequences, except for the trivial case of independent, identically distributed Gaussian random variables, do not obey the requirements of the hypothesis of Theorem 2.1: hence, Sections 3, 4, and 5 are devoted to a detailed study of the maximum in a stationary Gaussian sequence. Theorem 3.1 provides conditions on the rate of convergence of the covariance sequence to 0 which are sufficient for $Z_n$ to have the same extreme value limiting d.f. as in the case of independence, namely, $\exp (-e^{-x})$. The relation of these conditions to the spectral d.f. of the process is also discussed. A weaker condition on the covariance sequence ensures the "relative stability in probability" of $Z_n$ (Theorem 4.1). Theorem 5.1 describes the behavior of $Z_n$ when the spectrum has a discrete component with "not too many large jumps" and a "smooth" continuous component: when properly normalized, $Z_n$ converges in probability to a random variable representing the maximum of the process corresponding to the discrete spectral component. A special case was given by us in [2]. We now summarize some known results used in the sequel. The extreme value d.f.'s are continuous, so that (1.2) holds for all $x$; furthermore, this holds if and only if it holds for all $x$ satisfying $0 < \Phi(x) < 1.$ (1.2) implies that for all such $x$ $0 < F^n (a_nx + b_n) < 1,\quad\text{for all large} n,$ and \begin{equation*}\tag{1.3}\lim_{n \rightarrow \infty} F(a_nx + b_n) = 1.\end{equation*} Let $x_\infty$ be the supremum of all real numbers $x'$ for which $F(x') < 1$; then, for all $x$ satisfying $0 < \Phi(x) < 1$, we have \begin{equation*}\tag{1.4}\lim_{n \rightarrow \infty} a_nx + b_n = x_\infty.\end{equation*} From (1.3), and the asymptotic relation $-\log F \sim (1 - F), F \rightarrow 1$, we see that (1.2) holds if and only if \begin{equation*}\tag{1.5}\lim_{n \rightarrow \infty} n\lbrack 1 - F (a_nx + b_n)\rbrack = -\log \Phi(x).\end{equation*}

365 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present two different methods for obtaining simultaneous confidence intervals for judging contrasts among multinomial populations, and compare the intervals obtained by these methods with the simultaneous confidence interval obtained earlier by Gold [5] for all linear functions of multinometric probabilities.
Abstract: In the present article, we shall present two different methods for obtaining simultaneous confidence intervals for judging contrasts among multinomial populations, and we shall compare the intervals obtained by these methods with the simultaneous confidence intervals obtained earlier by Gold [5] for all linear functions of multinomial probabilities. One of the methods presented herein is particularly suited to the situation where all contrasts among, say, $I$ multinomial populations may be of interest, where each population consists of, say, $J$ classes. The other method presented herein is suited to certain situations where a specific set of contrasts among these populations is of interest; e.g., where, for each of the $\frac{1}{2}I(I - 1)$ pairs of populations, the $J$ contrasts between the corresponding probabilities associated with the twopopulations in the pair are of interest. For judging all contrasts among the $I$ multinomial populations, the confidence intervals obtained with the first method presented herein have the desirable property that they are shorter than the corresponding intervals obtained with the method presented by Gold [5]. For judging the $\frac{1}{2}I(I - 1)J$ pair-wise contrasts between the multinomial populations, the confidence intervals obtained with the second method presented herein have the desirable property that they are shorter than the corresponding intervals obtained with the first method, for the usual probability levels. In the present paper we shall also solve a problem first mentioned by Gold [5] but left unsolved in the earlier article. Gold took note of the fact that, in the usual analysis of variance context, the simultaneous confidence intervals obtained by Scheffe [14] and by Turkey [15] for judging contrasts among the parameters have the desirable property that rejection of the homogeneity hypothesis by the usual $F$ or Studentized range test implies the existence of at least one relevant contrast for which the corresponding confidence interval does not cover zero (see, for example, [14], pp. 66-77). She also noted that a result analogous to the Scheffe-Tukey result had not yet been obtained for her simultaneous confidence intervals, and she stated that the difficulty seemed to be that the homogeneity test is based on a $\chi^2$ statistic with $(I - 1)(J - 1)$ degrees of freedom in the case where $I$ multinomial populations, each consisting of $J$ classes, are tested for homogeneity, whereas her confidence intervals were based upon the $\chi^2$ distribution with $I(J - 1)$ degrees of freedom. In the present article, one of the methods we shall present for obtaining simultaneous confidence intervals for the contrasts among the $I$ multinomial populations will be based upon the $\chi^2$ distribution with $(I - 1)(J - 1)$ degrees of freedom, and these intervals will have desirable properties somewhat analogous to those enjoyed in the analysis of variance by the Scheffe confidence intervals and by the Tukey confidence intervals. A modification of the usual test of the null hypothesis that the $I$ multinomial populations are homogeneous will be presented herein, and we shall show that this modified test will lead to rejection of the null hypothesis if and only if there is at least one contrast, of the kind presented herein, for which the relevant confidence interval does not cover zero.

359 citations


Journal ArticleDOI
TL;DR: In this article, the authors derived necessary and sufficient conditions for asymptotic normality of estimates based on simple random sampling without replacement from a finite population, and thus solved a comparatively old problem initiated by W. G. Madow [8].
Abstract: In [3] the author established necessary and sufficient conditions for asymptotic normality of estimates based on simple random sampling without replacement from a finite population, and thus solved a comparatively old problem initiated by W. G. Madow [8]. The solution was obtained by approximating simple random sampling by so called Poisson sampling, which may be decomposed into independent subexperiments, each associated with a single unit in the population. In the present paper the same method is used for deriving asymptotic normality conditions for a special kind of sampling with varying probabilities called here rejective sampling. Rejective sampling may be realized by $n$ independent draws of one unit with fixed probabilities, generally varying from unit to unit, given the condition that samples in which all units are not distinct are rejected. If the drawing probabilities are constant, rejective sampling coincides, with simple random sampling without replacement, and so the present paper is a generalization of [3]. Basic facts about rejective sampling are exposed in Section 2. To obtain more refined results, Poisson sampling is introduced and analyzed (Section 3) and then related to rejective sampling (Section 4). Next three sections deal with probabilities of inclusion, variance formulas and asymptotic normality of estimators for rejective sampling. In Section 8 asymptotic formulas are tested numerically and applications to sample surveys are indicated. The paper is concluded by short-cuts in practical performance of rejective sampling. The readers interested in applications only may concentrate upon Sections 1, 8 and 9. Those interested in the theory of mean values and variances only, may omit Lemma 4.3 and Section 7.

327 citations


Journal ArticleDOI
TL;DR: In this article, sequential procedures are given for selecting the normal population with the greatest mean when (a) the populations have a common known variance or (b) the $k$ populations had a common but unknown variance, so that in each case the probability of making the correct selection exceeds a specified value when the greatest means exceeds all other means by at least a specified amount.
Abstract: In this paper sequential procedures are given for selecting the normal population with the greatest mean when (a) the $k$ populations have a common known variance or (b) the $k$ populations have a common but unknown variance, so that in each case the probability of making the correct selection exceeds a specified value when the greatest mean exceeds all other means by at least a specified amount. The procedures in the present paper all have the property that inferior populations can be eliminated from further consideration as the experiment proceeds.

238 citations



Journal ArticleDOI
TL;DR: The distribution of the number of successes in $n$ independent trials is "bell-shaped" as discussed by the authors, where the expected number of success is either determined uniquely or restricted to the pair of integers nearest to the starting point.
Abstract: The distribution of the number of successes in $n$ independent trials is "bell-shaped". The expected number of successes, $\mu$ say, either determines the most probable number of successes uniquely or restricts it to the pair of integers nearest to $\mu$.


Journal ArticleDOI
TL;DR: In this paper, the authors studied Doeblin Ratio limit laws, the weak and strong laws of large numbers, and the Central Limit theorem for Markov Renewal processes for the special case of a Markov chain.
Abstract: This paper is a study of Doeblin Ratio limit laws, the weak and strong laws of large numbers, and the Central Limit theorem for Markov Renewal processes. A general definition of these processes is given in Section 2. The means and variances of random variables associated with recurrence times are computed in Section 4. When restricted to the special case of a Markov chain, certain of the results of Sections 5 and 6 strengthen known results.


Journal ArticleDOI
TL;DR: In this article, the authors describe how data from a multinomial distribution, and in particular data in the form of a contingency table, may be studied by using a prior distribution of the parameters and expressing the results in a posterior distribution, or some aspects thereof, of the parameter parameters.
Abstract: Summary. This paper describes how data from a multinomial distribution, and in particular data in the form of a contingency table, may be studied by using a prior distribution of the parameters and expressing the results in the form of a posterior distribution, or some aspects thereof, of the parameters. The analysis used must depend on the prior distribution and the form described here only applies to a certain type of prior knowledge but, for reasons given below, it is believed that this type is of frequent occurrence. The binomial situation is first considered and the results obtained there suggest a general result for the multinomial distribution, which is then established. A few remarks on Bayesian analysis in general enable the result to be applied, first to certain multinomial problems and then, with the aid of another general result, to contingency tables. The method used there has close connections with the Analysis of Variance and these connections are examined, particularly with a view to simplifying the analysis of contingency tables involving three or more factors. 1. Binomial distributions. Although it will appear as a special case of results to be established for the general multinomial situation, it is instructive to begin with the binomial distribution which suggested the generalizations. Let N independent trials with constant probability 0 of success result in n successes and (N - n) failures. The likelihood is

Journal ArticleDOI
TL;DR: In this article, the Wishart distribution plays the role of the chi-square distribution in the multivariate case, and several generalizations which lead to multivariate analogs of the Beta or F distribution are given.
Abstract: 1. Summary and introduction. If X and Y are independent random variables having chi-square distributions with n and m degrees of freedom, respectively, then except for constants, X/Y and X/(X + Y) are distributed as F and Beta variables. In the multivariate case, the Wishart distribution plays the role of the chi-square distribution. There is, however, no single natural generalization of a ratio in the multivariate case. In this paper several generalizations which lead to multivariate analogs of the Beta or F distribution are given. Some of these distributions arise naturally from a consideration of the sufficient statistic or maximal invariant in various multivariate problems, e.g., (i) testing that k normal populations are identical [1], p. 251, (ii) multivariate analysis of variance tests [9], (iii) multivariate slippage problems [4], p. 321. Although several of the results may be known as folklore, they have not been explicitly stated. Other of the distributions obtained are new. Intimately related to some of the distributional problems is the independence of certain statistics, and results in this direction are also given. 2. Notation a,nd comments. If V and W are symmetric matrices, V > W means that V - W is positive definite. I, denotes the identity of order p; the subscript is omitted when the dimensionality is clear from the context. We write etr A to mean exp tr A. X - Y means that X and Y have the same distribution. V 'W (, p, n) means that V is a p X p symmetric matrix whose p(p + 1)/2 elements are random variables having a Wishart distribution with (non-degenerate) covariance matrix =_ A-' and n degrees of freedom (n _ p assumed throughout), i.e., with density function.

Journal ArticleDOI
TL;DR: In this paper, the authors focus on determining prior distributions on ignorance over parameter spaces, using invariance techniques similar to those of decision theory, and present a number of less compelling methods for exact determination.
Abstract: The paper is mainly concerned with determining prior distributions on ignorance over parameter spaces, using invariance techniques similar to those of decision theory. Prior distributions are rarely determined exactly by such techniques and a number of less compelling methods for exact determination are given.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the goodness of fit statistic is a quadratic form in observed proportions when the observed proportions are close to the expected proportions, and the asymptotic efficiency of the maximum likelihood estimator is proved at the same time.
Abstract: This paper is concerned with the theorem that the $X^2$ goodness of fit statistic for a multinomial distribution with $r$ cells and with $s$ parameters fitted by the method of maximum likelihood is distributed as $\chi^2$ with $r - s - 1$ degrees of freedom. Karl Pearson formulated and proved the theorem for the special case $s = 0$. The general theorem was formulated by Fisher [2]. The first attempt at a rigorous proof is due to Cramer [1]. A serious weakness of Cramer's proof is that, in effect, he assumes that the maximum likelihood estimator is consistent. (To be precise, he proves the theorem for the subclass of maximum likelihood estimators that are consistent. But how are we in practice to distinguish between an inconsistent maximum likelihood estimator and a consistent one?) Rao [3] has closed this gap in Cramer's proof by proving the consistency of maximum likelihood for any family of discrete distributions under very general conditions. In this paper the theorem is proved under more general conditions than the combined conditions of Rao and Cramer. Cramer assumes the existence of continuous second partial derivatives with respect to the "unknown" parameter while here only total differentiability at the "true" parameter values is postulated. There is a radical difference in the method of proof. While Cramer regards the maximum likelihood estimate as being the point where the derivative of the log-likelihood function is zero, here it is regarded as the point at which the likelihood function takes values arbitrarily near to its supremum. The method of proof consists essentially of showing that the goodness of fit statistic is a quadratic form in the observed proportions when the observed proportions are close to the expected proportions. The known asymptotic properties of the multinomial distribution are then used. The asymptotic efficiency of the maximum likelihood estimator is proved at the same time.


Journal ArticleDOI
TL;DR: In this article, a class of prior distributions on the space of all substochastic distributions on positive integers is given, such that along almost all sample sequences the corresponding posterior distributions of the expectations of all bounded functions on the positive integers are asymptotically normal.
Abstract: This paper extends some of the results obtained by Freedman [2]. In Section 1 a class of prior distributions on the space of all substochastic distributions on the positive integers is given, such that along almost all sample sequences the corresponding posterior distributions of the expectations of all bounded functions on the positive integers are asymptotically normal. Section 2 shows that most of Freedman's results carry over to the case of distributions on the closed unit interval.

Journal ArticleDOI
TL;DR: In this article, it is shown that if a qualitative probability is atomless and monotonely continuous, then there is one and only one probability measure compatible with it, and this probability measure is countably additive.
Abstract: The first clear and precise statement of the axioms of qualitative probability was given by de Finetti ([1], Section 13). A more detailed treatment, based however on more complex axioms for conditional qualitative probability, was given later by Koopman [5]. De Finetti and Koopman derived a probability measure from a qualitative probability under the assumption that, for any integer $n$, there are $n$ mutually exclusive, equally probable events. L. J. Savage [6] has shown that this strong assumption is unnecessary. More precisely, he proves that if a qualitative probability is only fine and tight, then there is one and only one probability measure compatible with it. No property equivalent to countable additivity has been used as yet in the development of qualitative probability theory. However, since the concept of countable additivity is of such fundamental importance in measure theory, it is to be expected that an equivalent property would be of interest in qualitative probability theory, and that in particular it would simplify the proof of the existence of compatible probability measures. Such a property is introduced in this paper, under the name of monotone continuity. It is shown that, if a qualitative probability is atomless and monotonely continuous, then there is one and only one probability measure compatible with it, and this probability measure is countably additive. It is also proved that any fine and tight qualitative probability can be extended to a monotonely continuous qualitative probability, and therefore, contrary to what might be expected, there is no loss in generality if we consider only qualitative probabilities which are monotonely continuous. At the present time there is still a controversy over the interpretation which should be given to the word probability in the scientific and technical literature. Although the present writer subscribes to the opinion that this interpretation may be different in different contexts, in this paper we do not enter into this controversy. We simply remark that a qualitative probability, as a numerical one, may be interpreted either as an objective or as a subjective probability, and therefore the following axiomatic theory is compatible with both interpretations of probability.

Journal ArticleDOI
TL;DR: In this paper, a sufficient condition for the validity of (1) at a given value of the real valued parameter (theta), say, say, 0, is obtained.
Abstract: For each $n$ let $t_n$ be an estimate (based on $n$ independent and identically distributed observations) of a real valued parameter $\theta$. Suppose that, for each $\theta, n^{\frac{1}{2}}(t_n - \theta)$ is asymptotically normally distributed with mean zero and variance $v(\theta)$. According to R. A. Fisher we then have \begin{equation*}\tag{(1)}v(\theta) \geqq I^{-1}(\theta),\end{equation*} where $I$ is the information contained in a single observation. It is known however that, in the absence of regularity conditions on the sequence $\{t_n\}$, (1) does not necessarily hold for each $\theta$. On the other hand, according to LeCam (1952, 1953, 1958) the set of points $\theta$ for which (1) fails is always of Lebesgue measure zero. This note gives a simple proof of the stated result of LeCam, along the following lines. First a sufficient condition for the validity of (1) at a given value of $\theta$, say $\theta^0$, is obtained. This is a little weaker than the condition that $t_n$ be asymptotically median-unbiased (i.e. $P(t_n 0$ and a $\mathscr{B}$-measurable function $M(x)$ such that $|L" (\theta \mid x)| \leqq M(x)$ for all $x \varepsilon X$ and all $\theta \varepsilon (\theta^0 - \delta, \theta^0 + \delta)$, and such that $E(M(x) \mid \theta^0) \theta^0 - n^{-\frac{1}{2}} \mid \theta^0 - n^{-\frac{1}{2}}) \leqq \frac{1}{2},\end{equation*} then also (1) holds for $\theta = \theta^0$. Another consequence of Proposition 1 is that if (6) holds uniformly for $\theta$ in some open interval of $\Theta$ then (1) holds for each $\theta$ in that interval. A somewhat weaker conclusion concerning the sufficiency of uniform convergence for (1) has been obtained independently by Rao (1963). The sequence $\{t_n\}$ is said to be superefficient if $v(\theta) \leqq I^{-1}(\theta)$ for all $\theta$ and the inequality is strict for at least one $\theta$. Examples of superefficient estimates were discovered by J. L. Hodges, Jr. (cf. LeCam (1953)). General studies bearing on superefficiency, using methods different from the present ones, were carried out by LeCam (1953, 1958). An informal discussion along lines similar to those of LeCam was given independently by Wolfowitz (1953). It is shown in LeCam (1953) that if $\{t_n\}$ is superefficient then $v(\theta) = I^{-1}(\theta)$ for almost all $\theta$ in $\Theta$; the following more general conclusion is given in LeCam (1958): PROPOSITION 2. The set of all $\theta$ in $\Theta$ for which (1) does not hold is of Lebesgue measure zero. It was observed by Chernoff (1956) that the asymptotic variance of an estimate is always a lower bound to the asymptotic expected squared error; in view of Proposition 2, this observation yields: PROPOSITION 3. $\lim\inf_{n \rightarrow \infty} \{ nE\lbrack (t_n - \theta)^2 \mid \theta\rbrack\} \geqq I^{-1}(\theta)$ for almost all $\theta$ in $\Theta$. The conclusions stated in this section can be extended to the case when $\theta$ is a $p$ dimensional parameter; a brief account of these extensions is given in Section 3. An extension to sampling frameworks more general than the present one of independent and identically distributed observations is described in Section 4.


Journal ArticleDOI
TL;DR: In this paper, a formal solution for the stationary distribution of queue-length at a fixed-cycle traffic light is found for a fairly general distribution of arrivals and for a single stream of vehicles which either all turn left or else all go straight on or turn right.
Abstract: A formal solution for the stationary distribution of queue-length at a fixed-cycle traffic light is found for a fairly general distribution of arrivals and for a single stream of vehicles which either all turn left or else all go straight on or turn right. (We assume that the vehicles are driving on the right of the road.) Some inequalities are derived for the expected queue-length and for the expected delay per vehicle.

Journal ArticleDOI
TL;DR: In this paper, a solution is given to the problem of determining at which points in the interval −1, 1/brack observations should be taken and what proportion of the observations at each such point so as to minimize the variance of the predicted value of a polynomial regression curve at a specified point beyong the interval observations.
Abstract: A solution is given to the problem of how to determine at which points in the interval $\lbrack -1, 1\rbrack$ observations should be taken and what proportion of the observations should be taken at each such point so as to minimize the variance of the predicted value of a polynomial regression curve at a specified point beyong the interval observations. The solution obtained states that the points are to be chosen to be Chebychev points and the number of observations are to be selected proportional to the absolute value of the corresponding Lagrange polynomial at the specified point. The preceding Chebychev solution becomes the minimax solution for the interval $(-1, t),$ provided $t > t_1 > 1$ where $t_1$ is a value satisfying a certain equation. Under the customary normality assumptions, the Chebychev solution to the prediction problem is used to construct a confidence band for a polynomial curve that will possess minimum width at any specified point beyond the interval of observations.

Journal ArticleDOI
TL;DR: In this article, the authors present sufficient conditions under which the distribution of the sample average shows increasing peakedness with increasing sample size and provide a comparison of the peakedness of distributions of various convex combinations of sample observations.
Abstract: : This paper presents simple sufficient conditions under which the distribution of the sample average shows increasing peakedness with increasing sample size The results are actually more general, permitting a comparison of the peakedness of distributions of various convex combinations of sample observations

Journal ArticleDOI
TL;DR: In this paper, a distribution of a sum of identically distributed Gamma-variables correlated according to an exponential autocorrelation law is derived, and an approximate distribution of the sum of these variables under the assumption that the sum itself is a Gamma-variable is given.
Abstract: A distribution of a sum of identically distributed Gamma-variables correlated according to an "exponential" autocorrelation law $\rho_{kj} = \rho^{|k - j|}(k, j = 1, \cdots n)$ where $\rho_{kj}$ is the correlation coefficient between the $k$th and $j$th random variables and $0 < \rho < 1$ is a given number is derived. An "approximate" distribution of the sum of these variables under the assumption that the sum itself is a Gamma-variable is given. A comparison between exact and approximate distributions for certain values of the correlation coefficient, the number of variables in the sum and the values of parameters of the initial distributions is presented.


Journal ArticleDOI
TL;DR: In this article, a modification of the general linear model for multivariate analysis, E(Y'M) = AtM, where YT(N X p) is a matrix which contains all observations, A(n X m) is the design matrix, and l (m X p), a matrix of parameters.
Abstract: or less by accident. In this paper we shall be concerned with experiments where variables are missing not by accident, but by design. As an example encountered frequently in psychological research, consider the construction of standardized tests. One phase in the standardization of such tests is the estimation of correlations between parallel forms. If three or more such forms are required, as is frequently the case for tests to be applied on the national level, estimation of correlation coefficients would necessitate the application of all forms to a representative standardization group. The application of more than two forms to the same student may however introduce errors, for recall, learning, or fatigue may seriously influence the results. A given student in the standardization group may receive only two tests, and symmetry suggests that an equal number of students be tested on each pair of examinations. To facilitate the handling of rather general situations, we shall assume a modification of the general linear model for multivariate analysis, E(Y'M) = AtM, where YT(N X p) is a matrix which contains all observations, A(N X m) is the design matrix, and l (m X p), a matrix of parameters. The matrix M, of order (p X u), was introduced by Roy [8] for allowing given linear combinations of variables in the model. It is particularly useful in the present case since, by a suitable array of ones and zeros in the matrix M, we can indicate whether or not a particular variable is observed in a given group of subjects. It will be recalled that models for simple and multiple regression and analysis of variance and covariance are special cases of this general linear model. In accordance with customary assumptions made in this model, we shall