scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1968"


Journal ArticleDOI
TL;DR: In this paper, it was shown that a family of cumulative distribution functions (cdf's) induces identifiable finite mixtures if and only if the cdf is linearly independent in its span over the field of real numbers.
Abstract: H. Teicher [5] has initiated a valuable study of the identifiability of finite mixtures (these terms to be defined in the next section), revealing a sufficiency condition that a class of finite mixtures be identifiable and from this, establishing the identifiability of all finite mixtures of one-dimensional Gaussian distributions and all finite mixtures of gamma distributions. From other considerations, he has generalized [4] a result of Feller [1] that arbitrary (and hence finite) mixtures of Poisson distributions are identifiable, and has also shown binomial and uniform families do not generate identifiable mixtures. In this paper it is proven that a family $\mathscr{F}$ of cumulative distribution functions (cdf's) induces identifiable finite mixtures if and only if $\mathscr{F}$ is linearly independent in its span over the field of real numbers. Also we demonstrate that finite mixtures of $\mathscr{F}$ are identifiable if $\mathscr{F}$ is any of the following: the family of $n$ products of exponential distributions, the multivariate Gaussian family, the union of the last two families, the family of one-dimensional Cauchy distributions, and the non-degenerate members of the family of one-dimensional negative binomial distributions. Finally it is shown that the translation-parameter family generated by any one-dimensional cdf yields identifiable finite mixtures.

489 citations


Book ChapterDOI
TL;DR: In this paper, the Banach space of bounded continuous real-valued functions on a separable metric space is defined, and the Borel probability measures on the space are defined.
Abstract: Let (S, d) be a separable metric space. Let \( P; > \left( S \right)\) be the set of Borel probability measures on S. \(C\left( S \right)\) denotes the Banach space of bounded continuous real-valued functions on S, with norm $$\left\| f \right\|_\infty= \sup \left\{ {\left| {f\left( x \right)} \right|:x{\text{ }}\varepsilon {\text{ }}S} \right\}.$$

324 citations


Journal ArticleDOI
TL;DR: Theorem 2.2.2 is weaker than (2.3) in this article, which corresponds to the usual Lindeberg condition (3.4) in Sacks (1958).
Abstract: A new method, simpler than previous methods due to Chung (1954) and Sacks (1958), is used to prove Theorem 2.2 below, which implies in a simple way all known results on asymptotic normality in various cases of stochastic approximation. Two examples of application are concerned with Venter's (1967) extension of the RM method and Fabian's (1967) modification of the KW process. Previously, although there was no difficulty in adopting one or the other method, the proofs in various cases had to be done almost ab initio or skipped leaving a gap (see Venter (1967)). The new proof is similar to that of Chung except that the basic recurrence relation is used to obtain the asymptotic characteristic function rather than limits of all moments. We remark that Lemma 2.1, a simple corollary to Chung's lemma is used only to obtain condition (2.2.4) which is weaker than (2.2.3) if $\alpha = 1$ and which corresponds to the usual Lindeberg condition. Both conditions (2.2.3) and (2.2.4) are weaker than the corresponding condition (3.4) in Sacks (1958). In what follows $(\Omega, \mathscr{S}, P)$ will be a probability space, relations between and convergence of random variables, vectors, and matrices will be meant with probability one unless specified otherwise. We shall write $X_n \sim \mathscr{L}$ if $X_n$ is asymptotically $\mathscr{L}$-distributed and $X_n \sim Y_n$, for two sequences of random vectors, if for any $\mathscr{L}, X_n \sim \mathscr{L}$ if and only if $Y_n \sim \mathscr{L}$. The indicator function of a set $A$ will be denoted by $\chi A$, the expectation and conditional expectation by $E$ and $E_F$, respectively. $R^k$ is the $k$-dimensional Euclidean space the elements of which are considered to be column vectors, $R = R^1, R^{k\times k}$ is the space of all real $k \times k$ matrices. The symbols $\mathbf{R}, \mathbf{R}^k, \mathbf{R}^{k\times k}$, denote sets of all measurable transformations from $(\Omega, \mathscr{S})$ to $R, R^k R^{k\times k}$, respectively. The unit matrix in $R^{k\times k}$ is denoted by $I$ and $| |$ is the Euclidean norm. With $h_n$ a sequence of numbers, $o(h_n), O(h_n), o_u(h_n), O_u(h_n)$ denote sequences $g_n, G_n, q_n, Q_n$, say, of elements in one of the sets $\mathbf{R}, \mathbf{R}^k, \mathbf{R}^{k\times k}$ such that $h_n^{-1} g_n \rightarrow 0, | h_n^{-1} G_n|\leqq f$ for an $f \varepsilon \mathbf{R}$ and all $n, h_n^{-1} q_n \rightarrow O$ uniformly on a set of probability one, $| h_n^{-1} Q_n| \leqq K$ for a $K \varepsilon R$ and all $n$. In special cases $o(h_n)$ may be constant on $\Omega$ and considered as a sequence with elements in $R, R^k$ or $R^{k\times k}$. Similarly in other cases. For Chung's lemma, which will be frequently referred to, or used later without reference, see Fabian ((1967), Lemma 4.2); note that it holds with $\beta = 0$, too.

316 citations


Journal ArticleDOI
TL;DR: In this paper, the requirements concerning the scores-generating function are relaxed to a minimum: they assume that this function is a difference of two non-decreasing and square integrable functions.
Abstract: This is a straightforward continuation of Hajek (1968). We provide a further extension of the Chernoff-Savage (1958) limit theorem. The requirements concerning the scores-generating function are relaxed to a minimum: we assume that this function is a difference of two non-decreasing and square integrable functions. Thus, in contradistinction to Hajek (1968), we dropped the assumption of absolute continuity. The main results are accumulated in Section 2 without proofs. The proofs are given in Sections 4 through 7. Section 3 contains auxiliary results.

284 citations



Journal ArticleDOI
TL;DR: In this paper, the upper and lower probabilities of a general fixed closed interval determined by a general random closed interval are derived for binomial $p$ inference about binomial values.
Abstract: Within the class of models producing upper and lower probability systems, as discussed in Dempster (1967a), a simple and important subclass may be characterized by random intervals on the line. Detailed expressions are given here for the upper and lower probabilities of a general fixed closed interval determined by a general random closed interval. Such random closed intervals occur in the applications of the general class of models to statistical inference described in Dempster (1966, 1967b, 1968). The illustration given here concerns inference about binomial $p$ and stresses the flexibility allowed in the introduction of prior information.

270 citations



Journal ArticleDOI
TL;DR: In this article, it was shown that the non-decreasing function of correlations is not always a function of the covariance matrix of the correlation matrix, and that it is true locally in the case of equicorrelated variables.
Abstract: For a random vector $(X_1,\cdots, X_k)$ having a $k$-variate normal distribution with zero mean values, Slepian [16] has proved that the probability $P\{X_1 < c_1,\cdots, X_k < c_k\}$ is a non-decreasing function of correlations. The present paper deals with the "two-sided" analogue of this problem, namely, if also the probability $P\{|X_1| < c_1,\cdots, |X_k| < c_k\}$ is a non-decreasing function of correlations. It is shown that this is true in the important special case where the correlations are of the form $\lambda_i\lambda_j\rho_{ij}, \{\rho_{ij}\}$ being some fixed correlation matrix (Section 1), and that it is true locally in the case of equicorrelated variables (Section 3). However, some counterexamples are offered showing that a complete analogue of Slepian's result does not hold in general (Section 4). Some applications of the main positive result are mentioned briefly (Section 2).

213 citations



Journal ArticleDOI
TL;DR: In this paper, a generalization of the Cornish-Fisher expansion to arbitrary analytic polynomials is presented, where the polynomial terms are represented as sums of products of Hermite polynoms.
Abstract: Let $\{F_n(x)\}$ be a sequence of distribution functions depending on a parameter $n$, and converging to a limiting distribution $\Phi(x)$ as $n$ increases. Then a generalized expansion of Cornish-Fisher type is an asymptotic relation between the quantiles of $F_n$ and $\Phi$. The original Cornish-Fisher formulae [3], [5] provided leading terms of these expansions in the case of normal $\Phi$, expressing a normal deviate in terms of the corresponding quantile of $F_n$ and its cumulants (the "normalizing" expansion) and, conversely, the quantiles of $F_n$ in terms of its cumulants and the corresponding quantiles of $\Phi$ (the "inverse" expansion). The value of both these asymptotic formulae has been well illustrated by their use in approximating the quantiles of complicated distributions (Johnson and Welch [9], Fisher [4], Goldberg and Levine [6]), and for obtaining random quantiles for distribution sampling applications (Teichroew [13], Bol'shev [2]). For a survey of the literature on Cornish-Fisher expansions, and some discussion of their validity, see Wallace ([14], Section 4). In Sections 2, 3 of the present paper, formal expansions are obtained which generalize the Cornish-Fisher relations to arbitrary analytic $\Phi$. Essentially, these expansions provide algorithms for transforming an asymptotic expansion of $F_n$ in terms of the "standard" distribution $\Phi$ into asymptotic relations between the quantiles of these distributions. The "standardizing" expansion of the quantile $u$ of $\Phi$ in terms of the corresponding quantile $x$ of $F_n$ is expressed (Section 2) in terms of a sequence of functions defined by a differential recurrence operator. A similar differential operator appears in the generalized "inverse" expansion for $x$ in terms of $u$ (Section 3), which arises from the application of Lagrange's inversion formula to the equation of quantiles. An asymptotic expansion for quantiles of the Wilks likelihood ratio criterion is given as an example. Formal expansions in terms of the cumulants of $F_n$ and $\Phi$ are obtained in Section 4 by developing $F_n$ about $\Phi$ as a Charlier differential series and collecting terms of like degree in the resulting exponential series. For known cumulants and for normal $\Phi$ these formal expressions reduce, as shown in Section 5, to a general form of the Cornish-Fisher expansions, in which the polynomial terms are represented as sums of products of Hermite polynomials. This representation is shown in Section 6 to account for some properties of the Cornish-Fisher polynomials.

202 citations




Journal ArticleDOI
TL;DR: In this paper, a multivariate approach for the construction of a class of aligned rank order tests for the analysis of variance (ANOVA) problem relating to two-way layouts is presented.
Abstract: 1. Summary and introduction. The present investigation is concerned with the formulation of a multivariate approach for the construction of a class of aligned rank order tests for the analysis of variance (ANOVA) problem relating to two-way layouts. The problems of simultaneous testing and testing for ordered alternatives based on aligned rank order statistics are also considered. Various efficiency results pertaining to the proposed tests are studied. Let us consider a two factor experiment comprising n blocks, each block containing p( > 2) plots receiving p different treatments. In accordance with the two-way ANOVA model, we express the yield Xij of the plot receiving the jth treatment in the ith block as

Journal ArticleDOI
TL;DR: In this paper, the best upper and lower bounds on the integral of a probability measure on a fixed measurable space were derived for Borel measurable functions on a real-valued Borel space.
Abstract: Let $g_1, \cdots, g_n$ and $h$ be given real-valued Borel measurable functions on a fixed measurable space $T = (T, \mathscr{A})$. We shall be interested in methods for determining the best upper and lower bound on the integral $\mu(h) = \int_Th(t)\mu(dt),$ given that $\mu$ is a probability measure on $T$ with known moments $\mu(g_j) = y_j, j = 1, \cdots, n$. More precisely, denote by $\mathscr{M}^+ = \mathscr{M}^+(T)$ the collection of all probability measures on $T$ such that $\mu(|g_j|) < \infty (j = 1, \cdots, n)$ and $\mu(|h|) < \infty$. For each $y = (y_1, \cdots, y_n) \varepsilon R^n$, consider the bounds $L(y) = L(y | h) = \inf \mu(h), U(y) = U(y | h) = \sup \mu(h),$ where $\mu$ is restricted by $\mu \varepsilon \mathscr{M}^+(T); \mu(g_1) = y_1, \cdots, \mu(g_n) = y_n.$ If there is no such measure $\mu$ we put $L(y) = + \infty, U(y) = - \infty$. In many applications, $h$ is the characteristic function (indicator function) $h = I_s$ of a given measurable subset $S$ of $T$. In that case we usually write instead $L(y | I_s) = L_s(y), U(y | I_s) = U_s(y)$. Thus, $L_s(y) \leqq \mu(S) \leqq U_s(y)$ are the best possible bounds on the probability mass $\mu(S)$ contained in $S$, given that $\mu \varepsilon \mathscr{M}^+$ and that $\mu(g) = y$. Here, $g$ denotes the mapping $g:T \rightarrow R^n$ defined by $g(t) = (g_1(t), \cdots, g_n(t))$. By $g_0$ we shall denote the function on $T$ with $g_0(t) = 1$ for all $t \varepsilon T$. The following tentative method for finding $L(y \mid h)$ may be said to go back to Markov [8] and Riesz [13], see [7]. Choose an $(n + 1)$-tuple $d^\ast = (d_0, d_1, \cdots, d_n)$ of real numbers such that $d_0 + d_1g_1(t) + \cdots + d_ng_n(t) \leqq h(t) \text{for all} t \varepsilon T,$ and define $B(d^\ast) = \{z \varepsilon R^n: z = g(t) \text{for some} t \varepsilon T \text{with} \sum^n_{j=0} d_jg_j(t) = h(t)\}.$ Then $L(y \mid h) = d_0 + \sum^n_{j=1} d_jy_j \text{for each} y \varepsilon \operatorname{conv} B(d^ast),$ ($\operatorname{conv} =$ convex hull). The main purpose of the present paper is to investigate the merits of this method and certain more general methods. It turns out (Theorem 5) that for almost all $y \epsilon R^n$ there exists at most one admissible $d^\ast$ with $y \varepsilon \operatorname{conv} B(d^\ast)$. Moreover, provided $y \varepsilon \int (V)$ where $V = \operatorname{conv} g(T)$, there exists at least one such $d^\ast$ if and only if there exists a measure $\mu \varepsilon \mathscr{M}^+$ with $\mu(g) = y$ and $\mu(h) = L(y \mid h)$. A sufficient condition for the latter would be that $T$ has a compact topology with respect to which $g$ is continuous and $h$ is lower semi-continuous. More interesting is a related method for finding $L(y \mid h)$, see Theorem 6, which will work for each $y \varepsilon \int (V)$ as soon as $g$ is bounded. The situation where $y ot\in \int (V)$ is discussed in Section 4. It appears that the assumption $y \varepsilon \int(V)$ is a rather natural one. We have chosen to develop the important special case $h = I_s$ in a partly independent manner, see the Sections 5, 6, and 7. In this case, the $(n + 1)$-tuple $d^\ast$ must satisfy \begin{align*}d_0 + \sum^n_{j=1} d_jz_j \leqq 1 \text{for all} z \varepsilon g(T),\\ \leqq 0 \text{for all} z \varepsilon g(S').\end{align*} Here, $S'$ denotes the complement of $S$ in $T$. Assuming that $d_1, \cdots, d_n$ are not all zero, let us associate to $d^\ast$ the pair of hyperplanes $H$ and $H'$ with equations $\sum^n_{j=1 d_jz_j} = 1 - d_0 \text{and} \sum^n_{j=1} d_jz_j = -d_0,$ respectively. This pair is such that $H, H'$ are distinct parallel hyperplanes with $g(S')$ and $H$ on opposite sides of $H'$ and $g(T)$ and $H'$ on the same side of $H;$ such a pair $H, H'$ will be said to be admissible. Observe that $B(d^\ast) = (g(S) \mathbf{cap} H) \mathbf{cup} (g(S') \mathbf{cap} H'),$ with $H, H'$ as the admissible pair determined by $d^\ast$. The present $(n + 1)$-tuple $d^\ast$ is useful, for determining $L_s(y) = L(y \mid I_s)$ for at least some points $y$, only when both $g(S) \mathbf{cap} H ot\equiv 0$ and $g(S') \mathbf{cap} H' ot\equiv 0$. That is, $H'$ should not only support the set $g(S')$ but even "intersect" it; similarly, $H$ and $g(S)$. Fortunately, one can usually replace "intersect" by "touch". More precisely (Corollary 13), if $H$ and $H'$ form an admissible pair as above then $L_s(y) = d_0 + \sum^n_{j=1} d_jy_j$ for each point $y$ such that both $y \varepsilon \int(V),\quad y \varepsilon \operatorname{conv}\lbrack\{H \mathbf{cap} \overline{\operatorname{conv}} g(S)\} \mathbf{cup} \{H' \mathbf{cap} \overline{\operatorname{conv}} g(S')\}\rbrack,$ a bar denoting closure. Provided $g$ is bounded the latter generalization will yield the value $L_s(y)$ for all relevant $y$, see Theorem 7. Whether or not $g$ is bounded, we have for almost all $y$ that there can be at most one admissible pair of hyperplanes $H$ and $H'$ yielding $L_s(y)$ in the above manner. A detailed discussion of the method on hand may be found in Section 6. The present method is geometrical in the following sense: (i) one only needs to know the sets $g(S)$ and $g(S')$ in $R^n;$ (ii) afterwards, one considers all the pairs $H$ and $H'$ of parallel hyperplanes touching $g(S)$ and $g(S')$ in the above manner. Each such pair yields $L_s(y)$ for certain values $y;$ varying the pair $H, H'$ one often obtains the value $L_s(y)$ for all relevant $y \varepsilon R^n$. Usually, there are many different regions in $y$-space, each with its own analytic formula for $L_s(y)$. Nevertheless, all these different formulae are derived from one and the same geometrical principle. A number of specific illustrations, all with $n = 2$, are presented in Section 7. They indicate that it is often quite easy to solve the following problem in a geometric manner. Let $X$ be a random variable taking its values in a measurable space $T$, such that $E(g_1(X)) = y_1,\quad E(g_2(X)) = y_2,$ with $g_1$ and $g_2$ as known real-valued Borel measurable functions on $T$. The problem is to determine the best possible lower bound $L_s(y)$ on $\mathrm{Pr} (X \varepsilon S)$ where $S$ is a given Borel measurable subset of $T$.

Journal ArticleDOI
TL;DR: In this article, an empirical stochastic process for two-sample problems is defined and its weak convergence is studied, based upon an identity which relates the two sample empirical process to the more usual one-sample empirical process.
Abstract: An empirical stochastic process for two-sample problems is defined and its weak convergence studied. The results are based upon an identity which relates the two-sample empirical process to the more usual one-sample empirical process. Based on this identity a relatively simple proof of a Chernoff-Savage theorem is obtained. The $c$-sample analogues of these results are also included.


Journal ArticleDOI
TL;DR: In this article, a random variable u = u the sum of the probabilities of the unobserved outcomes of an experiment with unknown probabilities, where u is defined as the probability that u is the probability of the unknown outcome.
Abstract: An experiment has the possible outcomes E 1, E 2, … with unknown probabilities p 1, p 2, … ; p i ≧ 0, Σ i p i = 1. In n independent trials suppose that E i occurs x i times, i = 1, 2, …, with \(\sum {_i{x_i}} = n.\) Let φ i = 1 or 0 according as x i = 0 or x i ≠ 0. Then the random variable u = \(\sum {_i{p_i}{\varphi _i}} \) the sum of the probabilities of the unobserved outcomes. How can we “estimate” u? (The quotation marks appear because u is not a parameter in the usual statistical sense


Journal ArticleDOI
TL;DR: In this article, Gumbel et al. showed that for any distribution function F(x) in the domain of attraction of any extremal distribution provided the moments are finite for sufficiently large numbers of points in the continuity set of the distribution, it is stable in probability.
Abstract: Let $Z_n$ be the maximum of $n$ independent identically distributed random variables each having the distribution function $F(x)$. If there exists a non-degenerate distribution function (df) $\Lambda(x)$, and a pair of sequence $a_n, b_n$, with $a_n > 0$, such that \begin{equation*}\tag{1.1}\lim_{n\rightarrow\infty}P\{a_n^{-1}(Z_n - b_n) \leqq x\} = \lim_{n\rightarrow\infty} F^n (a_nx + b_n) = \Lambda(x)\end{equation*} on all points in the continuity set of $\Lambda(x)$, we say that $\Lambda(x)$ is an extremal distribution, and that $F(x)$ lies in its domain of attraction. The possible forms of $\Lambda(x)$ have been completely specified, and their domains of attraction characterized by Gnedenko [5]. These results and their applications are contained in the book by Gumbel [6]. A natural question is whether the various moments of $a_n^{-1} (Z_n - b_n)$ converge to the corresponding moments of the limiting extremal distribution. Sen [9] and McCord [8] have shown that they do for certain distribution functions $F(x)$, satisfying (1.1). Von Mises ([10] pages 271-294) has shown that they do for a wide class of distribution functions having two derivatives for all sufficiently large $x$. In Section 2, the question is answered affirmatively for all distribution functions $F(x)$ in the domain of attraction of any extremal distribution provided the moments are finite for sufficiently large $n$. If there exists a sequence $a_n$ such that \begin{equation*}\tag{1.2}Z_n - a_n \rightarrow 0, \text{i.p.}\end{equation*} we say that $Z_n$ is stable in probability. If \begin{equation*}\tag{1.3}Z_n/a_n \rightarrow 1, \text{i.p.}\end{equation*} we say that $Z_n$ is relatively stable in probability. Necessary and sufficient conditions are well known for stability and relative stability both in probability (see Gnedenko [5]) and with probability one (see Geffroy [4], and Barndorff-Nielsen [1]). In Section 3 necessary and sufficient conditions are found for $m$th absolute mean stability and relative stability. The results of this work are valid for smallest values as well as for largest values.

Journal ArticleDOI
TL;DR: In this article, it was shown that the value of $m$ minimizing the asymptotic mean square error (AMSE) is of order $n^{\frac{1}{5}}$ (yielding an AMSE of order$n^{-
Abstract: Let $x_1 < x_2 < \cdots < x_n$ be an ordered random sample of size $n$ from the absolutely continuous cdf $F(x)$ with positive density $f(x)$ having a continuous first derivative in a neighborhood of the $p$th population quantile $ u_p(= F^{-1} (p))$. In order to convert the median or any other "quick estimator" [1] into a test we must estimate its variance, or for large samples its asymptotic variance which depends on $1/f( u_p)$. Siddiqui [4] proposed the estimator $S_{mn} = n(2m)^{-1}(x_{\lbrack np\rbrack+m} - x_{\lbrack np\rbrack-m+1})$ for $1/f( u_p)$, showed it is asymptotically normally distributed and suggested that $m$ be chosen to be of order $n^{\frac{1}{2}}$. In this note we show that the value of $m$ minimizing the asymptotic mean square error (AMSE) is of order $n^{\frac{1}{5}}$ (yielding an AMSE of order $n^{-\frac{4}{5}}$). Our analysis is similar to Rosenblatt's [2] study of a simple estimate of the density function.

Journal ArticleDOI
TL;DR: In this article, the authors showed that the Marcinkiewicz-Zygmund theorem holds (possibly with a different value of $C_ u$) whenever the $X$'s are independent.
Abstract: We prove the following THEOREM. Let $\{S_n, n \geqq 1\}$ be a martingale, $S_0 = 0, X_n = S_n - S_{n-1}, \gamma_{ u n} = E(|X_n|^ u)$ and $\beta_{ u n} = (1/n) \sum^n_{j = 1} \gamma_{ u j}$. Then for all $ u\geqq 2$ and $n = 1, 2, \cdots$ \begin{equation*}\tag{1.1}E(|S_n|^ u) \leqq C_ u n^{ u/2}\beta_{ u n},\end{equation*} where \begin{equation*}\tag{1.2}C_ u = \lbrack 8( u - 1) \max (1, 2^{ u - 3})\rbrack^ u.\end{equation*} As shown by Chung ([3], pp. 348-349) an inequality of Marcinkiewicz and Zygmund ([5], p. 87) implies that the theorem holds (possibly with a different value of $C_ u$) whenever the $X$'s are independent. In the same way the above theorem is implied by the generalization of the Marcinkiewicz-Zygmund result given by Burkholder ([2], Theorem 9). However, our proof is elementary. Although our choice of $C_ u$ is not the best possible, it is explicit. For the case of independent $X$'s, von Bahr ([6], p. 817) has given a bound for $E(|S_n|^ u)$ which may sometimes involve powers of $\beta_{ u n}$ higher than 1. Finally Doob ([4], Chapter V, Section 7) has treated the case when the $X$'s form a Markov chain. After proving some lemmata in Section 2, we give the proof of the theorem in Section 3. The case of exchangeable random variables is dealt with in Section 4.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of determining whether a weakly stationary sequence of dependent random variables possesses the central limit property, by which is meant that the sum of variables is suitably normal in distribution.
Abstract: Considerations on stochastic models frequently involve sums of dependent random variables (rv's). In many such cases, it is worthwhile to know if asymptotic normality holds. If so, inference might be put on a nonparametric basis, or the asymptotic properties of a test might become more easily evaluated for certain alternatives. Of particular interest, for example, is the question of when a weakly stationary sequence of rv's possesses the central limit property, by which is meant that the sum $\sum^n_1 X_i$, suitably normed, is asymptotically normal in distribution. The feeling of many experimenters that the normal approximation is valid in situations "where a stationary process has been observed during a time interval long compared to time lags for which correlation is appreciable" has been discussed by Grenander and Rosenblatt ([10]; 181). (See Section 5 for definitions of stationarity.) The general class of sequences $\{X_i\}_{-\infty}^\infty$ considered in this paper is that whose members satisfy the variance condition \begin{equation*}\tag{1.1}\operatorname{Var} (\sum^{a+n}_{a+1} X_i) \sim nA^2\text{uniformly in} a (n \rightarrow \infty) (A^2 > 0).\end{equation*} Included in this class are the weakly stationary sequences for which the covariances $r_j$ have convergent sum $\sum_1^\infty r_j$. A familiar example is a sequence of mutually orthogonal rv's having common mean and common variance. As a mathematical convenience, it shall be assumed (without loss of generality) that the sequences $\{X_i\}$ under consideration satisfy $E(X_i) \equiv 0$, for the sequences $\{X_i\}$ and $\{X_i - E(X_i)\}$ are interchangeable as far as concerns the question of asymptotic normality under the assumption (1.1). As a practical convenience, it shall be assumed for each sequence $\{X_i\}$ that the absolute central moments $E|X_i - E(X_i)|^ u$ are bounded uniformly in $i$ for some $ u > 2$ ($ u$ may depend upon the sequence). When (1.1) holds, this is a mild additional restriction and a typical criterion for verifying a Lindeberg restriction ([15]; 295). We shall therefore confine attention to sequences $\{X_i\}$ which satisfy the following basic assumptions (A): \begin{equation*}\tag{A1}E(X_i) \equiv 0,\end{equation*}\begin{equation*}\tag{A2}E(T_a^2) \sim A^2 \text{uniformly in} a (n \rightarrow \infty) (A^2 > 0),\end{equation*}\begin{equation*}\tag{A3}E|X_i|^{2+\delta} \leqq M (\text{for some} \delta > 0 \text{and} M < \infty),\end{equation*} where $T_a$ denotes the normed sum $n^{-\frac{1}{2}} \sum^{a+n}_{a+1} X_i$. Note that the formulations of (A2) and (A3) presuppose (A1). We shall say, under assumptions (A), that a sequence $\{ X_i\}$ has the central limit property (clp), or that $T_1$ is asymptotically normal (with mean zero and variance $A^2$), if \begin{equation*}\tag{1.2}P\{(nA^2)^{-\frac{1}{2}}\sum^n_1 X_i \leqq z\} \rightarrow (2\pi)^{-\frac{1}{2}} \int^z_{-\infty} e^{-\frac{1}{2}t{}^2}dt\quad (n \rightarrow \infty).\end{equation*} The assumptions (A) do not in general suffice for (1.2) to hold. (The reader is referred to Grenander and Rosenblatt ([10]; 180) for examples in which (1.2) does not hold under assumptions (A), one case being a certain strictly stationary sequence of uncorrelated rv's, another case being a certain bounded sequence of uncorrelated rv's.) It is well known, however, that in the case of independent $X_i$'s the assumptions (A) suffice for (1.2) to hold. It is desirable to know in what ways the assumption of independence may be relaxed, retaining assumptions (A), without sacrificing (1.2). Investigators have weakened considerably the moment requirements (A2) and (A3) while retaining strong restrictions on the dependence. However, in many situations of practical interest, assumptions (A) hold but neither strong dependence restrictions nor strong stationarity restrictions seem to apply. Thus it is important to have theorems which take advantage of assumptions (A) when they hold, in order to utilize conclusion (1.2) without recourse to severe additional assumptions. A basic theorem in this regard is offered in Section 4. It is unfortunate that the additional assumptions required, while relatively mild, are not particularly amenable to verification, with present theory. This difficulty is alleviated somewhat by the strong intuitive appeal of the conditions. The variety of ways in which the assumption of independence may be relaxed in itself poses a problem. It is difficult to compare the results of sundry investigations in central limit theory because of the ad hoc nature of the suppositions made in each instance. In Section 2 we explore the relationships among certain alternative dependence restrictions, some introduced in the present paper and some already in the literature. Conditions involving the moments of sums $\sum^{a+n}_{a+1} X_i$ are treated in detail in Section 3. The central limit theorems available for sums of dependent rv's embrace diverse areas of application. The results of Bernstein [2] and Loeve [14], [15] have limited applicability within the class of sequences satisfying assumptions (A). A result that is apropos is one of Hoeffding and Robbins [11] for $m$-dependent sequences (defined in Section 2). In addition to assumptions (A1) and (A3), their theorem requires that, defining $A_a^2 = E(X^2_{a+m}) + 2 \sum^m_1E(X_{a+m-j}X_{a+m},$ \begin{equation*}\tag{H}\lim_{n\rightarrow\infty} n^{-1}\sum^n_{i = 1} A^2_{a+i} = A^2 \text{exists uniformly in} a (n \rightarrow \infty).\end{equation*} Now it can be shown easily that conditions (A2) and (H) are equivalent in the case of an $m$-dependent sequence satisfying (A1) and (A3). Therefore, a formulation relevant to assumptions (A) is THEOREM 1.1 (Hoeffding-Robbins). If $\{ X_i\}$ is an $m$-dependent sequence satisfying assumptions (A), then it has the central limit property. In the case of a weakly stationary (with mean zero, say) $m$-dependent sequence, the assumptions of the theorem are satisfied except for (A3), which then is a mild additional restriction. For applications in which the existence of moments is not presupposed, e.g., strictly stationary sequences, Theorem 1.1 has been extended by Diananda [6], [7], [8] and Orey [16] in a series of results reducing the moment requirements while retaining the assumption of $m$-dependence. In the present paper the interest is in extensions relaxing the $m$-dependence assumption. A result of Ibragimov [12] in this regard implies THEOREM 1.2 (Ibragimov). If $\{ X_i\}$ is a strictly stationary sequence satisfying assumptions (A) and regularity condition (I), then it has the central limit property. (Condition (I) is defined in Section 2.) Other extensions under condition (I) but not involving stationarity assumptions are Corollary 4.1.3 and Theorem 7.2 below. See also Rosenblatt [17]. Other extensions for strictly stationary sequences, further reducing the dependence restrictions, appear in [12] and [13] and Sections 5 and 6 below. Section 2 is devoted to dependence restrictions. The restrictions (2.1), (2.2) and (2.3), later utilized in Theorem 4.1, are introduced and shown to be closely related to assumptions (A). Although conditional expectations are involved in (2.2) and (2.3), the restrictions are easily interpreted. It is found, under assumptions (A), that if (2.3) is sufficiently stringent, then (2.1) holds in a stringent form (Theorem 2.1). A link between regularity assumptions formulated in terms of joint probability distributions and those involving conditional expectations is established by Theorem 2.2 and corollaries. Implications of condition (I) are given in Theorem 2.3. Section 3 is devoted to the particular dependence restriction (2.1). Theorem 3.1 gives, under assumptions (A), a condition necessary and sufficient for (2.1) to hold in the most stringent form, (3.1). The remaining sections deal largely with central limit theorems. Section 4 obtains the basic result and its general implications. Sections 5, 6 and 7 exhibit particular results for weakly stationary sequences, sequences of martingale differences and bounded sequences. NOTATION AND CONVENTIONS. We shall denote by $\{ X_i\}^\infty_{-\infty}$ a sequence of rv's defined on a probability space. Let $\mathscr{M}_a ^b$ denote the $\sigma$-algebra generated by events of the form $\{(X_{i_1},\cdots, X_{i_k}) \varepsilon E\}$, where $a - 1 < i_1 < \cdots < i_k < b + 1$ and $E$ is a $k$-dimensional Borel set. We shall denote by $\mathscr{P}_a$ the $\sigma$-algebra $\mathscr{M}^a_{-\infty}$ of "past" events, i.e., generated by the rv's $\{ X_a, X_{a-1},\cdots\}$. Conditional expectation given a subfield $\mathscr{B}$ will be represented by $E(\cdot\mid\mathscr{B}),$ which is to be regarded as a function measurable ($\mathscr{B}$). All expectations will be assumed finite whenever expressed.


Journal ArticleDOI
TL;DR: In this article, the long-run average cost criterion of Markovian decision processes is studied with respect to the (long-run) average cost criteria, and sufficient conditions are given for the existence of an optimal rule and for it to be of stationary deterministic type.
Abstract: : Arbitrary state, finite action Markovian decision processes are studied with respect to the (long-run) average cost criterion. The problem is treated both as a limiting case of the discounted cost problem and also as a limit of the n-stage problem. Sufficient conditions are given for the existence of an optimal rule and for it to be of stationary deterministic type.

Journal ArticleDOI
TL;DR: The zonal polynomials of the positive definite real symmetric matrices, which appear in the expansions of the functions occurring in many multivariate non null distributions and moment formulae are eigenfunctions of the Laplace Beltrami operator as mentioned in this paper.
Abstract: The zonal polynomials of the positive definite real symmetric matrices, which appear in the expansions of the functions occurring in many multivariate non null distributions and moment formulae are eigenfunctions of the Laplace Beltrami operator The resulting differential equation gives a recurrence relation between the coefficients from which they can be calculated


Journal ArticleDOI
TL;DR: In this article, a nonparametric approach to the problem of testing for a shift in the level of a process occurring at an unknown time point when a fixed number of observations are drawn consecutively in time is presented.
Abstract: This work is an investigation of a nonparametric approach to the problem of testing for a shift in the level of a process occurring at an unknown time point when a fixed number of observations are drawn consecutively in time. We observe successively the independent random variables $X_1, X_2, \cdots, X_N$ which are distributed according to the continuous cdf $F_i, i = 1, 2, \cdots, N$. An upward shift in the level shall be interpreted to mean that the random variables after the change are stochastically larger than those before. Two versions of the testing problem are studied. The first deals with the case when the initial process level is known and the second when it is unknown. In the first case, we make the simplifying assumption that the distributions $F_i$ are symmetric before the shift and introduce the known initial level by saying that the point of symmetry $\gamma_0$ is known. Without loss of generality, we set $\gamma_0 = 0$. Defining a class of cdf's $\mathscr{F}_0 = \{F:F$ continuous, $F$ symmetric about origin$\}$, the problem of detecting an upward shift becomes that of testing the null hypothesis $H_0:F_0 = F_1 = \cdots = F_N,\quad\text{some}\quad F_0 \varepsilon\mathscr{F}_0,$ against the alternative $H_1:F_0 = F_1 = \cdots = F_m > F_{m + 1} = \cdots = F_N,\quad\text{some}\quad F_0 \varepsilon\mathscr{F}_0$ where $m(0 \leqq m \leqq N - 1)$ is unknown and the notation $F_m > F_{m + 1}$ indicates that $X_{m + 1}$ is stochastically larger than $X_m$. For the second situation with unknown initial level, the problem becomes that of testing the null hypothesis $H_0^\ast:F_1 = \cdots = F_N$, against the alternatives $H_1^\ast: F_1 = \cdots = F_m > F_{m + 1} = \cdots = F_N$, where $m(1 \leqq m \leqq N - 1)$ is unknown. Here the distributions are not assumed to be symmetric. The testing problem in the case of known initial level has been considered by Page [11], Chernoff and Zacks [2] and Kander and Zacks [7]. Assuming that the observations are initially from a symmetric distribution with known mean $\gamma_0$, Page proposes a test based on the variables $\operatorname{sgn} (X_i - \gamma_0)$. Chernoff and Zacks assume that the $F_i$ are normal cdf's with constant known variance and they derive a test for shift in the mean through a Bayesian argument. Their approach is extended to the one parameter exponential family of distributions by Kander and Zacks. Except for the test based on signs, all the previous work lies within the framework of a parametric statistics. The second formulation of the testing problem, the case of unknown initial level, has not been treated in detail. The only test proposed thus far is the one derived by Chenoff and Zacks for normal distributions with constant known variance. In both problems, our approach generally is to find optimal invariant tests for certain local shift alternatives and then to examine their properties. Our optimality criterion is the maximization of local average power where the average is over the space of the nuisance parameter $m$ with respect to an arbitrary weighting $\{q_i, i = 1, 2, \cdots, N: q_i \geqq 0, \sum^N_{i = 1} q_i = 1\}$. From the Bayesian viewpoint, $q_i$ may be interpreted as the prior probability that $X_i$ is the first shifted variate. Invariant tests with maximum local average power are derived for the case of known initial level in Section 2 and for the case of unknown initial level in Section 3. In both cases, the tests are distribution-free and they are unbiased for general classes of shift alternatives. They all depend upon the weight function $\{q_i\}$. With uniform weights, certain tests in Section 3 reduce to the standard tests for trend while a degenerate weight function leads to the usual two sample tests. In Section 4, we obtain the asymptotic distributions of the test statistics under local translation alternatives and investigate their Pitman efficiencies. Some small sample powers for normal alternatives are given in Section 5.

Journal ArticleDOI
TL;DR: In this article, a general method for obtaining asymptotically pointwise optimal procedures in sequential analysis when the cost of observation is constant was introduced, and the validity of this method in both estimation and testing was established for both Koopman-Darmois families and for the general case.
Abstract: In [4] we introduced a general method for obtaining asymptotically pointwise optimal procedures in sequential analysis when the cost of observation is constant The validity of this method in both estimation and testing was established in [4] for Koopman-Darmois families, and in [5] for the general case Section 2 of this paper generalizes Theorem 21 of [4] to cover essentially the case of estimation with variable cost of observation In Section 3 we show that in estimation problems, under a very weak condition, for constant cost of observation, the asymptotically pointwise optimal rules we propose are optimal in the sense of Kiefer and Sacks [9] The condition given is further investigated in the context of Bayesian sequential estimation in Section 4 and is shown to be satisfied if reasonable estimates based on the method of moments exist In Section 5 we consider the robustness of our rules under a change of prior The main result of this section is given by Theorem 51 Finally Theorem 52 deals with a generalization of Wald's [12] theory of asymptotically minimax rules and an application of that theory to the Bayesian model

Journal ArticleDOI
TL;DR: In this paper, it was shown that the asymptotic almost sure equivalence of the standardized forms of a sample quantile and the empirical distribution function at the corresponding population quantile for a stationary independent process, extended to an $m$-dependent process, not necessarily stationary.
Abstract: The usual technique of deriving the asymptotic normality of a quantile of a sample in which the random variables are all independent and identically distributed [cf. Cramer (1946), pp. 367-369] fails to provide the same result for an $m$-dependent (and possibly non-stationary) process, where the successive observations are not independent and the (marginal) distributions are not necessarily all identical. For this reason, the derivation of the asymptotic normality is approached here indirectly. It is shown that under certain mild restrictions, the asymptotic almost sure equivalence of the standardized forms of a sample quantile and the empirical distribution function at the corresponding population quantile, studied by Bahadur (1966) [see also Kiefer (1967)] for a stationary independent process, extends to an $m$-dependent process, not necessarily stationary. Conclusions about the asymptotic normality of sample quantiles then follow by utilizing this equivalence in conjunction with the asymptotic normality of the empirical distribution function under suitable restrictions. For this purpose, the results of Hoeffding (1963) and Hoeffding and Robbins (1948) are extensively used. Useful applications of the derived results are also indicated.