scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1971"


Journal ArticleDOI
TL;DR: In this paper, a problem of optimal stopping is formulated and simple rules are proposed which are asymptotically optimal in an appropriate sense, which is of central importance in quality control and also has applications in reliability theory.
Abstract: A problem of optimal stopping is formulated and simple rules are proposed which are asymptotically optimal in an appropriate sense. The problem is of central importance in quality control and also has applications in reliability theory and other areas.

1,346 citations


Journal ArticleDOI
TL;DR: In this paper, two very closely related definitions of robustness of a sequence of estimators are given which take into account the types of deviations from parametric models that occur in practice.
Abstract: Two very closely related definitions of robustness of a sequence of estimators are given which take into account the types of deviations from parametric models that occur in practice. These definitions utilize the properties of the Prokhorov distance between probability distributions. It is proved that weak $^\ast$-continuous functionals on the space of probability distributions define robust sequences of estimators (in either sense). The concept of the "breakdown point" of a sequence of estimators is defined, and some examples are given.

949 citations


Journal ArticleDOI
TL;DR: In this paper, a martingale version of the classical Lindeberg-Feller CLT for sums of independent random variables (rv's) is studied and the authors show that the convergence of a continuous stochastic process to a Gaussian process with independent increments is equivalent to convergence to the Wiener process.
Abstract: The classical Lindeberg-Feller CLT for sums of independent random variables (rv's) provides more than the convergence in distribution of the sum to a normal law. The independence of summands also guarantees the weak convergence of all finite dimensional distributions of an a.e. sample continuous stochastic process (suitably defined in terms of the partial sums) to those of a Gaussian process with independent increments, namely, the Wiener process. Moreover, the distributions of said process converge weakly to Wiener measure on $C\lbrack 0, 1\rbrack$, the latter result being known as an invariance principle, or functional CLT, an idea originating with Erdos and Kac [10] and Donsker [5], then developed by Billingsley, Prohorov, Skorohod and others. The present work contains an invariance principle for a certain class of martingales, under a martingale version of the classical Lindeberg condition. In the case of sums of independent rv's, our results reduce to the conventional invariance principle (see, for example, Parthasarathy [16]) in the setting of the classical Lindeberg-Feller CLT. Theorem 1 contains a type of martingale characteristic function convergence which is strictly analogous to the classical CLT, while Theorem 2 provides weak convergence of finite dimensional distributions to those of a Wiener process, followed by (Theorem 3) the weak convergence of corresponding induced measures on $C\lbrack 0, 1\rbrack$ to Wiener measure, thus entailing an invariance principle for martingales. Notation and results are listed in Section 2. Section 3 defines the Lindeberg condition for martingales and gives it several equivalent forms. Sections 4 and 5 contain the proofs of Theorems 1 and 2, respectively, while Theorem 3 is proved in Section 6 by use of a martingale inequality derived from an upcrossing inequality of Doob [7]. Section 7 contains brief remarks. Among the large literature on CLT's for sums of dependent rv's, mention of a martingale CLT is first made by Levy [12], [13], followed by Doob [6] page 383. Billingsley [2] and Ibragimov [11] gave a version for stationary ergodic martingales, and Csorgo [4] considered related problems. The author also knows of Dvoretsky [8]. Invariance principles for various dependent rv's were found by Billingsley [1], and in [3] for stationary ergodic martingales, the latter result being given by Rosen [17] for bounded summands. The present Theorem 3 relaxes the stationarity and ergodicity requirements of Billingsley's Theorem (23.1) of [3]. Since preparing the original version of this paper, the author's attention has been drawn to Dvoretsky [9], which announces a result strongly resembling the present Theorem 2, and to the CLT [14] and invariance principle [15] for reversed martingales, due to Lyones. The methods of Section 5 are owed to Billingsley [2] (who in turn acknowledges a "debt to Levy") and to ideas given by Dvoretsky [8]. Finally, I wish to thank Dr. C. C. Heyde, for making a vital remark, and Mr. David Scott, for helpful criticism.

688 citations


Journal ArticleDOI
TL;DR: In this paper, the authors derived thehentsov-Billingsley type fluctuation inequalities for stochastic processes whose time parameter ranges over the $q$-dimensional unit cube and established weak convergence results for such processes.
Abstract: Chentsov-Billingsley type fluctuation inequalities for stochastic processes whose time parameter ranges over the $q$-dimensional unit cube are derived and used to establish weak convergence results for such processes.

517 citations



Journal ArticleDOI
TL;DR: Raiffa and Schlaifer's theory of conjugate prior distributions is applied to Jeffrey's theory for simple normal sampling, for model I analysis of variance, and for univariate and multivariate Behrens-Fisher probelms as discussed by the authors.
Abstract: Raiffa and Schlaifer's theory of conjugate prior distributions is here applied to Jeffrey's theory of tests for a sharp hypothesis, for simple normal sampling, for model I analysis of variance, and for univariate and multivariate Behrens-Fisher probelms. Leonard J. Savage's Bayesianization of Jeffrey's theory is given with new generalizations. A new conjugate prior family for normal sampling which allows prior independence of unknown mena and variance is given.

385 citations




Journal ArticleDOI
TL;DR: In this paper, rank statistics are used to estimate the asymptotic linearity of a regression parameter vector in the multiple regression set up and the multi-normality of the derived estimates is deduced.
Abstract: 1. Summary and introduction. The present investigation is a follow up of [7] to a class of multiple regression problems, and is devoted to the construction of an estimate of regression parameter vector based on suitable rank statistics. Asymptotic linearity of these rank statistics in the multiple regression set up is established and the asymptotic multi-normality of the derived estimates is deduced. There exists the choice of the score-generating function to every basic distribution so that the asymptotic distribution of the estimates is the same as that of maximallikelihood estimates.

326 citations



Journal ArticleDOI
TL;DR: In this article, a generalization of the Skorohod metric to functions on more general spaces than $E_k$ is given, in which there are applications to genuinely discontinuous limit processes.
Abstract: The well-known space $D\lbrack 0, 1\rbrack$ is generalized to $k$ time dimensions and some properties of this space $D_k$ are derived. Then, following the "classical" lines as presented in Billingsley [1], a Skorohod-metric, tightness criteria and some other results concerning weak convergence are given. The theory is applied to prove weak convergence of two generalizations of the one-dimensional empirical process and of the Kolmogorov-Smirnov test statistic of independence. Stochastic processes with multidimensional time parameter and their weak convergence have been investigated by several authors. Dudley [4] established a theory of convergence of stochastic processes with sample functions in nonseparable metric spaces. Later on, Wichura [11] (see also Wichura [12]) modified the concepts of Dudley and developed them systematically. He applied his theory to a space which is with minor changes our space $D_k$. Weak convergence in the sense of Wichura [12] and ours differ usually, but both concepts coincide if the limit process has--with probability one--continuous sample functions only. From here it follows that the results of Dudley and Wichura concerning weak convergence of multivariate empirical processes are equivalent to ours. At least two further authors proved the convergence of multivariate empirical processes, namely LeCam [8] and Bickel [1]. Our proof follows the classical approach of Parthasarathy [9] using an argument of Kuelbs [7] to carry over the proof from 1 to $k$ dimensions. Kuelbs however deals properly with the "interpolated sum" process for two-dimensional time parameter. The space $D_k$ seems to be defined for the first time in connection with multivariate processes by Winkler [13], yet his investigations are not concerned with weak convergence. Another generalization of the space $D\lbrack 0, 1\rbrack$ and the Skorohod metric to functions on more general spaces than $E_k$ is given in the paper [10] of Straf, in which there are applications to genuinely discontinuous limit processes.

Journal ArticleDOI
TL;DR: In this paper, conditions under which maximum likelihood estimators are consistent and asymptotically normal in the case where the observations are independent but not identically distributed are established.
Abstract: Conditions are established under which maximum likelihood estimators are consistent and asymptotically normal in the case where the observations are independent but not identically distributed. The key concept employed is uniform integrability; and the required convergence theorems which involve uniform integrability, and are of independent interest, appear in the appendix. A motivational example involving estimation under variable censoring is presented. This example invokes the full generality of the theorems with regard to lack of i.i.d. and lack of densities $\operatorname{wrt}$ Lebesgue or counting measure.

Journal ArticleDOI
TL;DR: In this article, a weaker version of Bahadur's central limit theorem has been shown to be applicable to random variables which are neither independent nor identically distributed, and the proof involves fewer assumptions.
Abstract: Let $\{X_i\}$ be a sequence of independent random variables with the same distribution function $F(x) = \mathrm{P r}\{X_i \leqq x\}$. Let $F(M_p) = p, 0 x$. Bahadur (1966) has proved \begin{equation*}\tag{1}Y_{p,n} = M_p + \lbrack G_n(M_p) - (1 - p)\rbrack/F'(M_p) + R_n\end{equation*} where the remainder term $R_n$ becomes negligible as $n \rightarrow \infty$. More precisely, he has shown $R_n = O(n^{-\frac{3}{4}} \log n)$ a.s. as $n \rightarrow \infty$. The best result of this type is due to Kiefer (1967) who has calculated the exact order of $R_n$. Sen (1968) has extended Bahadur's result to random variables which are neither independent nor identically distributed. We shall give a new and much simpler proof of a weaker version of Bahadur's result which suffices for many statistical applications. Our proof involves fewer assumptions than Bahadur's. For arbitrary $p_n$ let $M_{p_n}$ be defined as $M_p + (p_n - p)/F'(M_p)$. Consider \begin{equation*}\tag{2}Y_{p_n,n} = M_{p_n} + \lbrack G_n(M_p) - (1 - p)\rbrack/F'(M_p) + R_n\end{equation*} where $Y_{p_n,n}$ is a sample $p_n$-quantile. In Section 2 we have proved the following result about $R_n$. THEOREM 1. Suppose $F'(M_p)$ exists and is strictly positive and $p_n - p = O(1/n^{\frac{1}{2}})$. Then $R_n$ as defined in (2) (and, a fortiori, $R_n$ as defined in (1) satisfies \begin{equation*}\tag{3}n^{\frac{1}{2}} R_n \rightarrow 0 \text{in probability}.\end{equation*} (After writing this paper the author discovered that the result for $p_n = p$ is stated without proof in Chernoff et al (1967).) It is easy to extend this result as in Sen (1968). An outline is sketched in one of the remarks. Once again it is possible to achieve some economy in assumptions. The representation (1) is not new. Its use in deriving the asymptotic moments of $Y_{p,n}$ goes back to Karl Pearson. See, for example, (1) in Hojo (1931). But the formulation therein is very imprecise and lacks a rigorous justification. We next consider an application of Theorem 1. Let $\bar{X}_n = (\sum^n_1 X_i)/n$ and $P_n =$ proportion of $X_i$'s above $\bar{X}_n$. David (1962) proved the asymptotic normality of $P_n$ when $F$ is a normal distribution function. Using the same elegant trick, Mustafi (1968) has proved a similar result for bivariate normal distributions. We shall extend these results considerably by providing alternative proofs based on Theorem 1, which dispense with the normality assumption on F. Moreover, in our proof we may consider--though we shall not do so for purposes of simplicity--instead of the sample mean $\bar{X}_n$ an $U$-statistic to which the central limit theorem of Hoeffding (1948) applies.

Journal ArticleDOI
TL;DR: In this article, a quadratic subspace has been proposed for the problem of determining conditions under which a complete sufficient statistic exists for a family of multivariate normal distributions, where the assumption of commutativity for certain pairs of matrices has been replaced by the weaker requirement of a Quadratic Subspace.
Abstract: Since the notion of completeness for a family of distributions was introduced by Lehmann and Scheffe [7], a problem of interest has been to determine conditions under which a complete sufficient statistic exists for a family of multivariate normal distributions. One approach to this problem, first formulated for a completely random model in some work by Graybill and Hultquist [2] and extended to a mixed linear model by Basson [1], has a basic assumption of commutativity for certain pairs of matrices. In the present paper some of the commutativity conditions and an associated eigenvalue condition assumed in the theorems on completeness in both [1] and [2] are replaced by the weaker requirement of a quadratic subspace. These subspaces, i.e., quadratic subspaces, are introduced and briefly investigated in Section 2 and are found to possess some rather interesting mathematical properties. The existence of $\bar\mathscr{A}$-best estimators (e.g., [12]) is also examined for several situations; and it is found that the usual estimators in the weighting factors for the recovery of interblock information in a balanced incomplete block design (treatments fixed and blocks random) have an optimal property when the number of treatments is equal to the number of blocks. Throughout the paper $(\mathscr{A}, (-,-))$ denotes the FDHS (finite-dimensional Hilbert space or finite-dimensional inner product space) of $n \times n$ real symmetric matrices with the trace inner product. The notation $Y \sim N_n(X\beta, \sum^m_{i=1} u_iV_i)$ means that $Y$ is an $n \times 1$ random vector distributed according to a multivariate normal distribution with expectation $X\beta$ and covariance matrix $\sum^m_{i=1} u_iV_i$; and for such a random vector the following is assumed: (a) $X$ is a known $n \times p$ matrix and $\beta$ is an unknown vector of parameters ranging over $\Omega_1 = R^p$. (b) Each $V_i(i = 1,2,\cdots, m)$ is a known $n \times n$ real symmetric matrix, $V_m = I$, and $ u = ( u_1,\cdots, u_m)'$ is a vector of unknown parameters ranging over a subset $\Omega_2$ of $R^m$. (c) The set $\Omega_2$ contains a non-void open set in $R^m$ and $\sum^m_{i=1} u_iV_i$ is a positive definite matrix for each $ u \in \Omega_2$. (d) The parameters $ u$ and $\beta$ are functionally independent so that the entire parameter space is $\Omega = \Omega_1 \times \Omega_2$. For the special case when $X = 0$ the notation $Y \sim N_n(0, \sum^m_{i=1} u_iV_i)$ is used and for this situation the parameter space $\Omega$ reduces to $\Omega_2$. The notation and terminology in the following sections is generally consistent with the usage in [12]. The adjoint of a linear operator $\mathbf{T}$ is denoted by $\mathbf{T}^\ast$ and the transpose of a matrix $A$ is denoted by A'. Additionally, the unique Moore-Penrose generalised inverse of a matrix $A$ is denoted by $A^+$, and as in [12], only real finite-dimensional linear spaces are considered.


Journal ArticleDOI
TL;DR: The Markovian decision process is a generalization of non-cooperative finite games and of two-person zero-sum stochastic games as mentioned in this paper, and the existence of equilibrium points under criteria of discounted gain and of average gain.
Abstract: We introduce a sequential competitive decision process that is a generalization of noncooperative finite games and of two-person zero-sum stochastic games (hence, of Markovian decision processes). We prove the existence of equilibrium points under criteria of discounted gain and of average gain. Two person zero-sum stochastic games and noncooperative finite games were introduced in elegant papers by Shapley [22] and Nash [16], [17]. Shapley's work prompted a series of papers [1], [4], [5], [10], [11], [12], [14], [18], [26] concerned with the existence of minimax solutions and algorithms for their computation. Even for the two-person zero-sum case, no finite algorithm yet exists. Nash's papers led to a sizeable literature in both mathematics and economics. Mills' [15] work, for example, is related to our characterization of equilibrium points in Section 4. Noncooperative stochastic games may yield fruitful models for several phenomena in the social sciences. Theories of economic markets, for example, have increasingly sought to encompass sequential economic decision processes. Some recent research in social psychology has taken an analogous direction [19], [25]. I became aware of recent work by Rogers [20] shortly after completing this paper. His results and ours nearly coincide with our Theorem 2 being slightly stronger than the comparable results in his paper. The basic difference between the papers is that Rogers relies on the Kakutani fixed point theorem whereas we use Brouwer's theorem. Our arguments are somewhat simpler as a consequence.

Journal ArticleDOI
TL;DR: In this article, a model of asymmetric contamination of a symmetric distribution is formulated, in which the amount of asymmetry tends to zero as the sample size increases and the estimators are judged by their asymptotic mean squared error, a concept which is made meaningful by the model.
Abstract: The problem of finding location estimators which are "robust" against deviations from normality has received increasing attention in the last several years. See, for example, Tukey (1960), Huber (1968), and papers cited therein. In the theoretical work done on the estimation of a location parameter, the underlying distribution is usually assumed to be symmetric, and the estimand is taken to be the center of symmetry, a natural quantity to estimate in this situation. Since the finite sample size properties of many proposed estimators are difficult to study analytically, most research has focussed on their more easily ascertainable asymptotic properties, which, it is hoped, will provide useful approximations to the finite sample size case. Most of the estimators commonly studied are, under suitable regularity conditions, asymptotically normal about the center of symmetry, with asymptotic variance depending on the underlying distribution. We thus have a simple criterion, the asymptotic variance, for comparing the performance of different estimators for a given underlying distribution, and of a given estimator for different underlying distributions. Huber (1964) has formulated and solved some minimax problems, in which the estimators are judged by their asymptotic variance. In Section 2 we define and state the asymptotic variances which have been found for the three most commonly studied types of location estimators. In Section 3 we demonstrate some relationships among the three types of estimators, and in Section 4 we show that Huber's minimax result applies to all three types. Then, in Section 5 we consider an aspect of the more general estimation problem in which the distributions are not assumed symmetric. A model of asymmetric contamination of a symmetric distribution is formulated, in which the amount of asymmetry tends to zero as the sample size increases. The estimators here are thought of as estimating the center of the symmetric component of the distribution. The maximum likelihood type estimators are shown to be asymptotically normal under this model, but with a bias that tends to zero as the sample size increases. The estimators may be judged by their asymptotic mean squared error, a concept which is made meaningful by the model. We conclude in Section 6 with a minimax result analogous to Huber's, for which we allow both symmetric and asymmetric contamination of a given distribution and judge the estimators by their asymptotic mean squared error.

Journal ArticleDOI
Abstract: Recently several authors (cf. [5], [6], [8], [9]) have established for arbitrary positive numbers $c_1,\cdots, c_k$ the inequality \begin{equation*}\tag{1}P\{|X_1| \leqq c_1,\cdots, |X_k| \leqq c_k\} \geqq \Pi^k_{i=1} P\{|X_i| \leqq c_i\}\end{equation*} valid for a random vector $X = (X_1,\cdots, X_k)$ having a multivariate normal distribution with mean values 0 and with an arbitrary covariance matrix. A question then arises whether also an analogue to (1) for multivariate Student distributions holds true, i.e. the inequality \begin{equation*}\tag{2}P\{|X_1|/S_1 \leqq c_1,\cdots, |X_k|/S_k \leqq c_k\} \geqq \Pi^k_{i=1} P\{|X_i|/S_i \leqq c_i\}\end{equation*} where $X = (X_1,\cdots, X_k)$ is as before, while $S_i = (\sum^p_{ u=1} Z^2_{i u})^{\frac{1}{2}}, i = 1,\cdots, k$, where $Z_ u = (Z_{i u},\cdots, Z_{k u}), u = 1,\cdots, p$, is a random sample of $p$ vectors, which are mutually independent and independent of $X$, and each of which has, in the simplest case, the same normal distribution as $X$. More generally, the $Z_ u$'s have some normal distributions with mean values 0 and with some covariance matrices which need not coincide with that of $X$ and even need not be identical. A certain proof of (2) was presented by A. Scott [6] but we shall give here a counterexample showing that, unfortunately, this proof is incorrect. However, if the correlations between $X_i$ and $X_j$ have the form $\lambda_i\lambda_j\rho_{ij} (i,j = 1,\cdots, k; i eq j)$ where $|\lambda_i| \leqq 1 (i = 1,\cdots, k)$ and where $\{\rho_{ij}\}$ is any fixed correlation matrix, and if the correlations between $Z_{i u}$ and $Z_{j u}$ have the form $\tau_{i u}\tau_{j u} (i,j = 1,\cdots, k; i eq j; u = 1,\cdots, p)$ where $|\tau_{i u}| < 1(i = 1,\cdots, k; u = 1,\cdots, p)$, we shall prove here that the left-hand side probability in (2) is a non-decreasing function of each $|\lambda_i|$ and each $|\tau_{i u}|$; therefore, in this case of a special correlation structure, (2) is indeed true. The general validity of (2) still remains an open question.

Journal ArticleDOI
TL;DR: In this paper, the first passage of the integrated Wiener process to 0 was determined in terms of the "$\frac{1}{2}$-winding time" distribution of H. P. McKean, Jr.
Abstract: The rate of first passage of the integrated Wiener process to $x > 0$ is determined in terms of the "$\frac{1}{2}$-winding time" distribution of H. P. McKean, Jr. The probability that the integrated Wiener process is currently at its maximum is approximated.

Journal ArticleDOI
TL;DR: In this article, convex analysis is used to generalize McCarthy's characterization of proper scoring rules, which is called proper if the expected score is maximized when the true density is chosen.
Abstract: A probability forecaster is asked to give a density $p$ of a random variable $\omega$. In return he gets a reward (or score) depending on $p$ and on a subsequently observed value of $\omega$. A scoring rule is called proper if the expected score is maximized when the true density is chosen. The present paper uses convex analysis to generalize McCarthy's characterization of proper scoring rules.



Journal ArticleDOI
TL;DR: In this article, the most accurate unbiased confidence interval procedures of level 1 -α for linear functions of both the mean and variance of a normal distribution were derived for the case that the transformation of the data is made before applying a statistical method.
Abstract: If $Y = g(X)$ is normal $(\mu, \sigma^2)$, where $g$ is a one-to-one real function and $X$ is a random variable whose expectation exists, we may write $EX = f(\mu, \sigma^2)$. The practical importance of this observation is that we often are concerned with testing hypotheses about, and constructing confidence intervals for, known functions of both the mean and variance of a normal distribution. This may happen when we use a statistical model, such as the lognormal distribution, that is related to the normal distribution by a transformation of variables. A slightly different case occurs when a transformation of data is made before applying a statistical method, such as analysis of variance or regression analysis, that involves the assumption of mormality for the transformed data. Some familiar examples in this context are $\mathrm{(i)} Y = X^{\frac{1}{2}}, EX = \mu^2 + \sigma^2$; $\mathrm{(ii)} Y = X^{\frac{1}{3}}, EX = \mu^3 + 3\mu\sigma^2$; $\mathrm{(iii)} Y = \arcsin (X^{\frac{1}{2}}), EX = \frac{1}{2}(1 - \cos(2\mu) \exp (-2\sigma^2))$; $\mathrm{(iv)} Y = \operatorname{arcsinh} (X^{\frac{1}{2}}), EX = \frac{1}{2}(\cosh(2\mu) \exp (2\sigma^2) -1)$; $\mathrm{(v)} Y = \log(X), EX = \exp (\mu + \frac{1}{2}\sigma^2)$. The theory of statistical inference in terms of $\mu = EY$ alone or $\sigma^2 = \operatorname{Var} Y$ alone is not easily extended to problems of inference in terms of $EX$ or $\operatorname{Var} X$, parametric functions of both $\mu$ and $\sigma^2$. Minimum variance unbiased estimators (MVUE's) for $EX$ and $\operatorname{Var} X$ were obtained by Finney (1941) for the case $Y = \log X$. Solutions for a much wider class of transformations were obtained by Neyman and Scott (1960) and Hoyle (1968). However there have been no analogous achievements with respect to hypothesis tests and confidence interval estimates for $EX$ and $\operatorname{Var} X$. The present paper, in which uniformly most accurate unbiased confidence interval procedures of level $1 - \alpha$ are derived for linear functions of $\mu$ and $\sigma^2$, is an approach to these problems. The results of this paper define an optimal solution for $EX$ when $Y = \log X$, since in this case the parametric function of interest is a monotone function of $\mu + \frac{1}{2}\sigma^2$. The results also provide a basis for approximate confidence interval solutions for other parametric functions of $\mu$ and $\sigma^2$. It is helpful to consider the problem in terms of confidence regions in the half-plane of points $(\mu, \sigma^2)$. For any transformation $Y = g(X)$ likely to be of practical significance, a confidence interval for $f(\mu, \sigma^2) = EX$ or $\operatorname{Var} X$ is a region in this half-plane, bounded by one or two contours of the form $f(\mu, \sigma^2) = m$. Kanofsky (1969) has proposed a method of simultaneous confidence estimation for all functions of $\mu$ and $\sigma^2$. He constructs a trapezoidal-shaped confidence region of level $1 - \alpha$ for $\mu$ and $\sigma^2$, and for an arbitrary function $h(\mu, \sigma^2)$, defines a confidence set for this function as the set of values $m$ such that the curve $h(\mu, \sigma^2) = m$ intersects this confidence region. If one is only interested in a single function, the procedure is conservative. However, for most such functions this is the only method based on exact distribution theory, to my knowledge, that has been proposed. The usual approach to confidence interval estimation for $EX$ or $\operatorname{Var} X$ has been to rely on approximate methods. For example, a common method of confidence interval estimation for $EX$ is to transform a level $1 - \alpha$ confidence interval for $EY = E(g(X))$, say $(\mu_1, \mu_2)$, by the inverse transform. Then $(g^{-1}(\mu_1), g^{-1}(\mu_2))$ would be an approximate level $1 - \alpha$ confidence interval for $EX$ if $g$ is monotone increasing. More sophisticated versions of this method have been proposed by Patterson (1966) and Hoyle (1968). A more direct approach is to use an estimator $T$ of $f(\mu, \sigma^2)$ and an estimator $V$ of the variance of $T. T$ is then assumed to be approximately normally distributed with mean $(f(\mu, \sigma^2)$ and variance equal to the observed value of $V$. For example, the sample mean $\bar{X}$ is an estimate of $EX$, and $S_X^2/(n(n - 1)) = \sum(X_i - \bar{X})^2/(n(n - 1))$ is an estimate of the variance of $\bar{X}$ (e.g., see Aitchison and Brown (1957) Section 5.62). Hoyle (1968) has suggested letting $T$ be the MVUE of $EX$, and $V$ the MVUE of $\operatorname{Var} T$, which he has given for a number of transformations. In this paper an optimal exact confidence interval procedure is presented for linear functions of $\mu$ and $\sigma^2$. That is, the procedure gives uniformly most accurate unbiased joint confidence regions of level $1 - \alpha$ for $\mu$ and $\sigma^2$, bounded by one or two contours of form $\mu + \lambda\sigma^2 = m$, for arbitrary $\lambda$. This provides an optimal confidence interval procedure for $EX$ when $Y = \log (X)$ is normal. Also it provides the basis for a new approximate confidence interval method for $EX$ in the general case $Y = g(X)$. That is, by a proper choice of the linear coefficient $\lambda$, it seems reasonable that a confidence region bounded by one or two contours of form $f(\mu, \sigma^2) = m$ might be approximated with some success by a confidence region bounded by contours of the form $\mu + \lambda\sigma^2 = m$. Certainly the degree of approximation possible should be better than that obtainable using only vertical bounding contours, as when a confidence interval for $\mu$ is transformed to give an approximate confidence interval for $EX$. Also, if the contours $f(\mu, \sigma^2) = m$ are fairly straight within a convex joint confidence region of level $1 - \alpha$ for $\mu$ and $\sigma^2$, it is not unreasonable to hope that an approximate confidence region should be possible that would have a true level near $1 - \alpha$, and that would be less conservative than a level $1 - \alpha$ region for $f(\mu, \sigma^2)$ determined by Kanofsky's method. The main result of the paper is the derivation in Section 2 of uniformly most powerful unbiased level $\alpha$ hypothesis tests for linear functions of $\mu$ and $\sigma^2$. The theoretical interest of this section is mainly in the analytic detail of how a well-known theorem applies to this somewhat unusual case. A numerical example follows, illustrating the use of the tables of critical values given in the Appendix. It is not obvious that the confidence procedures defined by these tests in Section 4 define confidence sets that are intervals, an extremely desirable property both for ease of calculation and for practical usefulness of the confidence sets. The proof in Section 5 that the one-sided tests define one-sided confidence intervals provided that $v$, the number of degrees of freedom available for the estimate of $\sigma^2$, is at least two, is the second major result of the paper. In Section 6 it is shown that this property does not obtain when $v = 1$. The analogous result in the two-sided case is proved only for $v = 2$ in Section 7. However it is conjectured that, as in the one-sided case, the desired property also holds for all larger values of $v$. The final section contains a brief discussion of applications of the method to confidence interval estimation for $EX$ when $Y = g(X)$ is normal. It is shown that essentially the only direct application is to the case where $Y = \log(X)$, and that there are no nontrivial direct applications where $EX$ is a function of $\mu$ or $\sigma^2$ alone. The construction of normal tolerance limits involves confidence interval estimation of functions of the form $\mu + \delta\sigma$ (Owen (1958)). However it is shown here that there are essentially no transformations to normality such that $EX$ is a function of $\mu + \delta\sigma$ for some $\delta$. A more complete discussion of approximate applications of the method is left for a subsequent paper.


Journal ArticleDOI
TL;DR: In this article, it was shown that functional central limit theorems (invariance principles) for N(t) are equivalent to corresponding statements for the sequence of partial sums of the u sub n's.
Abstract: : Let (u sub n, n = or > 1) be a sequence of nonnegative random variables, not necessarily independent or identically distributed, with an associated counting process (N(t), t = or >), defined by N(t) = max (k: u sub 1 + ... + u sub k = or t. It is shown that functional central limit theorems (invariance principles) for N(t) are equivalent to corresponding statements for the sequence of partial sums of the u sub n's. (Author)

Journal ArticleDOI
TL;DR: In this article, the first passage probability for the Gaussian process with mean zero and covariance was known for any integer T = n and the determinant is of size n + 1) \times (n + 1), 0 < i, j \leqq n, with the integral being an $n$-fold integral on $y_2, \cdots, y_n+1}$ over the region given by the region.
Abstract: We find an explicit formula for the first passage probability, $Q_a(T | x) = P_r(S(t) 0$, where $S$ is the Gaussian process with mean zero and covariance $ES(\tau)S(t) = \max (1 - |t - \tau|, 0)$. Previously, $Q_a(T\mid x)$ was known only for $T \leqq 1$. In particular for $T = n$ an integer and $-\infty < x < a < \infty$, $Q_a(T \mid x) = \frac{1}{\varphi(x)} \int_D \cdots \int \det \varphi(y_i - y_{j+1} + a) dy_2 \cdots dy_{n+1},$ where the integral is an $n$-fold integral on $y_2, \cdots, y_{n+1}$ over the region $D$ given by $D = \{a - x < y_2 < y_1 < \cdots < y_{n+1}\}$ and the determinant is of size $(n + 1) \times (n + 1), 0 < i, j \leqq n$, with $y_0 \equiv 0, y_1 \equiv a - x$.

Journal ArticleDOI
TL;DR: In this article, the authors give generalizations and minor extensions of known results in linear model theory utilizing both the coordinate-free approach of Kruskal and the usual parametric representations.
Abstract: Consideration is given to minimum variance unbiased estimation when the choice of estimators is restricted to a finite-dimensional linear space. The discussion gives generalizations and minor extensions of known results in linear model theory utilizing both the coordinate-free approach of Kruskal and the usual parametric representations. Included are (i) a restatement of a theorem on minimum variance unbiased estimation by Lehmann and Scheffe; (ii) a minor extension of a theorem by Zyskind on best linear unbiased estimation; (iii) a generalization of the covariance adjustment procedure described by Rao; (iv) a generalization of the normal equations; and (v) criteria for existence of minimum variance unbiased estimators by means of invariant subspaces. Illustrative examples are included.

Journal ArticleDOI
TL;DR: In this paper, the authors consider two procedures for estimating the center of a symmetric distribution, which use the observations themselves to choose the form of the estimator, and show that these procedures are asymptotically as good as knowing beforehand which estimator in the family is best for the given distribution, and using that estimator.
Abstract: This paper considers two procedures for estimating the center of a symmetric distribution, which use the observations themselves to choose the form of the estimator. Both procedures begin with a family of possible estimators. We use the observations to estimate the asymptotic variance of each member of the family of estimators. We then choose the estimator in the family with smallest estimated asymptotic variance and use the value given by that estimator as the location estimate. These procedures are shown to be asymptotically as good as knowing beforehand which estimator in the family is best for the given distribution, and using that estimator.

Journal ArticleDOI
TL;DR: In this article, the characteristic roots of the information matrix of a balanced fractional factorial design $T$ are obtained, when the parameters to be estimated include the general mean, the main effect, and the two-factor interaction, the remaining effects being assumed negligible.
Abstract: The characteristic roots of the information matrix $(M)_T$ of a balanced $2^m$ fractional factorial design $T$ are obtained, when the parameters to be estimated include the general mean $\mu$, the main effect $A_i$, and the two-factor interaction $A_iA_j$ (briefly, $A_{ij}$), the remaining effects being assumed negligible. (If $(M)_T$ is nonsingular, $T$ is a design of resolution $V$.) It is well known that $T$ depends on five nonnegative integers $(\mu_0, \mu_1, \mu_2, \mu_3, \mu_4)$, called its "index set." In Srivastava (1970), the special case when $\mu_0 = \mu_4$ and $\mu_1 = \mu_3$ was considered; in this paper, the theory is presented for the general case. As a by-product of this work, we obtain a class of useful necessary conditions on the set $(\mu_0, \mu_1, \mu_2, \mu_3, \mu_4)$ such that a design $T$ with this index set may (combinatorially) exist. If $(M)_T$ is nonsingular, and $(V)_T = \lbrack(M)_T\rbrack^{-1}$, an explicit expression (as a function of the $\mu_i$) has been obtained for $\operatorname{tr}(V)_T$; similar expressions for $|(V)_T|$ and $\operatorname{ch}_{\max}(V)_T$ can be easily written down using our results. One reason why $\operatorname{tr}(V)_T$ (rather than the other two criteria) should be used for comparing balanced resolution $V$ fractions is given. Finally, it is shown (through an example of a previously unknown design with resolution $V m = 7$) that for a given $N$ (the number of runs), an (existing) optimal balanced design (optimal with respect to, say, the trace criterion) does not necessarily satisfy the restriction $(\mu_0 = \mu_4$ and $\mu_1 = \mu_3$), and may be distinct from the design which is optimal in the restricted class. (Scores of other such examples may be found in Srivastava and Chopra (1970a), where the results of this paper are used in a basic manner.) Thus the need for considering designs with general index sets (which is accomplished in the present paper) becomes obvious.