scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Mathematical Statistics in 1970"



Journal ArticleDOI
TL;DR: In this paper, the authors presented classes of prior distributions for which the Bayes' estimate of an unknown function given certain observations is a spline function, where the spline is defined as the sum of the prior distributions of all the observations.
Abstract: : The report presents classes of prior distributions for which the Bayes' estimate of an unknown function given certain observations is a spline function. (Author)

998 citations


Journal ArticleDOI
TL;DR: In this article, it is shown how to obtain convergence to a $D$-optimum measure by successively adding points to a given initial experimental design, which correspond to points of maximum variance of the usual least squares estimate of the response mean for the particular regression model at each stage.
Abstract: It is possible to obtain convergence to a $D$-optimum measure, as defined by Kiefer and Wolfowitz, by successively adding points to a given initial experimental design. The points added correspond to points of maximum variance of the usual least squares estimate of the response mean for the particular regression model at each stage. A new bound is given for the generalized variances involved and an example is worked out.

454 citations


Journal ArticleDOI
TL;DR: In this paper, a geometrical description of the procedure for two-way contingency tables is presented, which leads to a simple proof of the convergence of the iterative procedure.
Abstract: Deming and Stephan (1940) first proposed the use of an iterative proportional fitting procedure to estimate cell probabilities in a contingency table subject to certain marginal constraints. In this paper we first relate this procedure to a variety of sources and a variety of statistical problems. We then describe the procedure geometrically for two-way contingency tables using the concepts presented in Fienberg (1968). This geometrical description leads to a rather simple proof of the convergence of the iterative procedure. We conclude the paper with a discussion of extensions to multi-dimensional tables and to tables with some zero entries.

361 citations


Journal ArticleDOI
TL;DR: In this paper, a family of estimators, each of which dominates the "usual" one, is given for the problem of simultaneously estimating means of three or more independent normal random variables which have a common unknown variance.
Abstract: A family of estimators, each of which dominates the "usual" one, is given for the problem of simultaneously estimating means of three or more independent normal random variables which have a common unknown variance. Charles Stein [4] established the existence of such estimators (for the case of a known variance) and later, with James [3], exhibited some, both for the case of unknown common variances considered here and for other cases as well. Alam and Thompson [1] have also obtained estimators which dominate the usual one. The class of estimators given in this paper contains those of James and Stein and also those of Alam and Thompson.

295 citations


Journal ArticleDOI
TL;DR: In this article, a method for obtaining probability inequalities and related limit theorems concerning the behavior of the entire sequence of random variables with a specified joint probability distribution is given. But the method is not suitable for the case of the random variables in the case where the distribution of the variables is fixed.
Abstract: 1 Extension and applications of an inequality of Ville and Wald Let x 1… be a sequence of random variables with a specified joint probability distribution P We shall give a method for obtaining probability inequalities and related limit theorems concerning the behavior of the entire sequence of x’s

254 citations


Journal ArticleDOI
TL;DR: In this paper, the authors derived the erodic definition of the maximum of height u and analyzed the sample functions near local maxima of u, especially as u → ∞, and mainly used similar methods as [4] and [11].
Abstract: Consider a stationary normal process ξ(t) with mean zero and the covariance function r(t). Properties of the sample functions in the neighborhood of zeros, upcrossings of very high levels, etc. have been studied by, among others, Kac and Slepian, 1959 [4] and Slepian, 1962 [11]. In this paper we shall study the sample functions near local maxima of height u, especially as u → -∞, and mainly use similar methods as [4] and [11]. Then it is necessary to analyse carefully what is meant by "near a maximum of height u." In Section 2 we derive the "ergodic" definition, i.e. the definition which is possible to interpret by the aid of relative frequencies in a single realisation. This definition has been treated previously by Leadbetter, 1966 [5], and it turns out to be related to Kac and Slepian's horizontal window definition. In Section 3 we give a representation of ξ(t) near a maximum as the difference between a non-stationary normal process and a deterministic process, and in Section 4 we examine these processes as u → -∞. We have then to distinguish between two cases. A: Regular case. r(t) = 1 -λ2t2/2 + λ4 t4/4! - λ6 t6/6! + o(t6) as t → 0, where the positive λ2k are the spectral moments. Then it is proved that if ξ(t) has a maximum of height u at t = 0 then, as u → -∞, egin{align*} (lambda_2lambda_6 - lambda_4^2)(lambda_4 - lambda_2^2)^{-1}{xi((lambda_2lambda_6 - lambda_4^2)^{-frac{1}{2}}(lambda_4 - lambda_2^2)^{frac{1}{2}}t|u|^{-1}) - u} \ sim |u|^{-3}{t^4/4! + omega(lambda_4 - lambda_2^2)^{frac{1}{2}}lambda_2^ {-frac{1}{2}}t^3/3! - zeta(lambda_4 - lambda_2^2)lambda_2 ^{-1}t^2/2}end{align*} where ω and ζ are independent random variables (rv), ω has a standard normal distribution and ζ has the density z exp (-z), z > 0 . Thus, in the neighborhood of a very low maximum the sample functions are fourth degree polynomials with positive t4-term, symmetrically distributed t3-term, and a negatively distributed t2-term but without t-term. B: Irregular case. r(t) = 1 - λ2t2/2 + λ4t4/4! - λ5|t|5/5! + o(t5) as t → 0, where λ5 > 0 . Now ξ(tu-2) - u ∼ |u|-5{λ2λ5(λ4 - λ22)-1 |t|3/3! + (2λ5)1/2 ω(t) - ζ(λ4 - λ22)λ2 -1t2/2} where ω(t) is a non-stationary normal process whose second derivative is a Wiener process, independent of ζ which has the density z exp (-z), z > 0 . The term λ5|t|5/5! "disturbs" the process in such a way that the order of the distance which can be surveyed is reduced from 1/|u| (in Case A) to 1/|u|2. The results are used in Section 5 to examine the distribution of the wave-length and the crest-to-trough wave-height, i.e., the amplitude, discussed by, among others, Cartwright and Longuet-Higgins, 1956 [1]. One hypothesis, sometimes found in the literature, [10], states that the amplitude has a Rayleigh distribution and is independent of the mean level. According to this hypothesis the amplitude is of the order 1/|u| as u → -∞ while the results of this paper show that it is of the order 1/|u|3. (Less)

246 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that any finite functional of Brownian motion can be represented as a stochastic integral, where the integrand has the form of conditional expectations of the differential.
Abstract: It is known that any functional of Brownian motion with finite second moment can be expressed as the sum of a constant and an Ito stochastic integral. It is also known that homogeneous additive functionals of Brownian motion with finite expectations have a similar representation. This paper extends these results in several ways. It is shown that any finite functional of Brownian motion can be represented as a stochastic integral. This representation is not unique, but if the functional has a finite expectation it does have a unique representation as a constant plus a stochastic integral in which the process of indefinite integrals is a martingale. A corollary of this result is that any martingale (on a closed interval) that is measurable with respect to the increasing family of $\sigma$-fields generated by a Brownian motion is equal to a constant plus an indefinite stochastic integral. Sufficiently well-behaved Frechet-differentiable functionals have an explicit representation as a stochastic integral in which the integrand has the form of conditional expectations of the differential.

228 citations



Journal ArticleDOI
TL;DR: In this article, an upper bound on the mean of the excess, uniform in t, when a random walk is stopped is given, where t is the time when the walk is over a threshold.
Abstract: A random walk, {Sn}∞n = 0 , having positive drift and starting at the origin, is stopped the first time Sn > t ≧ 0. The present paper studies the "excess," Sn - t, when the walk is stopped. The main result is an upper bound on the mean of the excess, uniform in t. Through Wald's equation, this gives an upper bound on the mean stopping time, as well as upper bounds on the average sample numbers of sequential probability ratio tests. The same elementary approach yields simple upper bounds on the moments and tail probabilities of residual and spent waiting times of renewal processes.

222 citations


Journal ArticleDOI
TL;DR: The Bayesian theory for testing a sharp hypothesis, defined by fixed values of parameters, is presented in general terms in this article, where an arbitrary positive prior probability is attached to the hypothesis and the ratio of posterior to prior odds for the hypothesis is given by the weighted likelihood ratio.
Abstract: The Bayesian theory for testing a sharp hypothesis, defined by fixed values of parameters, is here presented in general terms Arbitrary positive prior probability is attached to the hypothesis The ratio of posterior to prior odds for the hypothesis is given by the weighted likelihood ratio, shown here to equal Leonard J Savage's (1963) ratio of a posterior to a prior density (221) This Bayesian approach to hypothesis testing was suggested by Jeffreys (1948), Savage (1959), (1961), Lindley (1961), and Good (1950), (1965), but obscured some what by approximations and unique choices of prior distributions This Bayesian theory is distinct from that of Lindley (1965) and that of Dickey (1967a) Applications are given to hypotheses about multinomial means, for example, equality of two binomial probabilities A new test is presented for the order of a finite-state Markov chain

Journal ArticleDOI
TL;DR: In this paper, an extension of the investigation of Johnson (1967b) is made by giving a larger class of posterior distributions which possess asymptotic expansions having a normal distribution as a leading term.
Abstract: In this paper, an extension of the investigation of Johnson (1967b) is made by giving a larger class of posterior distributions which possess asymptotic expansions having a normal distribution as a leading term. Asymptotic expansions for the related normalizing transformation and percentiles are also presented. Before asymptotic expansions were treated rigorously, LaPlace (1847) gave an expansion for certain posterior distributions. The method used in this paper is a variation of his technique. Bernstein (1934), page 406, and von Mises (1964), chapter VIII, Section C, also treat special cases of these expansions. The conditions imposed are sufficient to make the maximum likelihood estimate strongly consistent and asymptotically normal. They also include higher order derivative assumptions on the log of the likelihood. As shown by Schwartz (1966), the posterior distribution may behave well even when the maximum likelihood estimate does not. However, we have not attempted to find the weakest assumptions under which the posterior distribution has an expansion. For general conditions under which the posterior distribution converges in variation to a normal distribution with probability one see LeCam (1953) and (1958) for the independent case and Kallianpur and Borwanker (1968) for Markov processes. In Section 2, we show that with probability one, the centered and scaled posterior distribution possesses an asymptotic expansion in powers of $n^{-\frac{1}{2}}$ having the standard normal as a leading term. The number of terms in the expansion obtained is two less than the number of continuous derivatives of the log likelihood. All terms beyond the first consist of a polynomial multiplied by the standard normal density. The coefficients of the polynomial depend on the prior density $\rho$ and the likelihood. The moments of the posterior distribution are shown to possess an expansion in Section 3. The following two sections present the normalizing transformation and percentile expansions. These last three expansions also apply for the case considered by Johnson (1967b) as does the information on the form of the terms in the expansion of the posterior distribution. To simplify the already heavy notation, these results are first proved for independent identically distributed random variables. The extension of all these results to the case of certain stationary ergodic Markov processes is immediate; Section 6 presents the necessary modifications. Throughout this paper, $\Phi$ and $\varphi$ will denote the standard normal cdf and pdf respectively. Also, $\mathbf{n}$ will be assumed to range over the positive integers; thus in some cases, the order of the error term in the expansion may be kept for smaller $n$ if the bounding constant is modified.

Journal ArticleDOI
TL;DR: In this article, the authors give sufficient conditions for the least squares estimates to be consistent in the case of nonlinear regression, i.e., without the assumption of linearity of g with respect to the parameters.
Abstract: : This paper gives alternative sufficient conditions for the least squares estimates to be consistent in the case of nonlinear regression, i.e., without the assumption of linearity of g with respect to the parameters.


Journal ArticleDOI
TL;DR: In this paper, the design and performance of optimal finite-memory systems for the two-hypothesis testing problem with probability of error loss criterion was studied. But the problem was not studied in this paper.
Abstract: This paper develops the theory of the design and performance of optimal finite-memory systems for the two-hypothesis testing problem. Let $X_1, X_2, \cdots$ be a sequence of independent identically distributed random variables drawn according to a probability measure $\mathscr{P}$. Consider the standard two-hypothesis testing problem with probability of error loss criterion in which $\mathscr{P} = \mathscr{P}_0$ with probability $\pi_0$; and $\mathscr{P} = \mathscr{P}_1$ with probability $\pi_1$. Let the data be summarized after each new observation by an $m$-valued statistic $T\in\{ 1, 2, \cdots, m\}$ which is updated according to the rule $T_n = f(T_{n-1}, X_n),$ where $f$ is a (perhaps randomized) time-invariant function. Let $d:\{ 1, 2,\cdots, m\} \rightarrow\{ H_0, H_1\}$ be a fixed decision function taking action $d(T_n)$ at time $n$, and let $P_e(f,d)$ be the long-run probability of error of the algorithm $(f, d)$ as the number of trials $n\rightarrow\infty$. Define $P^\ast = \inf_{(f,d)}P_e(f, d)$. Let the a.e. maximum and minimum likelihood ratios be defined by $\bar{l} = \sup(\mathscr{P}_0(A)/\mathscr{P}_1(A))$ and $\underline{l} = \inf(\mathscr{P}_0(A)/\mathscr{P}_1(A))$ where the supremum and infimum are taken over all measurable sets $A$ for which $\mathscr{P}_0(A) + \mathscr{P}_1(A) > 0$. Define $\gamma = \bar{l}/\underline{l}$. It will be shown that $P^\ast = \lbrack 2(\pi_0\pi_1\gamma^{m-1})^{\frac{1}{2}} - 1\rbrack/(\gamma^{m-1} - 1)$, under the nondegeneracy condition $\gamma^{m-1} \geqq \max\{\pi_0/\pi_1, \pi_1/\pi_0\}$; and a simple family of $\varepsilon$-optimal $(f, d)$'s will be exhibited. In general, an optimal $(f, d)$ does not exist; and $\varepsilon$-optimal algorithms involve randomization in $f$.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the probability that S n is the nth partial sum of any sequence x 1, x 2, x 3 of independent and identically distributed (i.i.d.) random variables with mean 0 and variance 1.
Abstract: 1. Introduction and summary. Let W(t) denote a standard Wiener process for 0 ≦ t 0 (or for some t > 0) for a certain class of functions g(t), including functions which are ~ (2t log log t)½ as y → ∞. We also prove an invariance theorem which states that this probability is the limit as m → ∞ of the probability that S n ≦m ½ g(n/m) for some n ≦ τm (or for some n ≦ 1), where S n is the nth partial sum of any sequence x 1, x 2, … of independent and identically distributed (i.i.d.) random variables with mean 0 and variance 1.

Journal ArticleDOI
TL;DR: In this article, the inequality due to Rademacher-Mensov for orthogonal $X_i$'s is generalized to other types of dependent Rv's.
Abstract: Assume $E(X_i) \equiv 0$. For $ u \geqq 2$, bounds on the $ u$th moment of $\max_{1 \leqq k \leqq n}|\sum^{a + k}_{a + 1} X_i|$ are deduced from assumed bounds on the $ u$th moment of $|\sum^{a + n}_{a + 1} X_i|$. The inequality due to Rademacher-Mensov for $ u = 2$ and orthogonal $X_i$'s is generalized to $ u \geqq 2$ and other types of dependent $\operatorname{rv's}.$ In the case $ u > 2$, a second result is obtained which is considerably stronger than the first for asymptotic applications.



Book ChapterDOI
TL;DR: In this paper, the authors established the following theorem on departure from normality in martingale martingales: if X 0, X 1, X 2, X 3, X 4, X 5, X 6, X 7, X 8, X 9, X 10, X 11, X 12, X 13, X 14, X 15, X 16, X 17, X 18, X 19, X 20, X 21, X 22, X 23, X 24, X 25, X 26, X 27, X 28, X 30, X
Abstract: Let {X n F n ,n = 0, 1, 2, …} be a martingale with X 0 = 0 a.s., \(X_n = \sum {_{i = 1}^n Y_i,n\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{ \geqslant } 1} \), n ≧ 1, and F n the σ-field generated by X 0, X 1, …, X n . Write $$\begin{array}{lr} \sigma^{2}_{n} = E(Y^2_n | \mathcal{F}_{n-1}), & S^{2}_{n} = \sum olimits^{n}_{i=1} E\sigma^2_i \end{array}$$ and suppose that there is a constant δ, with \(0 < \delta \leqq 1\), such that \(E|Y_n|^{2+2\delta} < \infty, n = 1, 2, \cdots \). It is the object of this paper to establish the following theorem on departure from normality.

Book ChapterDOI
TL;DR: In this article, a supercritical Galton-Watson process whose non-degenerate offspring distribution has a probability generating function is studied and the main purpose of this paper is to establish the following theorem which givesw an ultimate form if the limit result for the case in question.
Abstract: Let Z 0 = 1, Z 1, Z 2... denote a super-critical Galton–Watson process whose non-degenerate offspring distribution has probability generating function \(F\left(s\right)=\sum olimits^{\infty}_{{j}=0}s^j \mathbf{P}{\rm r}\left(\mathbf{Z}_1=j\right), 0\leqq s \leqq 1,\) where 1 ∞. The Galton–Watson process evolves in such that the generating function F n(S) of Z n is the nth functional iterate of F(S) and, for the super-critical case in question, the probability of extinction of the process, q, is well know to be the unique real number in [0, 1) satisfying F(q) = q. It is the main purpose of this paper to establish the following theorem which givesw an ultimate form if the limit result for the case in question.

Journal ArticleDOI
TL;DR: In this article, it was shown that the sample function has an approximate derivative of infinite magnitude at each point (and so is nowhere differentiable); and that the set of values in the range of at most countable multiplicity is nowhere dense in the ranges.
Abstract: Let $X(t), 0 \leqq t \leqq 1$, be separable measurable Gaussian process with mean 0, stationary increments, and $\sigma^2(t) = E(X(t) - X(0))^2$. If $\sigma^2(t) \sim C|t|^\alpha, t \rightarrow 0$, for some $\alpha, 0 < \alpha < 2$, then the Hausdorff dimension of $\{s: X(t) = X(s)\}$ is equal to $1 - (\alpha/2)$ for almost all $t$, almost surely. Under further variations and refinements of this condition there is a jointly continuous local time for almost every sample function. This extends the author's previous results for stationary Gaussian processes and for continuity in the space variable alone. The result on joint continuity of the local time is used to prove that the sample function has an "approximate derivative" of infinite magnitude at each point (and so is nowhere differentiable); and that the set of values in the range of at most countable multiplicity is nowhere dense in the range.




Journal ArticleDOI
TL;DR: In this paper, the authors consider the problem of obtaining unbiased estimators for parametric functions of the form (i.e., functions with covariance matrix and variance component models).
Abstract: Exemplification of the theory developed in [9] using a linear space of random variables other than linear combinations of the components of a random vector, and unbiased estimation for the parameters of a mixed linear model using quadratic estimators are the primary reasons for the considerations in this paper. For a random vector $Y$ with expectation $X\beta$ and covariance matrix $\sum_i u_iV_i$ ($ u_1, \cdots, u_m$, and $\beta$ denote the parameters), interest centers upon quadratic estimability for parametric functions of the form $\sum_{i\leqq j}\gamma_{ij}\beta_i\beta_j + \sum_k\gamma_k u k$ and procedures for obtaining quadratic estimators for such parametric functions. Special emphasis is given to parametric functions of the form $\sum_k\gamma_k u_k$. Unbiased estimation of variance components is the main reason for quadratic estimability considerations regarding parametric functions of the form $\sum_k\gamma_k u_k$. Concerning variance component models, Airy, in 1861 (Scheffe [6]), appears to have been the first to introduce a model with more than one source of variation. Such a model is also implied (Scheffe [6]) by Chauvenet in 1863. Fisher [1], [2] reintroduced variance component models and discussed, apparently for the first time, unbiased estimation in such models. Since Fisher's introduction and discussion of unbiased estimation in models with more than one source of variation, there has been considerable literature published on the subject. One of these papers is a description by Henderson [5] which popularized three methods (now known as Henderson's Methods I, II, and III) for obtaining unbiased estimates of variance components. We mention these methods since they seem to be commonly used in the estimation of variance components. For a review as well as a matrix formulation of the methods see Searle [7]. Among the several pieces of work which have dealt with Henderson's methods, only that of Harville [4] seems to have been concerned with consistency of the equations leading to the estimators and to the existence of unbiased (quadratic) estimators under various conditions. Harville, however, only treats a completely random two-way classification model with interaction. One other result which deals with existence of unbiased quadratic estimators in a completely random model is given by Graybill and Hultquist [3]. In Section 2 the form we assume for a mixed linear model is introduced and the pertinent quantiles needed for the application of the results in [9] are obtained. Definitions, terminology, and notation are consistent with the usage in [9]. Section 3 considers parametric functions of the form $\sum_{i\leqq j}\gamma_{ij}\beta_i\beta_j + \sum_k\gamma_k u_k$ and Section 4 concerns parametric functions of the form $\sum_k\gamma_k u_k$. One particular method for obtaining unbiased estimators for linear combinations of variance components is given in Section 4 that is computationally simpler than the Henderson Method III procedure which is the most widely used general approach applicable to any mixed linear model. The method described in Section 4 has the added advantage of giving necessary and sufficient conditions for the existence of unbiased quadratic estimators which is not always the case with the Henderson Method III. In the last section an example is given which illustrates the Henderson Method III procedure from the viewpoint of this paper.

Journal ArticleDOI
TL;DR: In this article, the problem of estimating the direction and unoriented direction of a vector is reformulated to permit solution by proper confidence sets, and the results show that if only multiple testing is desired, a modified S-method is more powerful.
Abstract: 0. Summary. The "S-method" of multiple comparison ([5]; [6], Section 3.5) was intended for multiple estimation, possibly combined with multiple testing. It is shown that if only multiple testing is desired a certain "modified S-method" is more powerful. While this result is of some theoretical interest, it is recommended after a discussion of the relative advantages of the two methods, that the new one generally not be used in applications. The multiple testing problems considered are related to estimating the direction of a vector or its unoriented direction-estimation problems which also have an inherent interest. A confidence set for a parameter point is called improper if the probability that it gives a trivially true statement is positive. The problems of estimating the direction and unoriented direction of a vector are reformulated to permit solution by proper confidence sets. In the case of the unoriented direction of a q-dimensional vector the confidence sets yield solutions of the problem of joint estimation of q -1 ratios and the problem of multiple estimation of all ratios in a certain infinite set. Specializing to the case q = 2 yields a proper confidence set as a substitute for Fieller's improper confidence set for a ratio. 1. Introduction. The reader interested only in Fieller's problem of estimating a ratio may proceed directly to the discussion following the Corollary near the end of Section 5. The reader not interested in multiple testing but in the estimation of directions and ratios may read through the sentence containing equation (3) and then skip to Section 3. We use the term "testing" to include the trichotomous procedure where if a hypothesis 0 = 0 is rejected by a two-tailed test we decide on one of the alternatives 0 > 0 or 0 < 0. "Estimation" refers to estimation by confidence intervals or other confidence sets. The problems will be treated under the underlying assumptions Q usually made in the analysis of variance,

Journal ArticleDOI
TL;DR: In this paper, a multivariate IHR distribution with a distribution function F(xI, x2,, xn) = P[X1 xl,..* X, > xn j X1 > xI', * * X,, > Xn'] is shown to be nondecreasing in x1', **, Xn' for every choice of xI, * * x n.
Abstract: 2. Multivariate IHR Distributions. Consider the randoih vector (X,, X2, , Xj) with distribution function F(xI, x2, , xn) = P[X1 xl, ..* X, > xn j X1 > xI', * * X,, > xn'] is nondecreasing in x1', * * , xn' for every choice of xI, * * x n. This generalises some notions of dependence that were studied by Lehmann [4] and Esary and Proschan [3]. Setting F(x,, X2, . * * Xn) = P[XI > XI, X2 > X2, , Xn, > Xn] we have: