scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1978"


Journal ArticleDOI
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

38,681 citations


Journal ArticleDOI
TL;DR: In this article, the authors make clear the role of mechanisms that sample experimental units, assign treatments and record data, and that unless these mechanisms are ignorable, the Bayesian must model them in the data analysis and confront inferences for causal effects that are sensitive to the specification of the prior distribution of the data.
Abstract: Causal effects are comparisons among values that would have been observed under all possible assignments of treatments to experimental units. In an experiment, one assignment of treatments is chosen and only the values under that assignment can be observed. Bayesian inference for causal effects follows from finding the predictive distribution of the values under the other assignments of treatments. This perspective makes clear the role of mechanisms that sample experimental units, assign treatments and record data. Unless these mechanisms are ignorable (known probabilistic functions of recorded values), the Bayesian must model them in the data analysis and, consequently, confront inferences for causal effects that are sensitive to the specification of the prior distribution of the data. Moreover, not all ignorable mechanisms can yield data from which inferences for causal effects are insensitive to prior specifications. Classical randomized designs stand out as especially appealing assignment mechanisms designed to make inference for causal effects straightforward by limiting the sensitivity of a valid Bayesian analysis.

2,430 citations


Journal ArticleDOI
TL;DR: In this article, the authors give an application of the recently developed martingale-based approach to the study of multivariate counting processes via $\mathbf{N}$ via ''mathbf{\Lambda''.
Abstract: Let $\mathbf{B} = (N_1, \cdots, N_k)$ be a multivariate counting process and let $\mathscr{F}_t$ be the collection of all events observed on the time interval $\lbrack 0, t\rbrack.$ The intensity process is given by $\Lambda_i(t) = \lim_{h \downarrow 0} \frac{1}{h}E(N_i(t + h) - N_i(t) \mid \mathscr{F}_t)\quad i = 1, \cdots, k.$ We give an application of the recently developed martingale-based approach to the study of $\mathbf{N}$ via $\mathbf{\Lambda}.$ A statistical model is defined by letting $\Lambda_i(t) = \alpha_i(t)Y_i(t), i = 1, \cdots, k,$ where $\mathbf{\alpha} = (\alpha_1, \cdots, \alpha_k)$ is an unknown nonnegative function while $\mathbf{Y} = (Y_1, \cdots, Y_k),$ together with $\mathbf{N},$ is a process observable over a certain time interval. Special cases are time-continuous Markov chains on finite state spaces, birth and death processes and models for survival analysis with censored data. The model is termed nonparametric when $\mathbf{\alpha}$ is allowed to vary arbitrarily except for regularity conditions. The existence of complete and sufficient statistics for this model is studied. An empirical process estimating $\beta_i(t) = \int^t_0 \alpha_i(s) ds$ is given and studied by means of the theory of stochastic integrals. This empirical process is intended for plotting purposes and it generalizes the empirical cumulative hazard rate from survival analysis and is related to the product limit estimator. Consistency and weak convergence results are given. Tests for comparison of two counting processes, generalizing the two sample rank tests, are defined and studied. Finally, an application to a set of biological data is given.

1,391 citations


Journal ArticleDOI
TL;DR: In this paper, an asymptotic expansion of distributions of maximum likelihood estimators and, more generally, minimum contrast estimators of vector parameters under readily verifiable distributional assumptions is shown to be identical with a formal Edgeworth expansion of the distribution function of W.. This settles a conjecture of Wallace (1958).
Abstract: EZ1. This asymptotic expansion is shown to be identical with a formal Edgeworth expansion of the distribution function of W.. This settles a conjecture of Wallace (1958). The class of statistics considered includes all appropriately smooth functions of sample moments. An application yields asymptotic expansions of distributions of maximum likelihood estimators and, more generally, minimum contrast estimators of vector parameters under readily verifiable distributional assumptions.

601 citations


Journal ArticleDOI
TL;DR: In this paper, an estimation procedure for stochastic processes based on the minimization of a sum of squared deviations about conditional expectations is developed, and the estimators and their limiting covariance matrix are worked out in detail for a subcritical branching process with immigration.
Abstract: An estimation procedure for stochastic processes based on the minimization of a sum of squared deviations about conditional expectations is developed. Strong consistency, asymptotic joint normality and an iterated logarithm rate of convergence are shown to hold for the estimators under a variety of conditions. Special attention is given to the widely studied cases of stationary ergodic processes and Markov processes with are asymptotically stationary and ergodic. The estimators and their limiting covariance matrix are worked out in detail for a subcritical branching process with immigration. A brief Monte Carlo study of the performance of the estimators is presented.

429 citations


Journal ArticleDOI
TL;DR: In this article, the estimation of a density and its derivatives by the kernel method is considered, and uniform consistency properties over the whole real line are studied under certain conditions on the density and on the behavior of the window width which are necessary and sufficient for weak and strong uniform consistency of the estimate of the density derivatives.
Abstract: The estimation of a density and its derivatives by the kernel method is considered. Uniform consistency properties over the whole real line are studied. For suitable kernels and uniformly continuous densities it is shown that the conditions $h \rightarrow 0$ and $(nh)^{-1} \log n \rightarrow 0$ are sufficient for strong uniform consistency of the density estimate, where $n$ is the sample size and $h$ is the "window width." Under certain conditions on the kernel, conditions are found on the density and on the behavior of the window width which are necessary and sufficient for weak and strong uniform consistency of the estimate of the density derivatives. Theorems on the rate of strong and weak consistency are also proved.

362 citations


Journal ArticleDOI
TL;DR: A method is set forth for analyzing periodic autoregressions which is also applicable when inferring the second order properties of periodically correlated processes and overcomes the usual requirements of a large number of both parameters and computer storage locations.
Abstract: A methodology is presented for analyzing periodic autoregressions which is also applicable when inferring the second order properties of periodically correlated processes. In addition, capitalizing on the connection between periodic and multiple autoregressions, a method is set forth for analyzing the latter, which overcomes the usual requirements of a large number of both parameters and computer storage locations. This is achieved by introducing an orthogonal parameterization for multiple autoregressions.

305 citations


Journal ArticleDOI
TL;DR: Hadamard matrices have been widely studied in the literature and many of their applications can be found in this paper, e.g., incomplete block designs, Youden designs, orthogonal $F$-square designs, optimal saturated resolution III (SRSIII), optimal weighing designs, maximal sets of pairwise independent random variables with uniform measure, error correcting and detecting codes, Walsh functions, and other mathematical and statistical objects.
Abstract: An $n \times n$ matrix $H$ with all its entries $+1$ and $-1$ is Hadamard if $HH' = nI$. It is well known that $n$ must be 1, 2 or a multiple of 4 for such a matrix to exist, but is not known whether Hadamard matrices exist for every $n$ which is a multiple of 4. The smallest order for which a Hadamard matrix has not been constructed is (as of 1977) 268. Research in the area of Hadamard matrices and their applications has steadily and rapidly grown, especially during the last three decades. These matrices can be transformed to produce incomplete block designs, $t$-designs, Youden designs, orthogonal $F$-square designs, optimal saturated resolution III designs, optimal weighing designs, maximal sets of pairwise independent random variables with uniform measure, error correcting and detecting codes, Walsh functions, and other mathematical and statistical objects. In this paper we survey the existence of Hadamard matrices and many of their applications.

288 citations


Journal ArticleDOI
TL;DR: In this article, nonparametric estimators for transition probabilities in partial Markov chains relative to multiple decrement models are proposed, which are generalizations of the product limit estimator.
Abstract: Nonparametric estimators are proposed for transition probabilities in partial Markov chains relative to multiple decrement models. The estimators are generalizations of the product limit estimator. We study the bias of the estimators, prove a strong consistency result and derive asymptotic normality of the estimators considered as stochastic processes. We also compute their efficiency relative to the maximum likelihood estimators in the case of constant forces of transition.

246 citations


Journal ArticleDOI
TL;DR: In this paper, the authors describe geometric results relating the natural parameter space and the expectation parameter space for multivariate exponential families, and show that the two spaces coincide and the geometry is the familiar Euclidean one.
Abstract: There are two important spaces connected with every multivariate exponential family, the natural parameter space and the expectation parameter space. We describe some geometric results relating the two. (In the simplest case, that of a normal translation family, the two spaces coincide and the geometry is the familiar Euclidean one.) Maximum likelihood estimation, within one-parameter curved subfamilies of the multivariate family, has two simple and useful geometric interpretations. The geometry also relates to the Fisherian question: to what extent can the Fisher information be replaced by $-\partial^2/\partial\theta^2\lbrack\log f_\theta(x)\rbrack\mid_{\theta=\hat{\theta}}$ in the variance bound for $\hat{\theta}$, the maximum likelihood estimator?

245 citations


Journal ArticleDOI
TL;DR: In this article, a random variable $X$ is said to have distribution in the class Ω( √ √ n) 0 for any real valued, positive function $a(bullet) if, for some real valued positive function, the identity $E\{(X - \mu)g(X), = E\{a(X)g'(X)\} holds for any absolutely continuous real valued function $g(\bullet), satisfying $E|a(x)g''(X)| 2.
Abstract: A random variable $X$ is said to have distribution in the class $\mathscr{E}_0$ if, for some real valued, positive function $a(\bullet)$, the identity $E\{(X - \mu)g(X)\} = E\{a(X)g'(X)\}$ holds for any absolutely continuous real valued function $g(\bullet)$ satisfying $E|a(X)g'(X)| 2$. Suppose $X_1,\cdots, X_p, p \geqq 3$, are independently distributed with distributions in $\mathscr{E}_0$, for some function $a(\bullet)$, and with means $\mu_1,\cdots, \mu_p$. Define $b(x) = \int a(x)^{-1} dx$, where the integral is interpreted as indefinite, $B_i = b(X_i), S = \sum^p_{i=1} B_i^2, X' = (X_1,\cdots, X_p)$ and $B' = (B_1,\cdots, B_p)$. Then the estimator $X - ((p - 2)/S)B$ dominates $X$ if sum of squared error loss is assumed. Similar estimators are obtained, when $p \geqq 4$, which shrink towards an origin determined by the data. There are corresponding results for some discrete exponential families.

Journal ArticleDOI
TL;DR: In this article, limit processes are obtained for the sequences of partial sums of polynomial regression residuals and properties of linear and quadratic functionals on the sequences are discussed.
Abstract: Limit processes are obtained for the sequences of partial sums of polynomial regression residuals. Properties of linear and quadratic functionals on the sequences are discussed. Distribution theory for Cramer-von Mises type functionals is obtained. An indication is given of the relevance of these results to the problem of testing for change of regression at unknown times.

Journal ArticleDOI
TL;DR: In this article, the adaptive biased coin design, which offers a compromise between perfect balance and complete randomization, is proposed and analyzed, which has the property that it forces a small-sized experiment to be balanced, but tends toward the complete randomisation scheme as the size of the experiment increases.
Abstract: In comparing two treatments, eligible subjects come to the experiment sequentially and must be treated at once. To reduce experimental bias and to increase the precision of inference about treatment effects, the adaptive biased coin design, which offers a compromise between perfect balance and complete randomization, is proposed and analyzed. This new design has the property that it forces a small-sized experiment to be balanced, but tends toward the complete randomization scheme as the size of the experiment increases.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the distance between the empirical and quantile processes can be approximated by a sequence of Brownian bridges as well as by a Kiefer process.
Abstract: Let $q_n(y), 0 < y < 1,$ be a quantile process based on a sequence of i.i.d. rv with distribution function $F$ and density function $f.$ Given some regularity conditions on $F$ the distance of $q_n(y)$ and the uniform quantile process $u_n(y),$ respectively defined in terms of the order statistics $X_{k:n}$ and $U_{k:n} = F(X_{k:n}),$ is computed with rates. As a consequence we have an extension of Kiefer's result on the distance between the empirical and quantile processes, a law of iterated logarithm for $q_n(y)$ and, using similar results for the uniform quantile process $u_n(y),$ it is also shown that $q_n(y)$ can be approximated by a sequence of Brownian bridges as well as by a Kiefer process.

Journal ArticleDOI
TL;DR: In this article, the authors present a list of repeated measurements designs for those cases where a subject cannot participate in all tests as in many pharmacological studies, and provide an extensive list of references on repeated measurements which it is hoped, will be useful to those who want to do further research.
Abstract: : Repeated measurements designs are concerned with scientific experiments in which subjects (experimental units) are repeatedly exposed to a sequence of different or identical tests (treatments). These designs have application in many branches of scientific inquiry such as: Biology, education, food science, marketing, environmental engineering, medicine and pharmacology. The objectives in the paper are threefold: (1) To construct some families of repeated measurements designs which researchers have been seeking. These designs are useful for those cases where a subject cannot participate in all tests as in many pharmacological studies; (2) to provide an extensive list of references on repeated measurements designs which it is hoped, will be useful to those who want to do further research in this area; (3) to state some unsolved problems which have an immediate application. (Author)

Journal ArticleDOI
TL;DR: In this paper, the problem of finding an optimal design for the elimination of one-way heterogeneity when a balanced block design does not exist is studied, and a general result on the optimality of certain asymmetrical designs is proved and applied to the block design setting.
Abstract: The problem of finding an optimal design for the elimination of one-way heterogeneity when a balanced block design does not exist is studied. A general result on the optimality of certain asymmetrical designs is proved and applied to the block design setting. It follows that if there is a group divisible partially balanced block design (GD PBBD) with 2 groups and $\lambda_2 = \lambda_1 + 1$, then it is optimal w.r.t. a very general class of criteria including all the commonly used ones. On the other hand, if there is a GD PBBD with 2 groups and $\lambda_1 = \lambda_2 + 1$, then it is optimal w.r.t. another class of criteria. Uniqueness of optimal designs and some other miscellaneous results are also obtained.

Journal ArticleDOI
TL;DR: In this article, it was shown that for any k-local discrimination rule, the mean-square difference between the probability of error for the rule and its deleted estimate is bounded by a small constant which depends only on $M$ and $k$.
Abstract: In the discrimination problem the random variable $\theta$, known to take values in $\{1,\cdots, M\}$, is estimated from the random vector $X$. All that is known about the joint distribution of $(X, \theta)$ is that which can be inferred from a sample $(X_1, \theta_1),\cdots, (X_n, \theta_n)$ of size $n$ drawn from that distribution. A discrimination rule is any procedure which determines a decision $\hat{\theta}$ for $\theta$ from $X$ and $(X_1, \theta_1),\cdots, (X_n, \theta_n)$. A rule is called $k$-local if the decision $\hat{\theta}$ depends only on $X$ and the pairs $(X_i, \theta_i)$ for which $X_i$ is one of the $k$-closest to $X$ from $X_1,\cdots, X_n$. It is shown that for any $k$-local discrimination rule, the mean-square difference between the probability of error for the rule and its deleted estimate is bounded by $A/n$ where $A$ is an explicitly given small constant which depends only on $M$ and $k$. Thus distribution-free confidence intervals can be placed about probability of error estimates for $k$-local discrimination rules.

Journal ArticleDOI
TL;DR: In this paper, the interval estimation of variance components is studied for the unbalanced one-way random effects model and an easily calculated function, $W$, of the harmonic mean of the class sizes and of the sample variance of class means is found and shown to be excellently approximated by a chi-square distribution.
Abstract: Interval estimation of variance components is studied for the unbalanced one-way random effects model. An easily calculated function, $W$, of the harmonic mean of the class sizes and of the sample variance of the class means is found to be important. The exact distribution of $W$ is found and is shown to be excellently approximated by a chi-square distribution. The random variable $W$ is used to construct interval estimates for (i) the between classes variance component and (ii) the ratio of the variance components and thus for the intraclass correlation and heritability. For most one-way unbalanced designs use of these approximate interval estimators will work very well.

Journal ArticleDOI
TL;DR: A decision maker is seen to be coherent in the sense of de Finetti if, and only if, his probabilities are computed in accordance with some finitely additive prior as mentioned in this paper, and if a bounded loss function is specified, then a decision rule is extended admissible (i.e., not uniformly dominated).
Abstract: A decision maker is seen to be coherent in the sense of de Finetti if, and only if, his probabilities are computed in accordance with some finitely additive prior. If a bounded loss function is specified, then a decision rule is extended admissible (i.e., not uniformly dominated) if and only if it is Bayes for some finitely additive prior. However, if an improper countably additive prior is used, then decisions need not cohere and decision rules need not be extended admissible. Invariant, finitely additive priors are found and their posteriors calculated for a class of problems including translation parameter problems.

Journal ArticleDOI
TL;DR: In this article, the asymptotic power functions of tests for heteroscedasticity and nonlinearity in the linear model were studied and some competitors robust against gross errors were introduced.
Abstract: We study the asymptotic power functions of tests for heteroscedasticity and nonlinearity in the linear model which were proposed by Anscombe and introduce and study some competitors robust against gross errors.

Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of testing the equality of a set of ordered normal means of a Poisson distribution with respect to a given set of parameters, and showed that the largest type I error probability yields the least favorable.
Abstract: This paper considers likelihood ratio tests for testing hypotheses that a collection of parameters satisfy some order restriction. The first problem considered is to test a hypothesis specifying an order restriction on a collection of means of normal distributions. Equality of the means is the subhypothesis of the null hypothesis which yields the largest type I error probability (i.e., is least favorable). Furthermore, the distribution of $T = -\ln$ (likelihood ratio) is similar to that of a likelihood ratio statistic for testing the equality of a set of ordered normal means. The least favorable status of homogeneity is a consequence of a result that if $X$ is a point and $A$ a closed convex cone in a Hilbert space and if $Z \in A$, then the distance from $X + Z$ to $A$ is no larger than the distance from $X$ to $A$. The results of a Monte Carlo study of the power of the likelihood ratio statistic are discussed. The distribution of $T$ is also shown to serve as the asymptotic distribution for likelihood ratio statistics for testing trend when the sampled distributions belong to an exponential family. An application of this result is given for underlying Poisson distributions.

Journal ArticleDOI
TL;DR: In this paper, a set of observations is partitioned into $k$ clusters by optimizing a clustering criterion, and the asymptotic distribution of this clustering criteria may be determined simply in certain cases where the optimal sample partition differs negligibly from the optimal population partition.
Abstract: A set of observations is partitioned into $k$ clusters by optimizing a clustering criterion $W$. The asymptotic distribution of this clustering criterion may be determined simply in certain cases where the optimal sample partition differs negligibly from the optimal population partition. Detailed proofs are given in the one-dimensional case when the clustering criterion to be minimized is within cluster sum of squares. The asymptotic distributions are used to compute approximate significance levels of tests for the presence of clusters, and of tests for bimodality.

Journal ArticleDOI
TL;DR: In this paper, the authors study a class of decision rules based on an adaptive partitioning of an Euclidean observation space, and provide sufficient conditions that a sequence of rules be asymptotically Bayes risk efficient as sample size increases.
Abstract: We study a class of decision rules based on an adaptive partitioning of an Euclidean observation space. The class of partitions has a computationally attractive form, and the related decision rule is invariant under strictly monotone transformations of coordinate axes. We provide sufficient conditions that a sequence of decision rules be asymptotically Bayes risk efficient as sample size increases. The sufficient conditions involve no regularity assumptions on the underlying parent distributions.

Journal ArticleDOI
TL;DR: In this article, an approximate linear model is proposed to allow deviations from an underlying ideal linear model as follows: if, in standard notation, $Y = A\beta + r + \varepsilon$ is the ideal model, then where $|r_i| \leqq M_i$ for $M$ a given vector is a given linear model.
Abstract: An approximate linear model is proposed to allow for deviations from an underlying ideal linear model as follows: If, in standard notation, $Y = A\beta + \varepsilon$ is the ideal model then $Y = A\beta + r + \varepsilon$ where $|r_i| \leqq M_i$ for $M$ a given vector is an approximate linear model. The problem solved here is that of finding a linear estimate of a single linear function of $\beta$ which minimaxes mean square error in the approximate model. The estimate obtained may be the standard one from the ideal model, but in general it is not. The estimate is calculated as a solution to a set of nonlinear equations (generalizing the usual normal equations) and an algorithm is given for obtaining the solution.

Journal ArticleDOI
TL;DR: Assuming only the existence of the third absolute moment, Chan and Wierman as discussed by the authors proved that the Berry-Esseen theorem holds for a σ n = σ √ n − 1/2 where σn is a $U$-statistic.
Abstract: Assuming only the existence of the third absolute moment we prove that $\sup_x |P(\sigma_n^{-1} U_n \leqq x) - \Phi (x)| \leqq C_{ u_3\sigma_g}^{-3}n^{-\frac{1}{2}}$ where $U_n$ is a $U$-statistic. This concludes a series of investigations on the Berry-Esseen theorem for $U$-statistics by Grams and Serfling, Bickel, and Chan and Wierman.

Journal ArticleDOI
TL;DR: In this article, the standard estimator used in life tables is shown to be asymptotically unbiased, uniformly strong consistent, and converges in distribution to a Gaussian process.
Abstract: In the analysis of life tables one biometric function of interest is the life expectancy at age $x, e_x = E\lbrack X - x\mid X > x\rbrack$. Estimation of $e_x$ is considered, the standard estimator used in life tables is shown to be asymptotically unbiased, uniformly strong consistent, and converges in distribution to a Gaussian process. The connections of the estimator studied in this article and that used in reliability theory are illustrated.

Journal ArticleDOI
TL;DR: The history of mathematical statistics in the United States prior to 1885 is reviewed, with emphasis upon the works of Robert Adrain, Benjamin and Charles Peirce, Simon Newcomb, and Erastus De Forest as discussed by the authors.
Abstract: The history of mathematical statistics in the United States prior to 1885 is reviewed, with emphasis upon the works of Robert Adrain, Benjamin and Charles Peirce, Simon Newcomb, and Erastus De Forest. While the period before 1850 produced little of substance, the years from 1850 to 1885 saw such innovations as an outlier rejection procedure, randomized design of experiments, elicitation of personal probabilities, kernel estimation of density functions, an anticipation of sufficiency, a runs test for fit, a Monte Carlo study, optimal linear smoothing, and the fitting of gamma distributions by the method of moments. Reasons for the rapid acceleration in the growth of the field are explored.

Journal ArticleDOI
TL;DR: The exact noncentral distributions of matrix variates and latent roots derived from normal samples involve hypergeometric functions of matrix argument as discussed by the authors, which can be defined as power series, by integral representations, or as solutions of differential equations, and there is no doubt that these mathematical characterizations have been a unifying influence in multivariate noncentral distribution theory.
Abstract: The exact noncentral distributions of matrix variates and latent roots derived from normal samples involve hypergeometric functions of matrix argument. These functions can be defined as power series, by integral representations, or as solutions of differential equations, and there is no doubt that these mathematical characterizations have been a unifying influence in multivariate noncentral distribution theory, at least from an analytic point of view. From a computational and inference point of view, however, the hypergeometric functions are themselves of very limited value due primarily to the many difficulties involved in evaluating them numerically and consequently in studying the effects of population parameters on the distributions. Asymptotic results for large sample sizes or large population latent roots have so far proved to be much more useful for such problems. The purpose of this paper is to review some of the recent results obtained in these areas.

Journal ArticleDOI
TL;DR: In this article, the convergence of recursive stochastic approximation algorithms with probability is considered and some extensions of previous results for the Robbins-Monro and Kiefer-Wolfowitz procedures are given.
Abstract: Convergence with probability one of a recursive stochastic approximation algorithm is considered. Some extensions of previous results for the Robbins-Monro and the Kiefer-Wolfowitz procedures are given. An inportant feature of the approach taken here is that the convergence analysis can be directly extended to more complex algorithms.

Journal ArticleDOI
TL;DR: In this article, a nonparametric minimum Hellinger distance estimator of location is introduced and shown to be asymptotically efficient at every symmetric density with finite Fisher information.
Abstract: A nonparametric minimum Hellinger distance estimator of location is introduced and shown to be asymptotically efficient at every symmetric density with finite Fisher information. Under small, possibly asymmetric, perturbations in such a density, the estimator is asymptotically robust in a technical sense which extends Hajek's concept of "regularity." A numerical example illustrates the computational feasibility of the estimator and its resistance to an arbitrary single outlier.