scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1981"


Journal ArticleDOI
TL;DR: In this article, an unbiased estimate of risk is obtained for an arbitrary estimate, and certain special classes of estimates are then discussed, such as smoothing by using moving averages and trimmed analogs of the James-Stein estimate.
Abstract: Estimation of the means of independent normal random variables is considered, using sum of squared errors as loss. An unbiased estimate of risk is obtained for an arbitrary estimate, and certain special classes of estimates are then discussed. The results are applied to smoothing by use of moving averages and to trimmed analogs of the James-Stein estimate. A suggestion is made for calculating approximate confidence sets for the mean vector centered at an arbitrary estimate.

2,866 citations


Journal ArticleDOI
TL;DR: Efron's "bootstrap" method of distribution approximation is shown to be asymptotically valid in a large number of situations, including $t$-statistics, the empirical and quantile processes, and von Mises functionals as discussed by the authors.
Abstract: Efron's "bootstrap" method of distribution approximation is shown to be asymptotically valid in a large number of situations, including $t$-statistics, the empirical and quantile processes, and von Mises functionals. Some counter-examples are also given, to show that the approximation does not always succeed.

1,635 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the natural jackknife variance estimate tends always to be biased upwards, a theorem to this effect being proved for the natural Jackknife estimate of $\operatorname{Var} S(X_1, X_2, \cdots, X_{n-1})$ based on the symmetric function of i.i.d. random variables.
Abstract: Tukey's jackknife estimate of variance for a statistic $S(X_1, X_2, \cdots, X_n)$ which is a symmetric function of i.i.d. random variables $X_i$, is investigated using an ANOVA-like decomposition of $S$. It is shown that the jackknife variance estimate tends always to be biased upwards, a theorem to this effect being proved for the natural jackknife estimate of $\operatorname{Var} S(X_1, X_2, \cdots, X_{n-1})$ based on $X_1, X_2, \cdots, X_n$.

1,409 citations


Journal ArticleDOI
TL;DR: In this article, the authors developed diagnostic measures to aid the analyst in detecting such observations and quantifying their effect on various aspects of the maximum likelihood fit of a logistic regression model.
Abstract: A maximum likelihood fit of a logistic regression model (and other similar models) is extremely sensitive to outlying responses and extreme points in the design space. We develop diagnostic measures to aid the analyst in detecting such observations and in quantifying their effect on various aspects of the maximum likelihood fit. The elements of the fitting process which constitute the usual output (parameter estimates, standard errors, residuals, etc.) will be used for this purpose. With a properly designed computing package for fitting the usual maximum-likelihood model, the diagnostics are essentially "free for the asking." In particular, good data analysis for logistic regression models need not be expensive or time-consuming.

1,216 citations


Journal ArticleDOI
TL;DR: The Bayesian bootstrap as discussed by the authors is the Bayesian analogue of the bootstrap, and it is used to estimate the posterior distribution of the parameter of a given parameter, instead of simulating the sampling distribution of a statistic estimating a parameter.
Abstract: The Bayesian bootstrap is the Bayesian analogue of the bootstrap. Instead of simulating the sampling distribution of a statistic estimating a parameter, the Bayesian bootstrap simulates the posterior distribution of the parameter; operationally and inferentially the methods are quite similar. Because both methods of drawing inferences are based on somewhat peculiar model assumptions and the resulting inferences are generally sensitive to these assumptions, neither method should be applied without some consideration of the reasonableness of these model assumptions. In this sense, neither method is a true bootstrap procedure yielding inferences unaided by external assumptions.

1,005 citations


Journal ArticleDOI
TL;DR: In this article, it is shown that the bootstrap approximation to the distribution of the least squares estimates is valid and some error bounds are given, and the regression and correlation models are considered.
Abstract: The regression and correlation models are considered. It is shown that the bootstrap approximation to the distribution of the least squares estimates is valid, and some error bounds are given.

893 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the bootstrap approximation of the distribution of the standardized sample mean is asymptotically more accurate than approximation by the limiting normal distribution.
Abstract: In the non-lattice case it is shown that the bootstrap approximation of the distribution of the standardized sample mean is asymptotically more accurate than approximation by the limiting normal distribution. The exact convergence rate of the bootstrap approximation of the distributions of sample quantiles is obtained. A few other convergence rates regarding the bootstrap method are also studied.

765 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that the copula of a pair of random variables $X, Y$ is invariant under a.s. strictly increasing transformations of $X and Y$ and that any property of the joint distribution function of the pair of variables which is invariance under such transformations is solely a function of their copula.
Abstract: In 1959 A. Renyi proposed a set of axioms for a measure of dependence for pairs of random variables. In the same year A. Sklar introduced the general notion of a copula. This is a function which links an $n$-dimensional distribution function to its one-dimensional margins and is itself a continuous distribution function on the unit $n$-cube, with uniform margins. We show that the copula of a pair of random variables $X, Y$ is invariant under a.s. strictly increasing transformations of $X$ and $Y$, and that any property of the joint distribution function of $X$ and $Y$ which is invariant under such transformations is solely a function of their copula. Exploiting these facts, we use copulas to define several natural nonparametric measures of dependence for pairs of random variables. We show that these measures satisfy reasonable modifications of Renyi's conditions and compare them to various known measures of dependence, e.g., the correlation coefficient and Spearman's $\rho$.

610 citations


Journal ArticleDOI
TL;DR: For a linear regression model, the necessary and sufficient condition for the asymptotic consistency of the least squares estimator is known as mentioned in this paper, and the condition is sufficient for the existence of any weakly consistent estimator, including the least square estimator.
Abstract: For a linear regression model, the necessary and sufficient condition for the asymptotic consistency of the least squares estimator is known. An analogous condition for the nonlinear model is considered in this paper. The condition is proved to be necessary for the existence of any weakly consistent estimator, including the least squares estimator. It is also sufficient for the strong consistency of the nonlinear least squares estimator if the parameter space is finite. For an arbitrary compact parameter space, its sufficiency for strong consistency is proved under additional conditions in a sense weaker than previously assumed. The proof involves a novel use of the strong law of large numbers in $C(S)$. Asymptotic normality is also established.

521 citations


Journal ArticleDOI
TL;DR: In this article, a random sample is divided into the $k$ clusters that minimise the within cluster sum of squares, and conditions are found that ensure the almost sure convergence, as the sample size increases, of the set of means of the k$ clusters.
Abstract: A random sample is divided into the $k$ clusters that minimise the within cluster sum of squares. Conditions are found that ensure the almost sure convergence, as the sample size increases, of the set of means of the $k$ clusters. The result is proved for a more general clustering criterion.

490 citations


Journal ArticleDOI
TL;DR: In this article, strong consistency and asymptotic normality for the maximum partial likelihood estimate of the regression parameter in Cox's regression model were established for the underlying cumulative hazard function and survival distribution.
Abstract: Strong consistency and asymptotic normality are established for the maximum partial likelihood estimate of the regression parameter in Cox's regression model. Estimates are also derived for the underlying cumulative hazard function and survival distribution. We establish the asymptotic normality of these estimates and calculate the limiting variances.

Journal ArticleDOI
TL;DR: In this paper, a new estimator of the parameter vector in a linear regression model when the observations are randomly censored on the right and when the error distribution is unknown is proposed, and sufficient conditions under which this estimator is mean square consistent and asymptotically normal.
Abstract: This paper proposes a new estimator of the parameter vector in a linear regression model when the observations are randomly censored on the right and when the error distribution is unknown. This estimator is explicitly defined and easily computable. The paper contains sufficient conditions under which this estimator is mean square consistent and asymptotically normal. A numerical example is given.

Journal ArticleDOI
TL;DR: In this paper, several signal plus noise or convolutional models are examined which exhibit such behavior and satisfy the regularity conditions of the asymptotic theory, and a numerical comparison of the results suggests that a psuedo maximum likelihood estimate of the signal parameter is uniformly more efficient than estimators that have been advanced by previous authors.
Abstract: : Pseudo maximum likelihood estimation easily extends to k parameter models, and is of interest in problems in which the likelihood surface is ill-behaved in higher dimensions but well-behaved in lower dimensions. Several signal plus noise or convolution models are examined which exhibit such behavior and satisfy the regularity conditions of the asymptotic theory. For specific models, a numerical comparison of asymptotic variances suggests that a psuedo maximum likelihood estimate of the signal parameter is uniformly more efficient than estimators that have been advanced by previous authors. A number of other potential applications are noted.

Journal ArticleDOI
TL;DR: The most famous priority dispute in the history of statistics is that between Gauss and Legendre, over the discovery of the method of least squares, and an attempt is made to evaluate Gauss's claim as discussed by the authors.
Abstract: The most famous priority dispute in the history of statistics is that between Gauss and Legendre, over the discovery of the method of least squares. New evidence, both documentary and statistical, is discussed, and an attempt is made to evaluate Gauss's claim. It is argued (though not conclusively) that Gauss probably possessed the method well before Legendre, but that he was unsuccessful in communicating it to his contemporaries. Data on the French meridian arc are presented that could, conceivably, permit a definitive verification of the claim.

Journal ArticleDOI
TL;DR: In this article, a class of probability density estimates can be obtained by penalizing the likelihood by a functional which depends on the roughness of the logarithm of the density.
Abstract: : A class of probability density estimates can be obtained by penalizing the likelihood by a functional which depends on the roughness of the logarithm of the density. The limiting case of the estimates as the amount of smoothing increasing has a natural form which makes the method attractive for data analysis and which provides a rationale for a particular choice of roughness penalty. The estimates are shown to be the solution of an unconstrained convex optimization problem, and mild natural conditions are given for them to exist. Rates of consistency in various norms and conditions for asymptotic normality and approximation by a Gaussian process are given, thus breaking new ground in the theory of maximum penalized likelihood density estimation. (Author)

Journal ArticleDOI
TL;DR: In this paper, the asymptotic normality of both linear and nonlinear statistics and the consistency of the variance estimators obtained using the linearization, jackknife and balanced repeated replication (BRR) methods in stratified samples are established.
Abstract: The asymptotic normality of both linear and nonlinear statistics and the consistency of the variance estimators obtained using the linearization, jackknife and balanced repeated replication (BRR) methods in stratified samples are established The results are obtained as $L \rightarrow \infty$ within the context of a sequence of finite populations $\{\Pi_L\}$ with $L$ strata in $\Pi_L$ and are valid for any stratified multistage design in which the primary sampling units (psu's) are selected with replacement and in which independent subsamples are taken within those psu's selected more than once In addition, some exact analytical results on the bias and stability of these alternative variance estimators in the case of ratio estimation are obtained for small $L$ under a general linear regression model

Journal ArticleDOI
TL;DR: In this article, a multivariate "errors in variables" regression model is proposed, in which the unknown mean vectors are assumed to follow the linear relation, i.i.d. random vectors with common covariance matrix.
Abstract: In a multivariate "errors in variables" regression model, the unknown mean vectors $\mathbf{u}_{1i}: p \times 1, \mathbf{u}_{2i}: r \times 1$ of the vector observations $\mathbf{x}_{1i}, \mathbf{x}_{2i}$, rather than the observations themselves, are assumed to follow the linear relation: $\mathbf{u}_{2i} = \alpha + B\mathbf{u}_{1i}, i = 1,2,\cdots, n$. It is further assumed that the random errors $\mathbf{e}_i = \mathbf{x}_i - \mathbf{u}_i, \mathbf{x}'_i = (\mathbf{x}'_{1i}, \mathbf{x}'_{2i}), \mathbf{u}'_i = (\mathbf{u}'_{1i}, \mathbf{u}'_{2i})$, are i.i.d. random vectors with common covariance matrix $\Sigma_e$. Such a model is a generalization of the univariate $(r = 1)$ "errors in variables" regression model which has been of interest to statisticians for over a century. In the present paper, it is shown that when $\Sigma_e = \sigma^2I_{p+r}$, a wide class of least squares approaches to estimation of the intercept vector $\alpha$ and slope matrix $B$ all lead to identical estimators $\hat{\alpha}$ and $\hat{B}$ of these respective parameters, and that $\hat{\alpha}$ and $\hat{B}$ are also the maximum likelihood estimators (MLE's) of $\alpha$ and $B$ under the assumption of normally distributed errors $\mathbf{e}_i$. Formulas for $\hat{\alpha}, \hat{B}$ and also the MLE's $\hat{U}_1$ and $\hat{\sigma}^2$ of the parameters $U_1 = (\mathbf{u}_{11}, \cdots, \mathbf{u}_{1n})$ and $\sigma^2$ are given. Under reasonable assumptions concerning the unknown sequence $\{\mathbf{u}_{1i}, i = 1,2,\cdots\}, \hat{\alpha}, \hat{B}$ and $r^{-1}(r + p)\hat{\sigma}^2$ are shown to be strongly (with probability one) consistent estimators of $\alpha, B$ and $\sigma^2$, respectively, as $n \rightarrow \infty$, regardless of the common distribution of the errors $\mathbf{e}_i$. When this common error distribution has finite fourth moments, $\hat{\alpha}, \hat{B}$ and $r^{-1}(r + p)\hat{\sigma}^2$ are also shown to be asymptotically normally distributed. Finally large-sample approximate $100(1 - u){\tt\%}$ confidence regions for $\alpha, B$ and $\sigma^2$ are constructed.

Journal ArticleDOI
TL;DR: In this article, the authors consider a heteroscedastic linear model in which the variances are given by a parametric function of the mean responses and a parameter $\theta$ and show that, as long as a reasonable starting estimate of β$ is available, their estimates of β are asymptotically equivalent to the natural estimate obtained with known variances.
Abstract: We consider a heteroscedastic linear model in which the variances are given by a parametric function of the mean responses and a parameter $\theta$. We propose robust estimates for the regression parameter $\beta$ and show that, as long as a reasonable starting estimate of $\theta$ is available, our estimates of $\beta$ are asymptotically equivalent to the natural estimate obtained with known variances. A particular method for estimating $\theta$ is proposed and shown by Monte-Carlo to work quite well, especially in power and exponential models for the variances. We also briefly discuss a "feedback" estimate of $\beta$.

Journal ArticleDOI
TL;DR: In this article, the authors propose three basic problems of statistical inference: (1) that inference is a procedure whereby one passes from a population (or sample) to a new individual; (2) this connection can be established using de Finetti's idea of exchangeability or Fisher's concept of a subpopulation; and (3) in making the connection use must be made of the appropriate probability.
Abstract: : This paper is concerned with basic problems of statistical inference. The thesis is in three parts: (1) that inference is a procedure whereby one passes from a population (or sample) to a new individual; (2) that this connection can be established using de Finetti's idea of exchangeability or Fisher's concept of a subpopulation; and (3) in making the connection use must be made of the appropriate probability. These three principles are used in a variety of situations and the topics discussed include analysis of variance and covariance, contingency tables, and calibration. Some comments on randomization are also included. (Author)

Journal ArticleDOI
TL;DR: In this paper, a stochastic process is defined whose sample paths may be assumed to be either increasing hazard rates or decreasing hazard rates by properly choosing the parameter functions of the process.
Abstract: : It is suggested that problems in a reliability context may be handled by a Bayesian non-parametric approach. A stochastic process is defined whose sample paths may be assumed to be either increasing hazard rates or decreasing hazard rates by properly choosing the parameter functions of the process. The posterior distribution of the hazard rates are derived for both exact and censored data. Bayes estimates of hazard rates,c.d.f.'s, densities, and means, are found under squared error type loss functions. Some simulation is done and estimates graphed to better understand the estimators. Finally, estimates of the c.d.f. from some data in a paper by Kaplan and Meier are constructed. (Author)

Journal ArticleDOI
TL;DR: In this article, the weak and strong Bayes risk consistency of the corresponding nonparametric discrimination rules is proved for all possible distributions of the data, and sufficient conditions are given for large classes of kernel estimates and nearest neighbor estimates.
Abstract: Let $(X, Y), (X_1, Y_1), \cdots, (X_n, Y_n)$ be independent identically distributed random vectors from $R^d \times R$, and let $E(|Y|^p) < \infty$ for some $p \geq 1$. We wish to estimate the regression function $m(x) = E(Y \mid X = x)$ by $m_n(x)$, a function of $x$ and $(X_1, Y_1), \cdots, (X_n, Y_n)$. For large classes of kernel estimates and nearest neighbor estimates, sufficient conditions are given for $E\{|m_n(x) - m(x)|^p\} \rightarrow 0$ as $n \rightarrow \infty$, almost all $x$. No additional conditions are imposed on the distribution of $(X, Y)$. As a by-product, just assuming the boundedness of $Y$, the almost sure convergence to 0 of $E\{|m_n(X) - m(X)\| X_1, Y_1, \cdots, X_n, Y_n\}$ is established for the same estimates. Finally, the weak and strong Bayes risk consistency of the corresponding nonparametric discrimination rules is proved for all possible distributions of the data.

Journal ArticleDOI
TL;DR: In this paper, the results of Wald on the consistency of the maximum likelihood estimate are extended and applications are made to mixture distributions and to clustering when the number of clusters is not known.
Abstract: The results of Wald on the consistency of the maximum likelihood estimate are extended. Applications are made to mixture distributions and to clustering when the number of clusters is not known.

Journal ArticleDOI
TL;DR: In this article, it was shown that if the interval is small (approximately two standard deviations wide) then the Bayes rule against a two point prior is the unique minimax estimator under squared error loss.
Abstract: The problem of estimating a normal mean has received much attention in recent years. If one assumes, however, that the true mean lies in a bounded interval, the problem changes drastically. In this paper we show that if the interval is small (approximately two standard deviations wide) then the Bayes rule against a two point prior is the unique minimax estimator under squared error loss. For somewhat wider intervals we also derive sufficient conditions for minimaxity of the Bayes rule against a three point prior.

Journal ArticleDOI
TL;DR: In this paper, the concept of conditional probability distributions is generalized to that of conditional measures, and Bayes theorem is extended to accommodate unbounded priors, and upper and lower expectations and variances induced by such intervals of measures are obtained.
Abstract: Partial prior knowledge is quantified by an interval $I(L, U)$ of $\sigma$-finite prior measures $Q$ satisfying $L(A) \leq Q(A) \leq U(A)$ for all measurable sets $A$, and is interpreted as acceptance of a family of bets. The concept of conditional probability distributions is generalized to that of conditional measures, and Bayes theorem is extended to accommodate unbounded priors. According to Bayes theorem, the interval $I(L, U)$ of prior measures is transformed upon observing $X$ into a similar interval $I(L_x, U_x)$ of posterior measures. Upper and lower expectations and variances induced by such intervals of measures are obtained. Under weak regularity conditions, as the amount of data increases, these upper and lower posterior expectations are strongly consistent estimators. The range of posterior expectations of an arbitrary function $b$ on the parameter space is asymptotically $b_N \pm \alpha\sigma_N + o(\sigma_N)$ where $b_N$ and $\sigma^2_N$ are the posterior mean and variance of $b$ induced by the upper prior measure $U$, and where $\alpha$ is a constant determined by the density of $L$ with respect to $U$ reflecting the uncertainty about the prior.

Journal ArticleDOI
TL;DR: In this article, asymptotic procedures for testing certain hypotheses concerning eigenvectors and for constructing confidence regions for eigen vectors are derived under fairly general conditions on the estimates of the matrix whose eigenvector is of interest.
Abstract: : Asymptotic procedures are given for testing certain hypotheses concerning eigenvectors and for constructing confidence regions for eigenvectors. These asymptotic procedures are derived under fairly general conditions on the estimates of the matrix whose eigenvectors are of interest. Applications of the general results to principal components analysis and canonical variate analysis are given. (Author)

Journal ArticleDOI
TL;DR: In this paper, the authors derived parametric and nonparametric simultaneous upper confidence intervals for all distances from the "best" under the location model, improving upon the results of Bechhofer (1954), Gupta (1956, 1965), Fabian (1962), and Desu (1970).
Abstract: In practice, comparisons with the "best" are often the ones of primary interest. In this paper, parametric and nonparametric simultaneous upper confidence intervals for all distances from the "best" are derived under the location model. Their improvement upon the results of Bechhofer (1954), Gupta (1956, 1965), Fabian (1962), and Desu (1970) in the parametric case is discussed. In the nonparametric case, no comparable confidence statements were available previously.

Journal ArticleDOI
TL;DR: In this article, the minimax risk for estimation with quadratic loss subject to |\theta| \leq m = 1 - \pi^2/m^2 + o(m^{-2}) was shown to be asymptotically minimax to this order.
Abstract: If $X$ is a $N(\theta, 1)$ random variable, let $\rho (m)$ be the minimax risk for estimation with quadratic loss subject to $|\theta| \leq m$. Then $\rho (m) = 1 - \pi^2/m^2 + o(m^{-2})$. We exhibit estimates which are asymptotically minimax to this order as well as approximations to the least favorable prior distributions. The approximate least favorable distributions (correct to order $m^{-2}$) have density $m^{-1} \cos^2 \big(\frac{\pi}{2m} s\big), |s| \leq m$ rather than the naively expected uniform density on $\lbrack -m, m \rbrack$. We also show how our results extend to estimation of a vector mean and give some explicit solutions.

Journal ArticleDOI
TL;DR: In this paper, the authors search for designs for which the least squares estimator minimizes appropriate functionals of the dispersion matrix under various correlation models, in particular "nearest neighbor" correlation models.
Abstract: In this paper designs are found which are optimum for various models that include some autocorrelation in the covariance structure $V$ First it is noted that the ordinary least squares estimator is quite robust against small perturbations in $V$ from the uncorrelated case $V_0 = \sigma^2_I$ This "local" argument justifies our use of such estimators and restriction to the class of designs $\mathscr{X}^\ast$ (balanced incomplete block or Latin squares) optimum under $V_0$ Within $\mathscr{X}^\ast$ we search for designs for which the least squares estimator minimizes appropriate functionals of the dispersion matrix under various correlation models $V$ In particular, we consider "nearest neighbor" correlation models in detail The solutions lead to interesting combinatorial conditions somewhat similar to those encountered in "repeated measurement" designs Typically, however, the latter need not be BIBD's and require twice as many blocks For Latin squares, and hypercubes, the conditions are less restrictive than those giving "completeness"

Journal ArticleDOI
TL;DR: In this paper, the authors describe the asymptotic theory of triple sampling as it pertains to the estimation of a mean and obtain limit theorems for the case of the normal distribution.
Abstract: We describe the asymptotic theory of triple sampling as it pertains to the estimation of a mean. We obtain limit theorems for the case of the normal distribution. Our results show that triple sampling combines the simplicity of Stein's double sampling technique with the efficiency of the fully sequential Anscombe-Chow-Robbins procedure.

Journal ArticleDOI
TL;DR: The authors generalized Brunk's result to points at which the regression function does not have positive slope and showed that the norming constants are of order $r^{\alpha/(2\alpha + 1)}.
Abstract: An estimator for a monotone regression function was proposed by Brunk. He has shown that if the underlying regression function has positive slope at a point, then, based on $r$ observations, the difference of the regression function and its estimate at that point has a nondegenerate limiting distribution if this difference is multiplied by $r^{1/3}$. To understand how the behavior of the regression function at a point influences the asymptotic properties of the estimator at that point, we have generalized Brunk's result to points at which the regression function does not have positive slope. If the first $\alpha - 1$ derivatives of the regression function are zero at a point and the $\alpha$th derivative is positive there, then the norming constants are of order $r^{\alpha/(2\alpha + 1)}$.