scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1982"


Journal ArticleDOI
TL;DR: In this article, the Cox regression model for censored survival data is extended to a model where covariate processes have a proportional effect on the intensity process of a multivariate counting process, allowing for complicated censoring patterns and time dependent covariates.
Abstract: The Cox regression model for censored survival data specifies that covariates have a proportional effect on the hazard function of the life-time distribution of an individual. In this paper we discuss how this model can be extended to a model where covariate processes have a proportional effect on the intensity process of a multivariate counting process. This permits a statistical regression analysis of the intensity of a recurrent event allowing for complicated censoring patterns and time dependent covariates. Furthermore, this formulation gives rise to proofs with very simple structure using martingale techniques for the asymptotic properties of the estimators from such a model. Finally an example of a statistical analysis is included.

3,719 citations


Journal ArticleDOI
TL;DR: In this article, it was shown that the optimal rate of convergence for an estimator of an unknown regression function (i.e., a regression function of order 2p + d) with respect to a training sample of size n = (p - m)/(2p + 2p+d) is O(n−1/n−r) under appropriate regularity conditions, where n−1 is the optimal convergence rate if q < q < \infty.
Abstract: Consider a $p$-times differentiable unknown regression function $\theta$ of a $d$-dimensional measurement variable Let $T(\theta)$ denote a derivative of $\theta$ of order $m$ and set $r = (p - m)/(2p + d)$ Let $\hat{T}_n$ denote an estimator of $T(\theta)$ based on a training sample of size $n$, and let $\| \hat{T}_n - T(\theta)\|_q$ be the usual $L^q$ norm of the restriction of $\hat{T}_n - T(\theta)$ to a fixed compact set Under appropriate regularity conditions, it is shown that the optimal rate of convergence for $\| \hat{T}_n - T(\theta)\|_q$ is $n^{-r}$ if $0 < q < \infty$; while $(n^{-1} \log n)^r$ is the optimal rate if $q = \infty$

1,513 citations


Journal ArticleDOI
TL;DR: In this article, Andersen and Gill (hereafter AG) present a stimulating development of asymptotic distribution theory for the Cox regression model with time-dependent covariates, which involves such conditions as $\sigma$-algebra right continuity and predictable, locally bounded, covariate processes.
Abstract: In this issue Andersen and Gill (hereafter AG) present a stimulating development of asymptotic distribution theory for the Cox regression model with time-dependent covariates. They use a counting process formulation for the failure time data and martingale covergence results. This approach involves such conditions as $\sigma$-algebra right continuity and predictable, locally bounded, covariate processes. In this commentary we consider the implications of such assumptions for likelihood factorization and covariate modeling. In particular, it is noted that the partial likelihood function modeled by AG cannot, in general, involve covariate measurements at the random failure times. Some related work by the authors on a partial likelihood function that may involve covariate values at the random failure times is briefly discussed. Assumptions under which the intensity process modeled by AG has a standard "hazard" function interpretation are described and some generalizations of the AG results are mentioned.

1,031 citations


Journal ArticleDOI
TL;DR: In this article, strong consistency and asymptotic normality of least squares estimates in stochastic regression models are established under certain weak assumptions on the Stochastic regressors and errors.
Abstract: Strong consistency and asymptotic normality of least squares estimates in stochastic regression models are established under certain weak assumptions on the stochastic regressors and errors. We discuss applications of these results to interval estimation of the regression parameters and to recursive on-line identification and control schemes for linear dynamic systems.

667 citations


Journal ArticleDOI
TL;DR: The normal, Poisson, gamma, binomial, and negative binomial distributions are univariate natural exponential families with quadratic variance functions as mentioned in this paper, where the variance is at most a function of the mean.
Abstract: The normal, Poisson, gamma, binomial, and negative binomial distributions are univariate natural exponential families with quadratic variance functions (the variance is at most a quadratic function of the mean). Only one other such family exists. Much theory is unified for these six natural exponential families by appeal to their quadratic variance property, including infinite divisibility, cumulants, orthogonal polynomials, large deviations, and limits in distribution.

623 citations


Journal ArticleDOI
TL;DR: In this article, an adaptive dependence of the sharpness of the kernels on the underlying density is considered, and it is shown that proportionally varying the bandwidth at the contributing readings lowers the bias to a vanishing fraction of the usual value, and makes for performance seen in well-known estimators that force moment conditions on the kernel (and so sacrifice positivity of the curve estimate).
Abstract: We consider kernel estimation of a smooth density $f$ at a point, but depart from the usual approach in admitting an adaptive dependence of the sharpness of the kernels on the underlying density. Proportionally varying the bandwidths like $f^{-1/2}$ at the contributing readings lowers the bias to a vanishing fraction of the usual value, and makes for performance seen in well-known estimators that force moment conditions on the kernel (and so sacrifice positivity of the curve estimate). Issues of equivariance and variance stabilitization are treated.

599 citations


Journal ArticleDOI
TL;DR: In this paper, the authors simplify a general heuristic necessary condition of Stein's for adaptive estimation of a Euclidean parameter in the presence of an infinite dimensional shape nuisance parameter and other non-Gaussian nuisance parameters.
Abstract: We simplify a general heuristic necessary condition of Stein's for adaptive estimation of a Euclidean parameter in the presence of an infinite dimensional shape nuisance parameter and other Euclidean nuisance parameters. We derive sufficient conditions and apply them in the construction of adaptive estimates for the parameters of linear models and multivariate elliptic distributions. We conclude with a review of issues in adaptive estimation.

552 citations


Journal ArticleDOI
TL;DR: The second-order information loss is calculated for Fisher-efficient estimators, and is decomposed into the sum of two non-negative terms: the exponential curvature of the estimator and the mixture curvature as mentioned in this paper.
Abstract: The differential-geometrical framework is given for analyzing statistical problems related to multi-parameter families of distributions. The dualistic structures of the exponential families and curved exponential families are elucidated from the geometrical viewpoint. The duality connected by the Legendre transformation is thus extended to include two kinds of affine connections and two kinds of curvatures. The second-order information loss is calculated for Fisher-efficient estimators, and is decomposed into the sum of two non-negative terms. One is related to the exponential curvature of the statistical model and the other is related to the mixture curvature of the estimator. Only the latter term depends on the estimator, and vanishes for the maximum-likelihood estimator. A set of statistics which recover the second-order information loss are given. The second-order efficiency also is obtained. The differential geometry of the function space of distributions is discussed.

383 citations


Journal ArticleDOI
TL;DR: In this paper, the maximum likelihood method is used to estimate a density function from an infinite-dimensional space, where the maximum of the likelihood is not attained by any density, and the parameter space is too big.
Abstract: Maximum likelihood estimation often fails when the parameter takes values in an infinite dimensional space. For example, the maximum likelihood method cannot be applied to the completely nonparametric estimation of a density function from an $\operatorname{iid}$ sample; the maximum of the likelihood is not attained by any density. In this example, as in many other examples, the parameter space (positive functions with area one) is too big. But the likelihood method can often be salvaged if we first maximize over a constrained subspace of the parameter space and then relax the constraint as the sample size grows. This is Grenander's "method of sieves." Application of the method sometimes leads to new estimators for familiar problems, or to a new motivation for an already well-studied technique. We will establish some general consistency results for the method, and then we will focus on three applications.

375 citations


Book ChapterDOI
TL;DR: In this paper, it is shown that the phase of the transfer function can be estimated under broad conditions and the asymptotic behavior of a phase estimate is determined under broad assumptions.
Abstract: NonGaussian linear processes are considered. It is shown that the phase of the transfer function can be estimated under broad conditions. This is not true of Gaussian linear processes and in this sense Gaussian linear processes are atypical. The asymptotic behavior of a phase estimate is determined. The phase estimates make use of bispectral estimates. These ideas are applied to a problem of deconvolution which is effective even when the transfer function is not minimum phase. A number of computational illustrations are given.

367 citations


Journal ArticleDOI
TL;DR: In this paper, the authors derived the nonparametric maximum likelihood estimate of a lifetime distribution on the basis of two independent samples, one a sample of size $m$ from the distribution and the other a sample from the length-biased distribution of the distribution.
Abstract: We derive the nonparametric maximum likelihood estimate, $\hat{F}$ say, of a lifetime distribution $F$ on the basis of two independent samples, one a sample of size $m$ from $F$ and the other a sample of size $n$ from the length-biased distribution of $F$, i.e. from $G_F(x) = \int^x_0 u dF(u)/\mu, \mu = \int^\infty_0 x dF(x)$. We further show that $(m + n)^{1/2}(\hat{F} - F)$ converges weakly to a pinned Gaussian process with a simple covariance function, when $m + n \rightarrow \infty$ and $m/n \rightarrow$ constant. Potential applications are described.

Journal ArticleDOI
TL;DR: In this paper, it was shown that if the variances are a parametric function of the design, then one can construct an estimate of the regression parameter which is asymptotically equivalent to the weighted least squares estimate with known variances.
Abstract: In a heteroscedastic linear model, it is known that if the variances are a parametric function of the design, then one can construct an estimate of the regression parameter which is asymptotically equivalent to the weighted least squares estimate with known variances. We show that the same is true when the only thing known about the variances is that they are determined by an unknown but smooth function of the design or the mean response.

Journal ArticleDOI
TL;DR: In this paper, a method for finding the limiting distribution of a parameter in the distribution of the $y_i$ is given for testing normality in regression models, and a simple application to test normality of regression models is given.
Abstract: In a variety of statistical problems, one is interested in the limiting distribution of statistics $\hat{T}_n = T_n(y_1, y_2, \cdots, y_n; \hat{\lambda}_n)$, where $\hat{\lambda}_n$ is an estimator of a parameter in the distribution of the $y_i$ and where the limiting distribution of $T_n = T_n(y_1, y_2, \cdots, y_n; \lambda)$ is relatively easy to find. For cases in which the limiting distribution of $T_n$ is normal with mean independent of $\lambda$, a useful method is given for finding the limiting distribution of $\hat{T}_n$. A simple application to testing normality in regression models is given.

Journal ArticleDOI
TL;DR: In this article, the asymptotic normality of a resubstitution estimator of a correct classification probability when using Fisher's linear discriminant function has been shown.
Abstract: Often a statistic of interest would take the form of a member of a common family, except that some vital parameter is unknown and must be estimated. This paper describes methods for showing the asymptotic normality of such statistics with estimated parameters. Whether or not the limiting distribution is affected by the estimator is primarily a question of whether or not the limiting mean (derived by replacing the estimator by a mathematical variable) has a nonzero derivative with respect to that variable. Section 2 contains conditions yielding the asymptotic normality of $U$-statistics with estimated parameters. These results generalize previous theorems by Sukhatme (1958). As an example, we show the limiting normality of a resubstitution estimator of a correct classification probability when using Fisher's linear discriminant function. The results for $U$-statistics are extended to cover a broad class of families of statistics through the differential. Specifically, conditions are given which yield the asymptotic normality of adaptive $L$-statistics and an example due to de Wet and van Wyk (1979) is examined.

Journal ArticleDOI
TL;DR: In this paper, the authors present elements of a frequentist theory of statistics for concepts of upper and lower (interval-valued) probability (IVP), defined on finite event algebras.
Abstract: We present elements of a frequentist theory of statistics for concepts of upper and lower (interval-valued) probability (IVP), defined on finite event algebras. We consider IID models for unlinked repetitions of experiments described by IVP and suggest several generalizations of standard notions of independence, asymptotic certainty and estimability. Instability of relative freqencies is favoured under our IID models. Moreover, generalizations of Bernoulli's Theorem give some justification for the estimation of an underlying IVP mechanism from fluctuations of relative frequencies. Our results indicate that an objectivist, frequency- or propensity-oriented, view of probability does not necessitate an additive probability concept, and that IVP models can represent a type of indeterminacy not captured by additive probability.

Journal ArticleDOI
TL;DR: In this article, a general class of models for analysis of censored survival data with covariates is considered, and the asymptotic properties of linear functionals of these models are studied in the general case where the true hazard rate function is not a step function.
Abstract: A general class of models for analysis of censored survival data with covariates is considered. If $n$ individuals are observed over a time period divided into $I(n)$ intervals, it is assumed that $\lambda_j(t)$, the hazard rate function of the time to failure of the individual $j$, is constant and equal to $\lambda_{ij} > 0$ on the $i$th interval, and that the vector $\ell = \{\log \lambda_{ij}: j = 1, \ldots, n; i = 1, \ldots, I(n)\}$ lies in a linear subspace. The maximum likelihood estimate $\hat{\ell}$ of $\ell$ provides a simultaneous estimate of the underlying hazard rate function, and of the effects of the covariates. Maximum likelihood equations and conditions for existence of $\hat{\ell}$ are given. The asymptotic properties of linear functionals of $\hat{\ell}$ are studied in the general case where the true hazard rate function $\lambda_0(t)$ is not a step function, and $I(n)$ increases without bound as the maximum interval length decreases. In comparison with recent work on regression analysis of survival data, the asymptotic results are obtained under more relaxed conditions on the regression variables.

Journal ArticleDOI
TL;DR: In this article, the authors considered nonparametric mixtures of exponential and Weibull (fixed shape) distributions as possible models for a lifetime distribution, and the maximum likelihood estimate of the mixing distribution was investigated and found to be supported on a finite number of points.
Abstract: Arbitrary nonparametric mixtures of exponential and Weibull (fixed shape) distributions are considered as possible models for a lifetime distribution. A characterization of such distributions is given by the well-known characterization of Laplace transforms. The maximum likelihood estimate of the mixing distribution is investigated and found to be supported on a finite number of points. It is shown to be unique and weakly convergent to the true mixing measure with probability one. A practical algorithm for computing the maximum likelihood estimate is described. Its performance is briefly discussed and some illustrative examples given.

Journal ArticleDOI
TL;DR: In this paper, a central limit theorem for the sample covariances of a linear process is proved for the parameter estimation of a fitted spectral model, which does not necessarily include the true spectral density of the linear process.
Abstract: A central limit theorem is proved for the sample covariances of a linear process. The sufficient conditions for the theorem are described by more natural ones than usual. We apply this theorem to the parameter estimation of a fitted spectral model, which does not necessarily include the true spectral density of the linear process. We also deal with estimation problems for an autoregressive signal plus white noise. A general result is given for efficiency of Newton-Raphson iterations of the likelihood equation.

Journal ArticleDOI
TL;DR: In this paper, the authors proved that the rate of almost sure convergence of autocovariances to their true values is uniform in the lag up to some order $P(T), increasing with T. The key assumption is that the process is stationary and the best linear predictor is the best predictor.
Abstract: Theorems are proved relating to the rate of almost sure convergence of autocovariances, and hence autocorrelations, to their true values. These rates are uniform in the lag up to some order $P(T)$, increasing with $T$. The key assumption is that the process is stationary and the best linear predictor is the best predictor. In particular for an ARMA process and $P(T) = O\{(\ln T)^a\}, a < \infty$, the rate is $O\{(\ln \ln T/T)^{1/2}\}$. These results are used to discuss autoregressions and the use of autoregressions to approximate the structure of a more general process by increasing the order of the autoregression with $T$.

Journal ArticleDOI
TL;DR: In this article, a simple iterative technique is proposed and shown to converge to the correct solution for solving the isotonic regression problem in more than one dimension, which is difficult to implement because of the large number of lower sets present or because they involve search techniques which require a significant amount of checking and readjustment.
Abstract: Algorithms for solving the isotonic regression problem in more than one dimension are difficult to implement because of the large number of lower sets present or because they involve search techniques which require a significant amount of checking and readjustment. Here a new algorithm for solving this problem based on a simple iterative technique is proposed and shown to converge to the correct solution.

Journal ArticleDOI
TL;DR: In this paper, an autoregressive process is defined for seasonal time series, and a methodology for testing the hypothesis is presented and percentiles for test statistics are obtained and extensions for multiplicative processes, for higher order processes, and for processes containing deterministic trend and seasonal components.
Abstract: Let $Y_t$ be an autoregressive process satisfying $Y_t = \alpha_1 Y_{t - 1} + \alpha_2 Y_{t - d} + \alpha_3 Y_{t - d - 1} + e_t$, where $\{e_t\}^\infty_{t = 0}$ is a sequence of $\operatorname{iid}(0, \sigma^2)$ random variables and $d \geq 2$. Such processes have been used as parametric models for seasonal time series. Typical values of $d$ are 2, 4, and 12 corresponding to time series observed semi-annually, quarterly, and monthly, respectively. If $\alpha_1 = 1, \alpha_2 = 1, \alpha_3 = - 1$ then $\Delta_1\Delta_d Y_t = e_t$, where $\Delta_r Y_t$ denotes $Y_t - Y_{t - r}$. If $(\alpha_1, \alpha_2, \alpha_3) = (1, 1, - 1)$ the process is nonstationary and the theory for stationary autoregressive processes does not apply. A methodology for testing the hypothesis $(\alpha_1, \alpha_2, \alpha_3) = (1, 1, - 1)$ is presented and percentiles for test statistics are obtained. Extensions are presented for multiplicative processes, for higher order processes, and for processes containing deterministic trend and seasonal components.

Journal ArticleDOI
TL;DR: In this paper, a method of estimating the endpoint of a distribution when only limited information is available about the behaviour of the distribution in the neighbourhood of the endpoint is proposed, which improves on earlier estimators based on only a bounded number of extremes.
Abstract: We propose a method of estimating the endpoint, $\theta$, of a distribution when only limited information is available about the behaviour of the distribution in the neighbourhood of $\theta$. By using increasing numbers of extreme order statistics we obtain an estimator which improves on earlier estimators based on only a bounded number of extremes. In a certain particular model our estimator is equal to a maximum likelihood estimator, but it is robust against departures from this model.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the first-order Edgeworth expansion of a functionals with skew-symmetric influence curve is asymptotically minimax.
Abstract: Let $X_1, X_2, \cdots, X_n$ be i.i.d random variables with d.f. $F$. Suppose the $\{\hat{T}_n = \hat{T}_n(X_1, X_2, \cdots, X_n); n \geq 1\}$ are real-valued statistics and the $\{T_n(F); n \geq 1\}$ are centering functionals such that the asymptotic distribution of $n^{1/2}\{\hat{T}_n - T_n(F)\}$ is normal with mean zero. Let $H_n(x, F)$ be the exact d.f. of $n^{1/2}\{\hat{T}_n - T_n(F)\}$. The problem is to estimate $H_n(x, F)$ or functionals of $H_n(x, F)$. Under regularity assumptions, it is shown that the bootstrap estimate $H_n(x, \hat{F}_n)$, where $\hat{F}_n$ is the sample d.f., is asymptotically minimax; the loss function is any bounded monotone increasing function of a certain norm on the scaled difference $n^{1/2}\{H_n(x, \hat{F}_n) - H_n(x, F)\}$. The estimated first-order Edgeworth expansion of $H_n(x, F)$ is also asymptotically minimax and is equivalent to $H_n(x, \hat{F}_n)$ up to terms of order $n^{- 1/2}$. On the other hand, the straightforward normal approximation with estimated variance is usually not asymptotically minimax, because of bias. The results for estimating functionals of $H_n(x, F)$ are similar, with one notable difference: the analysis for functionals with skew-symmetric influence curve, such as the mean of $H_n(x, F)$, involves second-order Edgeworth expansions and rate of convergence $n^{-1}$.

Journal ArticleDOI
TL;DR: In this paper, the authors show that Johnson's result can be generalized to include asymmetric Dirichlet priors and those finitely exchangeable sequences with linear posterior expectation of success.
Abstract: : How do Bayesians justify using conjugate priors on grounds other than mathematical convenience? In the 1920's the Cambridge philosopher William Ernest Johnson in effect characterized symmetric Dirichlet priors for multinomial sampling in terms of a natural and easily assessed subjective condition. Johnson's proof can be generalized to include asymmetric Dirichlet priors and those finitely exchangeable sequences with linear posterior expectation of success. Some interesting open problems that Johnson's result raises, and its historical and philosophical background are also discussed.

Journal ArticleDOI
TL;DR: The role of functional models and their associated fiducial analysis is explored in this paper, in an attempt to uncover a general theory of fiducual inference, and the role of the functional model is explored.
Abstract: The role of functional models and their associated fiducial analysis is explored in an attempt to uncover a general theory of fiducial inference.

Journal ArticleDOI
TL;DR: In this paper, the authors discuss five questions concerning maximum likelihood estimation: What kind of theory is maximum likelihood, how it is used in practice, to what extent can this theory and practice be justified from a decision-theoretic viewpoint, what are maximum likelihood's principal virtues and defects, and what improvements have been suggested by decision theory.
Abstract: This paper discusses five questions concerning maximum likelihood estimation: What kind of theory is maximum likelihood? How is maximum likelihood used in practice? To what extent can this theory and practice be justified from a decision-theoretic viewpoint? What are maximum likelihood's principal virtues and defects? What improvements have been suggested by decision theory?

Journal ArticleDOI
TL;DR: In this paper, generalizations of the arc sine laws are shown to provide insight into the operating characteristics of certain techniques for selecting models to fit a given data set, when the available models are nested.
Abstract: Generalizations of the arc sine laws are shown to provide insight into the operating characteristics of certain techniques for selecting models to fit a given data set, when the available models are nested. As a corollary, one sees that a popular technique may be expected to include about one superfluous parameter, even if the sample size is large.

Journal ArticleDOI
TL;DR: In this article, a general technique of improving upon the uniform minimum variance unbiased estimator (UMVUE) under possibly weighted squared error loss functions is developed, where improved estimators can be constructed by solving a difference inequality.
Abstract: Assume that $X_1, \cdots, X_p$ are independent random observations having discrete exponential densities $\rho_i(\theta_i)t_i(x_i)\theta^{xi}_i, i = 1, \cdots, p$ respectively. A general technique of improving upon the uniform minimum variance unbiased estimator (UMVUE) of $(\theta_1, \cdots, \theta_p)$ is developed under possibly weighted squared error loss functions. It is shown that improved estimators can be constructed by solving a difference inequality. Typical difference inequalities of a fairly general type are presented and solved. When specialized to Poisson and Negative binomial cases, broad classes of estimators are given that dominate the UMVUE. These results unify many known results in this rapidly diverging field, and some of them are new (especially those related to Negative Binomial distributions). Improved estimators are also obtained for the problems in which some of the observations are from Poisson families and some from Negative Binomial families. For sum of squared errors loss, estimators which dominate the UMVUE in the discrete exponential families are also given explicitly.

Journal ArticleDOI
TL;DR: For the problem of estimating a $p$-variate normal mean, the existence of confidence procedures which dominate the usual one, a sphere centered at the observations, has long been known, but no explicit procedure has yet been shown to dominate as discussed by the authors.
Abstract: For the problem of estimating a $p$-variate normal mean, the existence of confidence procedures which dominate the usual one, a sphere centered at the observations, has long been known. However, no explicit procedure has yet been shown to dominate. For $p \geq 4$, we prove that if the usual confidence sphere is recentered at the positive-part James Stein estimator, then the resulting confidence set has uniformly higher coverage probability, and hence is a minimax confidence set. Moreover, the increase in coverage probability can be quite substantial. Numerical evidence is presented to support this claim.

Journal ArticleDOI
TL;DR: In this article, the joint confidence and likelihood regions for the parameters in nonlinear regression models can be defined using the geometric concepts of sample space and solution locus, and it is shown that these inference regions correspond to ellipsoids on the tangent plane at the least squares point.
Abstract: Joint confidence and likelihood regions for the parameters in nonlinear regression models can be defined using the geometric concepts of sample space and solution locus. Using a quadratic approximation to the solution locus, instead of the usual linear approximation, it is shown that these inference regions correspond to ellipsoids on the tangent plane at the least squares point. Accurate approximate inference regions can be obtained by mapping these ellipsoids into the parameter space, and measures of the effect of intrinsic nonlinearity on inference can be based on the difference between the tangent plane ellipsoids and the sphere which would be obtained using a linear approximation.