scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1985"


Journal ArticleDOI
TL;DR: The dip test as mentioned in this paper measures multimodality in a sample by the maximum difference, over all sample points, between the empirical distribution function, and the unimodal distribution function that minimizes that maximum difference.
Abstract: The dip test measures multimodality in a sample by the maximum difference, over all sample points, between the empirical distribution function, and the unimodal distribution function that minimizes that maximum difference. The uniform distribution is the asymptotically least favorable unimodal distribution, and the distribution of the test statistic is determined asymptotically and empirically when sampling from the uniform.

1,800 citations


Journal ArticleDOI
TL;DR: In this article, a variety of parametric and nonparametric models for the joint distribution of a pair of random variables are discussed in relation to flexibility, dimensionality, and interpretability.
Abstract: Let $(X, Y)$ be a pair of random variables such that $X = (X_1, \cdots, X_J)$ and let $f$ by a function that depends on the joint distribution of $(X, Y).$ A variety of parametric and nonparametric models for $f$ are discussed in relation to flexibility, dimensionality, and interpretability. It is then supposed that each $X_j \in \lbrack 0, 1\rbrack,$ that $Y$ is real valued with mean $\mu$ and finite variance, and that $f$ is the regression function of $Y$ on $X.$ Let $f^\ast,$ of the form $f^\ast(x_1, \cdots, x_J) = \mu + f^\ast_1(x_1) + \cdots + f^\ast_J(x_J),$ be chosen subject to the constraints $Ef^\ast_j = 0$ for $1 \leq j \leq J$ to minimize $E\lbrack(f(X) - f^\ast(X))^2\rbrack.$ Then $f^\ast$ is the closest additive approximation to $f,$ and $f^\ast = f$ if $f$ itself is additive. Spline estimates of $f^\ast_j$ and its derivatives are considered based on a random sample from the distribution of $(X, Y).$ Under a common smoothness assumption on $f^\ast_j, 1 \leq j \leq J,$ and some mild auxiliary assumptions, these estimates achieve the same (optimal) rate of convergence for general $J$ as they do for $J = 1.$

1,239 citations


Journal ArticleDOI
TL;DR: A stopping rule that is a limit of Bayes rules is first derived and an almost minimax rule is presented; i.e. a stopping rule which satisfies E(N^\ast\mid u = \infty) = B.
Abstract: Suppose one is able to observe sequentially a series of independent observations $X_1, X_2, \cdots$ such that $X_1, X_2, \cdots, X_{ u-1}$ are iid distributed according to a known distribution $F_0$ and $X_ u, X_{ u+1}, \cdots$ are iid distributed according to a known distribution $F_1$. Assume that $ u$ is unknown and the problem is to raise an alarm as soon as possible after the distribution changes from $F_0$ to $F_1$. Formally, the problem is to find a stopping rule $N$ which in some sense minimizes $E(N - u\mid N\geq u)$ subject to a restriction $E(N\mid u = \infty) \geq B$. A stopping rule that is a limit of Bayes rules is first derived. Then an almost minimax rule is presented; i.e. a stopping rule $N^\ast$ is described which satisfies $E(N^\ast\mid u = \infty) = B$ for which \begin{equation*}\begin{split}\sup_{1\leq u < \infty}E(N^\ast - u\mid N^\ast \geq u) \\ - \inf_{\{\text{stopping rules} N|E(N| u=\infty)\geq B\}} \sup_{1\leq u < \infty}E(N - u \mid N \geq u) = o(1)\end{split}\end{equation*} where $o(1) \rightarrow 0$ as $B \rightarrow \infty$.

610 citations


Journal ArticleDOI
TL;DR: In this paper, the partially improper prior behind the smoothing spline model is used to obtain a generalization of the maximum likelihood (GML) estimate for smoothing parameter, then this estimate is compared with the generalized cross validation (GCV) estimate both analytically and by Monte Carlo methods.
Abstract: The partially improper prior behind the smoothing spline model is used to obtain a generalization of the maximum likelihood (GML) estimate for the smoothing parameter. Then this estimate is compared with the generalized cross validation (GCV) estimate both analytically and by Monte Carlo methods. The comparison is based on a predictive mean square error criteria. It is shown that if the true, unknown function being estimated is smooth in a sense to be defined then the GML estimate undersmooths relative to the GCV estimate and the predictive mean square error using the GML estimate goes to zero at a slower rate than the mean square error using the GCV estimate. If the true function is "rough" then the GCV and GML estimates have asymptotically similar behavior. A Monte Carlo experiment was designed to see if the asymptotic results in the smooth case were evident in small sample sizes. Mixed results were obtained for $n = 32$, GCV was somewhat better than GML for $n = 64$, and GCV was decidedly superior for $n = 128$. In the $n = 32$ case GCV was better for smaller $\sigma^2$ and the comparison close for larger $\sigma^2$. The theoretical results are shown to extend to the generalized spline smoothing model, which includes the estimate of functions given noisy values of various integrals of them.

528 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered the problem of estimating the distribution of a Gaussian process in a finite population with unknown distribution functions and derived the nonparametric maximum likelihood estimators (MLEs) of the MLEs.
Abstract: Let $\mathscr{P}$ be a finite population with $N \geq 1$ elements; for each $e \in \mathscr{P}$, let $X_e$ and $Y_e$ be independent, positive random variables with unknown distribution functions $F$ and $G$; and suppose that the pairs $(X_e, Y_e)$ are iid We consider the problem of estimating $F, G$, and $N$ when the data consist of those pairs $(X_e, Y_e)$ for which $e \in \mathscr{P}$ and $Y_e \leq X_e$ The nonparametric maximum likelihood estimators (MLEs) of $F$ and $G$ are described; and their asymptotic properties as $N \rightarrow \infty$ are derived It is shown that the MLEs are consistent against pairs $(F, G)$ for which $F$ and $G$ are continuous, $G^{-1}(0) \leq F^{-1}(0)$, and $G^{-1}(1) \leq F^{-1}(1) \sqrt N \times$ estimation error for $F$ converges in distribution to a Gaussian process if $\int^\infty_0 (1/G) dF < \infty$, but may fail to converge if this integral is infinite

474 citations


Journal ArticleDOI
TL;DR: In this paper, mild general conditions which assure weak or strong consistency or asymptotic normality are presented for categorical responses with a bounded range, and stochastic regressors also are treated.
Abstract: Generalized linear models are used for regression analysis in a number of cases, including categorical responses, where the classical assumptions are violated. The statistical analysis of such models is based on the asymptotic properties of the maximum likelihood estimator. We present mild general conditions which, respectively, assure weak or strong consistency or asymptotic normality. Most of the previous work has been concerned with natural link functions. In this case our normality condition, though obtained by a different approach, is closely related to a condition of Haberman (1977a). Examples show how the general conditions reduce to weak requirements for special exponential families. Further, for regressors with a compact range, sufficient conditions are given which do not involve the unknown parameter, and are therefore easy to check in practice. Responses with a bounded range, e.g. categorical responses, and stochastic regressors also are treated.

466 citations


Journal ArticleDOI
TL;DR: In this paper, a bandwidth-selection rule is formulated in terms of cross validation, and under mild assumptions on the kernel and the unknown regression function, it is seen that this rule is asymptotically optimal.
Abstract: Kernel estimators of an unknown multivariate regression function are investigated. A bandwidth-selection rule is considered, which can be formulated in terms of cross validation. Under mild assumptions on the kernel and the unknown regression function, it is seen that this rule is asymptotically optimal.

447 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of determining whether there exists a nonparametric maximum likelihood estimate (NPMLE) of an unknown distribution on the basis of samples from weighted versions of the distribution with known weight functions is treated.
Abstract: The following problem is treated: Given $s$ not-necessarily-random samples from an unknown distribution $F$, and assuming that we know the sampling rule of each sample, is it possible to combine the samples in order to estimate $F$, and if so what is the natural way of doing it? More formally, this translates to the problem of determining whether there exists a nonparametric maximum likelihood estimate (NPMLE) of $F$ on the basis of $s$ samples from weighted versions of $F$, with known weight functions, and if it exists, how to construct it? We give a simple necessary and sufficient condition, which can be checked graphically, for the existence and uniqueness of the NPMLE and, under this condition, we describe a simple method for constructing it. The method is numerically efficient and mathematically interesting because it reduces the problem to one of solving $s - 1$ nonlinear equations with $s - 1$ unknowns, the unique solution of which is easily obtained by the iterative, Gauss-Seidel type, scheme described in the paper. Extensions for the case where the weight functions are not completely specified and for censored samples, applications, numerical examples, and statistical properties of the NPMLE, are discussed. In particular, we prove under this condition that the NPMLE is a sufficient statistic for $F$. The technique has many potential applications, because it is not limited to the case where the sampled items are univariate. A FORTRAN program for the described algorithm is available from the author.

392 citations


Journal ArticleDOI
TL;DR: In this paper, the authors reformulated the maximum likelihood method of maximum likelihood in the case of a mixture of normal distributions into an optimization problem having a strongly consistent, global solution.
Abstract: The method of maximum likelihood leads to an ill-posed optimization problem in the case of a mixture of normal distributions. Estimation in the univariate case is reformulated using simple constraints into an optimization problem having a strongly consistent, global solution.

391 citations



Journal ArticleDOI
TL;DR: In this paper, the problem of estimating shape and scale parameters for a distribution with regularly varying tails is related to that of nonparametrically estimating a density at a fixed point, in that optimal construction of the estimators depends substantially upon unknown features of the distribution.
Abstract: The problem of estimating shape and scale parameters for a distribution with regularly varying tails is related to that of nonparametrically estimating a density at a fixed point, in that optimal construction of the estimators depends substantially upon unknown features of the distribution. We show how to overcome this problem by using adaptive methods. Our main results hold very generally, for a large class of adaptive estimators. Later we consider specific versions of adaptive estimators, and describe their performance both in theory and by means of simulation studies. We also examine a technique proposed by Hill (1975) for solving similar problems.

Journal ArticleDOI
TL;DR: In this article, a new estimate of the exponent of a distribution whose tail varies regularly at infinity is introduced, expressed as the convolution of a kernel with the logarithm of the quantile function, and includes as particular cases the estimates introduced by Hill and by De Haan.
Abstract: We introduce a new estimate of the exponent of a distribution whose tail varies regularly at infinity. This estimate is expressed as the convolution of a kernel with the logarithm of the quantile function, and includes as particular cases the estimates introduced by Hill and by De Haan. Under very weak conditions, we prove asymptotic normality, consistency and discuss the optimal choices of the kernel and of the bandwidth parameter.

Journal ArticleDOI
TL;DR: In this article, a uniform normal approximation for the distribution of the estimator of β is given, under which arbitrary linear combinations of β are asymptotically normal (when appropriately normalized).
Abstract: In a general linear model, $Y = X\beta + R$ with $Y$ and $R n$-dimensional, $X$ a $n \times p$ matrix, and $\beta p$-dimensional, let $\hat\beta$ be an $M$ estimator of $\beta$ satisfying $0 = \sum x_i\psi(y_i - x'_i\beta)$. Let $p \rightarrow \infty$ such that $(p \log n)^{3/2} /n \rightarrow 0$. Then $\max_i|x'_i(\hat{\beta} - \beta)| \rightarrow _P 0$, and it is possible to find a uniform normal approximation for the distribution of $\hat{\beta}$ under which arbitrary linear combinations $a'_n (\hat{\beta} - \beta)$ are asymptotically normal (when appropriately normalized) and $(\hat{\beta} - \beta)'(X'X)(\hat{\beta} - \beta)$ is approximately $\chi^2_p$.

Journal ArticleDOI
TL;DR: In this paper, Stein's general technique for improving upon the best invariant unbiased and minimax estimators of the normal covariance matrix is described, and several improved estimators are obtained by solving the differential inequality.
Abstract: Stein's general technique for improving upon the best invariant unbiased and minimax estimators of the normal covariance matrix is described. The technique is to obtain solutions to a certain differential inequality involving the eigenvalues of the sample covariance matrix. Several improved estimators are obtained by solving the differential inequality. These estimators shrink or expand the sample eigenvalues depending on their magnitude. A scale invariant, adaptive minimax estimator is also obtained.

Journal ArticleDOI
TL;DR: In this article, the authors introduce a bias-adjusted estimator and two estimators appropriate for normally distributed measurement errors -a functional maximum likelihood estimator, and an estimator which exploits the consequences of sufficiency.
Abstract: In a logistic regression model when covariates are subject to measurement error the naive estimator, obtained by regressing on the observed covariates, is asymptotically biased. We introduce a bias-adjusted estimator and two estimators appropriate for normally distributed measurement errors -a functional maximum likelihood estimator and an estimator which exploits the consequences of sufficiency. The four proposals are studied asymptotically under conditions which are appropriate when the measurement error is small. A small Monte Carlo study illustrates the superiority of the measurement-error estimators in certain situations.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce two nonparametric multivariate density estimators that are particularly suitable for application in interactive computing environments, which are statistically comparable to kernel methods and computationally comparable to histogram methods.
Abstract: We introduce two nonparametric multivariate density estimators that are particularly suitable for application in interactive computing environments. These estimators are statistically comparable to kernel methods and computationally comparable to histogram methods. Asymptotic theory of the estimators is presented and examples with univariate and simulated trivariate Gaussian data are illustrated.

Journal ArticleDOI
TL;DR: In this article, the authors make two points about the effect of the number of bootstrap simulations, B, on percentile-t bootstrap confidence intervals: coverage probability and distance of the simulated critical point form the true critical point derived with B = infinity.
Abstract: : The purpose of this document is to make two points about the effect of the number of bootstrap simulations, B, on percentile-t bootstrap confidence intervals. The first point concerns coverage probability; the second, distance of the simulated critical point form the true critical point derived with B=infinity. In both cases the author has in mind applications to smooth statistics, such as the Studentized mean of a sample drawn from a continuous distribution. He indicates the change that have to be made if the distribution of the statistic is not smooth. Additional keywords: Exponentive functions.

Journal ArticleDOI
TL;DR: In this article, the authors considered model restrictions, such as multiple eigenvalues in the covariance matrix, in designing the bootstrap algorithm and achieved the desired asymptotic levels.
Abstract: Bootstrap tests and confidence regions for functions of the population covariance matrix have the desired asymptotic levels, provided model restrictions, such as multiple eigenvalues in the covariance matrix, are taken into account in designing the bootstrap algorithm.

Journal ArticleDOI
TL;DR: In this article, the likelihood is defined for a state space model with incompletely specified initial conditions by transforming the data to eliminate the dependence on the unspecified conditions, and this approach is extended to obtain estimates of the state vectors and predictors and interpolators for missing observations.
Abstract: The likelihood is defined for a state space model with incompletely specified initial conditions by transforming the data to eliminate the dependence on the unspecified conditions. This approach is extended to obtain estimates of the state vectors and predictors and interpolators for missing observations. It is then shown that this method is equivalent to placing a diffuse prior distribution on the unspecified part of the initial state vector, and modified versions of the Kalman filter and smoothing algorithms are derived to give exact numerical procedures for diffuse initial conditions. The results are extended to continuous time models, including smoothing splines and continuous time autoregressive processes.

Journal ArticleDOI
TL;DR: In this article, it was shown that Hill's estimator for the exponent of regular variation is asymptotically normal if the number of extreme order statistics used to construct it tends to infinity appropriately with the sample size $n.$.
Abstract: It is shown that Hill's estimator (1975) for the exponent of regular variation is asymptotically normal if the number $k_n$ of extreme order statistics used to construct it tends to infinity appropriately with the sample size $n.$ As our main result, we derive a general condition which can be used to determine the optimal $k_n$ explicitly, provided that some prior knowledge is available on the underlying distribution function with regularly varying upper tail. This condition is simplified under appropriate assumptions and then applied to several examples.

Journal ArticleDOI
TL;DR: This paper proposed some alternative distributions to independence, to help interpret the chi-square test for independence in a two-way contingency table, which often rejects the independence hypothesis at an extremely small significance level, particularly when the sample size is large.
Abstract: The classical chi-square test for independence in a two-way contingency table often rejects the independence hypothesis at an extremely small significance level, particularly when the sample size is large. This paper proposes some alternative distributions to independence, to help interpret the $\chi^2$ statistic in such situations. The uniform alternative, in which every possible contingency table of the given dimension and sample size receives equal probability, leads to the volume test, as originally suggested in a regression context by H. Hotelling. Exponential family theory is used to generate a class of intermediate alternatives between independence and uniformity, leading to a random effects model for contingency tables.

Journal ArticleDOI
TL;DR: In this paper, the authors considered a nonparametric regression model where the zero mean errors are uncorrelated with common variance a2 and the response function f is assumed only to have a bounded square integrable qth derivative.
Abstract: Linear estimation is considered in nonparametric regression models of the form Yi = f (xi) + ei, xi E= (a, b), where the zero mean errors are uncorrelated with common variance a2 and the response function f is assumed only to have a bounded square integrable qth derivative. The linear estimator which minimizes the maximum mean squared error summed over the observation points is derived, and the exact minimax rate of convergence is obtained. For practical problems where bounds on 11 f2q) 11 and a2 may be unknown, generalized cross-validation is shown to give an adaptive estimator which achieves the minimax optimal rate under the additional assumption of normality. 1. The model. Consider the nonparametric regression model

Journal ArticleDOI
TL;DR: In this paper, a new approach to generalized cross validation based on Stein estimates and the associated unbiased risk estimates is developed, and the consistency results are obtained for the cross-validated (Steinized) estimates in the contexts of nearest neighbor nonparametric regression, model selection, ridge regression, and smoothing splines.
Abstract: This paper concerns the method of generalized cross validation (GCV), a promising way of choosing between linear estimates. Based on Stein estimates and the associated unbiased risk estimates (Stein, 1981), a new approach to GCV is developed. Many consistency results are obtained for the cross-validated (Steinized) estimates in the contexts of nearest-neighbor nonparametric regression, model selection, ridge regression, and smoothing splines. Moreover, the associated Stein's unbiased risk estimate is shown to be uniformly consistent in assessing the true loss (not the risk). Consistency properties are examined as well when the sampling error is unknown. Finally, we propose a variant of GCV to handle the case that the dimension of the raw data is known to be greater than that of their expected values.

Journal ArticleDOI
TL;DR: In this article, it was shown that the convergence rate of minimum distance estimators for a family of probability measures on a space with a sigma field is bounded by the distance between measures.
Abstract: Let $(\mathscr{X, A})$ be a space with a $\sigma$-field, $M = \{P_s; s \in \Theta\}$ be a family of probability measures on $\mathscr{A}$ with $\Theta$ arbitrary, $X_1, \cdots, X_n$ i.i.d. observations on $P_\theta.$ Define $\mu_n(A) = (1/n) \sum^n_{i = 1} I_A(X_i),$ the empirical measure indexed by $A \in \mathscr{A}.$ Assume $\Theta$ is totally bounded when metrized by the $L_1$ distance between measures. Robust minimum distance estimators $\hat{\theta}_n$ are constructed for $\theta$ and the resulting rate of convergence is shown naturally to depend on an entropy function for $\Theta$.

Journal ArticleDOI
TL;DR: In this article, an analog of Fisher's bound for asymptotic variances is obtained for minimax risk over a Sobolev smoothness class, based on applying a recent result on minimax filtering in Hilbert space.
Abstract: For nonparametric regression estimation on a bounded interval, optimal rates of decrease for integrated mean square error are known but not the best possible constants. A sharp result on such a constant, i.e., an analog of Fisher's bound for asymptotic variances is obtained for minimax risk over a Sobolev smoothness class. Normality of errors is assumed. The method is based on applying a recent result on minimax filtering in Hilbert space. A variant of spline smoothing is developed to deal with noncircular models.

Journal ArticleDOI
TL;DR: In this article, it is shown that any two sequences of forecasts that both meet this criterion must be in asymptotic agreement, and these agreed values can then be considered as correct objective probability forecasts for the particular sequence of outcome results obtained.
Abstract: Probability forecasts for a sequence of uncertain events may be compared with the outcomes of those events by means of a natural criterion of empirical validity, calibration. It is shown that any two sequences of forecasts which both meet this criterion must be in asymptotic agreement. These agreed values can then be considered as correct objective probability forecasts for the particular sequence of outcome results obtained. However, the objective forecasts vary with the extent of the information taken into account when they are formulated. We thus obtain a general theory of empirical probability, relative to an information base. This theory does not require that such probabilities be interpreted in terms of repeated trials of the same event. Some implications of this theory are discussed.

Journal ArticleDOI
TL;DR: In this article, a multivariate version of the Robbins-Monro procedure is proposed, which does not require a constant signum and converges to the unknown root at a rate n−1/2.
Abstract: Suppose that $f$ is a function from $\mathbb{R}^k$ to $\mathbb{R}^k$ and for some $\theta, f(\theta) = 0$. Initially $f$ is unknown, but for any $x$ in $\mathbb{R}^k$ we can observe a random vector $Y(x)$ with expectation $f(x)$. The unknown $\theta$ can be estimated recursively by Blum's (1954) multivariate version of the Robbins-Monro procedure. Blum's procedure requires the rather restrictive assumption that infimum of the inner product $(x - \theta)^tf(x)$ over any compact set not containing $\theta$ be positive. Thus at each $x, f(x)$ gives information about the direction towards $\theta$. Blum's recursion is $X_{n+1} = X_n - a_n Y_n$ where the conditional expectation of $Y_n$ given $X_1, \cdots, X_n$ is $f(X_n)$ and $a_n > 0$. Unlike Blum's method, the procedure introduced in this paper does not necessarily attempt to move in a direction that decreases $\|X_n - \theta\|$, at least not during the initial stage of the procedure. Rather, except for random fluctuations it moves in a direction which decreases $\|f\|^2$, and it may follow a circuitous route to $\theta$. Consequently, it does not require that $(x - \theta)^tf(x)$ have a constant signum. This new procedure is somewhat similar to the multivariate Kiefer-Wolfowitz procedure applied to $\|f\|^2$, but unlike the latter it converges to $\theta$ at rate $n^{-1/2}$. Deterministic root finding methods are briefly discussed. The method of this paper is a stochastic analog of the Newton-Raphson and Gauss-Newton techniques.

Journal ArticleDOI
TL;DR: In this paper, a general procedure for multistage modification of pivotal statistics is developed to improve the normal approximation, and explicit formulae are given for some basic cases involving independent random samples and samples drawn without replacement.
Abstract: A general procedure for multistage modification of pivotal statistics is developed to improve the normal approximation. Bootstrapping a first stage modified statistic is shown to be equivalent, in terms of asymptotic order, to the normal approximation of a second stage modification. Explicit formulae are given for some basic cases involving independent random samples and samples drawn without replacement. The Hodges-Lehmann deficiency is calculated to compare the regular $t$-statistic with its one-step correction.

Journal ArticleDOI
TL;DR: In this article, a class of linear serial rank statistics for the problem of testing white noise against alternatives of ARMA serial dependence is introduced, and the efficiency properties of the proposed statistics are investigated, and an explicit formulation of the asymptotically most efficient score-generating functions is provided.
Abstract: In this paper we introduce a class of linear serial rank statistics for the problem of testing white noise against alternatives of ARMA serial dependence. The asymptotic normality of the proposed statistics is established, both under the null as well as alternative hypotheses, using LeCam's notion of contiguity. The efficiency properties of the proposed statistics are investigated, and an explicit formulation of the asymptotically most efficient score-generating functions is provided. Finally, we study the asymptotic relative efficiency of the proposed procedures with respect to their normal theory counterparts based on sample autocorrelations.

Journal ArticleDOI
TL;DR: In this article, the problem of testing for a constant failure rate against alternatives with failure rates involving a single change point is considered and the asymptotic significance level for tests based on maximal score statistics are shown to involve the solution to a first passage time problem for an Ornstein-Uhlenbeck process.
Abstract: The problem of testing for a constant failure rate against alternatives with failure rates involving a single change-point is considered. The asymptotic significance level for tests based on maximal score statistics are shown to involve the solution to a first passage time problem for an Ornstein-Uhlenbeck process. An example illustrates the methodology.