scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 1998"


Journal ArticleDOI
TL;DR: It is shown that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error.
Abstract: One of the surprising recurring phenomena observed in experiments with boosting is that the test error of the generated classifier usually does not increase as its size becomes very large, and often is observed to decrease even after the training error reaches zero. In this paper, we show that this phenomenon is related to the distribution of margins of the training examples with respect to the generated voting classification rule, where the margin of an example is simply the difference between the number of correct votes and the maximum number of votes received by any incorrect label. We show that techniques used in the analysis of Vapnik's support vector classifiers and of neural networks with small weights can be applied to voting methods to relate the margin distribution to the test error. We also show theoretically and experimentally that boosting is especially effective at increasing the margins of the training examples. Finally, we compare our explanation to those based on the bias-variance decomposition.

2,257 citations


Journal ArticleDOI
TL;DR: In this article, the authors discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together, similar to the Bradley-Terry method for paired comparisons.
Abstract: We discuss a strategy for polychotomous classification that involves estimating class probabilities for each pair of classes, and then coupling the estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated data sets. Classifiers used include linear discriminants, nearest neighbors, adaptive nonlinear methods and the support vector machine.

1,569 citations


Journal ArticleDOI
TL;DR: A nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients is developed, andVariants of this method based on simple threshold nonlinear estimators are nearly minimax.
Abstract: We attempt to recover an unknown function from noisy, sampled data. Using orthonormal bases of compactly supported wavelets, we develop a nonlinear method which works in the wavelet domain by simple nonlinear shrinkage of the empirical wavelet coefficients. The shrinkage can be tuned to be nearly minimax over any member of a wide range of Triebel- and Besov-type smoothness constraints and asymptotically mini-max over Besov bodies with $p \leq q$. Linear estimates cannot achieve even the minimax rates over Triebel and Besov classes with $p<2$, so the method can significantly outperform every linear method (e.g., kernel, smoothing spline, sieve in a minimax sense). Variants of our method based on simple threshold nonlinear estimators are nearly minimax. Our method possesses the interpretation of spatial adaptivity; it reconstructs using a kernel which may vary in shape and bandwidth from point to point, depending on the data. Least favorable distributions for certain of the Triebel and Besov scales generate objects with sparse wavelet transforms. Many real objects have similarly sparse transforms, which suggests that these minimax results are relevant for practical problems. Sequels to this paper, which was first drafted in November 1990, discuss practical implementation, spatial adaptation properties, universal near minimaxity and applications to inverse problems.

1,066 citations


Journal ArticleDOI
TL;DR: Two arcing algorithms are explored, compared to each other and to bagging, and the definitions of bias and variance for a classifier as components of the test set error are introduced.
Abstract: Recent work has shown that combining multiple versions of unstable classifiers such as trees or neural nets results in reduced test set error. One of the more effective is bagging. Here, modified training sets are formed by resampling from the original training set, classifiers constructed using these training sets and then combined by voting. Freund and Schapire propose an algorithm the basis of which is to adaptively resample and combine (hence the acronym “arcing”) so that the weights in the resampling are increased for those cases most often misclassified and the combining is done by weighted voting. Arcing is more successful than bagging in test set error reduction. We explore two arcing algorithms, compare them to each other and to bagging, and try to understand how arcing works. We introduce the definitions of bias and variance for a classifier as components of the test set error. Unstable classifiers can have low bias on a large range of data sets. Their problem is high variance. Combining multiple versions either through bagging or arcing reduces variance significantly.

998 citations


Journal ArticleDOI
TL;DR: This work constructs Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic for examples including contingency tables, logistic regression, and spectral analysis of permutation data.
Abstract: We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include contingency tables, logistic regression, and spectral analysis of permutation data. The algorithms involve computations in polynomial rings using Grobner bases.

724 citations


Journal ArticleDOI
TL;DR: In this article, the authors derived the asymptotic distributions under more general conditions on the behavior of the distribution function near 0, and showed that these distributions are asymPT-normal.
Abstract: It is well known that $L_1$-estimators of regression parameters are asymptotically normal if the distribution function has a positive derivative at 0. In this paper, we derive the asymptotic distributions under more general conditions on the behavior of the distribution function near 0.

468 citations


Journal ArticleDOI
TL;DR: It is shown that under mild conditions the MLE is also asymptotically normal and proved that the observed information matrix is a consistent estimator of the Fisher information.
Abstract: Hidden Markov models (HMMs) have during the last decade become a widespread tool for modeling sequences of dependent random variables. Inference for such models is usually based on the maximum-likelihood estimator (MLE), and consistency of the MLE for general HMMs was recently proved by Leroux. In this paper we show that under mild conditions the MLE is also asymptotically normal and prove that the observed information matrix is a consistent estimator of the Fisher information.

332 citations


Journal ArticleDOI
TL;DR: In this article, the authors extend this result to the correlated gamma-frailty model, and they allow for covariates, which is essentially the same as the classical proof for the maximum likelihood estimator.
Abstract: The frailty model is a generalization of Cox's proportional hazard model, where a shared unobserved quantity in the intensity induces a positive correlation among the survival times. Murphy showed consistency and asymptotic normality of the nonparametric maximum likelihood estimator (NPMLE) for the shared gamma-frailty model without covariates. In this paper we extend this result to the correlated gamma-frailty model, and we allow for covariates. We discuss the definition of the nonparametric likelihood function in terms of a classical proof of consistency for the maximum likelihood estimator, which goes back to Wald. Our proof of the consistency for the NPMLE is essentially the same as the classical proof for the maximum likelihood estimator. A new central limit theorem for processes of bounded variation is given. Furthermore, we prove that a consistent estimator for the asymptotic variance of the NPMLE is given by the inverse of a discrete observed information matrix.

279 citations


Journal ArticleDOI
TL;DR: In this paper, the local behavior of regression splines is studied and explicit expressions for the asymptotic pointwise bias and variance of splines are obtained, leading to the construction of approximate confidence intervals and confidence bands for the regression function.
Abstract: In this paper, we study the local behavior of regression splines. In particular, explicit expressions for the asymptotic pointwise bias and variance of regression splines are obtained. In addition, asymptotic normality for regression splines is established, leading to the construction of approximate confidence intervals and confidence bands for the regression function.

279 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the sample ACVF and ACF of a general stationary sequence under a weak mixing condition and in the case that the marginal distributions are regularly varying.
Abstract: We study the sample ACVF and ACF of a general stationary sequence under a weak mixing condition and in the case that the marginal distributions are regularly varying. This includes linear and bilinear processes with regularly varying noise and ARCH processes, their squares and absolute values. We show that the distributional limits of the sample ACF can be random, provided that the Variance of the marginal distribution is infinite and the process is nonlinear. This is in contrast to infinite variance linear processes. If the process has a finite second but infinite fourth moment, then the sample ACP is consistent with scaling rates that grow at a slower rate than the standard root n. Consequently, asymptotic confidence bands are wider than those constructed in the classical theory. We demonstrate the theory in full detail far an ARCH(1) process.

261 citations


Journal ArticleDOI
TL;DR: In this paper, new probability inequalities involving the ordered components of an $MTP_2$ random vector are derived, which provide an analytical proof of an important conjecture in the field of multiple hypothesis testing.
Abstract: Some new probability inequalities involving the ordered components of an $MTP_2$ random vector are derived, which provide an analytical proof of an important conjecture in the field of multiple hypothesis testing. This conjecture has been mostly validated so far using simulation.

Journal ArticleDOI
TL;DR: In this paper, the covariance operator of a locally stationary process has approximate eigenvectors that are local cosine functions, and an adaptive covariance estimation is calculated by searching first for a "best" locally cosine basis which approximates the covariances by a band or a diagonal matrix.
Abstract: It is shown that the covariance operator of a locally stationary process has approximate eigenvectors that are local cosine functions. We model locally stationary processes with pseudo-differential operators that are time-varying convolutions. An adaptive covariance estimation is calculated by searching first for a “best” local cosine basis which approximates the covariance by a band or a diagonal matrix. The estimation is obtained from regularized versions of the diagonal coefficients in the best basis.

Journal ArticleDOI
TL;DR: In this article, an asymptotic treatment of the Linton and Nielsen estimator of additive regression models is proposed, which is based on weighted marginal integration for local linear fits and has the following advantages: (i) with an appropriate choice of the weight function, the additive components can be efficiently estimated: an additive component can be estimated with the same bias and variance as if the other components were known.
Abstract: Additive regression models have turned out to be a useful statistical tool in analyses of high-dimensional data sets. Recently, an estimator of additive components has been introduced by Linton and Nielsen which is based on marginal integration. The explicit definition of this estimator makes possible a fast computation and allows an asymptotic distribution theory. In this paper an asymptotic treatment of this estimate is offered for several models. A modification of this procedure is introduced. We consider weighted marginal integration for local linear fits and we show that this estimate has the following advantages. (i) With an appropriate choice of the weight function, the additive components can be efficiently estimated: An additive component can be estimated with the same asymptotic bias and variance as if the other components were known. (ii) Application of local linear fits reduces the design related bias.

Journal ArticleDOI
TL;DR: In this article, the authors propose to replace the residual cusum process by its innovation martingale, which is distribution free under composite null models and may be readily performed.
Abstract: In the context of regression analysis it is known that the residual cusum process may serve as a basis for the construction of various omnibus, smooth and directional goodness-of-fit tests. Since a deeper analysis requires the decomposition of the cusums into their principal components and this is difficult to obtain, we propose to replace this process by its innovation martingale. It turns out that the resulting tests are (asymptotically) distribution free under composite null models and may be readily performed. A simulation study is included which indicates that the distributional approximations already work for small to moderate sample sizes.

Journal ArticleDOI
TL;DR: In this paper, a general theory on convergence of the least square projection estimate in multiple regression is developed and applied to the functional ANOVA model, where the regression function is modeled as a specified sum of a constant term, main effects functions of one variable and selected interaction terms functions of two or more variables.
Abstract: A general theory on rates of convergence of the least-squares projection estimate in multiple regression is developed. The theory is applied to the functional ANOVA model, where the multivariate regression function Ž is modeled as a specified sum of a constant term, main effects functions of .Ž one variable and selected interaction terms functions of two or more . variables . The least-squares projection is onto an approximating space constructed from arbitrary linear spaces of functions and their tensor products respecting the assumed ANOVA structure of the regression function. The linear spaces that serve as building blocks can be any of the ones commonly used in practice: polynomials, trigonometric polynomials, splines, wavelets and finite elements. The rate of convergence result that is obtained reinforces the intuition that low-order ANOVA modeling can achieve dimension reduction and thus overcome the curse of dimensionality. Moreover, the components of the projection estimate in an appropriately defined ANOVA decomposition provide consistent estimates of the corresponding components of the regression function. When the regression function does not satisfy the assumed ANOVA form, the projection estimate converges to its best approximation of that form.

Journal ArticleDOI
TL;DR: It is argued that block thresholding has a number of advantages, including that it produces adaptive estimators which achieve minimax-optimal convergence rates without the logarithmic penalty that is sometimes associated with term-by-term thresholding.
Abstract: Motivated by recently developed threshold rules for wavelet estimators, we suggest threshold methods for general kernel density estimators, including those of classical Rosenblatt–Parzen type. Thresholding makes kernel methods competitive in terms of their adaptivity to a wide variety of aberrations in complex signals. It is argued that term-by-term thresholding does not always produce optimal performance, since individual coefficients cannot be estimated sufficiently accurately for reliable decisions to be made. Therefore, we suggest grouping coefficients into blocks and making simultaneous threshold decisions about all coefficients within a given block. It is argued that block thresholding has a number of advantages, including that it produces adaptive estimators which achieve minimax-optimal convergence rates without the logarithmic penalty that is sometimes associated with term-by-term thresholding. More than this, the convergence rates are achieved over large classes of functions with discontinuities, indeed with a number of discontinuities that diverges polynomially fast with sample size. These results are also established for block thresholded wavelet estimators, which, although they can be interpreted within the kernel framework, are often most conveniently constructed in a slightly different way.

Journal ArticleDOI
TL;DR: In this article, a state space model for long-range dependent data is developed and the exact likelihood function can be computed recursively in a finite number of steps using the Kalman filter, and an approximation to the likelihood function based on truncated state space equation is considered.
Abstract: This paper develops a state space modeling for long-range dependent data. Although a long-range dependent process has an infinite-dimensional state space representation, it is shown that by using the Kalman filter, the exact likelihood function can be computed recursively in a finite number of steps. Furthermore, an approximation to the likelihood function based on the truncated state space equation is considered. Asymptotic properties of these approximate maximum likelihood estimates are established for a class of long-range dependent models, namely, the fractional autoregressive moving average models. Simulation studies show rapid converging properties of the approximate maximum likelihood approach.

Journal ArticleDOI
TL;DR: The paper studies the construction of confidence values and examines to what extent they approximate frequentist p-values and Bayesian a posteriori probabilities, and derives more accurate confidence levels using both frequentist and objective Bayesian approaches.
Abstract: In the problem of regions, we wish to know which one of a discrete set of possibilities applies to a continuous parameter vector. This problem arises in the following way: we compute a descriptive statistic from a set of data, notice an interesting feature and wish to assign a confidence level to that feature. For example, we compute a density estimate and notice that the estimate is bimodal. What confidence can we assign to bimodality? A natural way to measure confidence is via the bootstrap: we compute our descriptive statistic on a large number of bootstrap data sets and record the proportion of times that the feature appears. This seems like a plausible measure of confidence for the feature. The paper studies the construction of such confidence values and examines to what extent they approximate frequentist $p$-values and Bayesian a posteriori probabilities. We derive more accurate confidence levels using both frequentist and objective Bayesian approaches. The methods are illustrated with a number of examples, including polynomial model selection and estimating the number of modes of a density.

Journal ArticleDOI
TL;DR: In this article, a wavelet shrinkage procedure for nonequispaced samples is proposed and shown to be adaptive and near optimal for global and piecewise Holder classes, with a number of discontinuities that grows polynomially fast with the sample size.
Abstract: Standard wavelet shrinkage procedures for nonparametric regression are restricted to equispaced samples. There, data are transformed into empirical wavelet coefficients and threshold rules are applied to the coefficients. The estimators are obtained via the inverse transform of the denoised wavelet coefficients. In many applications, however, the samples are nonequispaced. It can be shown that these procedures would produce suboptimal estimators if they were applied directly to nonequispaced samples. We propose a wavelet shrinkage procedure for nonequispaced samples. We show that the estimate is adaptive and near optimal. For global estimation, the estimate is within a logarithmic factor of the minimax risk over a wide range of piecewise Holder classes, indeed with a number of discontinuities that grows polynomially fast with the sample size. For estimating a target function at a point, the estimate is optimally adaptive to unknown degree of smoothness within a constant. In addition, the estimate enjoys a smoothness property: if the target function is the zero function, then with probability tending to 1 the estimate is also the zero function.

Journal ArticleDOI
TL;DR: In this paper, the maximum likelihood estimator (MLE) for unstable autoregressive moving-average (ARMA) time series with the noise sequence satisfying a general auto-gressive heteroscedastic (GARCH) process was investigated and it was shown that the MLE satisfying the likelihood equation exists and is consistent.
Abstract: This paper investigates the maximum likelihood estimator (MLE) for unstable autoregressive moving-average (ARMA) time series with the noise sequence satisfying a general autoregressive heteroscedastic (GARCH) process. Under some mild conditions, it is shown that the MLE satisfying the likelihood equation exists and is consistent. The limiting distribution of the MLE is derived in a unified manner for all types of characteristic roots on or outside the unit circle and is expressed as a functional of stochastic integrals in terms of Brownian motions. For various types of unit roots, the limiting distribution of the MLE does not depend on the parameters in the moving-average component and hence, when the GARCH innovations reduce to usual white noises with a constant conditional variance, they are the same as those for the least squares estimators (LSE) for unstable autoregressive models given by Chan and Wei (1988). In the presence of the GARCH innovations, the limiting distribution will involve a sequence of independent bivariate Brownian motions with correlated components. These results are different from those already known in the literature and, in this case, the MLE of unit roots will be much more efficient than the ordinary least squares estimation.

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of nonparametric function estimation in the Gaussian white noise model, where the unknown function belongs to one of the Sobolev classes, with an unknown regularity parameter.
Abstract: The problem of nonparametric function estimation in the Gaussian white noise model is considered. It is assumed that the unknown function belongs to one of the Sobolev classes, with an unknown regularity parameter. Asymptotically exact adaptive estimators of functions are proposed on the scale of Sobolev classes, with respect to pointwise and sup-norm risks. It is shown that, unlike the case of $L_2$-risk, a loss of efficiency under adaptation is inevitable here. Bounds on the value of the loss of efficiency are obtained.

Journal ArticleDOI
TL;DR: In this article, the authors define the sharp change point problem as an extension of earlier problems in change point analysis related to nonparametric regression and give a systematic treatment of the correct rate of convergence for estimating the position of acusp of an arbitrary order.
Abstract: We define the sharp change point problem as an extension of earlier problems in change point analysis related to nonparametric regression. As particular cases, these include estimation of jump points in smooth curves. More generally, we give a systematic treatment of the correct rate of convergence for estimating the position of a “cusp”of an arbitrary order. We propose a test function for the local regularity of a signal that characterizes such a point as a global maximum. In the sample implementation of our method, from observations of the signal at discrete time positions $i/n, i =1 \ldots,n$, we use a wavelet transformation to approximate the position of the change point in the no-noise case. We study the noise effect, in the worst case scenario over a wide class of functions having a unique irregularity of “order $\alpha$” and propose a sequence of estimators which converge at the rate $n_{-1/(1+2\alpha)}$, as $n$ tends to infinity. Finally we analyze the likelihood ration of the problem and show that this is actually the minimaz rate of convertence. Examples of thresholding empirical wavelet coefficients to estimate the position of sharp change points are also presented.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the posterior distribution in a possibly incorrect parametric model concentrates in a strong sense on the set of pseudotrue parameters determined by the true distribution.
Abstract: We prove that the posterior distribution in a possibly incorrect parametric model a.s. concentrates in a strong sense on the set of pseudotrue parameters determined by the true distribution. As a consequence, we obtain in the case of a unique pseudotrue parameter the strong consistency of pseudo-Bayes estimators w.r.t. general loss functions. Further, we present a simple example based on normal distributions and having two different pseudotrue parameters, where pseudo-Bayes estimators have an essentially different asymptotic behavior than the pseudomaximum likelihood estimator. While the MLE is strongly consistent, the sequence of posterior means is strongly inconsistent and a.s. almost all its accumulation points are not pseudotrue. Finally, we give conditions under which a pseudo-Bayes estimator for a unique pseudotrue parameter has an asymptotic normal distribution.

Journal ArticleDOI
TL;DR: In this article, the authors propose a method of adaptive estimation of a regression function which is near optimal in the classical sense of the mean integrated error, and no assumptions are imposed on the design, number and size of jumps.
Abstract: We propose a method of adaptive estimation of a regression function which is near optimal in the classical sense of the mean integrated error. At the same time, the estimator is shown to be very sensitive to discontinuities or change-points of the underlying function $f$ or its derivatives. For instance, in the case of a jump of a regression function, beyond the intervals of length (in order) $n^{-1} \log n$ around change-points the quality of estimation is essentially the same as if locations of jumps were known. The method is fully adaptive and no assumptions are imposed on the design, number and size of jumps. The results are formulated in a nonasymptotic way and can therefore be applied for an arbitrary sample size.

Journal ArticleDOI
TL;DR: In this article, a nonparametric Bayes factor for testing the fit of a parametric model is proposed, which is based on a Gaussian process prior and an asymptotic consistency requirement.
Abstract: We develop a nonparametric Bayes factor for testing the fit of a parametric model. We begin with a nominal parametric family which we then embed into an infinite-dimensional exponential family. The new model then has a parametric and nonparametric component. We give the log density of the nonparametric component a Gaussian process prior. An asymptotic consistency requirement puts a restriction on the form of the prior, leaving us with a single hyperparameter for which we suggest a default value based on simulation experience. Then we construct a Bayes factor to test the nominal model versus the semiparametric alternative. Finally, we show that the Bayes factor is consistent. The proof of the consistency is based on approximating the model by a sequence of exponential families.

Journal ArticleDOI
TL;DR: In this article, a large convex class of linear estimators of the unknown signal plus white noise is chosen to minimize the estimated quadratic risk, which is done after orthogonal transformation of the data to a reasonable coordinate system.
Abstract: An unknown signal plus white noise is observed at $n$ discrete time points. Within a large convex class of linear estimators of $\xi$, we choose the estimator $\hat{\xi}$ that minimizes estimated quadratic risk. By construction, $\hat{\xi}$ is nonlinear. This estimation is done after orthogonal transformation of the data to a reasonable coordinate system. The procedure adaptively tapers the coefficients of the transformed data. If the class of candidate estimators satisfies a uniform entropy condition, then $\hat{\xi}$ is asymptotically minimax in Pinsker’s sense over certain ellipsoids in the parameter space and shares one such asymptotic minimax property with the James–Stein estimator. We describe computational algorithms for $\hat{\xi}$ and construct confidence sets for the unknown signal. These confidence sets are centered at $\hat{\xi}$, have correct asymptotic coverage probability and have relatively small risk as set-valued estimators of $\xi$.

Journal ArticleDOI
TL;DR: In this paper, a sieve bootstrap procedure for time series with a deterministic trend is proposed, which is based on nonparametric trend estimation and autoregressive approximation for some noise process.
Abstract: We propose a sieve bootstrap procedure for time series with a deterministic trend. The sieve for constructing the bootstrap is based on nonparametric trend estimation and autoregressive approximation for some noise process. The bootstrap scheme itself does i.i.d. resampling of estimated innovations from fitted autoregressive models. We show the validity and indicate second-order correctness of such sieve bootstrap approximations for the limiting distribution of nonparametric linear smoothers. The resampling can then be used to construct nonparametric confidence intervals for the underlying trend. In particular, we show asymptotic validity for constructing confidence bands which are simultaneously within a neighborhood of size in the order of the smoothing bandwidth. Our resampling procedure yields satisfactory results in a simulation study for finite sample sizes. We also apply it to the longest series of total ozone measurements from Arosa (Switzerland) and find a significant decreasing trend.

Journal ArticleDOI
TL;DR: In this paper, a new test is proposed for the comparison of two regression curves $f$ and $g$ which can be applied for power calculations, for constructing confidence regions and for testing precise hypotheses of a weighted $L_2$ distance between the regression curves.
Abstract: A new test is proposed for the comparison of two regression curves $f$ and $g$. We prove an asymptotic normal law under fixed alternatives which can be applied for power calculations, for constructing confidence regions and for testing precise hypotheses of a weighted $L_2$ distance between $f$ and $g$ . In particular, the problem of nonequal sample sizes is treated, which is related to a peculiar formula of the area between two step functions. These results are extended in various directions, such as the comparison of $k$ regression functions or the optimal allocation of the sample sizes when the total sample size is fixed. The proposed pivot statistic is not based on a nonparametric estimator of the regression curves and therefore does not require the specification of any smoothing parameter.

Journal ArticleDOI
TL;DR: In this article, the maximum likelihood prior density, if it exists, is defined as the density for which the corresponding Bayes estimate is asymptotically negligibly different from the original maximum likelihood estimate.
Abstract: Consider an estimate $\theta^*$ of a parameter $\theta$ based on repeated observations from a family of densities $f_\theta$ evaluated by the Kullback–Leibler loss function $K(\theta, \theta^*) = \int \log(f_\theta/f_{\theta^*})f_\theta$. The maximum likelihood prior density, if it exists, is the density for which the corresponding Bayes estimate is asymptotically negligibly different from the maximum likelihood estimate. The Bayes estimate corresponding to the maximum likelihood prior is identical to maximum likelihood for exponential families of densities. In predicting the next observation, the maximum likelihood prior produces a predictive distribution that is asymptotically at least as close, in expected truncated Kullback–Leibler distance, to the true density as the density indexed by the maximum likelihood estimate. It frequently happens in more than one dimension that maximum likelihood corresponds to no prior density, and in that case the maximum likelihood estimate is asymptotically inadmissible and may be improved upon by using the estimate corresponding to a least favorable prior. As in Brown, the asymptotic risk for an arbitrary estimate “near” maximum likelihood is given by an expression involving derivatives of the estimator and of the information matrix. Admissibility questions for these “near ML” estimates are determined by the existence of solutions to certain differential equations.

Journal ArticleDOI
TL;DR: In this paper, a strong approximation of a local polynomial estimator LPE in nonparametric autoregression by an (LPE) in a corresponding non-parametric regression model was derived.
Abstract: We derive a strong approximation of a local polynomial estimator LPE in nonparametric autoregression by an (LPE) in a corresponding nonparametric regression model. This generally suggests the application of regression-typical tools for statistical inference in nonparametric autoregressive models. It provides an important simplification for the boot-strap method to be used: It is enough to mimic the structure of a nonparametric regression model rather than to imitate the more complicated process structure in the autoregressive case. As an example we consider a simple wild bootstrap, which is used for the construction of simultaneous confidence bands and nonparametric supremum-type tests.