scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 2000"


Journal ArticleDOI
TL;DR: This work shows that this seemingly mysterious phenomenon of boosting can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood, and develops more direct approximations and shows that they exhibit nearly identical results to boosting.
Abstract: Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.

6,598 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider the asymptotic behavior of regression estimators that minimize the residual sum of squares plus a penalty proportional to the value of the parameter ε > 0, and show that the limiting distributions can have positive probability mass at 0 under appropriate conditions.
Abstract: We consider the asymptotic behavior ofregression estimators that minimize the residual sum of squares plus a penalty proportional to $\sum|\beta_j|^{\gamma}$. for some $\gamma > 0$. These estimators include the Lasso as a special case when $\gamma = 1$. Under appropriate conditions, we show that the limiting distributions can have positive probability mass at 0 when the true value of the parameter is 0.We also consider asymptotics for “nearly singular” designs.

1,427 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of estimating the Gaussian sequence model with respect to a set of i.i.d. standard normal variables, and show that such estimators are adaptive over collections of hyperrectangles, ellipsoids, or Besov bodies.
Abstract: We consider the problem of estimating $\|s\|^2$ when $s$ belongs to some separable Hilbert space and one observes the Gaussian process $Y(t) = \langles, t\rangle + \sigmaL(t)$, for all $t \epsilon \mathbb{H}$,where $L$ is some Gaussian isonormal process. This framework allows us in particular to consider the classical “Gaussian sequence model” for which $\mathbb{H} = l_2(\mathbb{N}*)$ and $L(t) = \sum_{\lambda\geq1}t_{\lambda}\varepsilon_{\lambda}$, where $(\varepsilon_{\lambda})_{\lambda\geq1}$ is a sequence of i.i.d. standard normal variables. Our approach consists in considering some at most countable families of finite-dimensional linear subspaces of $\mathbb{H}$ (the models) and then using model selection via some conveniently penalized least squares criterion to build new estimators of $\|s\|^2$. We prove a general nonasymptotic risk bound which allows us to show that such penalized estimators are adaptive on a variety of collections of sets for the parameter $s$, depending on the family of models from which they are built.In particular, in the context of the Gaussian sequence model, a convenient choice of the family of models allows defining estimators which are adaptive over collections of hyperrectangles, ellipsoids, $l_p$-bodies or Besov bodies.We take special care to describe the conditions under which the penalized estimator is efficient when the level of noise $\sigma$ tends to zero. Our construction is an alternative to the one by Efroimovich and Low for hyperrectangles and provides new results otherwise.

1,336 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the asymptotic behavior of posterior distributions and Bayes estimators for infinite-dimensional statistical models and give general results on the rate of convergence of the posterior measure.
Abstract: We consider the asymptotic behavior of posterior distributions and Bayes estimators for infinite-dimensional statistical models. We give general results on the rate of convergence of the posterior measure. These are applied to several examples, including priors on finite sieves, log-spline models, Dirichlet processes and interval censoring.

865 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce several general structures for depth functions, classify many existing examples as special cases, and establish results on the possession, or lack thereof, of four key properties desirable for depth function in general: affine invariance, maximality at center, monotonicity relative to deepest point, and vanishing at infinity.
Abstract: Statistical depth functions are being formulated ad hoc with increasing popularity in nonparametric inference for multivariate data. Here we introduce several general structures for depth functions, classify many existing examples as special cases, and establish results on the possession, or lack thereof, of four key properties desirable for depth functions in general. Roughly speaking, these properties may be described as: affine invariance, maximality at center, monotonicity relative to deepest point, and vanishing at infinity. This provides a more systematic basis for selection of a depth function. In particular, from these and other considerations it is found that the halfspace depth behaves very well overall in comparison with various competitors.

858 citations


Journal ArticleDOI
TL;DR: Richardson and Green as mentioned in this paper presented a Markov Chain Monte Carlo (MCMC) approach to perform a Bayesian analysis of data from a finite mixture distribution with an unknown number of components.
Abstract: Richardson and Green present a method of performing a Bayesian analysis of data from a finite mixture distribution with an unknown number of components Their method is a Markov Chain Monte Carlo (MCMC) approach, which makes use of the “reversible jump” methodology described by Green We describe an alternative MCMC method which views the parameters of the model as a (marked) point process, extending methods suggested by Ripley to create a Markov birth-death process with an appropriate stationary distribution Our method is easy to implement, even in the case of data in more than one dimension, and we illustrate it on both univariate and bivariate data There appears to be considerable potential for applying these ideas to other contexts, as an alternative to more general reversible jump methods, and we conclude with a brief discussion of how this might be achieved

583 citations


Journal ArticleDOI
TL;DR: In this article, the asymptotic theory for the sample autocorrelations and extremes of a GARCH(I, 1) process is provided, and special attention is given to the case when the sum of the ARCH and GARCH parameters is close to 1.
Abstract: The asymptotic theory for the sample autocorrelations and extremes of a GARCH(I, 1) process is provided. Special attention is given to the case when the sum of the ARCH and GARCH parameters is close to 1, that is, when one is close to an infinite Variance marginal distribution. This situation has been observed for various financial log-return series and led to the introduction of the IGARCH model. In such a situation, the sample autocorrelations are unreliable estimators of their deterministic counterparts for the time series and its absolute values, and the sample autocorrelations of the squared time series have nondegenerate limit distributions. We discuss the consequences for a foreign exchange rate series.

426 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that an alternative method of plotting Hill estimator values is more revealing than the standard method unless the underlying data comes from a Pareto distribution, which is not the case in this paper.
Abstract: An abundance of high quality data sets requiring heavy tailed models necessitates reliable methods of estimating the shape parameter governing the degree of tail heaviness. The Hill estimator is a popular method for doing this but its practical use is encumbered by several difficulties. We show that an alternative method of plotting Hill estimator values is more revealing than the standard method unless the underlying data comes from a Pareto distribution.

253 citations


Journal ArticleDOI
TL;DR: In this paper, the authors study the scale space surface from a statistical viewpoint and provide new insights into nonparametric smoothing procedures and yields useful techniques for statistical exploration of features in the data.
Abstract: Scale space theory from computer vision leads to an interesting and novel approach to nonparametric curve estimation. The family of smooth curve estimates indexed by the smoothing parameter can be represented as a surface called the scale space surface. The smoothing parameter here plays the same role as that played by the scale of resolution in a visual system. In this paper, we study in detail various features of that surface from a statistical viewpoint. Weak convergence of the empirical scale space surface to its theoretical counterpart and some related asymptotic results have been established under appropriate regularity conditions. Our theoretical analysis provides new insights into nonparametric smoothing procedures and yields useful techniques for statistical exploration of features in the data. In particular, we have used the scale space approach for the development of an effective exploratory data analytic tool called SiZer.

247 citations


Journal Article
TL;DR: In this article, the authors show that boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion, and they develop more direct approximations and show that they exhibit nearly identical results to boosting.
Abstract: Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. We show that this seemingly mysterious phenomenon can be understood in terms of well-known statistical principles, namely additive modeling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modeling on the logistic scale using maximum Bernoulli likelihood as a criterion. We develop more direct approximations and show that they exhibit nearly identical results to boosting. Direct multiclass generalizations based on multinomial likelihood are derived that exhibit performance comparable to other recently proposed multiclass generalizations of boosting in most situations, and far superior in some. We suggest a minor modification to boosting that can reduce computation, often by factors of 10 to 50. Finally, we apply these insights to produce an alternative formulation of boosting decision trees. This approach, based on best-first truncated tree induction, often leads to better performance, and can provide interpretable descriptions of the aggregate decision rule. It is also much faster computationally, making it more suitable to large-scale data mining applications.

221 citations


Journal ArticleDOI
TL;DR: In this paper, a class of estimators based on local polynomial regression is proposed, which are weighted linear combinations of study variables, in which the weights are calibrated to known control totals, but the assumptions on the superpopulation model are considerably weaker.
Abstract: Estimation of finite population totals in the presence of auxiliary information is considered. A class of estimators based on local polynomial regression is proposed. Like generalized regression estimators, these estimators are weighted linear combinations of study variables, in which the weights are calibrated to known control totals, but the assumptions on the superpopulation model are considerably weaker. The estimators are shown to be asymptotically design-unbiased and consistent under mild assumptions. A variance approximation based on Taylor linearization is suggested and shown to be consistent for the design mean squared error of the estimators. The estimators are robust in the sense of asymptotically attaining the Godambe–Joshi lower bound to the anticipated variance. Simulation experiments indicate that the estimators are more efficient than regression estimators when the model regression function is incorrectly specified, while being approximately as efficient when the parametric specification is correct.

Journal ArticleDOI
TL;DR: In this article, a new approximation to the Gaussian likelihood of a multivariate locally stationary process is introduced, based on an approximation of the inverse of the covariance matrix of such processes.
Abstract: A new approximation to the Gaussian likelihood of a multivariate locally stationary process is introduced. It is based on an approximation of the inverse of the covariance matrix of such processes. The new quasi-likelihood is a generalisation of the classical Whittle-likelihood for stationary processes. For parametric models asymptotic normality and efficiency of the resulting estimator are proved. Since the likelihood has a special local structure it can be used for nonparametric inference as well. This is briefly sketched for different estimates.

Journal ArticleDOI
TL;DR: This paper addresses the following aggregation problem: given M functions f 1, ..., f M, and proposes algorithms which provide approximations of f * with expected L 2 accuracy O(N -1/4 In 1/4 M), and shows that this approximation rate cannot be significantly improved.
Abstract: We consider the problem of estimating an unknown function $f$ from $N$ noisy observations on a random grid. In this paper we address the following aggregation problem: given $M$ functions $f_1,\dots, f_M$, find an “aggregated ”estimator which approximates $f$ nearly as well as the best convex combination $f^*$ of $f_1,\dots, f_M$. We propose algorithms which provide approximations of $f^*$ with expected $L_2$ accuracy $O(N^{-1/4}\ln^{1/4} M$. We show that this approximation rate cannot be significantly improved. We discuss two specific applications: nonparametric prediction for a dynamic system with output nonlinearity and reconstruction in the Jones – Barron class.

Journal ArticleDOI
TL;DR: The authors showed that the BIC estimator fails to be consistent for the uniformly distributed iid process, which is consistent with a strong ratio-typicality result for Markov sample paths.
Abstract: The Bayesian Information Criterion (BIC) estimates the order of a Markov chain (with finite alphabet $A$) from observation of a sample path $x_1, x_2,\dots, x_n$, as that value $k = \hat{k}$ that minimizes the sum of the negative logarithm of the $k$th order maximum likelihood and the penalty term $\frac{|A|^k(|A|-1)}{2}\log n$ We show that $\hat{k}$ equals the correct order of the chain, eventually almost surely as $n \rightarrow \infty$, thereby strengthening earlier consistency results that assumed an apriori bound on the order A key tool is a strong ratio-typicality result for Markov sample pathsWe also show that the Bayesian estimator or minimum description length estimator, of which the BIC estimator is regarded as an approximation, fails to be consistent for the uniformly distributed iid process

Journal ArticleDOI
TL;DR: In this paper, two estimators of the mean function of a counting process based on "panel count data" are studied. And the authors show that the estimator proposed by Sun and Kalbfleisch can be viewed as a pseudo-maximum likelihood estimator when a nonhomogeneous Poisson process model is assumed for the counting process.
Abstract: We study two estimators of the mean function of a countingprocess based on “panel count data.” The setting for “panel count data” is one in which $n$ independent subjects, each with a counting process with common mean function, are observed at several possibly different times duringa study. Following a model proposed by Schick and Yu, we allow the number of observation times, and the observation times themselves, to be random variables. Our goal is to estimate the mean function of the counting process. We show that the estimator of the mean function proposed by Sun and Kalbfleisch can be viewed as a pseudo-maximum likelihood estimator when a non-homogeneous Poisson process model is assumed for the counting process. We establish consistency of both the nonparametric pseudo maximum likelihood estimator of Sun and Kalbfleisch and the full maximum likeli- hood estimator, even if the underlying counting process is not a Poisson process.We also derive the asymptotic distribution of both estimators at a fixed time $t$, and compare the resulting theoretical relative efficiency with finite sample relative efficiency by way of a limited Monte-Carlo study.

Journal ArticleDOI
TL;DR: In this paper, it was shown that the divergence of the maximum likelihood estimator provides a useful measure of the effective dimension of the model, and that the expected mean squared error of the estimator and the expected residual sum of squares are equal.
Abstract: For the problem of estimating a regression function, $\mu$ say, subject to shape constraints, like monotonicity or convexity, it is argued that the divergence of the maximum likelihood estimator provides a useful measure of the effective dimension of the model. Inequalities are derived for the expected mean squared error of the maximum likelihood estimator and the expected residual sum of squares. These generalize equalities from the case of linear regression. As an application, it is shown that the maximum likelihood estimator of the error variance $\sigma^2$ is asymptotically normal with mean $\sigma^2$ and variance $2\sigma_2/n$. For monotone regression, it is shown that the maximum likelihood estimator of $\mu$ attains the optimal rate of convergence, and a bias correction to the maximum likelihood estimator of $\sigma^2$ is derived.

Journal ArticleDOI
TL;DR: In this article, structural properties of the regions enclosed by contours, such as affine equivariance, nestedness, connectedness and compactness, and almost sure convergence results for sample depth contours are established.
Abstract: Statistical depth functions have become increasingly used in nonparametric inference for multivariate data. Here the contours of such functions are studied. Structural properties of the regions enclosed by contours, such as affine equivariance, nestedness, connectedness and compactness, and almost sure convergence results for sample depth contours, are established. Also, specialized results are established for some popular depth functions, includinghalfspace depth, and for the case of elliptical distributions. Finally, some needed foundational results on almost sure convergence of sample depth functions are provided.

Journal ArticleDOI
TL;DR: In this article, it is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by mixing the proposed ones to be adaptive in terms of statistical risks.
Abstract: General results on adaptive density estimation are obtained with respect to any countable collection of estimation strategies under Kullback-Leibler and squared $L_2$ losses. It is shown that without knowing which strategy works best for the underlying density, a single strategy can be constructed by mixing the proposed ones to be adaptive in terms of statistical risks. A consequence is that under some mild conditions, an asymptotically minimax-rate adaptive estimator exists for a given countable collection of density classes; that is, a single estimator can be constructed to be simultaneously minimax-rate optimal for all the function classes being considered. A demonstration is given for high-dimensional density estimation on $[0,1]^d$ where the constructed estimator adapts to smoothness and interaction-order over some piecewise Besov classes and is consistent for all the densities with finite entropy.

Journal ArticleDOI
TL;DR: A bound on the rate of convergence in Hellinger distance for densityestimation is established using the Gaussian mixture sieve assuming that the true density is itself a mixture of Gaussians; the underlying mixing measure of thetrue density is not necessarilyassumed to have finite support.
Abstract: Gaussian mixtures provide a convenient method of densityestimation that lies somewhere between parametric models and kernel densityestimators When the number of components of the mixture is allowed to increase as sample size increases, the model is called a mixture sieve We establish a bound on the rate of convergence in Hellinger distance for densityestimation using the Gaussian mixture sieve assuming that the true density is itself a mixture of Gaussians; the underlying mixing measure of the true densityis not necessarilyassumed to have finite support Computing the rate involves some delicate calculations since the size of the sieve—as measured bybracketing entropy—and the saturation rate, cannot be found using standard methods When the mixing measure has compact support, using kn ∼ n 2/3 /� log n� 1/3 components in the mixture yields a rate of order � log n� � 1+η� /6/n1/6 for every η> 0� The rates depend heavilyon the tail behavior of the true density The sensitivity to the tail behavior is diminished byusing a robust sieve which includes a long-tailed component in the

Journal ArticleDOI
TL;DR: The uniqueness results for the S-functionals are obtained by embedding them within a more general class of functionals which are called the M-functional with auxiliary scale as discussed by the authors.
Abstract: The S-functionals of multivariate location and scatter, including the MVE-functionals, are known to be uniquely defined only at unimodal elliptically symmetric distributions. The goal of this paper is to establish the uniqueness of these functionals under broader classes of symmetric distributions. We also discuss some implications of the uniqueness of the functionals and give examples of striclty unimodal and symmetric distributions for which the MVE-functional is not uniquely defined. The uniqueness results for the S-functionals are obtained by embedding them within a more general class of functionals which we call the M-functionals with auxiliary scale. The uniqueness results of this paper are then obtained for this class of multivariate functionals. Besides the S-functionals, the class of multivariate M-functionals with auxiliary scale include the constrained M-functionals recently introduced by Kent and Tyler, as well as a new multivariate generalization of Yohai's MM-functionals.

Journal ArticleDOI
TL;DR: The quick optimal rate of the TPS-ANOVA model makes it very preferable in high-dimensional function estimation, and many properties of the tensor product space of Sobolev-Hilbert spaces are given.
Abstract: To deal with the curse of dimensionality in high-dimensional nonparametric problems, we consider using tensor product space ANOVA models, which extend the popular additive models and are able to capture interactions of any order. The multivariate function is given an ANOVA decomposition, that is, it is expressed as a constant plus the sum of functions of one variable (main effects), plus the sum of functions of two variables (two-factor interactions) and so on. We assume the interactions to be in tensor product spaces. We show in both regression and white noise settings, the optimal rate of convergence for the TPS-ANOVA model is within a log factor of the one-dimensional optimal rate, and that the penalized likelihood estimator in TPS-ANOVA achieves this rate of convergence. The quick optimal rate of the TPS-ANOVA model makes it very preferable in high-dimensional function estimation. Many properties of the tensor product space of Sobolev-Hilbert spaces are also given.

Journal ArticleDOI
TL;DR: In this article, it was shown that the global power function of any nonparametric test is flat on balls of alternatives except for alternatives coming from a finite dimensional subspace, and that the level points are far away from the corresponding Neym an-Pearson test level points except for a finite number of orthogonal directions of alternatives.
Abstract: usseldorf It is shown that the global power function of any nonparametric test is flat on balls of alternatives except for alternatives coming from a finite dimensional subspace. The present benchmark is here the upper one-sided (or two-sided) envelope power function. Every choice of a test fixes a priori a finite dimensional region with high power. It turns out that also the level points are far away fromthe corresponding Neym an–Pearson test level points except for a finite number of orthogonal directions of alternatives. For certain submodels the result is independent of the underlying sample size. In the last section the statistical consequences and special goodness of fit tests are discussed. 1. Introduction. Omnibus tests are commonly used if the specific structure of certain nonparametric alternatives is unknown. Among other justifications, it turns out that they typically are consistent against fixed alternatives and √ n-consistent under sequences of local alternatives of sample size n .F or these reasons, people often trust in goodness of fit tests and these are frequently applied to data of finite sample size. On the other hand, every asymptotic approximation should be understood as an approximation of the underlying finite sample case. Thus the statistician likes to distinguish and to compare the power of different competing tests. The present paper offers a concept for the comparison and justification of different tests by their power functions and level points. It is shown that under certain circumstances every test has a preference for a finite dimensional space of alternatives. Apart fromthis space, the power function is almost flat on balls of alternatives. There exists no test which pays equal attention to an infinite number of orthogonal alternatives. The results do not only hold for asymptotic models but they also hold for concrete alternatives on the real line at finite sample size and their level points uniformly for the sample size. The results are not surprising. Every statistician knows that it is impossible to separate an infinite sequence of different parameters simultaneously if only a finite number of observations is available. The conclusions of the results are two-fold. 1. The statistician should analyze the goodness of fit tests of his computer package in order to get some knowledge and an impression about their preferences.

Journal ArticleDOI
TL;DR: In this paper, the authors identify a large class of cascade generators uniquely determined by the scaling exponents of a single cascade realization as a.s. constants and provide both asymptotic consistency and confidence intervals for two different estimators of the cumulant generating function (log Laplace transform) of the cascade generator distribution.
Abstract: The probability distribution of the cascade generators in a random multiplicative cascade represents a hidden parameter which is reflected in the fine scale limiting behavior of the scaling exponents (sample moments) of a single sample cascade realization as a.s. constants. We identify a large class of cascade generators uniquely determined by these scaling exponents. For this class we provide both asymptotic consistency and confidence intervals for two different estimators of the cumulant generating function (log Laplace transform) of the cascade generator distribution. These results are derived from investigation of the convergence properties of the fine scale sample moments of a single cascade realization.

Journal ArticleDOI
TL;DR: A class of approximate numerical methods for solving the penalized likelihood variational problem which, in conjunction with the ranGACV method, allows the application of smoothing spline ANOVA models with Bernoulli data to much larger data sets than previously possible.
Abstract: We propose the randomized Generalized Approximate Cross Validation (ranGACV) method for choosing multiple smoothing parameters in penalized likelihood estimates for Bernoulli data. The method is intended for application with penalized likelihood smoothing spline ANOVA models. In addition we propose a class of approximate numerical methods for solving the penalized likelihood variational problem which, in conjunction with the ranGACV method allows the application of smoothing spline ANOVA models with Bernoulli data to much larger data sets than previously possible. These methods are based on choosing an approximating subset of the natural (representer) basis functions for the variational problem. Simulation studies with synthetic data, including synthetic data mimicking demographic risk factor data sets is used to examine the properties of the method and to compare the approach with the GRKPACK code of Wang (1997c). Bayesian “confidence intervals” are obtained for the fits and are shown in the simulation studies to have the “across the function” property usually claimed for these confidence intervals. Finally the method is applied to an observational data set from the Beaver Dam Eye study, with scientifically interesting results.

Journal ArticleDOI
TL;DR: All Bayes estimators for proper Gaussian priors have zero asymptotic efficiency in this minimax sense, and a class of priors whose Bayes procedures attain the optimal minimax rate of convergence is presented.
Abstract: We study the Bayesian approach to nonparametric function estimation problems such as nonparametric regression and signal estimation We consider the asymptotic properties of Bayes procedures for conjugate (= Gaussian) priors We show that so long as the prior puts nonzero measure on the very large parameter set of interest then the Bayes estimators are not satisfactory More specifically, we show that these estimators do not achieve the correct minimax rate over norm bounded sets in the parameter space Thus all Bayes estimators for proper Gaussian priors have zero asymptotic efficiency in this minimax sense We then present a class of priors whose Bayes procedures attain the optimal minimax rate of convergence These priors may be viewed as compound, or hierarchical, mixtures of suitable Gaussian distributions

Journal ArticleDOI
TL;DR: In this paper, a new approach to testing for monotonicity of a regression mean, not requiring computation of a curve estimator or a bandwidth, is suggested, based on the notion of "running gradients " over short intervals, although from some viewpoints it may be regarded as an analogue for testing of the dip/excess mass approach.
Abstract: A new approach to testing for monotonicity of a regression mean, not requiring computation of a curve estimator or a bandwidth, is suggested It is based on the notion of “running gradients ” over short intervals, although from some viewpoints it may be regarded as an analogue for monotonicity testing of the dip/excess mass approach for testing modality hypotheses about densities Like the latter methods, the new technique does not suffer difficulties caused by almost-flat parts of the target function In fact, it is calibrated so as to work well for flat response curves, and as a result it has relatively good power properties in boundary cases where the curve exhibits shoulders In this respect, as well as in its construction, the “running gradients” approach differs from alternative techniques based on the notion of a critical bandwidth

Journal ArticleDOI
TL;DR: In this article, the authors present the explicit solution of the Bayesian problem of sequential testing of two simple hypotheses about the intensity of an observed Poisson process, which consists of reducing the initial problem to a free-boundary differential-difference problem, and solving the latter by use of the principles of smooth and continuous fit.
Abstract: We present the explicit solution of the Bayesian problem of sequential testing of two simple hypotheses about the intensity of an observed Poisson process. The method of proof consists of reducing the initial problem to a free-boundary differential-difference Stephan problem, and solving the latter by use of the principles of smooth and continuous fit. A rigorous proof of the optimality of the Wald’s sequential probability ratio test in the variational formulation of the problem is obtained as a consequence of the solution of the Bayesian problem.

Journal ArticleDOI
TL;DR: In this article, a procedure associated with nonlinear wavelet methods that provides adaptive confidence intervals around $f (x_0)$ in either a white noise model or a regression setting is presented.
Abstract: We present a procedure associated with nonlinear wavelet methods that provides adaptive confidence intervals around $f (x_0)$, in either a white noise model or a regression setting. A suitable modification in the truncation rule for wavelets allows construction of confidence intervals that achieve optimal coverage accuracy up to a logarithmic factor. The procedure does not require knowledge of the regularity of the unknown function $f$; it is also efficient for functions with a low degree of regularity.

Journal ArticleDOI
TL;DR: In this article, the estimation of structured correlation matrices in infinitely differentiable Gaussian random field models is considered, and the log-likelihood function is determined explicitly in closed-form and the sieve maximum likelihood estimators are shown to be strongly consistent under mild conditions.
Abstract: This article considers the estimation of structured correlation matrices in infinitely differentiable Gaussian random field models.The problem is essentially motivated by the stochastic modeling of smooth deterministic responses in computer experiments.In particular, the log-likelihood function is determined explicitly in closed-form and the sieve maximum likelihood estimators are shown to be strongly consistent under mild conditions.

Journal ArticleDOI
TL;DR: In this article, the authors extended Vardi's estimation approach by developing a simple two-step estimation procedure in which the weight functions and the data were obtained by maximizing a profile partial likelihood.
Abstract: Vardi [Ann.Statist.13 178-203 (1985)] introduced an $s$-sample biased sampling model with known selection weight functions, gave a condition under which the common underlying probability distribution $G$ is uniquely estimable and developed simple procedure for computing the nonparametric maximum likelihood estimator (NPMLE) $\mathbb{G}_n$ of $G$. Gill, Vardi and Wellner thoroughly described the large sample properties of Vardi’s NPMLE, giving results on uniform consistency, convergence of $\sqrt{n}(\mathbb{G}-G)$ to a Gaussian process and asymptotic efficiency of $\mathbb{G}_n$. Gilbert, Lele and Vardi considered the class of semiparametric $s$-sample biased sampling models formed by allowing the weight functions to depend on an unknown finite-dimensional parameter $\theta$ .They extended Vardi’s estimation approach by developing a simple two-step estimation procedure in which $\hat{\theta}_n$ is obtained by maximizing a profile partial likelihood and $\mathbb{G}_n \equiv \mathbb{G}_n(\hat{\theta}_n)$ is obtained by evaluating Vardi’s NPMLE at $\hat{\theta}_n$. Here we examine the large sample behavior of the resulting joint MLE $(\hat{\theta}_n,\mathbb{G}_n)$, characterizing conditions on the selection weight functions and data in order that $(\hat{\theta}_n, \mathbb{G}_n)$ is uniformly consistent, asymptotically Gaussian and efficient. Examples illustrated here include clinical trials (especially HIV vaccine efficacy trials), choice-based sampling in econometrics and case-control studies in biostatistics.