scispace - formally typeset
Search or ask a question

Showing papers in "Annals of Statistics in 2001"


Journal ArticleDOI
TL;DR: A general gradient descent boosting paradigm is developed for additive expansions based on any fitting criterion, and specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification.
Abstract: Function estimation/approximation is viewed from the perspective of numerical optimization in function space, rather than parameter space. A connection is made between stagewise additive expansions and steepest-descent minimization. A general gradient descent “boosting” paradigm is developed for additive expansions based on any fitting criterion.Specific algorithms are presented for least-squares, least absolute deviation, and Huber-M loss functions for regression, and multiclass logistic likelihood for classification. Special enhancements are derived for the particular case where the individual additive components are regression trees, and tools for interpreting such “TreeBoost” models are presented. Gradient boosting of regression trees produces competitive, highly robust, interpretable procedures for both regression and classification, especially appropriate for mining less than clean data. Connections between this approach and the boosting methods of Freund and Shapire and Friedman, Hastie and Tibshirani are discussed.

17,764 citations


Journal ArticleDOI
TL;DR: In this paper, it was shown that a simple FDR controlling procedure for independent test statistics can also control the false discovery rate when test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses.
Abstract: Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional familywise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate $t$. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which a procedure with proven FDR control can be offered is greatly increased.

9,335 citations


Journal ArticleDOI
TL;DR: In this article, the authors derived the Tracey-Widom law of order 1 for large p and n matrices, where p is the largest eigenvalue of a p-variate Wishart distribution on n degrees of freedom with identity covariance.
Abstract: Let x(1) denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x(1) is the largest principal component variance of the covariance matrix $X'X$, or the largest eigenvalue of a p­variate Wishart distribution on n degrees of freedom with identity covariance. Consider the limit of large p and n with $n/p = \gamma \ge 1$. When centered by $\mu_p = (\sqrt{n-1} + \sqrt{p})^2$ and scaled by $\sigma_p = (\sqrt{n-1} + \sqrt{p})(1/\sqrt{n-1} + 1/\sqrt{p}^{1/3}$, the distribution of x(1) approaches the Tracey-Widom law of order 1, which is defined in terms of the Painleve II differential equation and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5. The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.

2,202 citations


Journal ArticleDOI
TL;DR: The generalized likelihood ratio statistics are shown to be general and powerful for nonparametric testing problems based on function estimation and can even be adaptively optimal in the sense of Spokoiny by using a simple choice of adaptive smoothing parameter.
Abstract: Likelihood ratio theory has had tremendous success in parametric inference, due to the fundamental theory of Wilks. Yet, there is no general applicable approach for nonparametric inferences based on function estimation. Maximum likelihood ratio test statistics in general may not exist in nonparametric function estimation setting. Even if they exist, they are hard to find and can not; be optimal as shown in this paper. We introduce the generalized likelihood statistics to overcome the drawbacks of nonparametric maximum likelihood ratio statistics. A new S Wilks phenomenon is unveiled. We demonstrate that a class of the generalized likelihood statistics based on some appropriate nonparametric estimators are asymptotically distribution free and follow chi (2)-distributions under null hypotheses for a number of useful hypotheses and a variety of useful models including Gaussian white noise models, nonparametric regression models, varying coefficient models and generalized varying coefficient models. We further demonstrate that generalized likelihood ratio statistics are asymptotically optimal in the sense that they achieve optimal rates of convergence given by Ingster. They can even be adaptively optimal in the sense of Spokoiny by using a simple choice of adaptive smoothing parameter. Our work indicates that the generalized likelihood ratio statistics are indeed general and powerful for nonparametric testing problems based on function estimation.

676 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compute the rate at which the posterior distribution concentrates around the true parameter value, and show that the rate is driven by the size of the space, as measured by bracketing entropy, and the degree to which the prior concentrates in a small ball around the real parameter.
Abstract: We compute the rate at which the posterior distribution concentrates around the true parameter value. The spaces we work in are quite general and include in finite dimensional cases. The rates are driven by two quantities: the size of the space, as measured by bracketing entropy, and the degree to which the prior concentrates in a small ball around the true parameter. We consider two examples.

345 citations


Journal ArticleDOI
TL;DR: The run method converges slowly but can withstand blocks as well as a high proportion of isolated outliers and the rate of convergence of the taut-string multiresolution method is almost optimal.
Abstract: The paper considers the problem of nonparametric regression with emphasis on controlling the number of local extremes. Two methods, the run method and the taut-string multiresolution method, are introduced and analyzed on standard test beds. It is shown that the number and locations of local extreme values are consistently estimated. Rates of convergence are proved for both methods. The run method converges slowly but can withstand blocks as well as a high proportion of isolated outliers. The rate of convergence of the taut-string multiresolution method is almost optimal. The method is extremely sensitive and can detect very low power peaks. Section 1 contains an introduction with special reference to the number of local extreme values. The run method is described in Section 2 and the taut-string-multiresolution method in Section 3. Low power peaks are considered in Section 4. Section contains a comparison with other methods and Section 6 a short conclusion. The proofs are given in Section 7 and the taut-string algorithm is described in the Appendix.

310 citations


Journal ArticleDOI
TL;DR: In this paper, a generalized minimum aberration criterion for comparing asymmetrical fractional factorial designs is proposed, which is independent of the choice of treatment contrasts and thus model-free.
Abstract: By studying treatment contrasts and ANOVA models, we propose a generalized minimum aberration criterion for comparing asymmetrical fractional factorial designs. The criterion is independent of the choice of treatment contrasts and thus model-free. It works for symmetrical and asymmetrical designs, regular and nonregular designs. In particular, it reduces to the minimum aberration criterion for regular designs and the minimum G 2 -aberration criterion for two-level nonregular designs. In addition, by exploring the connection between factorial design theory and coding theory, we develop a complementary design theory for general symmetrical designs, which covers many existing results as special cases.

309 citations


Journal ArticleDOI
TL;DR: In this article, the average derivative estimator (ADE) of the index vector is used for improving the quality of gradient estimation by extending the weighting kernel in a direction of small directional derivative, and the whole procedure requires at most 2 $\log n$ iterations and the resulting estimator is $\sqrt{n}$-consistent under relatively mild assumptions on the model independently of the dimensionality.
Abstract: Single-index modeling is widely applied in,for example,econometric studies as a compromise between too restrictive parametric models and flexible but hardly estimable purely nonparametric models. By such modeling the statistical analysis usually focuses on estimating the index coefficients. The average derivative estimator (ADE) of the index vector is based on the fact that the average gradient of a single index function $f(x^{\top}\beta)$ is proportional to the index vector $\beta$. Unfortunately,a straightforward application of this idea meets the so-called “curse of dimensionality” problem if the dimensionality $d$ of the model is larger than 2. However, prior information about the vector $\beta$ can be used for improving the quality of gradient estimation by extending the weighting kernel in a direction of small directional derivative. The method proposed in this paper consists of such iterative improvements of the original ADE. The whole procedure requires at most 2 $\log n$ iterations and the resulting estimator is $\sqrt{n}$-consistent under relatively mild assumptions on the model independently of the dimensionality $d$.

299 citations


Journal ArticleDOI
TL;DR: A method for monotonizing general kernel-type estimator types, for example local linear estimators and Nadaraya-Watson estimators, subject to imposing the constraint of monotonicity is suggested.
Abstract: We suggest a method for monotonizing general kernel-type estimators, for example local linear estimators and Nadaraya-Watson estimators. Attributes of our approach include the fact that it produces smooth estimates, indeed with the same smoothness as the unconstrained estimate. The method is applicable to a particularly wide range of estimator types, it can be trivially modified to render an estimator strictly monotone and it can be employed after the smoothing step has been implemented. Therefore, an experimenter may use his or her favorite kernel estimator, and their favorite bandwidth selector, to construct the basic nonparametric smoother and then use our technique to render it monotone in a smooth way. Implementation involves only an off-the-shelf programming routine. The method is based on maximizing fidelity to the conventional empirical approach, subject to monotonicity. We adjust the unconstrained estimator by tilting the empirical distribution so as to make the least possible change, in the sense of a distance measure, subject to imposing the constraint of monotonicity.

287 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the convergence rate of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers.
Abstract: We study the rates of convergence of the maximum likelihood estimator (MLE) and posterior distribution in density estimation problems, where the densities are location or location-scale mixtures of normal distributions with the scale parameter lying between two positive numbers. The true density is also assumed to lie in this class with the true mixing distribution either compactly supported or having sub-Gaussian tails. We obtain bounds for Hellinger bracketing entropies for this class, and from these bounds, we deduce the convergence rates of (sieve) MLEs in Hellinger distance. The rate turns out to be $(log n)^\kappa /\sqrt{n}$, where $\kappa \ge 1$ is a constant that depends on the type of mixtures and the choice of the sieve. Next, we consider a Dirichlet mixture of normals as a prior on the unknown density. We estimate the prior probability of a certain Kullback-Leibler type neighborhood and then invoke a general theorem that computes the posterior convergence rate in terms the growth rate of the Hellinger entropy and the concentration rate of the prior. The posterior distribution is also seen to converge at the rate $(log n)^\kappa /\sqrt{n}$, where $\kappa$ now depends on the tail behavior of the base measure of the Dirichlet process.

284 citations


Journal ArticleDOI
TL;DR: In this article, the authors study nonparametric estimation of convex regression and density functions by methods of least squares and maximum likelihood, and provide characterizations of these estimators, prove that they are consistent and establish their asymptotic distributions at a fixed point of positive curvature of the functions estimated.
Abstract: We study nonparametric estimation of convexregression and density functions by methods of least squares (in the regression and density cases) and maximum likelihood (in the density estimation case).We provide characterizations of these estimators, prove that they are consistent and establish their asymptotic distributions at a fixed point of positive curvature of the functions estimated. The asymptotic distribution theory relies on the existence of an “invelope function” for integrated two-sided Brownian motion $+t^4$ which is established in a companion paper by Groeneboom, Jongbloed and Wellner.

Journal ArticleDOI
TL;DR: In this paper, the authors propose two classes of tests of qualitative nonparametric hypotheses about f such as monotonicity or concavity, constructed via a new class of multiscale statistics and an extension of Levy's modulus of continuity of Brownian motion.
Abstract: Suppose that one observes a process Y on the unit interval, where dY(t) =n1/2 f(t)dt +dW (t) with an unknown function parameter f, given scale parameter n <=1 ("sample size") and standard Brownian motion W. We propose two classes of tests of qualitative nonparametric hypotheses about f such as monotonicity or concavity. These tests are asymptotically optimal and adaptive in a certain sense. They are constructed via a new class of multiscale statistics and an extension of Levy's modulus of continuity of Brownian motion.

Journal ArticleDOI
TL;DR: In this paper, the authors show that the posterior distribution of a Bernstein polynomial prior converges at a near parametric rate for the Hellinger distance, provided that the true density is twice differentiable and bounded away from 0.
Abstract: Mixture models for density estimation provide a very useful set up for the Bayesian or the maximum likelihood approach.For a density on the unit interval, mixtures of beta densities form a flexible model. The class of Bernstein densities is a much smaller subclass of the beta mixtures defined by Bernstein polynomials, which can approximate any continuous density. A Bernstein polynomial prior is obtained by putting a prior distribution on the class of Bernstein densities. The posterior distribution of a Bernstein polynomial prior is consistent under very general conditions. In this article, we present some results on the rate of convergence of the posterior distribution. If the underlying distribution generating the data is itself a Bernstein density, then we show that the posterior distribution converges at “nearly parametric rate” $(log n) /\sqrt{n}$ for the Hellinger distance. If the true density is not of the Bernstein type, we show that the posterior converges at a rate $n^{1/3}(log n)^{5/6}$ provided that the true density is twice differentiable and bounded away from 0. Similar rates are also obtained for sieve maximum likelihood estimates.These rates are inferior to the pointwise convergence rate of a kernel type estimator.We show that the Bayesian bootstrap method gives a proxy for the posterior distribution and has a convergence rate at par with that of the kernel estimator.

Journal ArticleDOI
TL;DR: In this paper, a nonparametric estimation theory in a nonstationary environment, more precisely in the framework of null recurrent Markov chains, is developed, which makes it possible to decompose the times series under consideration into independent and identical parts.
Abstract: We develop a nonparametric estimation theory in a nonstationary environment, more precisely in the framework of null recurrent Markov chains. An essential tool is the split chain, which makes it possible to decompose the times series under consideration into independent and identical parts. A tail condition on the distribution of the recurrence time is introduced. This condition makes it possible to prove weak convergence results for sums of functions of the process depending on a smoothing parameter. These limit results are subsequently used to obtain consistency and asymptotic normality for local density estimators and for estimators of the conditional mean and the conditional variance. In contradistinction to the parametric case, the convergence rate is slower than in the stationary case, and it is directly linked to the tail behavior of the recurrence time. Applications to econometric, and in particular to cointegration models, are indicated.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a method of effective dimension reduction for a multi-index model which is based on iterative improvement of the family of average derivative estimates, and showed that in the case when the effective dimension m of the index space does not exceed 3, this space can be estimated with the rate n − 1/2 under mild assumptions on the model.
Abstract: We propose a new method of effective dimension reduction for a multiindex model which is based on iterative improvement of the family of average derivative estimates. The procedure is computationally straightforward and does not require any prior information about the structure of the underlying model. We show that in the case when the effective dimension m of the index space does not exceed 3, this space can be estimated with the rate n −1/2 under rather mild assumptions on the model.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a test for selecting explanatory variables in nonparametric regression, which does not need to estimate the conditional expectation function given all the variables, but only those which are significant under the null hypothesis.
Abstract: This paper proposes a test for selecting explanatory variables in nonparametric regression. The test does not need to estimate the conditional expectation function given all the variables, but only those which are significant under the null hypothesis. This feature is computationally convenient and solves, in part, the problem of the “curse of dimensionality” when selecting regressors in a nonparametric context. The proposed test statistic is based on functionals of a $U$-process. Contiguous alternatives, converging to the null at a rate $n^{-1/2}$ can be detected. The asymptotic null distribution of the statistic depends on certain features of the data generating process,and asymptotic tests are difficult to implement except in rare circumstances. We justify the consistency of two easy to implement bootstrap tests which exhibit good level accuracy for fairly small samples, according to the reported Monte Carlo simulations. These results are also applicable to test other interesting restrictions on nonparametric curves, like partial linearity and conditional independence.

Journal ArticleDOI
TL;DR: This work describes a hierarchy of exponential families which is useful for distinguishing types of graphical models and shows how to compute the dimension of a stratified exponential family.
Abstract: We describe a hierarchy of exponential families which is useful for distinguishing types of graphical models. Undirected graphical models with no hidden variables are linear exponential families (LEFs). Directed acyclic graphical (DAG) models and chain graphs with no hidden variables, includ­ ing DAG models with several families of local distributions, are curved exponential families (CEFs). Graphical models with hidden variables are what we term stratified exponential families (SEFs). A SEF is a finite union of CEFs of various dimensions satisfying some regularity conditions. We also show that this hierarchy of exponential families is noncollapsing with respect to graphical models by providing a graphical model which is a CEF but not a LEF and a graphical model that is a SEF but not a CEF. Finally, we show how to compute the dimension of a stratified exponential family. These results are discussed in the context of model selection of graphical models.

Journal ArticleDOI
TL;DR: In this paper, the authors extend Robins' theory of causal inference for complex longitudinal data to the case of continuously varying covariates and treatments, and establish versions of the key results of the discrete theory: the g-computation formula and a collection of powerful characterizations of the gnull hypothesis of no treatment effect.
Abstract: We extend Robins’ theory of causal inference for complex longitudinal data to the case of continuously varying as opposed to discrete covariates and treatments. In particular we establish versions of the key results of the discrete theory: the g-computation formula and a collection of powerful characterizations of the g-null hypothesis of no treatment eect. This is accomplished under natural continuity hypotheses concerning the conditional distributions of the outcome variable and of the covariates given the past. We also show that our assumptions concerning counterfactual variables place no restriction on the joint distribution of the observed variables: thus in a precise sense, these assumptions are ‘for free’, or if you prefer, harmless.

Journal ArticleDOI
TL;DR: In this article, the authors studied the problem of testing for equality at a fixed point in the setting of nonparametric estimation of a monotone function and derived a likelihood ratio test for this hypothesis.
Abstract: We study the problem of testing for equality at a fixed point in the setting of nonparametric estimation of a monotone function The likelihood ratio test for this hypothesis is derived in the particular case of interval censoring (or current status data) and its limiting distribution is obtained The limiting distribution is that of the integral of the difference of the squared slope processes corresponding to a canonical version of the problem involving Brownian motion + t 2 and greatest convex minorants thereof Inversion of the family of tests yields pointwise confidence intervals for the unknown distribution function We also study the behavior of the statistic under local and fixed alternatives 1 Introduction We shall consider likelihood ratio tests, and the corresponding confidence intervals, in a class of problems involving nonparametric estimation of a monotone function The problem in each case involves testing the null hypothesis H 0 that the monotone function has a particular value at afixed point t 0 in the domain of the function Of course with each testing problem there is a related problem of finding confidence intervals Here are some examples of the problems we have in mind

Journal ArticleDOI
TL;DR: In this paper, three methods using nonparametric estimators of the regression function are discussed for testing the equality of k regression curves from independent samples, including linear combination of estimators for the integrated variance function in the individual samples and in the combined sample.
Abstract: In the problem of testing the equality of k regression curves from independent samples, we discuss three methods using nonparametric estimators of the regression function. The first test is based on a linear combination of estimators for the integrated variance function in the individual samples and in the combined sample. The second approach transfers the classical one-way analysis of variance to the situation of comparing non-parametric curves, while the third test compares the differences between the estimates of the individual regression functions by means of an L 2 -distance. We prove asymptotic normality of all considered statistics under the null hypothesis and local and fixed alternatives with different rates corresponding to the various cases. Additionally, consistency of a wild bootstrap version of the tests is established. In contrast to most of the procedures proposed in the literature, the methods introduced in this paper are also applicable in the case of different design points in each sample and heteroscedastic errors. A simulation study is conducted to investigate the finite sample properties of the new tests and a comparison with recently proposed and related procedures is performed.

Journal ArticleDOI
TL;DR: In this paper, an autoregressive moving average model is proposed to generate uncorrelated (white noise) time series, but these series are not independent in the non-Gaussian case, and an approximation to the likelihood of the model in the case of Laplacian (two-sided exponential) noise yields a modified absolute deviations criterion.
Abstract: An autoregressive moving average model in which all of the roots of the autoregressive polynomial are reciprocals of roots of the moving average polynomial and vice versa is called an all-pass time series model. All-pass models generate uncorrelated (white noise) time series, but these series are not independent in the non-Gaussian case. An approximation to the likelihood of the model in the case of Laplacian (two-sided exponential) noise yields a modified absolute deviations criterion, which can be used even if the underlying noise is not Laplacian. Asymptotic normality for least absolute deviation estimators of the model parameters is established under general conditions. Behavior of the estimators in finite samples is studied via simulation. The methodology is applied to exchange rate returns to show that linear all-pass models can mimic “nonlinear” behavior, and is applied to stock market volume data to illustrate a two-step procedure for fitting noncausal autoregressions.

Journal ArticleDOI
TL;DR: In this paper, the behavior of averaged periodograms and cross-periodograms of a broad class of nonstationary processes is studied and the results can be applied in fractional cointegration with unknown integration orders.
Abstract: The behavior of averaged periodograms and cross-periodograms of a broad class of nonstationary processes is studied. The processes include nonstationary ones that are fractional of any order, as well as asymptotically stationary fractional ones. The cross-periodogram can involve two nonstationary processes of possibly different orders, or a nonstationary and an asymptotically stationary one. The averaging takes place either over the whole frequency band, or over one that degenerates slowly to zero frequency as sample size increases. In some cases it is found to make no asymptotic difference, and in particular we indicate how the behavior of the mean and variance changes across the two-dimensional space of integration orders. The results employ only local-to-zero assumptions on the spectra of the underlying weakly stationary sequences. It is shown how the results can be applied in fractional cointegration with unknown integration orders.

Journal ArticleDOI
TL;DR: In this article, a parametric spectral density with power-law behavior about a fractional pole at the unknown frequency was considered, where the long memory was assumed to be known.
Abstract: We consider a parametric spectral density with power-law behavior about a fractional pole at the unknown frequency $\omega$. The case of known $\omega$, especially $\omega =0$, is standard in the long memory literature. When $omega$ is unknown, asymptotic distribution theory for estimates of parameters, including the (long) memory parameter, is significantly harder. We study a form of Gaussian estimate. We establish $n$-consistency of the estimate of $\omega$, and discuss its (non-standard) limiting distributional behavior. For the remaining parameter estimates,we establish $\sqrt{n}$-consistency and asymptotic normality.

Journal ArticleDOI
TL;DR: In this paper, a truly nonparametric estimator of the spectral measure, based on the ranks of the above data, is presented, which is valid for all values of the extreme value indices.
Abstract: Let $(\mathcal{X}_1, \mathcal{Y}_1),\dots,(\mathcal{X}_n, \mathcal{Y}_n)$ be a random sample from a bivariate distribution function $F$ in the domain of max-attraction of a distribution function $G$. This $G$ is characterised by the two extreme value indices and its spectral or angular measure. The extreme value indices determine both the marginals and the spectral measure determines the dependence structure of $G$. One of the main issues in multivariate extreme value theory is the estimation of this spectral measure. We construct a truly nonparametric estimator of the spectral measure, based on the ranks of the above data. Under natural conditions we prove consistency and asymptotic normality for the estimator. In particular,the result is valid for all values of the extreme value indices. The theory of (local) empirical processes is indispensable here. The results are illustrated by an application to real data and a small simulation study.

Journal ArticleDOI
TL;DR: The authors presented a semiparametric methodology yielding almost sure convergence of the estimated number of components to the true but unknown number of component components for finite mixture models when the number of the components is known.
Abstract: The consistent estimation of mixture complexity is of fundamental importance in many applications of finite mixture models An enormous body of literature exists regarding the application, computational issues and theoretical aspects of mixture models when the number of components is known, but estimating the unknown number of components remains an area of intense research effort This article presents a semiparametric methodology yielding almost sure convergence of the estimated number of components to the true but unknown number of components The scope of application is vast, as mixture models are routinely employed across the entire diverse application range of statistics, including nearly all of the social and experimental sciences

Journal ArticleDOI
TL;DR: In this paper, the authors derived the asymptotic distribution of the sequential empirical process of the squared residuals of an ARCH(p) sequence and showed that in certain applications, including the detection of changes in the distribution of unobservable innovations, their result leads to asymPTotically distribution free statistics.
Abstract: We derive the asymptotic distribution of the sequential empirical process of the squared residuals of an ARCH(p) sequence. Unlike the residuals of an ARMA process, these residuals do not behave in this context like asymptotically independent random variables, and the asymptotic distribution involves a term depending on the parameters of the model. We show that in certain applications, including the detection of changes in the distribution of the unobservable innovations, our result leads to asymptotically distribution free statistics.

Journal ArticleDOI
TL;DR: In this article, a penalized least-squares estimator (PLSE) is built on a data driven selected model among a collection of models which are finite dimensional spaces, and the estimator is adaptive in the minimax sense simultaneously over some family of Besov balls.
Abstract: We study the problem of estimatingsome unknown regression function in a $\beta$-mixing dependent framework. To this end, we consider some collection of models which are finite dimensional spaces. A penalized least-squares estimator (PLSE) is built on a data driven selected model among this collection. We state non asymptotic risk bounds for this PLSE and give several examples where the procedure can be applied (autoregression, regression with arithmetically $\beta$-mixing design points, regression with mixing errors, estimation in additive frameworks, estimation of the order of the autoregression). In addition we show that under a weak moment condition on the errors, our estimator is adaptive in the minimax sense simultaneously over some family of Besov balls.

Journal ArticleDOI
TL;DR: In this paper, a process associated with integrated Brownian motion is introduced that characterizes the limit behavior of nonparametric least squares and maximum likelihood estimators of convex functions and convex densities, respectively.
Abstract: A process associated with integrated Brownian motion is introduced that characterizes the limit behavior of nonparametric least squares and maximum likelihood estimators of convex functions and convex densities, respectively. We call this process “the invelope” and show that it is an almost surely uniquely defined function of integrated Brownian motion. Its role is comparable to the role of the greatest convex minorant of Brownian motion plus a parabolic drift in the problem of estimating monotone functions. An iterative cubic spline algorithm is introduced that solves the constrained least squares problem in the limit situation and some results, obtained by applying this algorithm, are shown to illustrate the theory.

Journal ArticleDOI
TL;DR: In this paper, a non-sequential optimal design for a class of nonlinear growth models, which includes the asymptotic regression model, is studied, and the theorem of Perron and Frobenius on primitive matrices is adopted.
Abstract: This paper is concerned with nonsequential optimal designs for a class of nonlinear growth models, which includes the asymptotic regression model. This design problem is intimately related to the problem of finding optimal designs for polynomial regression models with only partially known heteroscedastic structure. In each case, a straightforward application of the usual D­optimality criterion would lead to designs which depend on the unknown underlying parameters. To overcome this undesirable dependence a maximin approach is adopted. The theorem of Perron and Frobenius on primitive matrices plays a crucial role in the analysis.

Journal ArticleDOI
TL;DR: In this paper, the authors studied posterior consistency of survival models with neutral to the right process priors, and provided necessary and sufficient conditions for the consistency of the posterior with different priors.
Abstract: Ghosh and Ramamoorthi studied posterior consistency for survival models and showed that the posterior was consistent when the prior on the distribution of survival times was the Dirichlet process prior. In this paper, we study posterior consistency of survival models with neutral to the right process priors which include Dirichlet process priors. A set of sufficient conditions for posterior consistency with neutral to the right process priors are given. Interestingly, not all the neutral to the right process priors have consistent posteriors, but most of the popular priors such as Dirichlet processes, beta processes and gamma processes have consistent posteriors. With a class of priors which includes beta processes, a necessary and sufficient condition for the consistency is also established. An interesting counter-intuitive phenomenon is found. Suppose there are two priors centered at the true parameter value with finite variances. Surprisingly, the posterior with smaller prior variance can be inconsistent, while that with larger prior variance is consistent.