scispace - formally typeset
Search or ask a question

Showing papers in "Test in 2013"


Journal ArticleDOI
25 Jul 2013-Test
TL;DR: In this article, the authors present a survey of the developments on Goodness-of-Fit for regression models during the last 20 years, from the very first origins with the idea of the tests for density and distribution, until the most recent advances for complex data and models.
Abstract: This survey intends to collect the developments on Goodness-of-Fit for regression models during the last 20 years, from the very first origins with the proposals based on the idea of the tests for density and distribution, until the most recent advances for complex data and models. Far from being exhaustive, the contents in this paper are focused on two main classes of tests statistics: smoothing-based tests (kernel-based) and tests based on empirical regression processes, although other tests based on Maximum Likelihood ideas will be also considered. Starting from the simplest case of testing a parametric family for the regression curves, the contributions in this field provide also testing procedures in semiparametric, nonparametric, and functional models, dealing also with more complex settings, as those ones involving dependent or incomplete data.

161 citations


Journal ArticleDOI
01 Jun 2013-Test
TL;DR: In this paper, a flexible approach to approximate the regression function in the case of a functional predictor and a scalar response is introduced. And the terms of such decomposition are estimated with a procedure that combines a spline approximation and the one-dimensional Nadaraya-Watson approach.
Abstract: In this paper we introduce a flexible approach to approximate the regression function in the case of a functional predictor and a scalar response. Following the Projection Pursuit Regression principle, we derive an additive decomposition which exploits the most interesting projections of the prediction variable to explain the response. On one hand, this approach allows us to avoid the well-known curse of dimensionality problem, and, on the other one, it can be used as an exploratory tool for the analysis of functional dataset. The terms of such decomposition are estimated with a procedure that combines a spline approximation and the one-dimensional Nadaraya–Watson approach. The good behavior of our procedure is illustrated from theoretical and practical points of view. Asymptotic results state that the terms in the additive decomposition can be estimated without suffering from the dimensionality problem, while some applications to real and simulated data show the high predictive performance of our method.

88 citations


Journal ArticleDOI
01 Jan 2013-Test
TL;DR: The aim of this paper is to extend the ideas of generalized additive models for multivariate data to functional data covariates by developing a modified version of the local scoring and backfitting algorithms that allows for the nonparametric estimation of the link function.
Abstract: The aim of this paper is to extend the ideas of generalized additive models for multivariate data (with known or unknown link function) to functional data covariates. The proposed algorithmis a modified version of the local scoring and backfitting algorithms that allows for the non-parametric estimation of the link function. This algorithm would be applied to predict a binary response example.

56 citations


Journal ArticleDOI
05 Apr 2013-Test
TL;DR: In this paper, the problem of nonparametric regression is revisited with a view towards reaching a fully model-free environment for predictive inference, i.e., point predictors and predictive intervals.
Abstract: The problem of prediction is revisited with a view towards going beyond the typical nonparametric setting and reaching a fully model-free environment for predictive inference, i.e., point predictors and predictive intervals. A basic principle of model-free prediction is laid out based on the notion of transforming a given setup into one that is easier to work with, namely i.i.d. or Gaussian. As an application, the problem of nonparametric regression is addressed in detail; the model-free predictors are worked out, and shown to be applicable under minimal assumptions. Interestingly, model-free prediction in regression is a totally automatic technique that does not necessitate the search for an optimal data transformation before model fitting. The resulting model-free predictive distributions and intervals are compared to their corresponding model-based analogs, and the use of cross-validation is extensively discussed. As an aside, improved prediction intervals in linear regression are also obtained.

46 citations


Journal ArticleDOI
01 Jun 2013-Test
TL;DR: Different penalized spline estimations of the functional logit model are proposed in this paper, based on smoothed functional PCA and/or a discrete penalty in the log-likelihood criterion in terms of B-spline expansions of the sample curves and the functional parameter.
Abstract: The problem of multicollinearity associated with the estimation of a functional logit model can be solved by using as predictor variables a set of functional principal components. The functional parameter estimated by functional principal component logit regression is often nonsmooth and then difficult to interpret. To solve this problem, different penalized spline estimations of the functional logit model are proposed in this paper. All of them are based on smoothed functional PCA and/or a discrete penalty in the log-likelihood criterion in terms of B-spline expansions of the sample curves and the functional parameter. The ability of these smoothing approaches to provide an accurate estimation of the functional parameter and their classification performance with respect to unpenalized functional PCA and LDA-PLS are evaluated via simulation and application to real data. Leave-one-out cross-validation and generalized cross-validation are adapted to select the smoothing parameter and the number of principal components or basis functions associated with the considered approaches.

28 citations


Journal ArticleDOI
01 Mar 2013-Test
TL;DR: In this paper, a back-fitting algorithm to attain the maximum penalized likelihood estimates (MPLEs) by using natural cubic smoothing splines is presented, and sufficient conditions on the existence of the MPLEs are presented as well as some inferential results and discussions on degrees of freedom and smoothing parameter estimation.
Abstract: In this paper we discuss estimation and diagnostic procedures in semiparametric additive models with symmetric errors in order to permit distributions with heavier and lighter tails than the normal ones, such as Student-t, Pearson VII, power exponential, logistics I and II, and contaminated normal, among others. Such models belong to the general class of statistical models GAMLSS proposed by Rigby and Stasinopoulos (Appl. Stat. 54:507–554, 2005). A back-fitting algorithm to attain the maximum penalized likelihood estimates (MPLEs) by using natural cubic smoothing splines is presented. In particular, the score functions and Fisher information matrices for the parameters of interest are expressed in a similar notation of that used in parametric symmetric models. Sufficient conditions on the existence of the MPLEs are presented as well as some inferential results and discussions on degrees of freedom and smoothing parameter estimation. Diagnostic quantities such as leverage, standardized residual and normal curvatures of local influence under two perturbation schemes are derived. A real data set previously analyzed under normal linear models is reanalyzed under semiparametric additive models with symmetric errors.

27 citations


Journal ArticleDOI
17 Jul 2013-Test
TL;DR: A novel two-stage benchmarking methodology using a single weighted squared error loss function that combines the loss at the unit level and the area level without any specific distributional assumptions is developed.
Abstract: There has been recent growth in small area estimation due to the need for more precise estimation of small geographic areas, which has led to groups such as the U.S. Census Bureau, Google, and the RAND corporation utilizing small area-estimation procedures. We develop a novel two-stage benchmarking methodology using a single weighted squared error loss function that combines the loss at the unit level and the area level without any specific distributional assumptions. This loss is considered while benchmarking the weighted means at each level or both the weighted means and weighted variability at the unit level. Furthermore, we provide multivariate extensions for benchmarking weighted means at both levels. The behavior of our methods is analyzed using a complex study from the National Health Interview Survey (NHIS) from 2000, which estimates the proportion of people that do not have health insurance for many domains of an Asian subpopulation. Finally, the methodology is explored via simulated data under the proposed model. Ultimately, three proposed benchmarked Bayes estimators do not dominate each other, leaving much exploration for further understanding of such complex studies such as the choice of weights, optimal algorithms for efficiency, as well as extensions to multi-stage benchmarking methods.

21 citations


Journal ArticleDOI
18 Jan 2013-Test
TL;DR: In this paper, a copula-graphic estimator is proposed for censored survival data, where dependent censoring is modeled through an Archimedean copula function, which is supposed to be known.
Abstract: In this paper, a copula-graphic estimator is proposed for censored survival data. It is assumed that there is some dependent censoring acting on the variable of interest that may come from an existing competing risk. Furthermore, the full process is independently censored by some administrative censoring time. The dependent censoring is modeled through an Archimedean copula function, which is supposed to be known. An asymptotic representation of the estimator as a sum of independent and identically distributed random variables is obtained, and, consequently, a central limit theorem is established. We investigate the finite sample performance of the estimator through simulations. A real data illustration is included.

20 citations


Journal ArticleDOI
18 Jul 2013-Test
TL;DR: In this article, two different forms of multivariate prior are derived from the elicited univariate beta distributions for the probability of each category, conditional on the probabilities of other categories.
Abstract: This paper addresses the task of eliciting an informative prior distribution for multinomial models. We first introduce a method of eliciting univariate beta distributions for the probability of each category, conditional on the probabilities of other categories. Two different forms of multivariate prior are derived from the elicited beta distributions. First, we determine the hyperparameters of a Dirichlet distribution by reconciling the assessed parameters of the univariate beta conditional distributions. Although the Dirichlet distribution is the standard conjugate prior distribution for multinomial models, it is not flexible enough to represent a broad range of prior information. Second, we use the beta distributions to determine the parameters of a Connor–Mosimann distribution, which is a generalization of a Dirichlet distribution and is also a conjugate prior for multinomial models. It has a larger number of parameters than the standard Dirichlet distribution and hence a more flexible structure. The elicitation methods are designed to be used with the aid of interactive graphical user-friendly software.

19 citations


Journal ArticleDOI
04 Jul 2013-Test
TL;DR: In this paper, a reweighted least trimmed squares (RLTS) estimator is proposed that employs data-dependent weights determined from an initial robust fit, and the RLTS estimator preserves robust properties of the initial robust estimate even if errors exhibit heteroscedasticity, asymmetry or serial correlation.
Abstract: A new class of robust regression estimators is proposed that forms an alternative to traditional robust one-step estimators and that achieves the $\sqrt{n}$ rate of convergence irrespective of the initial estimator under a wide range of distributional assumptions. The proposed reweighted least trimmed squares (RLTS) estimator employs data-dependent weights determined from an initial robust fit. Just like many existing one- and two-step robust methods, the RLTS estimator preserves robust properties of the initial robust estimate. However contrary to existing methods, the first-order asymptotic behavior of RLTS is independent of the initial estimate even if errors exhibit heteroscedasticity, asymmetry, or serial correlation. Moreover, we derive the asymptotic distribution of RLTS and show that it is asymptotically efficient for normally distributed errors. A simulation study documents benefits of these theoretical properties in finite samples.

15 citations


Journal ArticleDOI
01 Sep 2013-Test
TL;DR: The rather new concept of restricted breakdown point will demonstrate that the TCLUST procedure resists to a proportion α of contamination as soon as the data set is sufficiently “well clustered”.
Abstract: Clustering procedures allowing for general covariance structures of the obtained clusters need some constraints on the solutions. With this in mind, several proposals have been introduced in the literature. The TCLUST procedure works with a restriction on the “eigenvalues-ratio” of the clusters scatter matrices. In order to try to achieve robustness with respect to outliers, the procedure allows to trim off a proportion α of the most outlying observations. The resistance to infinitesimal contamination of the TCLUST has already been studied. This paper aims to look at its resistance to a higher amount of contamination by means of the study of its breakdown behavior. The rather new concept of restricted breakdown point will demonstrate that the TCLUST procedure resists to a proportion α of contamination as soon as the data set is sufficiently “well clustered”.

Journal ArticleDOI
01 Mar 2013-Test
TL;DR: A generalized Pólya urn model is introduced with the feature that the evolution of the urn is governed by a function which may change depending on the stage of the process, and a Strong Law of Large Numbers and a Central Limit Theorem are obtained.
Abstract: We introduce a generalized Polya urn model with the feature that the evolution of the urn is governed by a function which may change depending on the stage of the process, and we obtain a Strong Law of Large Numbers and a Central Limit Theorem for this model, using stochastic recurrence techniques. This model is used to represent the evolution of a family of acyclic directed graphs, called random circuits, which can be seen as logic circuits. The model provides asymptotic results for the number of outputs, that is, terminal nodes, of this family of random circuits.

Journal ArticleDOI
01 Mar 2013-Test
TL;DR: In this paper, the authors deal with the problem of estimating the support S of a probability distribution under shape restrictions, where the shape restriction is an extension of the notion of convexity named α-convexity.
Abstract: In this work we deal with the problem of estimating the support S of a probability distribution under shape restrictions. The shape restriction we deal with is an extension of the notion of convexity named α-convexity. Instead of assuming, as in the convex case, the existence of a separating hyperplane for each exterior point of S, we assume the existence of a separating open ball with radius α. Given an α-convex set S, the α-convex hull of independent random points in S is the natural estimator of the set. If α is unknown the r n -convex hull of the sample can be considered being r n a sequence of positive numbers. We analyze the asymptotic properties of the r n -convex hull estimator in the bidimensional case and obtain the convergence rate for the expected distance in measure between the set and the estimator. The geometrical complexity of the estimator and its dependence on r n are also obtained via the analysis of the expected number of vertices of the r n -convex hull.

Journal ArticleDOI
04 Jun 2013-Test
TL;DR: In this article, a multivariate time series model by generalizing the ARMAX process is defined, and conditions on stationarity and analysis of local dependence and domains of attraction are given.
Abstract: We define a new multivariate time series model by generalizing the ARMAX process in a multivariate way. We give conditions on stationarity and analyze local dependence and domains of attraction. As a consequence of the obtained results, we derive new multivariate extreme value distributions. We characterize the extremal dependence by computing the multivariate extremal index and bivariate upper tail dependence coefficients. An estimation procedure for the multivariate extremal index is presented. We also address the marginal estimation and propose a new estimator for the ARMAX autoregressive parameter.

Journal ArticleDOI
01 Mar 2013-Test
TL;DR: In this paper, the authors consider optimal experimental designs for models with correlated observations through a covariance function depending on the magnitude of the responses and show that there exists a huge class of functions that, composed with the mean of the process in some way, preserves positive definiteness and can be used for the purposes of modeling and computing optimal designs in more realistic situations.
Abstract: This paper considers optimal experimental designs for models with correlated observations through a covariance function depending on the magnitude of the responses. This suggests the use of stochastic processes whose covariance structure is a function of the mean. Covariance functions must be positive definite. This fact is nontrivial in this context and constitutes one of the challenges of the present paper. We show that there exists a huge class of functions that, composed with the mean of the process in some way, preserves positive definiteness and can be used for the purposes of modeling and computing optimal designs in more realistic situations. We offer some examples for an easy construction of such covariances and then study the problem of locally D-optimal designs through an illustrative example as well as a real radiation retention model in the human body.

Journal ArticleDOI
08 Feb 2013-Test
TL;DR: In this paper, a U-statistics-based test for null variance components in linear mixed models was proposed and obtained its asymptotic distribution (for increasing number of units) under mild regularity conditions that include only the existence of the second moment for the random effects and of the fourth moment for conditional errors.
Abstract: We propose a U-statistics-based test for null variance components in linear mixed models and obtain its asymptotic distribution (for increasing number of units) under mild regularity conditions that include only the existence of the second moment for the random effects and of the fourth moment for the conditional errors. We employ contiguity arguments to derive the distribution of the test under local alternatives assuming additionally the existence of the fourth moment of the random effects. Our proposal is easy to implement and may be applied to a wide class of linear mixed models. We also consider a simulation study to evaluate the behaviour of the U-test in small and moderate samples and compare its performance with that of exact F-tests and of generalized likelihood ratio tests obtained under the assumption of normality. A practical example in which the normality assumption is not reasonable is included as illustration.

Journal ArticleDOI
15 Aug 2013-Test
TL;DR: In this article, the basic distribution theory of δ-record values, Rn,δ, δ≤0, from a sequence of independent and identically distributed random variables from an absolutely continuous parent is presented.
Abstract: We present the basic distribution theory of δ-record values, Rn,δ, δ≤0, from a sequence of independent and identically distributed random variables from an absolutely continuous parent We obtain recurrent formulas for the density function of Rn,δ and a representation for this random variable that in some sense is similar to Tata’s representation for ordinary records We also give the probability function of inter-δ-record times and some of its properties We give examples of our results and also some elements of inference based on δ-records

Journal ArticleDOI
23 Jul 2013-Test
TL;DR: In this article, a nonparametric copula function estimator for two consecutive survival data which are subject to truncation and right censorship is provided, along with an extension of Spearman's Rho and Kendall's Tau to the present situation.
Abstract: In the analysis of medical data the whole lifetime is often split into pieces characterizing the various stages in the development of a chronical disease. In this paper we provide a nonparametric copula function estimator for two consecutive survival data which are subject to truncation and right censorship. We also discuss an extension of Spearman’s Rho and Kendall’s Tau to the present situation.

Journal ArticleDOI
25 Jul 2013-Test
TL;DR: In this article, the authors discuss the Goodness-of-Fit (GoF) tests for regression models and conclude that there is still too much to say about Goodness ofFit tests for regressions.
Abstract: First for all, we would like to thank the discussants for reading our paper and fortaking time to prepare such interesting and valuable contributions The feeling, afterrevising the discussions and going back to the original version of the paper, is thatthere is still too much to say about Goodness-of-Fit (GoF) tests for regression modelsand the discussants have given a good proof of this Although they qualify the reviewas

Journal ArticleDOI
25 Jul 2013-Test
TL;DR: In this paper, Gonzalez-Manteiga and Crujeiras discuss the problem of nonparametric goodness-of-fit testing when the null hypothesis is non- or semiparametric.
Abstract: We discuss the following two particular aspects of the paper of Gonzalez-Manteiga and Crujeiras ( 10.1007/s11749-013-0327-5 ): First, what changes if the null hypothesis is non- or semiparametric? For example, Rodriguez-Poo et al. (A practical test for misspecification in regression: functional form, separability, and distribution. Econom. Theory, 2013, under revision) considered optimal rates of adaptive nonparametric tests when the null model is semiparametric. A second, though related question is, how serious are the bandwidth and calibration problems? Sperlich (On the choice of regularization parameters in specification testing: a critical discussion. Empir. Econ., 2013, forthcoming) has shown that the unsolved bandwidth selection problems, in particular when calibrating, render nonparametric specification tests useless in practice. Two additional questions are only raised briefly and concern (a) the computational aspects, and (b) the problem that asymptotically, nonparametric omnibus tests might reject almost any null hypothesis as probably no parametric or semiparametric model is 100 % correct. But it is maybe a reasonable and useful approximation. To this aim we recall the idea of testing the problem of so-called ‘precise hypotheses’ as outlined in Dette (Ann. Stat. 27:1012–1040, 1999) for nonparametric goodness-of-fit tests.

Journal ArticleDOI
01 Mar 2013-Test
TL;DR: In this article, the authors proposed several new goodness-of-fit tests for normality based on the distance between the observed sample and the predictive sample drawn from the posterior predictive distribution.
Abstract: In this paper, we propose several new goodness-of-fit tests for normality based on the distance between the observed sample and the predictive sample drawn from the posterior predictive distribution. Note that the predictive sample is stochastic for a set of given sample observations, the distance being consequently random. To circumvent the randomness, we choose the conditional expectation and qth quantile as the test statistics. Two statistics are related to the well-known Shapiro–Francia test, and their asymptotic distributions are derived. The simulation study shows that the new tests are able to better discriminate between the normal distribution and heavy-tailed distributions or mixed normal distributions. Against those alternatives, the new tests are more powerful than existing tests including the Anderson–Darling test and the Shapiro–Wilk test, which are two of the best tests of normality in the literature.

Journal ArticleDOI
01 Mar 2013-Test
TL;DR: In this article, rank-based tests for the two-sample problem with doubly-truncated data were proposed for both nonparametric and semiparametric approaches, where the truncation distribution is parameterized, while the lifetime distribution is left unspecified.
Abstract: A class of rank-based tests is proposed for the two-sample problem with doubly-truncated data. We consider both nonparametric and semiparametric approaches, where the truncation distribution is parameterized, while the lifetime distribution is left unspecified. The asymptotic distribution theory of the test is presented. The small-sample performance of the test is investigated under a variety of situations by means of Monte Carlo simulations. The proposed tests are illustrated using the CDC AIDS Blood Transfusion Data.

Journal ArticleDOI
05 Apr 2013-Test
TL;DR: In this paper, a model-free model-fitting and predictive distributions is proposed to extend this procedure to semiparametric and parametric mixed effects models (MEM) as in practice, these are probably the most popular ones for prediction.
Abstract: Discussing the paper “Model-free model-fitting and predictive distributions” by Politis (2013), we propose to extend this procedure to semiparametric and parametric mixed effects models (MEM) as in practice, these are probably the most popular ones for prediction. Specifically, combining Politis’ prediction method with procedures from Lombardia and Sperlich (Comput. Stat. Data Anal. 56:2903–2917, 2012) and Gonzalez-Manteiga et al. (J. Multivar. Anal. 114:288–302, 2013) yields new MEM-based MF/MB point and interval predictors which can be used for example for small area statistics. Combining Politis’ idea with nonparametric matching estimators may also yield improved (point and interval) estimators for treatment effects and policy evaluation.

Journal ArticleDOI
01 Feb 2013-Test
TL;DR: In this article, the authors considered the problem of estimating the upcrossings index η for a class of stationary sequences satisfying a mild oscillation restriction, and proposed an estimator for the proposed estimator.
Abstract: For stationary sequences, under general dependence restrictions, any limiting point process for time normalized upcrossings of high levels is a compound Poisson process, i.e., there is a clustering of high upcrossings, where the underlying Poisson points represent cluster positions and the multiplicities correspond to cluster sizes. For such classes of stationary sequences, there exists the upcrossings index η, 0≤η≤1, which is directly related to the extremal index θ, 0≤θ≤1, for suitable high levels. In this paper, we consider the problem of estimating the upcrossings index η for a class of stationary sequences satisfying a mild oscillation restriction. For the proposed estimator, properties such as consistency and asymptotic normality are studied. Finally, the performance of the estimator is assessed through simulation studies for autoregressive processes and case studies in the fields of environment and finance. Comparisons with other estimators derived from well known estimators of the extremal index are also presented.

Journal ArticleDOI
01 Mar 2013-Test
TL;DR: In this paper, a new consistent asymptotically distribution-free test for independence of the components of bivariate random variables is proposed, which combines methods of order-selection tests with nonparametric copula density estimation.
Abstract: We suggest a new consistent asymptotically distribution-free test for independence of the components of bivariate random variables. The approach combines methods of order-selection tests with nonparametric copula density estimation. We deduce the asymptotic distribution of the test statistic and investigate the small sample performance by means of a simulation study and a data application.


Journal ArticleDOI
01 Jan 2013-Test
TL;DR: In this article, a robust bandwidth selection and bias reduction procedure was proposed to reduce the variance and mean squared error of quantile-based estimators in small data sets, which can be used for outlier detection in skewed distributions.
Abstract: Many univariate robust estimators are based on quantiles. As already theoretically pointed out by Fernholz (in J. Stat. Plan. Inference 57(1), 29–38, 1997), smoothing the empirical distribution function with an appropriate kernel and bandwidth can reduce the variance and mean squared error (MSE) of some quantile-based estimators in small data sets. In this paper we apply this idea on several robust estimators of location, scale and skewness. We propose a robust bandwidth selection and bias reduction procedure. We show that the use of this smoothing method indeed leads to smaller MSEs, also at contaminated data sets. In particular, we obtain better performances for the medcouple which is a robust measure of skewness that can be used for outlier detection in skewed distributions.


Journal ArticleDOI
25 Jul 2013-Test
TL;DR: With their expertise in GoF, Professors Gonzalez-Manteiga and Crujeiras (GMC) have succeeded in providing an up-to-date review of methods and corresponding theory of tests dealing mostly with the structure of the regression function, and in doing so manage to cover the basic parametric regression model as well as alternative models which may be generally described as nonparametric or semiparametric.
Abstract: First I would like to thank the authors for providing a thorough review of Goodnessof-Fit (GoF) tests for regression models. It is truly an achievement to cover such a wide spectrum of methods of GoF within a paper of reasonable length. With their expertise in GoF, Professors Gonzalez-Manteiga and Crujeiras (GMC) have succeeded in providing an up-to-date review of methods and corresponding theory of tests dealing mostly with the structure of the regression function, and in doing so manage to cover the basic parametric regression model as well as alternative models which may be generally described as nonparametric or semiparametric. They also consider, apart from the typical scenario of independent observations, GoF tests with dependence, and tests under more complex data structures. In what follows I will try to discuss in more detail specific alternative aspects of the regression model. For these aspects one may formulate corresponding hypotheses that may also be included under the general heading ‘GoF for regression’. In addition, I will try to provide some insight on methods of GoF which utilize the characteristic function (CF).

Journal ArticleDOI
Xin-Bing Kong1
30 Jul 2013-Test
TL;DR: A direct approach to estimate the risk for vast portfolios using asynchronous and noisy high-frequency data and it is demonstrated that the mean squared error of the risk estimator can be decreased by choosing an optimal tuning parameter depending on the allocation plan.
Abstract: It is well known that the traditional estimated risk for the Markowitz mean-variance optimization had been demonstrated to seriously depart from its theoretic optimal risk due to accumulation of input estimation errors. Fan et al. (in J. Am. Stat. Assoc. 107:592–606, 2012a) addressed the problem by introducing the gross-exposure constrained mean-variance portfolio selection. In this paper, we present a direct approach to estimate the risk for vast portfolios using asynchronous and noisy high-frequency data. This approach alleviates accumulation of the estimation error of tens of hundreds of integrated volatilities (or co-volatilities), and on the other hand it has the advantage of smoothing away the microstructure noise in the spatial direction. Based on the simple approach, together with the “pre-averaging” technique, we obtain a sharper bound of the risk approximation error than that in Fan et al. (in J. Am. Stat. Assoc. 107:412–428, 2012b). This bound is locally dependent on the allocation plan satisfying the gross-exposure constraint. The bound does not require exponential tail of the distribution of the microstructure noise. Finite fourth moment suffices. Our work also demonstrates that the mean squared error of the risk estimator can be decreased by choosing an optimal tuning parameter depending on the allocation plan. This is more pronounced for the moderately high-frequency data. Our theoretical results are further confirmed by simulations.