A comparison of different nonparametric methods for inference on additive models
Abstract: In this article, we highlight the main differences of available methods for the analysis of regression functions that are probably additive separable. We first discuss definition and interpretation of the most common estimators in practice explaining the different ideas of modeling behind each estimator as well as what the procedures are doing to the data. Computational aspects are mentioned explicitly. The discussion concludes with a simulation study on the mean squared error for different marginal integration approaches. Next, various test statistics for checking additive separability are introduced and accomplished with asymptotic theory. For the statistics, different smoothing and bootstrap methods, we perform a detailed simulation study. A main focus in the reported results is directed on the (non)reliability of the methods when the covariates are strongly correlated among themselves. We found that the most striking differences lie in the different pre-smoothers that are used, but less in the differe...
Summary (3 min read)
- In the last ten years additive models have attracted an increasing amount of interest in nonparametric statistics.
- A consequence could be to prefer marginal integration for the construction of additivity tests.
- Further, Dette and Munk (1998) pointed out several drawbacks in the application of Fourier series estimation for checking model assumptions.
- Therefore the present article is mainly concerned about the practical performance of the different procedures and for a better understanding of some of the above mentioned problems in estimating and testing.
- The authors will investigate and explain that the use of the internalized Nadaraya–Watson estimator for the marginal integration can partly ameliorate this problem.
2 Marginal Integration and Additive Models
- Further, m, σ are unknown functions and the regression function m(·) has to be estimated nonparametrically.
- Notice first that in case of additivity, i.e. there exist functions mα, m−α such that m(X) = mα(Xα) + m−α(X−α) (2.2) with X−α being the vector X without the component Xα, the marginal impact of Xα corresponds exactly to the additive component mα.
- The authors estimate the right hand side of equation (2.3) by replacing the expectation by an average and the unknown mutidimensional regression function m by a pre-smoother m̃.
2.1 Formal Definition
- Theory has always been derived for kernel estimators [note that the same happened to the backfitting (Opsomer and Ruppert 1997, Mammen, Linton and Nielsen 1999)].
- Therefore the authors will concentrate only on the kernel based definitions even though spline implementation is known to be computationally more advantageous.
- The authors first give the definition of the classic marginal integration method (CMIE).
- The modification giving us the internalized marginal integration estimate (IMIE) concerns the definition of m̂, equation (2.7), where f̂(xα, Xk,−α) is substituted by f̂(Xjα, Xj,−α), see Jones, Davies and Park (1994) or Kim, Linton and Hengartner (2000) for details.
- Notice that the fraction before Yj in (2.11) is the inverse of the conditional density fα|−α(Xα|X−α).
2.2 On a Better Understanding of Marginal Integration
- Linton and Härdle (1999) already emphasized the differences of backfitting and marginal integration, often they are still interpreted as competing estimators for the same aim.
- For a better understanding of the difference between orthogonal projection into the additive space and measuring the marginal impact (marginal integration) the authors give two more examples.
- Obvious advantages of the IMIE are the possibility of changing the sums and getting rid of the xα in the density estimates, see (2.11).
- Camlong-Viot (2000) chose this for the simpler theoretical analysis of this estimator while Hengartner (1996) showed that the bandwidths conditions for the nuisance corrections depend only on the smoothness of the densities but not, as for the CMIE, on the smoothness of their component functions.
- So far not investigated are differences in the finite sample performance.
2.3 Some Simulation Results
- Since asymptotically all methods are consistent, differences and problems can better be observed for small samples.
- The bandwidths were chosen from h = 0.25std(X) to 1.6std(X) where std(X) is the vector of the empirical standard deviations of the particular design.
- For the calculation of the CV-values the authors used the same trimming at 1.96 (tr5), respectively 1.645 (tr10) in each direction.
- Due to the sparseness of data the IMIE substantially outperforms the other methods, especially in the presence of high correlation.
3 Testing Additivity
- In this section the authors investigate several tests but will only concentrate on statistics based on residuals from an internal marginal integration fit.
- The authors prove asymptotic normality of the corresponding test statistics under the null hypothesis of additivity and fixed alternatives with different rates of convergence corresponding to both cases.
- In the following section the authors investigate the asymptotic behavior of these statistics under the null hypothesis and fixed alternatives.
3.1 Theoretical Results
- The authors assume that the following assumptions are satisfied.
- The authors will come back to this point in the next section.
- Note further that Gozalo and Linton (2000) and Dette and von Lieres (2000) considered weight functions in the definition of the corresponding test statistics based on residuals from the classical marginal integration fit.
- Obviously it depends on many factors like the density of the covariates, kernel choice, error variance function, and the functional ∆ = m − m0 which test has more power.
- T4n might give more reliable results in such cases.
4 Simulation Comparison of Additivity Tests
- In this section the authors continue the considerations of the last part of Section 2 but extend them to the various tests for checking additivity.
- The authors concentrate especially on the differences caused by the use of different pre-smoothers, i.e. they compare CMIE with IMIE, but certainly also consider differences between T1n to T4n.
- Finally, the authors compare the difference in performance between tests using the bootstrap based on residuals taken from Y − m̂0 (B0), as e.g. Gozalo and Linton (2000) or Härdle and Mammen (1993), versus bootstrap based on residuals taken from Y − m̂ (B1) as e.g. Dette and von Lieres (2000).
- The authors took always the bandwidths minimizing the average of the CV values for trimming tr5 and covariance Σ2.
- Due to computational restrictions the authors did the simulations only for 500 bootstrap replications.
4.1 The case d = 2
- Finally, since results for the test statistic T4 depend strongly on the choice of bandwidth g, the authors tried out various bandwidths and report the results for 0.1std(X) (g1), and 0.2std(X) (g2).
- Note that for the ease of presentation all tables will have the same structure.
- In the left part of each Table the results are given under the null hypothesis of additivity, i.e. for scalar a = 0.0; in the right part the authors present results under some alternative (a = 1.0).
- Tables for independent and correlated designs are separated.
- For these reasons all results presented here and in the following are based on bootstrap taking residuals under the null hypothesis.
4.2 The case d = 3
- As for estimation, also for testing the results change significantly when the authors increase the dimension of the model.
- Thus, a power statement or comparison would not make much sense.
- The authors will restrict ourselves on some remarks.
- For sample sizes bigger than n = 150 the simulations with the CMIE took about 10 times longer than with the IMIE (measured in days).the authors.
- The authors turn to highly correlated designs, i.e. using Σ3.
Did you find this useful? Give us your feedback
Related Papers (5)