scispace - formally typeset
Search or ask a question

Showing papers on "Cross-validation published in 1988"


Journal ArticleDOI
TL;DR: In this paper, the frequency properties of Wahba's Bayesian confidence intervals for smoothing splines are investigated by a large-sample approximation and by a simulation study, and the authors explain why the ACP is accurate for functions that are much smoother than the sample paths prescribed by the prior.
Abstract: The frequency properties of Wahba's Bayesian confidence intervals for smoothing splines are investigated by a large-sample approximation and by a simulation study. When the coverage probabilities for these pointwise confidence intervals are averaged across the observation points, the average coverage probability (ACP) should be close to the nominal level. From a frequency point of view, this agreement occurs because the average posterior variance for the spline is similar to a consistent estimate of the average squared error and because the average squared bias is a modest fraction of the total average squared error. These properties are independent of the Bayesian assumptions used to derive this confidence procedure, and they explain why the ACP is accurate for functions that are much smoother than the sample paths prescribed by the prior. This analysis accounts for the choice of the smoothing parameter (bandwidth) using cross-validation. In the case of natural splines an adaptive method for avo...

274 citations


Journal ArticleDOI
TL;DR: In this article, the selection of the order of the kernel in a kernel density estimator is considered from two points of view: theoretical properties are investigated by a mean integrated squared error analysis of the problem and cross validation is proposed as a practical method of choice.
Abstract: The selection of the order, i.e., number of vanishing moments, of the kernel in a kernel density estimator is considered from two points of view. First, theoretical properties are investigated by a mean integrated squared error analysis of the problem. Second, and perhaps more importantly, cross validation is proposed as a practical method of choice, and theoretical backing for this is provided through an asymptotic optimality result.

78 citations


Journal ArticleDOI
TL;DR: This paper shows how the computations required by generalized cross-validation can be performed as a simple extension of the dynamic programming formulas.
Abstract: Smoothing and differentiation of noisy data using spline functions requires the selection of an unknown smoothing parameter. The method of generalized cross-validation provides an excellent estimate of the smoothing parameter from the data itself even when the amount of noise associated with the data is unknown. In the present model only a single smoothing parameter must be obtained, but in a more general context the number may be larger. In an earlier work, smoothing of the data was accomplished by solving a minimization problem using the technique of dynamic programming. This paper shows how the computations required by generalized cross-validation can be performed as a simple extension of the dynamic programming formulas. The results of numerical experiments are also included.

56 citations


Book ChapterDOI
01 Jan 1988
TL;DR: In the context of the prediction error method for one step ahead prediction in a single time series, a conventional and two cross-validatory procedures are proposed for prediction of squared prediction errors and also for choosing among several predictor families.
Abstract: In the context of the prediction error method for one step ahead prediction in a single time series, a conventional and two cross-validatory procedures are proposed for prediction of squared prediction errors, and also for choosing among several predictor families. These procedures are compared in a simulation study. The conventional procedure appears to perform at least as well as the cross-validatory procedures.

29 citations


Journal ArticleDOI
TL;DR: In this article, two cross-validation procedures are proposed for making a choice between different model structures used for approximate modeling of multivariable systems, and they are shown to be asymptotically equivalent to the generalized Akaike structure selection criteria.
Abstract: Using cross-validation ideas, two procedures are proposed for making a choice between different model structures used for (approximate) modelling of multivariable systems. The procedures are derived under fairly general conditions: the ‘true’ system does not need to be contained in the model set; model structures do not need to be nested and different criteria may be used for model estimation and validation. The proposed structure selection rules are shown to be invariant to parameter scaling. Under certain conditions (essentially requiring that the system belongs to the model set and that the maximum likelihood method is used for parameter estimation) they are shown to be asymptotically equivalent to the (generalized) Akaike structure selection criteria.

27 citations



Journal ArticleDOI
TL;DR: The findings of this study demonstrate the danger of model-form misspecification when one mistakenly assumes that the combination of robust and least squares procedures compensates for a lack of knowledge about the processes underlying the generation of the data.
Abstract: The combination of robust and least squares procedures has frequently been recommended as a useful strategy for constructing models. The application of this strategy to a real-world data set resulted in a model with an incorrect functional form. Additional in-depth investigations into the nature of the application, combined with data-error corrections, made possible the construction of a satisfactory model. The results of the modeling activity were evaluated in terms of model face-validity, the predictive performance on a holdout data set, and the ability to meet user requirements. The findings of this study demonstrate the danger of model-form misspecification when one mistakenly assumes that the combination of robust and least squares procedures compensates for a lack of knowledge about the processes underlying the generation of the data.

12 citations


Journal ArticleDOI
TL;DR: A generalized cross validation method for estimating a good value for the smoothing parameter in using complete splines for fitting noisy data is developed and it is shown that in the limit as the number of data points becomes large, the ratio of the expected value of the error using the estimated parameter to that obtained using the optimal parameter approaches one.
Abstract: In this paper we develop a generalized cross validation (GCV) method for estimating a good value for the smoothing parameter in using complete splines for fitting noisy data. By analyzing the eigenvalues of an appropriate energy matrix, we are able to show that, assuming each measurement is subject to independent identically distributed random errors with mean zero, the method is asymptotically optimal. In particular, we show that in the limit as the number of data points becomes large, the ratio of the expected value of the error using our estimated parameter to that obtained using the optimal parameter approaches one. In addition, we discuss the numerical computation of the smoothing parameter. In this connection, we present a new algorithm for efficiently computing the central bands (and thus the trace) of the inverse of a banded matrix. This algorithm may be of interest in its own right.

5 citations


Book ChapterDOI
01 Jan 1988
TL;DR: A state-of-the-art on methods for estimating spatial covariance structures from available data and their capabilities are illustrated with applications to synthetically generated and field hydro-geological, hydrochemical and isotopic data.
Abstract: In this paper we present a state-of-the-art on methods for estimating spatial covariance structures from available data. Some of these methods were developed for applications in mining engineering and thus do not consider the specific features of groundwater data such as the presence of sampling and measurement errors and the fact that the support of the data changes from point to point. Adjoint state maximum likelihood cross-validation methods are especially suited to deal with such data. These methods also have other desirable properties such as: (1) the use of identification criteria for selecting a covariance model, (2) the computation of parameter estimation errors, (3) the use of a highly efficient numerical algorithm based on adjoint state theory, and (4) the ease of analyzing the issues of parameter identifiability and uniqueness and stability of the solutions. Their capabilities are illustrated with applications to synthetically generated and field hydro-geological, hydrochemical and isotopic data.

5 citations




Book ChapterDOI
01 Jan 1988
TL;DR: This chapter focuses on assessing the number of axes that should be considered in correspondence analysis and reviews the limitations of available tools.
Abstract: Publisher Summary This chapter focuses on assessing the number of axes that should be considered in correspondence analysis. It reviews the limitations of available tools. The problem can be solved through more exploratory approaches based either on least squares approximation properties of the solutions, or on the cross validation techniques. A simple visualization of the matrix makes it possible to see whether factorizations of the total error show an identifiable structure. When this error is small and unstructured, the number of axes used is sufficient. Correspondence analysis users are not provided with efficient testing tools for retaining number of axes to be studied. Experiments have been made to use Bootstrap's and Jackknife's techniques.


Journal ArticleDOI
TL;DR: Using cross-validation ideas, two procedures for making a choice between different model structures used for (approximate) modelling of multivariable systems are proposed and are shown to be asymptotically equivalent to the Akaike structure selection criteria.

Book ChapterDOI
TL;DR: The problem of automatic calibration of histograms by cross-validation is considered, assuming the true underlying density is continuous with continuous first derivative, and it is shown that the classical Sturges’ rule performs poorly and that cross- validation is a relatively difficult task.
Abstract: In this paper the problem of automatic calibration of histograms by cross-validation is considered, assuming the true underlying density is continuous with continuous first derivative. The histogram is one of the simpliest semiparametric estimators used by economists, but it is surprisingly difficult to construct histograms with small estimation errors. Cross-validation algorithms attempt, to automatically determine histogram bin widths that are nearly optimal with respect to mean integrated squared error. Alternative philosophies and approaches of cross-validation for histograms are presented. It is shown that the classical Sturges’ rule performs poorly and that cross-validation is a relatively difficult task. Understanding the performance of cross-validation algorithms in this simple setting should prove valuable when cross-validating other more complex semiparametric procedures.