scispace - formally typeset
Search or ask a question

Showing papers on "Model selection published in 1993"


Journal ArticleDOI
Jun Shao1
TL;DR: In this article, the authors show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n v -out crossvalidation with n v, the number of observations reserved for validation, satisfying n v /n → 1 as n → ∞.
Abstract: We consider the problem of selecting a model having the best predictive ability among a class of linear models. The popular leave-one-out cross-validation method, which is asymptotically equivalent to many other model selection methods such as the Akaike information criterion (AIC), the C p , and the bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the model with the best predictive ability does not converge to 1 as the total number of observations n → ∞. We show that the inconsistency of the leave-one-out cross-validation can be rectified by using a leave-n v -out cross-validation with n v , the number of observations reserved for validation, satisfying n v /n → 1 as n → ∞. This is a somewhat shocking discovery, because nv/n → 1 is totally opposite to the popular leave-one-out recipe in cross-validation. Motivations, justifications, and discussions of some practical aspects of the use of the leave-n v -out cross-validation method are provided, and results ...

1,700 citations


Journal ArticleDOI
TL;DR: In this article, a wide range of multiple time series models and methods are considered, including vector autoregressive, vector auto-regressive moving average, cointegrated and periodic processes as well as state space and dynamic simultaneous equations models.
Abstract: This graduate-level textbook deals with analyzing and forecasting multiple time series. It considers a wide range of multiple time series models and methods. The models include vector autoregressive, vector autoregressive moving average, cointegrated and periodic processes as well as state space and dynamic simultaneous equations models. Least squares, maximum likelihood and Bayesian methods are considered for estimating these models. Different procedures for model selection or specification are treated and a range of tests and criteria for evaluating the adequacy of a chosen model are introduced. The choice of point and interval forecasts as well as innovation accounting are presented as tools for structural analysis within the multiple time series context.

623 citations


Journal ArticleDOI
TL;DR: Two notions of multi-fold cross validation (MCV and MCV*) criteria are considered and it turns out that MCV indeed reduces the chance of overfitting.
Abstract: A natural extension of the simple leave-one-out cross validation (CV) method is to allow the deletion of more than one observations. In this article, several notions of the multifold cross validation (MCV) method have been discussed. In the context of variable selection under a linear regression model, we show that the delete-d MCV criterion is asymptotically equivalent to the well known FPE criterion. Two computationally more feasible methods, the r-fold cross validation and the repeated learning-testing criterion, are also studied. The performance of these criteria are compared with the simple leave-one-out cross validation method. Simulation results are obtained to gain some understanding on the small sample properties of these methods.

531 citations


Proceedings Article
29 Nov 1993
TL;DR: This paper focuses on the special case of leave-one-out cross validation applied to memory-based learning algorithms, but it is argued that it is applicable to any class of model selection problems.
Abstract: Selecting a good model of a set of input points by cross validation is a computationally intensive process, especially if the number of possible models or the number of training points is high. Techniques such as gradient descent are helpful in searching through the space of models, but problems such as local minima, and more importantly, lack of a distance metric between various models reduce the applicability of these search methods. Hoeffding Races is a technique for finding a good model for the data by quickly discarding bad models, and concentrating the computational effort at differentiating between the better ones. This paper focuses on the special case of leave-one-out cross validation applied to memory-based learning algorithms, but we also argue that it is applicable to any class of model selection problems.

349 citations


Book
01 Nov 1993
TL;DR: The Prelude to Bootstrap and Cross Validation: Validation of Time Series Problems and Further Bootstrap Results.
Abstract: Preface. Prelude. Computer Intensive Philosophy. Cross Validation. Validation of Time Series Problems. Statistical Bootstrap. Further Bootstrap Results. Computer Intensive Applications. References. Index.

262 citations


Journal ArticleDOI
TL;DR: This paper shows the general superiority of the ''extended'' nonconvergent methods compared to classical penalty term methods, simple stopped training, and methods which only vary the number of hidden units.

246 citations


Journal ArticleDOI
TL;DR: In this article, average derivative functionals of regression are proposed for nonparametric model selection and diagnostics, which can be used to reduce the dimensionality of the model, assess the relative importance of predictors, measure the extent of nonlinearity and nonadditivity.
Abstract: Average derivative functionals of regression are proposed for nonparametric model selection and diagnostics. The functionals are of the integral type, which under certain conditions allows their estimation at the usual parametric rate of n –1/2. We analyze asymptotic properties of the estimators of these functionals, based on kernel regression. These estimators can then be used for assessing the validity of various restrictions imposed on the form of regression. In particular, we show how they could be used to reduce the dimensionality of the model, assess the relative importance of predictors, measure the extent of nonlinearity and nonadditivity, and, under certain conditions, help identify projection directions in projection pursuit models and decide on the number of these directions.

146 citations


Journal ArticleDOI
TL;DR: In this article, the qualitative robustness properties of the Schwarz information criterion (SIC) based on objective functions defining M-estimators were studied and the crucial restriction needed to achieve robustness in model selection is the uniform boundedness of the objective function.
Abstract: This paper studies the qualitative robustness properties of the Schwarz information criterion (SIC) based on objective functions defining M-estimators. A definition of qualitative robustness appropriate for model selection is provided and it is shown that the crucial restriction needed to achieve robustness in model selection is the uniform boundedness of the objective function. In the process, the asymptotic performance of the SIC for general M-estimators is also studied. The paper concludes with a Monte Carlo study of the finite sample behavior of the SIC for different specifications of the sample objective function.

132 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed a new approach for selecting appropriate growth curve models for a given set of data, prior to fitting the models, based on the characteristics of the data sets.

124 citations


Journal ArticleDOI
TL;DR: Grewal et al. as mentioned in this paper compared five models, using data for 71 texturally diverse New Zealand soils: a one-parameter (p = 1) Jaky model borrowed from geotechnics; the standard lognormal model (p= 2); two modified lognorm models (each withp= 3); and the bimodal lognorical model ( p = 3).
Abstract: We investigated a relatively unexplored area of soil science: the fitting of parameterized models to particle-size distribution (a subject more thoroughly explored in sedimentology). Comparative fitting of different models requires the use of statistical indices enabling rational selection of an optimum model, i.e., a model that balances the improvement in fit often achieved by increasing the number of parameters,/>, against model simplicity retained by minimizing p. Five models were tested on cumulative mass-size data for 71 texturally diverse New Zealand soils: a one-parameter (p = 1) Jaky model borrowed from geotechnics; the standard lognormal model (p = 2); two modified lognormal models (each withp= 3); and the bimodal lognormal model (p = 3). The Jaky and modified lognormal models have not previously been introduced into the soil science literature. Three statistical comparators were used: the coefficient of determination, R; the F statistic; and the Cf statistic of Mallows. The bimodal model and one modified lognormal model (denoted ORL) best fit the data. The bimodal model gave a marginally better fit, but incorporates a sub-clay mode (untestable with the present data), so we adopted the ORL model as the physically best benchmark for comparison of other models. The simple Jaky oneparameter model gave a good fit to data for many of the soils, better than the standard lognormal model for 23 soils. The model comparison methods described have potential utility in other areas of soil science. The Cp statistic is advocated as the best statistic for model selection. A FREQUENT NEED in soil science is to fit parameterized models to data. Examples include the fitting of adjustable, analytic functions to data for the soil moisture characteristic, hydraulic conductivity function, or PSD. Often several candidate models exist, posing the problem of choice. In general, algorithms for fitting such models minimize an aggregated discrepancy between observed and model-estimated data. A lower bound to this discrepancy is set by experimental errors in the observed data. Often (though not always), increasing/; in a model will improve the fit; however, increasing/? may sacrifice simplicity and utility of the model, and may simply be an empirical expedient for conforming the model to fit the data. The first test for admitting an additional parameter is to check for its statistical significance. This can be done via a Student's Mest or Wald test (Gallant, 1987). Failure in this test means the additional parameter overparameterizes the model. Also, if the aggregate error produced by the model is less than random experimental error, the model is again overparameterized, though in a different sense. Selection of an optimum model from a group thus requires use of a sensitive discriminating statistic. Here, an optimum model is defined as one selected by balancing the minimization of some objective function (measuring aggregate discrepancy) against minimization of p. We explored the application of new parametric G.D. Buchan and K.S. Grewal, Dep. of Soil Science, and A.B. Robson, Centre for Computing and Biometrics, Lincoln Univ., Canterbury, New Zealand. Received 27 Aug. 1991. "Corresponding author. Published in Soil Sci. Soc. Am. J. 57:901-908 (1993). models for soil PSD. We compared five models, using data for 71 New Zealand soils. Three of these models are, as far as we are aware, new to the soil science literature. Three model comparison techniques were compared: the coefficient of determination (R), the F statistic, and the Cp statistic of Mallows (1973). Modeling of PSD is a poorly researched area in soil science, in strong contrast to sedimentology, geology, and geotechnics, where diverse model forms have been explored, ranging from the Jaky one-parameter model (Jaky, 1944) to the more recent log-hyperbolic (Bagnold and Barndorff-Nielsen, 1980) and log-skew Laplace models (Fieller et al., 1984; Flenley et al., 1987). Recently, Shiozawa and Campbell (1991) proposed a bimodal lognormal model, comparing it with a unimodal lognormal model, using R* as a model-selection criterion. The two methods of model comparison proposed (i.e., F and C_) enable rational selection of an optimum model for PSD, and serve as better discriminators than R.

108 citations


Journal ArticleDOI
TL;DR: In this article, the authors present a model selection procedure for Bayesian vector autoregression (BVAR) forecasting models for forecasting macroeconomic and regional economic variables in a business forecasting application.

Book ChapterDOI
01 Jan 1993
TL;DR: In this paper, the authors describe a method of nonlinear time series analysis suitable for nonlinear, stationary, multivariate processes whose one-step-ahead conditional density depends on a finite number of lags.
Abstract: We describe a method of nonlinear time series analysis suitable for nonlinear, stationary, multivariate processes whose one-step-ahead conditional density depends on a finite number of lags. Such a density can be represented as a Hermite expansion. Certain parameters of the expansion can be set to imply sharp restrictions on the process such as a pure VAR, a pure ARCH, a nonlinear process with homogeneous innovations, etc. The model is fitted using maximum likelihood procedures on a truncated expansion together with a model selection strategy that determines the truncation point. The estimator is consistent for the true density with respect to a strong norm. The norm is strong enough to imply consistency of evaluation functionals and moments of the conditional density. We describe a method of simulating from the density. Simulation can be used for a great variety of applications. In this paper, we give special attention to using simulations to set sup-norm confidence bands. Fortran code is available via ftp anonymous at ccvr1.cc.ncsu.edu (128.109.212.20) in directory pub/arg/snp; alternatively, it is available from the authors in the form of a DOS formatted diskette. The code is provided at no charge for research purposes without warranty.

Journal ArticleDOI
TL;DR: A weighted Euclidean distance model for analyzing three-way proximity data is proposed that incorporates a latent class approach and removes the rotational invariance of the classical multidimensional scaling model retaining psychologically meaningful dimensions, and drastically reduces the number of parameters in the traditional INDSCAL model.
Abstract: A weighted Euclidean distance model for analyzing three-way proximity data is proposed that incorporates a latent class approach. In this latent class weighted Euclidean model, the contribution to the distance function between two stimuli is per dimension weighted identically by all subjects in the same latent class. This model removes the rotational invariance of the classical multidimensional scaling model retaining psychologically meaningful dimensions, and drastically reduces the number of parameters in the traditional INDSCAL model. The probability density function for the data of a subject is posited to be a finite mixture of spherical multivariate normal densities. The maximum likelihood function is optimized by means of an EM algorithm; a modified Fisher scoring method is used to update the parameters in the M-step. A model selection strategy is proposed and illustrated on both real and artificial data.

Journal ArticleDOI
TL;DR: A new lower bound is provided for prediction without refitting, while a lower bound for prediction with refitting was given by Rissanen, and a set of sufficient conditions for a model selection criterion to achieve these bounds are specified.
Abstract: This paper discusses the topic of model selection for finite-dimensional normal regression models. We compare model selection criteria according to prediction errors based upon prediction with refitting, and prediction without refitting. We provide a new lower bound for prediction without refitting, while a lower bound for prediction with refitting was given by Rissanen. Moreover, we specify a set of sufficient conditions for a model selection criterion to achieve these bounds. Then the achievability of the two bounds by the following selection rules are addressed: Rissanen's accumulated prediction error criterion (APE), his stochastic complexity criterion, AIC, BIC and the FPE criteria. In particular, we provide upper bounds on overfitting and underfitting probabilities needed for the achievability. Finally, we offer a brief discussion on the issue of finite-dimensional vs. infinite-dimensional model assumptions.

Journal ArticleDOI
TL;DR: In this paper, a bivariate mixed lognormal (Δ2) distribution is proposed for representing rainfalls, containing zeros, measured at two monitoring sites and provides the maximum-likelihood (ML) estimates or ML estimating equations of the parameters for the Δ2 distribution.
Abstract: The paper proposes a bivariate Mixed lognormal (Δ2) distribution as a probability model for representing rainfalls, containing zeros, measured at two monitoring sites and provides the maximum-likelihood (ML) estimates or ML estimating equations of the parameters for the Δ2 distribution. The distribution extends the univariate mixed lognormal distribution to the bivariate case. Two procedures for model selection are proposed: the first is the use of statistical test and the second is done by minimizing the Akaike's information criterion (AIC). An illustration for an analysis of the AMeDAS (Automated Meteorological Data Acquisition System) daily rainfall dataset observed in the summer half-year (May–October) of 1988 is given. A Δ2 distribution is fitted to the bivariate data obtained from Tokyo and each of the other 1048 monitoring sites. Among the 1048 cases investigated, the agreement between the models selected by the test (5% significance level) and the minimum AIC procedures is 846 cases (80.7...

Journal ArticleDOI
TL;DR: Following the information-theoretic approach to model selection, the authors develop criteria for detection of the number of damped/undamped sinusoids that are matched to the singular-value-decomposition (SVD)-based methods, such as modified forward/backward and forward-backward linear prediction.
Abstract: Following the information-theoretic approach to model selection, the authors develop criteria for detection of the number of damped/undamped sinusoids. These criteria are matched to the singular-value-decomposition (SVD)-based methods, such as modified forward/backward and forward-backward linear prediction, so well that the extra computations needed over and above those required for computing the SVD are marginal. Next, an analytical framework for analyzing the performance of these criteria is developed. In the development of the analysis, some approximations which become better for large signal-to-noise-ratio are made. Simulations are used to verify the usefulness of the analysis, and to compare the performance of the method with that of J.J. Fuchs (1988). >

Journal ArticleDOI
TL;DR: In this article, a modified model selection procedure based on a new appreciation function was proposed, which was shown to perform better than the original one in terms of model selection performance in chemical data sets.
Abstract: Regularized discriminant analysis has proven to be a most effective classifier for problems where traditional classifiers fail because of a lack of sufficient training samples, as is often the case in highdimensional settings. However, it has been shown that the model selection procedure of regularized discriminant analysis, determining the degree of regularization, has some deficiencies associated with it. We propose a modified model selection procedure base on a new appreciation function. By means of an extensive simulation it was shown that the new model selection procedure performs better than the original one. We also propose that one of the control parameters of regularized discriminant analysis be allowed to take on negative values. This extension leads to an improved performance in certain situations. The results are confirmed using two chemical data sets.

Book ChapterDOI
09 Jul 1993
TL;DR: In this paper, the authors examine the tradeoffs involved in using temporal influence diagrams (TIDs) which adequately capture the temporal evolution of a dynamic system without prohibitive data and computational requirements.
Abstract: This paper addresses the tradeoffs which need to be considered in reasoning using probabilistic network representations, such as Influence Diagrams (IDs). In particular, we examine the tradeoffs entailed in using Temporal Influence Diagrams (TIDs) which adequately capture the temporal evolution of a dynamic system without prohibitive data and computational requirements. Three approaches for TID construction which make different tradeoffs are examined: (1) tailoring the network at each time interval to the data available (rather then just copying the original Bayes Network for all time intervals); (2) modeling the evolution of a parsimonious subset of variables (rather than all variables); and (3) model selection approaches, which seek to minimize some measure of the predictive accuracy of the model without introducing too many parameters, which might cause "overfitting" of the model. Methods of evaluating the accuracy/efficiency of the tradeoffs are proposed.

Journal ArticleDOI
01 Jan 1993
TL;DR: The model (type) selection process is described, why support for this should be integral to MMS design is argued, and an approach to the design of the model selection subsystem in an integrated DSS is overviewed.
Abstract: Effective computer based support for the use of analytic models in management decision making requires model management systems (MMS) that facilitate all phases of the modeling process. Existing approaches to the design of MMS commonly assume that the type of model needed to solve each problem is predetermined by the decision maker. This is a limited view, since determination of the appropriate model type is a difficult task, and is hampered by the subjective preferences of individuals. In this paper, we describe the model (type) selection process, argue why support for this should be integral to MMS design, and overview an approach to the design of the model selection subsystem in an integrated DSS.

Journal ArticleDOI
TL;DR: In this paper, a model selection method utilizing neural networks has been developed to perform automated spectral predictions using a library of previously generated regression equations (models), which contains models capable of simulating the 13 C NMR spectra for various classes of organic compounds.
Abstract: A model selection method utilizing neural networks has been developed to perform automated spectral predictions using a library of previously generated regression equations (models). The library contains models capable of simulating the 13 C NMR spectra for various classes of organic compounds. The 4018 carbon atoms used to develop the 75 models were utilized to train the network to relate the chemical environment surrounding each of the atoms to the models which they were used to develop

Journal ArticleDOI
TL;DR: The main issues addressed are: (a) the comparison of different nonparametric regression methods in this context, and (b) how to do model selection, i.e., given a (finite) set of candidate spline functions, select the (possibly unique) best one using some (statistically based) selection criteria.
Abstract: We analyze in detail the estimation problem associated with the following problem Givenn noisy measurements (y i ,i=1, ,n) of the response of a system to an input (A(t) wheret indicates time), obtain an estimate ofA(t) given a knownK(t) (the unit impulse response function of the system) under the model: $$y_i = \int_0^{t_i } {A(s)K(t_i - s)ds + \varepsilon _i } $$ where e1,e n are independent identically distributed random variables with mean zero and common finite variance In the solution to the problem, the unknown function is represented by a spline function, and the problem is recast in terms of (inequality constrained) linear regression The main issues addressed are: (a) the comparison of different nonparametric regression methods in this context, and (b) how to do model selection, ie, given a (finite) set of candidate spline functions, select the (possibly unique) best one using some (statistically based) selection criteria Different spline candidate sets, and different asymptotic and resampling-based statistical selection criteria are compared by means of simulations Due to the particular nature of the estimation problem, modifications to the criteria are suggested Applications to simulated and real pharmacokinetics data are reported

Journal ArticleDOI
TL;DR: In this paper, the authors show how probability theory can be used to address such questions in a simple and straight forward manner, in which probability theory is used to solve the model selection problem in many data analysis problems in science.

Proceedings Article
29 Nov 1993
TL;DR: Experimental and theoretical work indicates that the performance of neural networks can be improved by considering methods for combining neural networks.
Abstract: The past several years have seen a tremendous growth in the complexity of the recognition, estimation and control tasks expected of neural networks. In solving these tasks, one is faced with a large variety of learning algorithms and a vast selection of possible network architectures. After all the training, how does one know which is the best network? This decision is further complicated by the fact that standard techniques can be severely limited by problems such as over-fitting, data sparsity and local optima. The usual solution to these problems is a winner-take-all cross-validatory model selection. However, recent experimental and theoretical work indicates that we can improve performance by considering methods for combining neural networks.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of model selection based on Pearson chi-square type statistics and propose some convenient asymptotically standard normal tests for model selection, which have the desirable feature that neither model needs to be correctly specified nor nested in each other.


Journal ArticleDOI
Claus Weihs1
TL;DR: Purely graphical multivariate tools such as 3D rotation and scatterplot matrices are discussed after having introduced the univariate and bivariate tools on which they are based.
Abstract: Exploratory data analysis (EDA) is a toolbox of data manipulation methods for looking at data to see what they seem to say, i.e. one tries to let the data speak for themselves. In this way there is hope that the data will lead to indications about ‘models’ of relationships not expected a priori. In this respect EDA is a pre-step to confirmatory data analysis which delivers measures of how adequate a model is. In this tutorial the focus is on multivariate exploratory data analysis for quantitative data using linear methods for dimension reduction and prediction. Purely graphical multivariate tools such as 3D rotation and scatterplot matrices are discussed after having introduced the univariate and bivariate tools on which they are based. The main tasks of multivariate exploratory data analysis are identified as ‘search for structure’ by dimension reduction and ‘model selection’ by comparing predictive power. Resampling is used to support validity, and variables selection to improve interpretability.

Journal ArticleDOI
TL;DR: Computer-intensive statistical methods, especially the bootstrap, are suitable for studying the variability of predictions and present new opportunities for modelling spatial distribution and change to distribution at a regional scale.

Proceedings Article
01 Jan 1993
TL;DR: This paper demonstrates the utility of a preference-based mechanism for model selection that can apply previously identified model selection criteria and can be added, modified, or removed without modifying the interpreter or other planning knowledge.
Abstract: The range of possible domain models on which an explanation can be based is often large, yet human explainers are able to choose models that address a questioner's informative needs without undue obscurity. However, few existing explanation systems use knowledge bases providing multiple models of their topics of explanation let alone account for the selection of a model for a given explanation. This paper demonstrates the utility of a preference-based mechanism for model selection. Selection heuristics are made explicit as preferences and can be added, modified, or removed without modifying the interpreter or other planning knowledge. The mechanism is more general than previous mechanisms for model selection or “perspective,” and can apply previously identified model selection criteria.

Journal ArticleDOI
TL;DR: This paper considers the analysis of record breaking data sets, where only observations that exceed, or only those that fall below, the current extreme value are recorded, and presents a numerical example involving records in Olympic high jump competition, where besides estimation, related issues in model selection and prediction are addressed.
Abstract: SUMMARY In this paper we consider the analysis of record breaking data sets, where only observations that exceed, or only those that fall below, the current extreme value are recorded. Example application areas include industrial stress testing, meteorological analysis, sporting and athletic events, and oil and mining surveys. A closely related area is that of threshold modelling, where the observations are those that cross a certain threshold value. The inherent missing data structure present in these problems leads to likelihood functions that contain possibly high-dimensional integrals, rendering traditional maximum likelihood methods difficult or not feasible. Fortunately, we may obtain arbitrarily accurate approximations to the likelihood function by iteratively applying Monte Carlo integration methods (Geyer & Thompson, 1992). Subiteration using the Gibbs sampler may help to evaluate any multivariate integrals encountered during this process. This approach can handle far more sophisticated parametric models than have been used previously in record breaking and threshold data contexts. In particular, the methodology allows for observations that are dependent and subject to mean shifts over time. We present a numerical example involving records in Olympic high jump competition, where besides estimation we also address related issues in model selection and prediction.

Proceedings Article
11 Jul 1993
TL;DR: The approach is based on the idea that artifact performance models for computer-aided design should be chosen in light of the design decisions they are required to support, and has developed a technique called "Gradient Magnitude Model Selection" (GMMS), which embodies this principle.
Abstract: Models of physical systems can differ according to computational cost, accuracy and precision, among other things. Depending on the problem solving task at hand, different models will be appropriate. Several investigators have recently developed methods of automatically selecting among multiple models of physical systems. Our research is novel in that we are developing model selection techniques specifically suited to computer-aided design. Our approach is based on the idea that artifact performance models for computer-aided design should be chosen in light of the design decisions they are required to support. We have developed a technique called "Gradient Magnitude Model Selection" (GMMS), which embodies this principle. GMMS operates in the context of a hillclimbing search process. It selects the simplest model that meets the needs of the hillclimbing algorithm in which it operates. We are using the domain of sailing yacht design as a testbed for this research. We have implemented GMMS and used it in hillclimbing search to decide between a computationally expensive potential-flow program and an algebraic approximation to analyze the performance of sailing yachts. Experimental tests show that GMMS makes the design process faster than it would be if the most expensive model were used for all design evaluations. GMMS achieves this performance improvement with little or no sacrifice in the quality of the resulting design.