scispace - formally typeset
Search or ask a question

Showing papers on "Model selection published in 2005"


Book
01 Jan 2005
TL;DR: In this article, the authors lay out the basic model and select groups as an approximate approximation of the original model, and statistically link group membership to Covariates, and add covariates to the trajectories themselves.
Abstract: Acknowledgments 1. Introduction and Rationale PART I. LAYING OUT THE BASIC MODEL 2. The Basic Model 3. Groups as an Approximation 4. Model Selection 5. Posterior Group-Membership Probabilities PART II. GENERALIZING THE BASIC MODEL 6. Statistically Linking Group Membership to Covariates 7. Adding Covariates to the Trajectories Themselves 8. Dual Trajectory Analysis 9. Concluding Observations References Index

2,771 citations


Journal ArticleDOI
TL;DR: In this paper, a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration is proposed, which is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest.
Abstract: Summary. Probabilistic forecasts of continuous variables take the form of predictive densities or predictive cumulative distribution functions. We propose a diagnostic approach to the evaluation of predictive performance that is based on the paradigm of maximizing the sharpness of the predictive distributions subject to calibration. Calibration refers to the statistical consistency between the distributional forecasts and the observations and is a joint property of the predictions and the events that materialize. Sharpness refers to the concentration of the predictive distributions and is a property of the forecasts only. A simple theoretical framework allows us to distinguish between probabilistic calibration, exceedance calibration and marginal calibration. We propose and study tools for checking calibration and sharpness, among them the probability integral transform histogram, marginal calibration plots, the sharpness diagram and proper scoring rules. The diagnostic approach is illustrated by an assessment and ranking of probabilistic forecasts of wind speed at the Stateline wind energy centre in the US Pacific Northwest. In combination with cross-validation or in the time series context, our proposal provides very general, nonparametric alternatives to the use of information criteria for model diagnostics and model selection.

1,537 citations


Journal ArticleDOI
TL;DR: This work compares several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection, and finds that LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis and the .632+ bootstrap has the lowest mean square error.
Abstract: Motivation: In genomic studies, thousands of features are collected on relatively few samples. One of the goals of these studies is to build classifiers to predict the outcome of future observations. There are three inherent steps to this process: feature selection, model selection and prediction assessment. With a focus on prediction assessment, we compare several methods for estimating the 'true' prediction error of a prediction model in the presence of feature selection. Results: For small studies where features are selected from thousands of candidates, the resubstitution and simple split-sample estimates are seriously biased. In these small samples, leave-one-out cross-validation (LOOCV), 10-fold cross-validation (CV) and the .632+ bootstrap have the smallest bias for diagonal discriminant analysis, nearest neighbor and classification trees. LOOCV and 10-fold CV have the smallest bias for linear discriminant analysis. Additionally, LOOCV, 5- and 10-fold CV, and the .632+ bootstrap have the lowest mean square error. The .632+ bootstrap is quite biased in small sample sizes with strong signal-to-noise ratios. Differences in performance among resampling methods are reduced as the number of specimens available increase. Contact: annette.molinaro@yale.edu Supplementary Information: A complete compilation of results and R code for simulations and analyses are available in Molinaro et al. (2005) (http://linus.nci.nih.gov/brb/TechReport.htm).

1,128 citations


Journal ArticleDOI
TL;DR: This paper introduces a variable selection method referred to as a rescaled spike and slab model, and studies the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization.
Abstract: Variable selection in the linear regression model takes many apparent faces from both frequentist and Bayesian standpoints. In this paper we introduce a variable selection method referred to as a rescaled spike and slab model. We study the importance of prior hierarchical specifications and draw connections to frequentist generalized ridge regression estimation. Specifically, we study the usefulness of continuous bimodal priors to model hypervariance parameters, and the effect scaling has on the posterior mean through its relationship to penalization. Several model selection strategies, some frequentist and some Bayesian in nature, are developed and studied theoretically. We demonstrate the importance of selective shrinkage for effective variable selection in terms of risk misclassification, and show this is achieved using the posterior from a rescaled spike and slab model. We also show how to verify a procedure’s ability to reduce model uncertainty in finite samples using a specialized forward selection strategy. Using this tool, we illustrate the effectiveness of rescaled spike and slab models in reducing model uncertainty.

1,006 citations


Journal ArticleDOI
TL;DR: A statistical framework based on the point process likelihood function to relate a neuron's spiking probability to three typical covariates: the neuron's own spiking history, concurrent ensemble activity, and extrinsic covariates such as stimuli or behavior.
Abstract: Multiple factors simultaneously affect the spiking activity of individual neurons. Determining the effects and relative importance of these factors is a challenging problem in neurophysiology. We propose a statistical framework based on the point process likelihood function to relate a neuron's spiking probability to three typical covariates: the neuron's own spiking history, concurrent ensemble activity, and extrinsic covariates such as stimuli or behavior. The framework uses parametric models of the conditional intensity function to define a neuron's spiking probability in terms of the covariates. The discrete time likelihood function for point processes is used to carry out model fitting and model analysis. We show that, by modeling the logarithm of the conditional intensity function as a linear combination of functions of the covariates, the discrete time point process likelihood function is readily analyzed in the generalized linear model (GLM) framework. We illustrate our approach for both GLM and non-GLM likelihood functions using simulated data and multivariate single-unit activity data simultaneously recorded from the motor cortex of a monkey performing a visuomotor pursuit-tracking task. The point process framework provides a flexible, computationally efficient approach for maximum likelihood estimation, goodness-of-fit assessment, residual analysis, model selection, and neural decoding. The framework thus allows for the formulation and analysis of point process models of neural spiking activity that readily capture the simultaneous effects of multiple covariates and enables the assessment of their relative importance.

982 citations


Journal ArticleDOI
TL;DR: Some myths about model selection are debunked, in particular the myth that consistent model selection has no effect on subsequent inference asymptotically and an “impossibility” result regarding the estimation of the finite-sample distribution of post-model-selection estimators.
Abstract: Model selection has an important impact on subsequent inference. Ignoring the model selection step leads to invalid inference. We discuss some intricate aspects of data-driven model selection that do not seem to have been widely appreciated in the literature. We debunk some myths about model selection, in particular the myth that consistent model selection has no effect on subsequent inference asymptotically. We also discuss an “impossibility” result regarding the estimation of the finite-sample distribution of post-model-selection estimators.

680 citations


Journal ArticleDOI
TL;DR: The conditional Akaike information (CAIC) as discussed by the authors was proposed for both maximum likelihood and residual maximum likelihood estimation of linear mixed-effects models in the analysis of clustered data, and the penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects.
Abstract: SUMMARY This paper focuses on the Akaike information criterion, AIC, for linear mixed-effects models in the analysis of clustered data. We make the distinction between questions regarding the population and questions regarding the particular clusters in the data. We show that the AIC in current use is not appropriate for the focus on clusters, and we propose instead the conditional Akaike information and its corresponding criterion, the conditional AIC, CAIC. The penalty term in CAIC is related to the effective degrees of freedom p for a linear mixed model proposed by Hodges & Sargent (2001); p reflects an intermediate level of complexity between a fixed-effects model with no cluster effect and a corresponding model with fixed cluster effects. The CAIC is defined for both maximum likelihood and residual maximum likelihood estimation. A pharmacokinetics data appli cation is used to illuminate the distinction between the two inference settings, and to illustrate the use of the conditional AIC in model selection.

559 citations


Journal ArticleDOI
TL;DR: A methodology for model selection based on a penalized contrast is developed, and an adaptive choice of the penalty function for automatically estimating the dimension of the model, i.e., the number of change points, is proposed.

554 citations


Journal ArticleDOI
01 Oct 2005-Ecology
TL;DR: The theory behind AIC is reviewed and how it can be used to test ecological theory by considering two example studies of foraging, motivated by simple foraging theory, and plausible truths for the two studies are presented.
Abstract: Ecologists are increasingly applying model selection to their data analyses, primarily to compare regression models. Model selection can also be used to compare mechanistic models derived from ecological theory, thereby providing a formal framework for testing the theory. The Akaike Information Criterion (AIC) is the most commonly adopted criterion used to compare models; however, its performance in general is not very well known. The best model according to AIC has the smallest expected Kullback-Leibler (K-L) distance, which is an information-theoretic measure of the difference between a model and the truth. I review the theory behind AIC and demonstrate how it can be used to test ecological theory by considering two example studies of foraging, motivated by simple foraging theory. I present plausible truths for the two studies, and models that can be fit to the foraging data. K-L distances are calculated for simulated studies, which provide an appropriate test of AIC. Results support the use of a commonly adopted rule of thumb for selecting models based on AIC differences. However, AICc, a corrected version of AIC commonly used to reduce model selection bias, showed no clear improvement, and model averaging, a technique to reduce model prediction bias, gave mixed results.

506 citations


Journal ArticleDOI
TL;DR: This article proposes a freeway travel time prediction framework that exploits a recurrent neural network topology, the so-called state-space neural network (SSNN), with preprocessing strategies based on imputation that appears to be robust to the “damage” done by these imputation schemes.
Abstract: Accuracy and robustness with respect to missing or corrupt input data are two key characteristics for any travel time prediction model that is to be applied in a real-time environment (e.g. for display on variable message signs on freeways). This article proposes a freeway travel time prediction framework that exhibits both qualities. The framework exploits a recurrent neural network topology, the so-called state-space neural network (SSNN), with preprocessing strategies based on imputation. Although the SSNN model is a neural network, its design (in terms of input- and model selection) is not “black box” nor location-specific. Instead, it is based on the lay-out of the freeway stretch of interest. In this sense, the SSNN model combines the generality of neural network approaches, with traffic related (“white-box”) design. Robustness to missing data is tackled by means of simple imputation (data replacement) schemes, such as exponential forecasts and spatial interpolation. Although there are clear theoretical shortcomings to “simple” imputation schemes to remedy input failure, our results indicate that their use is justified in this particular application. The SSNN model appears to be robust to the “damage” done by these imputation schemes. This is true for both incidental (random) and structural input failure. We demonstrate that the SSNN travel time prediction framework yields good accurate and robust travel time predictions on both synthetic and real data.

461 citations


Book
01 Jan 2005
TL;DR: In this article, the authors present a model selection and validation procedure for order statistics and extreme events. But they do not discuss the model selection procedure for the case of multivariate extremes.
Abstract: Preface. I: DATA, INTRODUCTION AND MOTIVATION. 1. Introduction and Motivation. II: PROBABILISTIC MODELS USEFUL IN EXTREMES. 2. Discrete Probabilistic Models. 3. Continuous Probabilistic Models. III: MODEL ESTIMATION, SELECTION, AND VALIDATION. 4. Model Estimation. 5. Model Selection and Validation. IV: EXACT MODELS FOR ORDER STATISTICS AND EXTREMES. 6. Order Statistics. 7. Point Processes and Exact Models. V: ASYMPTOTIC MODELS FOR EXTREMES AND EXCEEDANCES. 8. Limit Distributions of Order Statistics. 9. Limit Distributions of Exceedances. 10. Multivariate Extremes. Appendix: Statistical Tables. Bibliography. Index.

Journal ArticleDOI
TL;DR: In this paper, the authors show that for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation.
Abstract: A traditional approach to statistical inference is to identify the true or best model first with little or no consideration of the specific goal of inference in the model identification stage. Can the pursuit of the true model also lead to optimal regression estimation? In model selection, it is well known that BIC is consistent in selecting the true model, and AIC is minimax-rate optimal for estimating the regression function. A recent promising direction is adaptive model selection, in which, in contrast to AIC and BIC, the penalty term is data-dependent. Some theoretical and empirical results have been obtained in support of adaptive model selection, but it is still not clear if it can really share the strengths of AIC and BIC. Model combining or averaging has attracted increasing attention as a means to overcome the model selection uncertainty. Can Bayesian model averaging be optimal for estimating the regression function in a minimax sense? We show that the answers to these questions are basically in the negative: for any model selection criterion to be consistent, it must behave suboptimally for estimating the regression function in terms of minimax rate of covergence; and Bayesian model averaging cannot be minimax-rate optimal for regression estimation.

Journal ArticleDOI
TL;DR: It is demonstrated that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and an adaptive criterion is proposed that detects previously mapped chromosomal aberrations.
Abstract: Microarray-CGH experiments are used to detect and map chromosomal imbalances, by hybridizing targets of genomic DNA from a test and a reference sample to sequences immobilized on a slide. These probes are genomic DNA sequences (BACs) that are mapped on the genome. The signal has a spatial coherence that can be handled by specific statistical tools. Segmentation methods seem to be a natural framework for this purpose. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose BACs share the same relative copy number on average. We model a CGH profile by a random Gaussian process whose distribution parameters are affected by abrupt changes at unknown coordinates. Two major problems arise : to determine which parameters are affected by the abrupt changes (the mean and the variance, or the mean only), and the selection of the number of segments in the profile. We demonstrate that existing methods for estimating the number of segments are not well adapted in the case of array CGH data, and we propose an adaptive criterion that detects previously mapped chromosomal aberrations. The performances of this method are discussed based on simulations and publicly available data sets. Then we discuss the choice of modeling for array CGH data and show that the model with a homogeneous variance is adapted to this context. Array CGH data analysis is an emerging field that needs appropriate statistical tools. Process segmentation and model selection provide a theoretical framework that allows precise biological interpretations. Adaptive methods for model selection give promising results concerning the estimation of the number of altered regions on the genome.

Journal ArticleDOI
TL;DR: Issues that render model-based approaches necessary are reviewed, nucleotide-based models that attempt to capture relevant features of evolutionary processes are briefly reviewed, and methods that have been applied to model selection in phylogenetics are reviewed: likelihood-ratio tests, AIC, BIC, and performance- based approaches.
Abstract: ▪ Abstract Investigation into model selection has a long history in the statistical literature. As model-based approaches begin dominating systematic biology, increased attention has focused on how models should be selected for distance-based, likelihood, and Bayesian phylogenetics. Here, we review issues that render model-based approaches necessary, briefly review nucleotide-based models that attempt to capture relevant features of evolutionary processes, and review methods that have been applied to model selection in phylogenetics: likelihood-ratio tests, AIC, BIC, and performance-based approaches.

Journal ArticleDOI
TL;DR: New accelerated life test models are presented in which both observed failures and degradation measures can be considered for parametric inference of system lifetime and it is shown that in most cases the models for failure can be approximated closely by accelerated test versions of Birnbaum–Saunders and inverse Gaussian distributions.
Abstract: Based on a generalized cumulative damage approach with a stochastic process describing degradation, new accelerated life test models are presented in which both observed failures and degradation measures can be considered for parametric inference of system lifetime. Incorporating an accelerated test variable, we provide several new accelerated degradation models for failure based on the geometric Brownian motion or gamma process. It is shown that in most cases, our models for failure can be approximated closely by accelerated test versions of Birnbaum–Saunders and inverse Gaussian distributions. Estimation of model parameters and a model selection procedure are discussed, and two illustrative examples using real data for carbon-film resistors and fatigue crack size are presented.

Proceedings ArticleDOI
07 Aug 2005
TL;DR: A probabilistic kernel approach to preference learning based on Gaussian processes and a new likelihood function is proposed to capture the preference relations in the Bayesian framework.
Abstract: In this paper, we propose a probabilistic kernel approach to preference learning based on Gaussian processes. A new likelihood function is proposed to capture the preference relations in the Bayesian framework. The generalized formulation is also applicable to tackle many multiclass problems. The overall approach has the advantages of Bayesian methods for model selection and probabilistic prediction. Experimental results compared against the constraint classification approach on several benchmark datasets verify the usefulness of this algorithm.

Journal ArticleDOI
TL;DR: This paper introduces an information criterion for model selection based on composite likelihood, and describes applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful dataset.
Abstract: A composite likelihood consists of a combination of valid likelihood objects, usually related to small subsets of data. The merit of composite likelihood is to reduce the computational complexity so that it is possible to deal with large datasets and very complex models, even when the use of standard likelihood or Bayesian methods is not feasible. In this paper, we aim to suggest an integrated, general approach to inference and model selection using composite likelihood methods. In particular, we introduce an information criterion for model selection based on composite likelihood. We also describe applications to the modelling of time series of counts through dynamic generalised linear models and to the analysis of the well-known Old Faithful geyser dataset.

Journal ArticleDOI
TL;DR: In this paper, the authors show how model selection can be non-distortionary: approximately unbiased "selection estimates" are derived, with reported standard errors close to the sampling standard deviations of the estimated DGP parameters, and a near-unbiased goodness-of-fit measure.
Abstract: After reviewing the simulation performance of general-to-specific automatic regression-model selection, as embodied in PcGets, we show how model selection can be non-distortionary: approximately unbiased ‘selection estimates’ are derived, with reported standard errors close to the sampling standard deviations of the estimated DGP parameters, and a near-unbiased goodness-of-fit measure. The handling of theory-based restrictions, non-stationarity and problems posed by collinear data are considered. Finally, we consider how PcGets can handle three ‘intractable’ problems: more variables than observations in regression analysis; perfectly collinear regressors; and modelling simultaneous equations without a priori restrictions.

Journal ArticleDOI
TL;DR: In this article, the authors compare the six lag-order selection criteria most commonly used in applied work and conclude that the Akaike Information Criterion (AIC) tends to produce the most accurate structural and semi-structural impulse response estimates for realistic sample sizes.
Abstract: It is common in empirical macroeconomics to fit vector autoregressive (VAR) models to construct estimates of impulse responses. An important preliminary step in impulse response analysis is the selection of the VAR lag order. In this paper, we compare the six lag-order selection criteria most commonly used in applied work. Our metric is the mean-squared error (MSE) of the implied pointwise impulse response estimates normalized relative to their MSE based on knowing the true lag order. Based on our simulation design we conclude that for monthly VAR models, the Akaike Information Criterion (AIC) tends to produce the most accurate structural and semi-structural impulse response estimates for realistic sample sizes. For quarterly VAR models, the Hannan-Quinn Criterion (HQC) appears to be the most accurate criterion with the exception of sample sizes smaller than 120, for which the Schwarz Information Criterion (SIC) is more accurate. For persistence profiles based on quarterly vector error correction models with known cointegrating vector, our results suggest that the SIC is the most accurate criterion for all realistic sample sizes.

Journal ArticleDOI
TL;DR: Variational approximations are used to perform the analogous model selection task in the Bayesian context and place JunB and JunD at the centre of the mechanisms that control apoptosis and proliferation.
Abstract: Motivation: We have used state-space models (SSMs) to reverse engineer transcriptional networks from highly replicated gene expression profiling time series data obtained from a well-established model of T cell activation. SSMs are a class of dynamic Bayesian networks in which the observed measurements depend on some hidden state variables that evolve according to Markovian dynamics. These hidden variables can capture effects that cannot be directly measured in a gene expression profiling experiment, for example: genes that have not been included in the microarray, levels of regulatory proteins, the effects of mRNA and protein degradation, etc. Results: We have approached the problem of inferring the model structure of these state-space models using both classical and Bayesian methods. In our previous work, a bootstrap procedure was used to derive classical confidence intervals for parameters representing 'gene--gene' interactions over time. In this article, variational approximations are used to perform the analogous model selection task in the Bayesian context. Certain interactions are present in both the classical and the Bayesian analyses of these regulatory networks. The resulting models place JunB and JunD at the centre of the mechanisms that control apoptosis and proliferation. These mechanisms are key for clonal expansion and for controlling the long term behavior (e.g. programmed cell death) of these cells. Availability: Supplementary data is available at http://public.kgi.edu/wild/index.htm and Matlab source code for variational Bayesian learning of SSMs is available at http://www.cse.ebuffalo.edu/faculty/mbeal/software.html Contact: David_Wild@kgi.edu

Journal ArticleDOI
TL;DR: This paper proposes to estimate change-points in the mean of a signal corrupted by an additive Gaussian noise with a method based on a penalized least-squares criterion, and chooses the penalty function such that the resulting estimator minimizes the quadratic risk.

Journal ArticleDOI
TL;DR: Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed compared with other variable selection and estimation methods.
Abstract: We propose an empirical Bayes method for variable selection and coefficient estimation in linear regression models. The method is based on a particular hierarchical Bayes formulation, and the empirical Bayes estimator is shown to be closely related to the LASSO estimator. Such a connection allows us to take advantage of the recently developed quick LASSO algorithm to compute the empirical Bayes estimate, and provides a new way to select the tuning parameter in the LASSO method. Unlike previous empirical Bayes variable selection methods, which in most practical situations can be implemented only through a greedy stepwise algorithm, our method gives a global solution efficiently. Simulations and real examples show that the proposed method is very competitive in terms of variable selection, estimation accuracy, and computation speed compared with other variable selection and estimation methods.

Journal ArticleDOI
TL;DR: The theory of reduction is reviewed, the approach of general-to-specific modeling is summarized, and the econometrics of model selection are discussed, noting that general- to- specific modeling is the practical embodiment of reduction.
Abstract: This paper discusses the econometric methodology of general-to-specific modeling, in which the modeler simplifies an initially general model that adequately characterizes the empirical evidence within his or her theoretical framework. Central aspects of this approach include the theory of reduction, dynamic specification, model selection procedures, model selection criteria, model comparison, encompassing, computer automation, and empirical implementation. This paper thus reviews the theory of reduction, summarizes the approach of general-to-specific modeling, and discusses the econometrics of model selection, noting that general-to-specific modeling is the practical embodiment of reduction. This paper then summarizes fifty-seven articles key to the development of general-to-specific modeling.

Journal ArticleDOI
TL;DR: In this paper, the authors provide a simple, computer-generated example to illustrate the procedure for multimodel inference based on K-L information and present arguments, based on statistical underpinnings that its theoretical basis renders it preferable to other approaches.
Abstract: Uncertainty of hydrogeologic conditions makes it important to consider alternative plausible models in an effort to evaluate the character of a ground water system, maintain parsimony, and make predictions with reasonable definition of their uncertainty. When multiple models are considered, data collection and analysis focus on evaluation of which model(s) is(are) most supported by the data. Generally, more than one model provides a similar acceptable fit to the observations; thus, inference should be made from multiple models. Kullback-Leibler (K-L) information provides a rigorous foundation for model inference that is simple to compute, is easy to interpret, selects parsimonious models, and provides a more realistic measure of precision than evaluation of any one model or evaluation based on other commonly referenced model selection criteria. These alternative criteria strive to identify the true (or quasi-true) model, assume it is represented by one of the models in the set, and given their preference for parsimony regardless of the available number of observations the selected model may be underfit. This is in sharp contrast to the K-L information approach, where models are considered to be approximations to reality, and it is expected that more details of the system will be revealed when more data are available. We provide a simple, computer-generated example to illustrate the procedure for multimodel inference based on K-L information and present arguments, based on statistical underpinnings that have been overlooked with time, that its theoretical basis renders it preferable to other approaches.

Journal ArticleDOI
TL;DR: It is demonstrated that a structured approach based on fractional polynomials can give a broadly satisfactory practical solution to the problem of simultaneously identifying a subset of 'important' predictors and determining the functional relationship for continuous predictors.
Abstract: Objectives: In fitting regression models, data analysts must often choose a model based on several candidate predictor variables which may influence the outcome. Most analysts either assume a linear relationship for continuous predictors, or categorize them and postulate step functions. By contrast, we propose to model possible non-linearity in the relationship between the outcome and several continuous predictors by estimating smooth functions of the predictors. We aim to demonstrate that a structured approach based on fractional polynomials can give a broadly satisfactory practical solution to the problem of simultaneously identifying a subset of 'important' predictors and determining the functional relationship for continuous predictors. Methods: We discuss the background, and motivate and describe the multivariable fractional polynomial (MFP) approach to model selection from data which include continuous and categorical predictors. We compare our results with those from other approaches in examples. We present a small simulation study to compare the functional form of the relationship obtained by fitting fractional polynomials and splines to a single predictor variable. Results: We illustrate the advantages of the MFP approach over standard techniques of model construction in two real example datasets analyzed with logistic and Cox regression models, respectively. In the simulation study, fractional polynomial models had lower mean square error and more realistic behaviour than comparable spline models. Conclusions: In many practical situations, the MFP approach can satisfy the aim of finding models that fit the data well and also are simple, interpretable and potentially transportable to other settings.

Journal ArticleDOI
TL;DR: In this article, the authors proposed an instability measure to capture the uncertainty of model selection in estimation, called perturbation instability in estimation (PIE), based on perturbations of the sample.
Abstract: Model-combining (i.e., mixing) methods have been proposed in recent years to deal with uncertainty in model selection. Even though advantages of model combining over model selection have been demonstrated in simulations and data examples, it is still unclear to a large extent when model combining should be preferred. In this work, first we propose an instability measure to capture the uncertainty of model selection in estimation, called perturbation instability in estimation (PIE), based on perturbation of the sample. We demonstrate that estimators from model selection can have large PIE values and that model combining substantially reduces the instability for such cases. Second, we propose a model combining method, adaptive regression by mixing with model screening (ARMS), and derive a theoretical property. In ARMS, a screening step is taken to narrow down the list of candidate models before combining, which not only saves computing time, but also can improve estimation accuracy. Third, we compare ARMS w...

Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to evaluate analysis of variance or analysis of covariance models with inequality constraints on the (adjusted) means is presented and contains two issues: estimation of the parameters given the restrictions using the Gibbs sampler and model selection using Bayes factors in the case of competing theories.
Abstract: Researchers often have one or more theories or expectations with respect to the outcome of their empirical research. When researchers talk about the expected relations between variables if a certain theory is correct, their statements are often in terms of one or more parameters expected to be larger or smaller than one or more other parameters. Stated otherwise, their statements are often formulated using inequality constraints. In this article, a Bayesian approach to evaluate analysis of variance or analysis of covariance models with inequality constraints on the (adjusted) means is presented. This evaluation contains two issues: estimation of the parameters given the restrictions using the Gibbs sampler and model selection using Bayes factors in the case of competing theories. The article concludes with two illustrations: a one-way analysis of covariance and an analysis of a three-way table of ordered means.

Journal ArticleDOI
TL;DR: The experiments conducted on a bi-class problem show that the proposed methodology can adequately choose the SVM hyper-parameters using the empirical error criterion and it turns out that the criterion produces a less complex model with fewer support vectors.

Journal ArticleDOI
TL;DR: In this article, the impact of nonlinear distortions on linear system identification was studied and a theoretical framework was proposed that extends the linear system description to include nonlinear distortion: the nonlinear system is replaced by a linear model plus a nonlinear noise source.

Journal ArticleDOI
TL;DR: A general framework for cross-validation is introduced and distributional properties ofCross-validated risk estimators in the context of estimator selection and performance assessment are derived.