scispace - formally typeset
Search or ask a question

Showing papers on "Model selection published in 2001"


Journal ArticleDOI
Wei Pan1
TL;DR: This work proposes a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term.
Abstract: Correlated response data are common in biomedical studies. Regression analysis based on the generalized estimating equations (GEE) is an increasingly important method for such data. However, there seem to be few model-selection criteria available in GEE. The well-known Akaike Information Criterion (AIC) cannot be directly applied since AIC is based on maximum likelihood estimation while GEE is nonlikelihood based. We propose a modification to AIC, where the likelihood is replaced by the quasi-likelihood and a proper adjustment is made for the penalty term. Its performance is investigated through simulation studies. For illustration, the method is applied to a real data set.

2,233 citations


Journal ArticleDOI
TL;DR: In this paper, a new SAS procedure, TRAJ, is proposed to fit semiparametric mixtures of censored normal, Poisson, zero-inflated Poisson and Bernoulli distributions to longitudinal data.
Abstract: This article introduces a new SAS procedure written by the authors that analyzes longitudinal data (developmental trajectories) by fitting a mixture model. The TRAJ procedure fits semiparametric (discrete) mixtures of censored normal, Poisson, zero-inflated Poisson, and Bernoulli distributions to longitudinal data. Applications to psychometric scale data, offense counts, and a dichotomous prevalence measure in violence research are illustrated. In addition, the use of the Bayesian information criterion to address the problem of model selection, including the estimation of the number of components in the mixture, is demonstrated.

2,085 citations



Book ChapterDOI
03 Jan 2001
TL;DR: The notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function, is introduced, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on a test set, giving improved classification accuracy.
Abstract: We introduce the notion of kernel-alignment, a measure of similarity between two kernel functions or between a kernel and a target function. This quantity captures the degree of agreement between a kernel and a given learning task, and has very natural interpretations in machine learning, leading also to simple algorithms for model selection and learning. We analyse its theoretical properties, proving that it is sharply concentrated around its expected value, and we discuss its relation with other standard measures of performance. Finally we describe some of the algorithms that can be obtained within this framework, giving experimental results showing that adapting the kernel to improve alignment on the labelled data significantly increases the alignment on the test set, giving improved classification accuracy. Hence, the approach provides a principled method of performing transduction.

1,083 citations


Journal ArticleDOI
TL;DR: It is shown here that a best-fit model can be readily identified and should be routine in any phylogenetic analysis that uses models of evolution.
Abstract: Despite the relevant role of models of nucleotide substitution in phylogenetics, choosing among different models remains a problem. Several statistical methods for selecting the model that best fits the data at hand have been proposed, but their absolute and relative performance has not yet been characterized. In this study, we compare under various conditions the performance of different hierarchical and dynamic likelihood ratio tests, and of Akaike and Bayesian information methods, for selecting best-fit models of nucleotide substitution. We specifically examine the role of the topology used to estimate the likelihood of the different models and the importance of the order in which hypotheses are tested. We do this by simulating DNA sequences under a known model of nucleotide substitution and recording how often this true model is recovered by the different methods. Our results suggest that model selection is reasonably accurate and indicate that some likelihood ratio test methods perform overall better than the Akaike or Bayesian information criteria. The tree used to estimate the likelihood scores does not influence model selection unless it is a randomly chosen tree. The order in which hypotheses are tested, and the complexity of the initial model in the sequence of tests, influence model selection in some cases. Model fitting in phylogenetics has been suggested for many years, yet many authors still arbitrarily choose their models, often using the default models implemented in standard computer programs for phylogenetic estimation. We show here that a best-fit model can be readily identified. Consequently, given the relevance of models, model fitting should be routine in any phylogenetic analysis that uses models of evolution.

924 citations


Journal ArticleDOI
TL;DR: This article reviews the principle of minimum description length (MDL) for problems of model selection, and illustrates the MDL principle by considering problems in regression, nonparametric curve estimation, cluster analysis, and time series analysis.
Abstract: This article reviews the principle of minimum description length (MDL) for problems of model selection. By viewing statistical modeling as a means of generating descriptions of observed data, the MDL framework discriminates between competing models based on the complexity of each description. This approach began with Kolmogorov's theory of algorithmic complexity, matured in the literature on information theory, and has recently received renewed attention within the statistics community. Here we review both the practical and the theoretical aspects of MDL as a tool for model selection, emphasizing the rich connections between information theory and statistics. At the boundary between these two disciplines we find many interesting interpretations of popular frequentist and Bayesian procedures. As we show, MDL provides an objective umbrella under which rather disparate approaches to statistical modeling can coexist and be compared. We illustrate the MDL principle by considering problems in regression, nonpar...

788 citations


Journal ArticleDOI
TL;DR: In this paper, a consistent model and moment selection criteria for GMM estimation is proposed, based on the J statistic for testing over-identifying restrictions, which is similar to the likelihood-based selection criteria BIC, HQIC, and AIC.

736 citations


Journal ArticleDOI
TL;DR: The purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop the point of view about this subject.
Abstract: Our purpose in this paper is to provide a general approach to model selection via penalization for Gaussian regression and to develop our point of view about this subject. The advantage and importance of model selection come from the fact that it provides a suitable approach to many different types of problems, starting from model selection per se (among a family of parametric models, which one is more suitable for the data at hand), which includes for instance variable selection in regression models, to nonparametric estimation, for which it provides a very powerful tool that allows adaptation under quite general circumstances. Our approach to model selection also provides a natural connection between the parametric and nonparametric points of view and copes naturally with the fact that a model is not necessarily true. The method is based on the penalization of a least squares criterion which can be viewed as a generalization of Mallows’Cp. A large part of our efforts will be put on choosing properly the list of models and the penalty function for various estimation problems like classical variable selection or adaptive estimation for various types of lp-bodies.

560 citations


Book ChapterDOI
01 Jan 2001
TL;DR: This article illustrates some of the fundamental practical issues that arise for two different model selection problems: the variable selection problem for the linear model and the CART model selection problem.
Abstract: In principle, the Bayesian approach to model selection is straightforward. Prior probability distributions are used to describe the uncertainty surrounding all unknowns. After observing the data, the posterior distribution provides a coherent post data summary of the remaining uncertainty which is relevant for model selection. However, the practical implementation of this approach often requires carefully tailored priors and novel posterior calculation methods. In this article, we illustrate some of the fundamental practical issues that arise for two different model selection problems: the variable selection problem for the linear model and the CART model selection problem. Hugh Chipman is Associate Professor of Statistics, Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada; email: hachipman@uwaterloo.ca. Edward I. George is Professor of Statistics, Department of Statistics, The Wharton School of the University of Pennsylvania, 3620 Locust Walk, Philadelphia, PA 19104-6302, U.S.A; email: edgeorge@wharton.upenn.edu. Robert E. McCulloch is Professor of Statistics, Graduate School of Business, University of Chicago, 1101 East 58th Street, Chicago, IL, U.S.A; email: Robert.McCulloch@gsb.uchicago.edu. This work was supported by NSF grant DMS-98.03756 and Texas ARP grant 003658.690.

471 citations


Journal ArticleDOI
TL;DR: By combining sequential model selection procedures, the online VB method provides a fully online learning method with a model selection mechanism and was able to adapt the model structure to dynamic environments.
Abstract: The Bayesian framework provides a principled way of model selection. This framework estimates a probability distribution over an ensemble of models, and the prediction is done by averaging over the ensemble of models. Accordingly, the uncertainty of the models is taken into account, and complex models with more degrees of freedom are penalized. However, integration over model parameters is often intractable, and some approximation scheme is needed. Recently, a powerful approximation scheme, called the variational bayes (VB) method, has been proposed. This approach defines the free energy for a trial probability distribution, which approximates a joint posterior probability distribution over model parameters and hidden variables. The exact maximization of the free energy gives the true posterior distribution. The VB method uses factorized trial distributions. The integration over model parameters can be done analytically, and an iterative expectation-maximization-like algorithm, whose convergence is guaranteed, is derived. In this article, we derive an online version of the VB algorithm and prove its convergence by showing that it is a stochastic approximation for finding the maximum of the free energy. By combining sequential model selection procedures, the online VB method provides a fully online learning method with a model selection mechanism. In preliminary experiments using synthetic data, the online VB method was able to adapt the model structure to dynamic environments.

415 citations


Proceedings ArticleDOI
07 Jul 2001
TL;DR: A robust segmentation algorithm by incorporating such techniques as dimension correction, model selection using the geometric AIC, and least-median fitting is presented, demonstrating that oar algorithm dramatically outperforms existing methods.
Abstract: Reformulating the Costeira-Kanade algorithm as a pure mathematical theorem independent of the Tomasi-Kanade factorization, we present a robust segmentation algorithm by incorporating such techniques as dimension correction, model selection using the geometric AIC, and least-median fitting. Doing numerical simulations, we demonstrate that oar algorithm dramatically outperforms existing methods. It does not involve any parameters which need to be adjusted empirically.

Book
01 Jan 2001
TL;DR: Caroline Alexander as mentioned in this paper provides an authoritative and up-to-date treatment of the use of market data to develop models for financial analysis and provides real world illustrations to motivate theoretical developments.
Abstract: Market Models provides an authoritative and up–to–date treatment of the use of market data to develop models for financial analysis. Written by a leading figure in the field of financial data analysis, this book is the first of its kind to address the vital techniques required for model selection and development. Model developers are faced with many decisions, about the pricing, the data, the statistical methodology and the calibration and testing of the model prior to implementation. It is important to make the right choices and Carol Alexander′s clear exposition provides valuable insights at every stage. In each of the 13 Chapters, Market Models presents real world illustrations to motivate theoretical developments. The accompanying CD contains spreadsheets with data and programs; this enables you to implement and adapt many of the examples. The pricing of options using normal mixture density functions to model returns; the use of Monte Carlo simulation to calculate the VaR of an options portfolio; modifying the covariance VaR to allow for fat–tailed PL the calculation of implied, EWMA and ′historic′ volatilities; GARCH volatility term structure forecasting; principal components analysis; and many more are all included. Carol Alexander brings many new insights to the pricing and hedging of options with her understanding of volatility and correlation, and the uncertainty which surrounds these key determinants of option portfolio risk. Modelling the market risk of portfolios is covered where the main focus is on a linear algebraic approach; the covariance matrix and principal component analysis are developed as key tools for the analysis of financial systems. The traditional time series econometric approach is also explained with coverage ranging from the application cointegration to long–short equity hedge funds, to high–frequency data prediction using neural networks and nearest neighbour algorithms. Throughout this text the emphasis is on understanding concepts and implementing solutions. It has been designed to be accessible to a very wide audience: the coverage is comprehensive and complete and the technical appendix makes the book largely self–contained. Market Models: A Guide to Financial Data Analysis is the ideal reference for all those involved in market risk measurement, quantitative trading and investment analysis.

Journal ArticleDOI
TL;DR: A method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients, which produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm.
Abstract: We propose a method of analyzing collections of related curves in which the individual curves are modeled as spline functions with random coefficients. The method is applicable when the individual curves are sampled at variable and irregularly spaced points. This produces a low-rank, low-frequency approximation to the covariance structure, which can be estimated naturally by the EM algorithm. Smooth curves for individual trajectories are constructed as best linear unbiased predictor (BLUP) estimates, combining data from that individual and the entire collection. This framework leads naturally to methods for examining the effects of covariates on the shapes of the curves. We use model selection techniques--Akaike information criterion (AIC), Bayesian information criterion (BIC), and cross-validation--to select the number of breakpoints for the spline approximation. We believe that the methodology we propose provides a simple, flexible, and computationally efficient means of functional data analysis.


Journal ArticleDOI
TL;DR: In this paper, a natural class of robust estimators for generalized linear models based on the notion of quasi-likelihood is defined, which can be used for stepwise model selection as in the classical framework.
Abstract: By starting from a natural class of robust estimators for generalized linear models based on the notion of quasi-likelihood, we define robust deviances that can be used for stepwise model selection as in the classical framework. We derive the asymptotic distribution of tests based on robust deviances, and we investigate the stability of their asymptotic level under contamination. The binomial and Poisson models are treated in detail. Two applications to real data and a sensitivity analysis show that the inference obtained by means of the new techniques is more reliable than that obtained by classical estimation and testing procedures.

Journal ArticleDOI
TL;DR: In this article, the authors consider econometric model selection from a computer-automation perspective, focusing on general-to-specific reductions, embodied in PcGets.

Journal ArticleDOI
TL;DR: The Stata implementation of a class of flexible parametric survival models recently proposed by Royston and Parmar (2001) will be described and examples based on a study of prognostic factors in breast cancer are given.
Abstract: Since its introduction to a wondering public in 1972, the Cox pro- portional hazards regression model has become an overwhelmingly popular tool in the analysis of censored survival data. However, some features of the Cox model may cause problems for the analyst or an interpreter of the data. They include the restrictive assumption of proportional hazards for covariate effects, and "loss" (non-estimation) of the baseline hazard function induced by conditioning on event times. In medicine, the hazard function is often of fundamental interest since it represents an important aspect of the time course of the disease in question. In the present article, the Stata implementation of a class of flexible parametric survival models recently proposed by Royston and Parmar (2001) will be described. The models start by assuming either proportional hazards or proportional odds (user- selected option). The baseline distribution function is modeled by restricted cubic regression spline in log time, and parameter estimation is by maximum likelihood. Model selection and choice of knots for the spline function are discussed. Interval- censored data and models in which one or more covariates have non-proportional effects are also supported by the software. Examples based on a study of prognostic factors in breast cancer are given.

Journal ArticleDOI
TL;DR: Results indicate that the in-sample model selection criteria investigated are not able to provide a reliable guide to out-of-sample performance and there is no apparent connection between in- sample model fit and out- of-sample forecasting performance.

Proceedings ArticleDOI
07 Jul 2001
TL;DR: This paper presents a new framework for HMM topology and parameter estimation in an online, dynamic fashion and is posed as a model selection problem with an MDL prior.
Abstract: Hidden Markov models (HMMs) are increasingly being used in computer vision for applications such as: gesture analysis, action recognition from video, and illumination modeling. Their use involves an off-line learning step that is used as a basis for on-line decision making (i.e. a stationarity assumption on the model parameters). But, real-world applications are often non-stationary in nature. This leads to the need for a dynamic mechanism to learn and update the model topology as well as its parameters. This paper presents a new framework for HMM topology and parameter estimation in an online, dynamic fashion. The topology and parameter estimation is posed as a model selection problem with an MDL prior. Online modifications to the topology are made possible by incorporating a state splitting criterion. To demonstrate the potential of the algorithm, the background modeling problem is considered. Theoretical validation and real experiments are presented.

Journal ArticleDOI
TL;DR: The proposed filtering method is translation invariant, has the ability to decompose an arbitrary length series without boundary adjustments, is associated with a zero-phase filter and is circular, which helps to preserve the entire sample unlike other two-sided filters.
Abstract: It is well documented that strong intraday seasonalities may induce distortions in the estimation of volatility models. These seasonalities are also the dominant source for the underlying misspecifications of the various volatility models. Therefore, an obvious route is to filter out the underlying intraday seasonalities from the data. In this paper, we propose a simple method for intraday seasonality extraction that is free of model selection parameters which may affect other intraday seasonality filtering methods. Our methodology is based on a wavelet multi-scaling approach which decomposes the data into its low- and high-frequency components through the application of a non-decimated discrete wavelet transform. It is simple to calculate, does not depend on a particular model selection criterion or model-specific parameter choices. The proposed filtering method is translation invariant, has the ability to decompose an arbitrary length series without boundary adjustments, is associated with a zero-phase filter and is circular. Being circular helps to preserve the entire sample unlike other two-sided filters where data loss occurs from the beginning and the end of the studied sample.

Journal ArticleDOI
TL;DR: In this article, the authors use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of uncertainty about the return forecasting model, and show that the out-of-sample performance of the Bayesian approach is superior to that of model selection criteria.
Abstract: We use Bayesian model averaging to analyze the sample evidence on return predictability in the presence of uncertainty about the return forecasting model. The analysis reveals in-sample and out-of-sample predictability, and shows that the out-of-sample performance of the Bayesian approach is superior to that of model selection criteria. Our exercises find that term premium and market risk premium are relatively robust predictors. Moreover, small-cap value stocks appear more predictable than large-cap growth stocks. We also investigate the implications of model uncertainty from investment management perspectives. The analysis shows that model uncertainty is more important than estimation risk. Finally, asset allocations in the presence of estimation risk exhibit sensitivity to whether model uncertainty is incorporated or ignored.

Journal ArticleDOI
TL;DR: A genetic algorithm is constructed which can search for the global optimum of an arbitrary function as the output of a feedforward network model and is allowed to evolve the type of inputs, the number of hidden units and the connection structure between the inputs and the output layers.
Abstract: This paper proposes a model selection methodology for feedforward network models based on the genetic algorithms and makes a number of distinct but inter-related contributions to the model selection literature for the feedforward networks. First, we construct a genetic algorithm which can search for the global optimum of an arbitrary function as the output of a feedforward network model. Second, we allow the genetic algorithm to evolve the type of inputs, the number of hidden units and the connection structure between the inputs and the output layers. Third, we study how introduction of a local elitist procedure which we call the election operator affects the algorithm's performance. We conduct a Monte Carlo simulation to study the sensitiveness of the global approximation properties of the studied genetic algorithm. Finally, we apply the proposed methodology to the daily foreign exchange returns.

Journal ArticleDOI
TL;DR: This work describes a hierarchy of exponential families which is useful for distinguishing types of graphical models and shows how to compute the dimension of a stratified exponential family.
Abstract: We describe a hierarchy of exponential families which is useful for distinguishing types of graphical models. Undirected graphical models with no hidden variables are linear exponential families (LEFs). Directed acyclic graphical (DAG) models and chain graphs with no hidden variables, includ­ ing DAG models with several families of local distributions, are curved exponential families (CEFs). Graphical models with hidden variables are what we term stratified exponential families (SEFs). A SEF is a finite union of CEFs of various dimensions satisfying some regularity conditions. We also show that this hierarchy of exponential families is noncollapsing with respect to graphical models by providing a graphical model which is a CEF but not a LEF and a graphical model that is a SEF but not a CEF. Finally, we show how to compute the dimension of a stratified exponential family. These results are discussed in the context of model selection of graphical models.

Journal ArticleDOI
TL;DR: In this paper, Logic Regression is used to deal with single‐nucleotide polymorphism (SNP) sequence data and a number of mutations are identified that are associated with the affected status, without selecting any false positives.
Abstract: Logic Regression is a new adaptive regression methodology that attempts to construct predictors as Boolean combinations of (binary) covariates. In this paper we modify this algorithm to deal with single-nucleotide polymorphism (SNP) data. The predictors that are found are interpretable as risk factors of the disease. Significance of these risk factors is assessed using techniques like cross-validation, permutation tests, and independent test sets. These model selection techniques remain valid when data is dependent, as is the case for the family data used here. In our analysis of the Genetic Analysis Workshop 12 data we identify the exact locations of mutations on gene 1 and gene 6 and a number of mutations on gene 2 that are associated with the aected status, without selecting any false positives.

Journal ArticleDOI
TL;DR: This work treats fMRI data analysis as a spatiotemporal system identification problem and addresses issues of model formulation, estimation, and model comparison, presenting a new model that includes a physiologically based hemodynamic response and an empirically derived low-frequency noise model.

Journal ArticleDOI
TL;DR: A novel knot selection algorithm for regression spline estimation in nonparametric regression that achieves very competitive performance with alternative methods and has substantial advantage in nonsmooth functions.
Abstract: Spline procedures have proven effective in estimating smooth functions. However, spline procedures based on stepwise addition and/or deletion have some drawbacks. They suffer from the knot compounding problem, making their performance suboptimal. Furthermore, due to computational complexity, spline procedures may not achieve their full potential. In this article, we propose a novel knot selection algorithm for regression spline estimation in nonparametric regression. The algorithm includes three new components: knot relocation, guided search, and local fitting. The local properties of the spline functions are used to efficiently implement the algorithm. Extensive simulation studies are performed to demonstrate the improvement of the new knot selection algorithm over the stepwise addition and deletion scheme, and the advantages of the spline procedure with the new knot selection scheme over alternative adaptive methods. In the simulations, our procedure achieves very competitive performance with alternativ...

Book ChapterDOI
01 Jan 2001

Journal ArticleDOI
TL;DR: In this paper, a simple index of utility is proposed for model selection, which evaluates model sensitivity (response to changes in input) and model error (closeness of simulation to measurement).

Journal ArticleDOI
TL;DR: In this paper, a new class of models for data showing trend and multiplicative seasonality is presented, which allow the forecast error variance to depend on the trend and/or the seasonality.

Book ChapterDOI
01 Jan 2001
TL;DR: A review of the study designs, statistical issues, and analytical techniques used to study resource selection and provide practical guidance for biologists, resource managers, and others conducting studies of resource selection via radiotelemetry is provided in this paper.
Abstract: Publisher Summary Radiotelemetry studies of animals are designed to provide insights into resource selection so that managers can obtain, protect, and restore resources used by animals. A common approach to study resource selection using radiotelemetry data involves a comparison of resource use to resource availability. Resource selection occurs when resources are used disproportionately to availability. This chapter provides a review of the study designs, statistical issues, and analytical techniques used to study resource selection and provide practical guidance for biologists, resource managers, and others conducting studies of resource selection via radiotelemetry. It also focuses on statistical issues of scale, techniques for defining resource use and availability, pooling observations, independence of relocations, and variable and model selection and how these factors affect inference in resource selection studies. In most cases, the goal of a resource selection study is to make statistical inferences to a population of animals the radio-marked sample is assumed to represent. This is achieved by considering the radio-marked animal as the experimental unit to avoid pseudoreplication, thus reducing dependency problems when individual relocations are treated as experimental units.