scispace - formally typeset
Search or ask a question

Showing papers on "Model selection published in 2004"


Journal ArticleDOI
TL;DR: Various facets of such multimodel inference are presented here, particularly methods of model averaging, which can be derived as a non-Bayesian result.
Abstract: The model selection literature has been generally poor at reflecting the deep foundations of the Akaike information criterion (AIC) and at making appropriate comparisons to the Bayesian information...

8,933 citations


Journal ArticleDOI
TL;DR: A publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates is described.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

7,828 citations


Journal ArticleDOI
TL;DR: It is argued that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages.
Abstract: Model selection is a topic of special relevance in molecular phylogenetics that affects many, if not all, stages of phylogenetic inference. Here we discuss some fundamental concepts and techniques of model selection in the context of phylogenetics. We start by reviewing different aspects of the selection of substitution models in phylogenetics from a theoretical, philosophical and practical point of view, and summarize this comparison in table format. We argue that the most commonly implemented model selection approach, the hierarchical likelihood ratio test, is not the optimal strategy for model selection in phylogenetics, and that approaches like the Akaike Information Criterion (AIC) and Bayesian methods offer important advantages. In particular, the latter two methods are able to simultaneously compare multiple nested or nonnested models, assess model selection uncertainty, and allow for the estimation of phylogenies and model parameters using all available models (model-averaged inference or multimodel inference). We also describe how the relative importance of the different parameters included in substitution models can be depicted. To illustrate some of these points, we have applied AIC-based model averaging to 37 mitochondrial DNA sequences from the subgenus Ohomopterus (genus Carabus) ground beetles described by Sota and Vogler (2001). (AIC; Bayes factors; BIC; likelihood ratio tests; model averaging; model uncertainty; model selection; multimodel inference.) It is clear that models of nucleotide substitution (henceforth models of evolution) play a significant role in molecular phylogenetics, particularly in the context of distance, maximum likelihood (ML), and Bayesian es- timation. We know that the use of one or other model affects many, if not all, stages of phylogenetic inference. For example, estimates of phylogeny, substitution rates, bootstrap values, posterior probabilities, or tests of the molecular clock are clearly influenced by the model of evolution used in the analysis (Buckley, 2002; Buckley

3,712 citations


Journal ArticleDOI
TL;DR: The steps of model selection are outlined and several ways that it is now being implemented are highlighted, so that researchers in ecology and evolution will find a valuable alternative to traditional null hypothesis testing, especially when more than one hypothesis is plausible.
Abstract: Recently, researchers in several areas of ecology and evolution have begun to change the way in which they analyze data and make biological inferences. Rather than the traditional null hypothesis testing approach, they have adopted an approach called model selection, in which several competing hypotheses are simultaneously confronted with data. Model selection can be used to identify a single best model, thus lending support to one particular hypothesis, or it can be used to make inferences based on weighted support from a complete set of competing models. Model selection is widely accepted and well developed in certain fields, most notably in molecular systematics and mark-recapture analysis. However, it is now gaining support in several other areas, from molecular evolution to landscape ecology. Here, we outline the steps of model selection and highlight several ways that it is now being implemented. By adopting this approach, researchers in ecology and evolution will find a valuable alternative to traditional null hypothesis testing, especially when more than one hypothesis is plausible.

3,489 citations


Journal ArticleDOI
TL;DR: A range of Bayesian hierarchical models using the Markov chain Monte Carlo software WinBUGS are presented that allow for variation in true treatment effects across trials, and models where the between-trials variance is homogeneous across treatment comparisons are considered.
Abstract: Mixed treatment comparison (MTC) meta-analysis is a generalization of standard pairwise meta-analysis for A vs B trials, to data structures that include, for example, A vs B, B vs C, and A vs C trials. There are two roles for MTC: one is to strengthen inference concerning the relative efficacy of two treatments, by including both 'direct' and 'indirect' comparisons. The other is to facilitate simultaneous inference regarding all treatments, in order for example to select the best treatment. In this paper, we present a range of Bayesian hierarchical models using the Markov chain Monte Carlo software WinBUGS. These are multivariate random effects models that allow for variation in true treatment effects across trials. We consider models where the between-trials variance is homogeneous across treatment comparisons as well as heterogeneous variance models. We also compare models with fixed (unconstrained) baseline study effects with models with random baselines drawn from a common distribution. These models are applied to an illustrative data set and posterior parameter distributions are compared. We discuss model critique and model selection, illustrating the role of Bayesian deviance analysis, and node-based model criticism. The assumptions underlying the MTC models and their parameterization are also discussed.

1,861 citations


Journal ArticleDOI
TL;DR: A Bayesian MCMC approach to the analysis of combined data sets was developed and its utility in inferring relationships among gall wasps based on data from morphology and four genes was explored, supporting the utility of morphological data in multigene analyses.
Abstract: The recent development of Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) techniques has facilitated the exploration of parameter-rich evolutionary models. At the same time, stochastic models have become more realistic (and complex) and have been extended to new types of data, such as morphology. Based on this foundation, we developed a Bayesian MCMC approach to the analysis of combined data sets and explored its utility in inferring relationships among gall wasps based on data from morphology and four genes (nuclear and mitochondrial, ribosomal and protein coding). Examined models range in complexity from those recognizing only a morphological and a molecular partition to those having complex substitution models with independent parameters for each gene. Bayesian MCMC analysis deals efficiently with complex models: convergence occurs faster and more predictably for complex models, mixing is adequate for all parameters even under very complex models, and the parameter update cycle is virtually unaffected by model partitioning across sites. Morphology contributed only 5% of the characters in the data set but nevertheless influenced the combined-data tree, supporting the utility of morphological data in multigene analyses. We used Bayesian criteria (Bayes factors) to show that process heterogeneity across data partitions is a significant model component, although not as important as among-site rate variation. More complex evolutionary models are associated with more topological uncertainty and less conflict between morphology and molecules. Bayes factors sometimes favor simpler models over considerably more parameter-rich models, but the best model overall is also the most complex and Bayes factors do not support exclusion of apparently weak parameters from this model. Thus, Bayes factors appear to be useful for selecting among complex models, but it is still unclear whether their use strikes a reasonable balance between model complexity and error in parameter estimates.

1,758 citations



Journal ArticleDOI
TL;DR: It is proposed that GAM's with a ridge penalty provide a practical solution in such circumstances, and a multiple smoothing parameter selection method suitable for use in the presence of such a penalty is developed.
Abstract: Representation of generalized additive models (GAM's) using penalized regression splines allows GAM's to be employed in a straightforward manner using penalized regression methods. Not only is inference facilitated by this approach, but it is also possible to integrate model selection in the form of smoothing parameter selection into model fitting in a computationally efficient manner using well founded criteria such as generalized cross-validation. The current fitting and smoothing parameter selection methods for such models are usually effective, but do not provide the level of numerical stability to which users of linear regression packages, for example, are accustomed. In particular the existing methods cannot deal adequately with numerical rank deficiency of the GAM fitting problem, and it is not straightforward to produce methods that can do so, given that the degree of rank deficiency can be smoothing parameter dependent. In addition, models with the potential flexibility of GAM's can also present ...

1,657 citations


Journal ArticleDOI
TL;DR: This article explains and develops a method based on the Ornstein‐Uhlenbeck (OU) process, first proposed by Hansen, that allows to translate hypotheses regarding adaptation in different selective regimes into explicit models, to test the models against data using maximum‐likelihood‐based model selection techniques, and to infer details of the evolutionary process.
Abstract: Biologists employ phylogenetic comparative methods to study adaptive evolution. However, none of the popular methods model selection directly. We explain and develop a method based on the Ornstein-Uhlenbeck (OU) process, first proposed by Hansen. Ornstein-Uhlenbeck models incorporate both selection and drift and are thus qualitatively different from, and more general than, pure drift models based on Brownian motion. Most importantly, OU mod- els possess selective optima that formalize the notion of adaptive zone. In this article, we develop the method for one quantitative character, discuss interpretations of its parameters, and provide code implementing the method. Our approach allows us to translate hy- potheses regarding adaptation in different selective regimes into ex- plicit models, to test the models against data using maximum-like- lihood-based model selection techniques, and to infer details of the evolutionary process. We illustrate the method using two worked examples. Relative to existing approaches, the direct modeling ap- proach we demonstrate allows one to explore more detailed hy- potheses and to utilize more of the information content of com- parative data sets than existing methods. Moreover, the use of a model selection framework to simultaneously compare a variety of hy- potheses advances our ability to assess alternative evolutionary explanations.

1,250 citations


Journal ArticleDOI
TL;DR: In this article, Fan and Li showed that the nonconcave penalized likelihood has an oracle property when the number of parameters is finite, and the consistency of the sandwich formula of the covariance matrix is demonstrated.
Abstract: A class of variable selection procedures for parametric models via nonconcave penalized likelihood was proposed by Fan and Li to simultaneously estimate parameters and select important variables. They demonstrated that this class of procedures has an oracle property when the number of parameters is finite. However, in most model selection problems the number of parameters should be large and grow with the sample size. In this paper some asymptotic properties of the nonconcave penalized likelihood are established for situations in which the number of parameters tends to ∞ as the sample size increases. Under regularity conditions we have established an oracle property and the asymptotic normality of the penalized likelihood estimators. Furthermore, the consistency of the sandwich formula of the covariance matrix is demonstrated. Nonconcave penalized likelihood ratio statistics are discussed, and their asymptotic distributions under the null hypothesis are obtained by imposing some mild conditions on the penalty functions. The asymptotic results are augmented by a simulation study, and the newly developed methodology is illustrated by an analysis of a court case on the sexual discrimination of salary.

978 citations


Journal ArticleDOI
TL;DR: In this article, the authors show that the median probability model is often the optimal predictive model, which is defined as the model consisting of those variables which have overall posterior probability greater than or equal to 1/2 of being in a model.
Abstract: Often the goal of model selection is to choose a model for future prediction, and it is natural to measure the accuracy of a future prediction by squared error loss. Under the Bayesian approach, it is commonly perceived that the optimal predictive model is the model with highest posterior probability, but this is not necessarily the case. In this paper we show that, for selection among normal linear models, the optimal predictive model is often the median probability model, which is defined as the model consisting of those variables which have overall posterior probability greater than or equal to 1/2 of being in a model. The median probability model often differs from the highest probability model.

Journal ArticleDOI
TL;DR: Evidence is found that the most global model considered provides a poor fit to the data, hence an overdispersion factor is estimated to adjust model selection procedures and inflate standard errors.
Abstract: Few species are likely to be so evident that they will always be detected at a site when present. Recently a model has been developed that enables estimation of the proportion of area occupied, when the target species is not detected with certainty. Here we apply this modeling approach to data collected on terrestrial salamanders in the Plethodon glutinosus complex in the Great Smoky Mountains National Park, USA, and wish to address the question “how accurately does the fitted model represent the data?” The goodness-of-fit of the model needs to be assessed in order to make accurate inferences. This article presents a method where a simple Pearson chi-square statistic is calculated and a parametric bootstrap procedure is used to determine whether the observed statistic is unusually large. We found evidence that the most global model considered provides a poor fit to the data, hence estimated an overdispersion factor to adjust model selection procedures and inflate standard errors. Two hypothetical datasets with known assumption violations are also analyzed, illustrating that the method may be used to guide researchers to making appropriate inferences. The results of a simulation study are presented to provide a broader view of the methods properties.

Journal ArticleDOI
TL;DR: It is argued that useful information for model selection can be obtained from using AIC and BIC together, particularly from trying as far as possible to find models favored by both criteria.
Abstract: The two most commonly used penalized model selection criteria, the Bayesian information criterion (BIC) and Akaike’s information criterion (AIC), are examined and compared. Their motivations as approximations of two different target quantities are discussed, and their performance in estimating those quantities is assessed. Despite their different foundations, some similarities between the two statistics can be observed, for example, in analogous interpretations of their penalty terms. The behavior of the criteria in selecting good models for observed data is examined with simulated data and also illustrated with the analysis of two well-known data sets on social mobility. It is argued that useful information for model selection can be obtained from using AIC and BIC together, particularly from trying as far as possible to find models favored by both criteria.

Journal ArticleDOI
TL;DR: In this article, the authors examine the roles played by the propensity score (the probability of selection into treatment) in matching, instrumental variable, and control function methods and contrast the roles of exclusion restrictions in matching and selection models.
Abstract: This paper investigates four topics. (1) It examines the different roles played by the propensity score (the probability of selection into treatment) in matching, instrumental variable, and control function methods. (2) It contrasts the roles of exclusion restrictions in matching and selection models. (3) It characterizes the sensitivity of matching to the choice of conditioning variables and demonstrates the greater robustness of control function methods to misspecification of the conditioning variables. (4) It demonstrates the problem of choosing the conditioning variables in matching and the failure of conventional model selection criteria when candidate conditioning variables are not exogenous in a sense defined in this paper.

Journal ArticleDOI
TL;DR: Least Angle Regression (LARS) as discussed by the authors is a new model selection algorithm, which is a useful and less greedy version of traditional forward selection methods such as All Subsets, Forward Selection and Backward Elimination.
Abstract: The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method;

Journal ArticleDOI
TL;DR: In this paper, a Bayesian probabilistic approach is presented for selecting the most plausible class of models for a structural or mechanical system within some specified set of model classes, based on system response data.
Abstract: A Bayesian probabilistic approach is presented for selecting the most plausible class of models for a structural or mechanical system within some specified set of model classes, based on system response data. The crux of the approach is to rank the classes of models based on their probabilities conditional on the response data which can be calculated based on Bayes’ theorem and an asymptotic expansion for the evidence for each model class. The approach provides a quantitative expression of a principle of model parsimony or of Ockham’s razor which in this context can be stated as "simpler models are to be preferred over unnecessarily complicated ones." Examples are presented to illustrate the method using a single-degree-of-freedom bilinear hysteretic system, a linear two-story frame, and a ten-story shear building, all of which are subjected to seismic excitation.

Journal ArticleDOI
TL;DR: The authors used bootstrap resampling in conjunction with automated variable selection methods to develop parsimonious prediction models using data on patients admitted to hospital with a heart attack, and demonstrated that selecting those variables that were identified as independent predictors of mortality in at least 60% of the bootstrap samples resulted in a parsimony model with excellent predictive ability.
Abstract: Researchers frequently use automated model selection methods such as backwards elimination to identify variables that are independent predictors of an outcome under consideration. We propose using bootstrap resampling in conjunction with automated variable selection methods to develop parsimonious prediction models. Using data on patients admitted to hospital with a heart attack, we demonstrate that selecting those variables that were identified as independent predictors of mortality in at least 60%% of the bootstrap samples resulted in a parsimonious model with excellent predictive ability.

Journal ArticleDOI
TL;DR: These are the nowadays most frequently used white-box models for description of biological nitrogen and phosphorus removal activated sludge processes, mainly applicable to municipal wastewater systems, but can be adapted easily to specific situations such as the presence of industrial wastewater.
Abstract: This review paper focuses on modelling of wastewater treatment plants (WWTP). White-box modelling is widely applied in this field, with learning, design and process optimisation as the main applications. The introduction of the ASM model family by the IWA task group was of great importance, providing researchers and practitioners with a standardised set of basis models. This paper introduces the nowadays most frequently used white-box models for description of biological nitrogen and phosphorus removal activated sludge processes. These models are mainly applicable to municipal wastewater systems, but can be adapted easily to specific situations such as the presence of industrial wastewater. Some of the main model assumptions are highlighted, and their implications for practical model application are discussed. A step-wise procedure leads from the model purpose definition to a calibrated WWTP model. Important steps in the procedure are: model purpose definition, model selection, data collection, data reconciliation, calibration of the model parameters and model unfalsification. The model purpose, defined at the beginning of the procedure, influences the model selection, the data collection and the model calibration. In the model calibration a process engineering approach, i.e. based on understanding of the process and the model structure, is needed. A calibrated WWTP model, the result of an iterative procedure, can usually be obtained by only modifying few model parameters, using the default parameter sets as a starting point. Black-box, stochastic grey-box and hybrid models are useful in WWTP applications for prediction of the influent load, for estimation of biomass activities and effluent quality parameters. These modelling methodologies thus complement the process knowledge included in white-box models with predictions based on data in areas where the white-box model assumptions are not valid or where white-box models do not provide accurate predictions. Artificial intelligence (AI) covers a large spectrum of methods, and many of them have been applied in applications related to WWTPs. AI methodologies and white-box models can interact in many ways; supervisory control systems for WWTPs are one evident application. Modular agent-based systems combining several AI and modelling methods provide a great potential. In these systems, AI methods on one hand can maximise the knowledge extracted from data and operator experience, and subsequently apply this knowledge to improve WWTP control. White-box models on the other hand allow evaluating scenarios based on the available process knowledge about the WWTP. A white-box model calibration tool, an AI based WWTP design tool and a knowledge representation tool in the WWTP domain are other potential applications where fruitful interactions between AI methods and white-box models could be developed.

01 Jan 2004
TL;DR: This work explores reversible jump MCMC methods that build on sets of parallel Gibbs sampling-based analyses to generate suitable empirical proposal distributions and that address the challenging problem of finding efficient proposals in high-dimensional models.
Abstract: Factor analysis has been one of the most powerful and flexible tools for assessment of multivariate dependence and codependence. Loosely speaking, it could be argued that the origin of its success rests in its very exploratory nature, where various kinds of data-relationships amongst the variables at study can be iteratively verified and/or refuted. Bayesian inference in factor analytic models has received renewed attention in recent years, partly due to computational advances but also partly to applied focuses generating factor structures as exemplified by recent work in financial time series modeling. The focus of our current work is on exploring questions of uncertainty about the number of latent factors in a multi- variate factor model, combined with methodological and computational issues of model specification and model fitting. We explore reversible jump MCMC methods that build on sets of parallel Gibbs sampling-based analyses to generate suitable empirical proposal distributions and that address the challenging problem of finding efficient proposals in high-dimensional models. Alternative MCMC methods based on bridge sampling are discussed, and these fully Bayesian MCMC approaches are compared with a collection of popular model selection methods in empirical stud- ies. Various additional computational issues are discussed, including situations where prior information is scarce, and the methods are explored in studies of some simulated data sets and an econometric time series example.

Journal ArticleDOI
TL;DR: The reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution.
Abstract: A common problem in molecular phylogenetics is choosing a model of DNA substitution that does a good job of explaining the DNA sequence alignment without introducing superfluous parameters. A number of methods have been used to choose among a small set of candidate substitution models, such as the likelihood ratio test, the Akaike Information Criterion (AIC), the Bayesian Information Criterion (BIC), and Bayes factors. Current implementations of any of these criteria suffer from the limitation that only a small set of models are examined, or that the test does not allow easy comparison of non-nested models. In this article, we expand the pool of candidate substitution models to include all possible time-reversible models. This set includes seven models that have already been described. We show how Bayes factors can be calculated for these models using reversible jump Markov chain Monte Carlo, and apply the method to 16 DNA sequence alignments. For each data set, we compare the model with the best Bayes factor to the best models chosen using AIC and BIC. We find that the best model under any of these criteria is not necessarily the most complicated one; models with an intermediate number of substitution types typically do best. Moreover, almost all of the models that are chosen as best do not constrain a transition rate to be the same as a transversion rate, suggesting that it is the transition/ transversion rate bias that plays the largest role in determining which models are selected. Importantly, the reversible jump Markov chain Monte Carlo algorithm described here allows estimation of phylogeny (and other phylogenetic model parameters) to be performed while accounting for uncertainty in the model of DNA substitution.

Journal ArticleDOI
TL;DR: In this paper, two new approaches are proposed for estimating the regression coefficients in a semiparametric model and the asymptotic normality of the resulting estimators is established.
Abstract: Semiparametric regression models are very useful for longitudinal data analysis. The complexity of semiparametric models and the structure of longitudinal data pose new challenges to parametric inferences and model selection that frequently arise from longitudinal data analysis. In this article, two new approaches are proposed for estimating the regression coefficients in a semiparametric model. The asymptotic normality of the resulting estimators is established. An innovative class of variable selection procedures is proposed to select significant variables in the semiparametric models. The proposed procedures are distinguished from others in that they simultaneously select significant variables and estimate unknown parameters. Rates of convergence of the resulting estimators are established. With a proper choice of regularization parameters and penalty functions, the proposed variable selection procedures are shown to perform as well as an oracle estimator. A robust standard error formula is derived usi...

Journal ArticleDOI
TL;DR: It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree.
Abstract: Compositional heterogeneity among lineages can compromise phylogenetic analyses, because models in common use assume compositionally homogeneous data. Models that can accommodate compositional heterogeneity with few extra parameters are described here, and used in two examples where the true tree is known with confidence. It is shown using likelihood ratio tests that adequate modeling of compositional heterogeneity can be achieved with few composition parameters, that the data may not need to be modelled with separate composition parameters for each branch in the tree. Tree searching and placement of composition vectors on the tree are done in a Bayesian framework using Markov chain Monte Carlo (MCMC) methods. Assessment of fit of the model to the data is made in both maximum likelihood (ML) and Bayesian frameworks. In an ML framework, overall model fit is assessed using the Goldman-Cox test, and the fit of the composition implied by a (possibly heterogeneous) model to the composition of the data is assessed using a novel tree- and model-based composition fit test. In a Bayesian framework, overall model fit and composition fit are assessed using posterior predictive simulation. It is shown that when composition is not accommodated, then the model does not fit, and incorrect trees are found; but when composition is accommodated, the model then fits, and the known correct phylogenies are obtained. (Compositional heterogeneity; Markov chain Monte Carlo; maximum likelihood; model assessment; model selection; phylogenetics.) Markov process models used for phylogenetic analysis of DNA sequences have become more realistic. The sim- ple Jukes-Cantor model has been extended to take into account unequal nucleotide composition, different rates of change from one nucleotide to another, and among- site rate variation in the form of a proportion of invariant

Journal ArticleDOI
TL;DR: In this paper, the authors consider the various proposals from a Bayesian decision-theoretic perspective for model selection from both frequentist and Bayesian perspectives, and propose a method for selecting the best model.
Abstract: Model selection is an important part of any statistical analysis and, indeed, is central to the pursuit of science in general. Many authors have examined the question of model selection from both frequentist and Bayesian perspectives, and many tools for selecting the “best model” have been suggested in the literature. This paper considers the various proposals from a Bayesian decision–theoretic perspective.


Journal ArticleDOI
TL;DR: The LARS method as discussed by the authors is based on a recursive procedure selecting, at each step, the covariates having largest absolute correlation with the response variable, which enables recovering the estimates given by the Lasso and Stagewise.
Abstract: DISCUSSION OF “LEAST ANGLE REGRESSION” BY EFRONET AL.By Jean-Michel Loubes and Pascal MassartUniversit´e Paris-SudThe issue of model selection has drawn the attention of both applied andtheoretical statisticians for a long time. Indeed, there has been an enor-mous range of contribution in model selection proposals, including work byAkaike (1973), Mallows (1973), Foster and George (1994), Birg´e and Mas-sart (2001a) and Abramovich, Benjamini, Donoho and Johnstone (2000).Over the last decade, modern computer-driven methods have been devel-oped such as All Subsets, Forward Selection, Forward Stagewise or Lasso.Such methods are useful in the setting of the standard linear model, wherewe observe noisy data and wish to predict the response variable using onlya few covariates, since they provide automatically linear models that fit thedata. The procedure described in this paper is, on the one hand, numeri-cally very efficient and, on the other hand, very general, since, with slightmodifications, it enables us to recover the estimates given by the Lasso andStagewise.1. Estimation procedure. The “LARS” method is based on a recursiveprocedure selecting, at each step, the covariates having largest absolute cor-relation with the response y. In the case of an orthogonal design, the esti-mates can then be viewed as an l

Book
30 Aug 2004
TL;DR: In this paper, the CAPM model is used to calculate the risk on a portfolio and a trading strategy is proposed for pairs selection in equity markets, based on the Arrow-Debreu theory.
Abstract: Preface. Acknowledgments. PART ONE: BACKGROUND MATERIAL. Chapter 1. Introduction. The CAPM Model. Market Neutral Strategy. Pairs Trading. Outline. Audience. Chapter 2. Time Series. Overview. Autocorrelation. Time Series Models. Forecasting. Goodness of Fit versus Bias. Model Choice. Modeling Stock Prices. Chapter 3. Factor Models. Introduction. Arbitrage Pricing Theory. The Covariance Matrix. Application: Calculating the Risk on a Portfolio. Application: Calculation of Portfolio Beta. Application: Tracking Basket Design. Sensitivity Analysis. Chapter 4. Kalman Filtering. Introduction. The Kalman Filter. The Scalar Kalman Filter. Filtering the Random Walk. Application: Example with the Standard & Poor Index. PART TWO: STATISTICAL ARBITRAGE. Chapter 5. Overview. History. Motivation. Cointegration. Applying the Model. A Trading Strategy. Road Map for Strategy Design. Chapter 6. Pairs Selection in Equity Markets. Introduction. Common Trends Cointegration Model. Common Trends Model and APT. The Distance Measure. Interpreting the Distance Measure. Reconciling Theory and Practice. Chapter 7. Testing for Tradability. Introduction. The Linear Relationship. Estimating the Linear Relationship: The Multifactor Approach. Estimating the Linear Relationship: The Regression Approach. Testing Residual for Tradability. Chapter 8. Trading Design. Introduction. Band Design for White Noise. Spread Dynamics. Nonparametric Approach. Regularization. Tying Up Loose Ends. PART THREE: RISK ARBITRAGE PAIRS. Chapter 9. Risk Arbitrage Mechanics. Introduction. History. The Deal Process. Transaction Terms. The Deal Spread. Trading Strategy. Quantitative Aspects. Chapter 10. Trade Execution. Introduction. Specifying the Order. Verifying the Execution. Execution During the Pricing Period. Short Selling. Chapter 11. The Market Implied Merger Probability. Introduction. Implied Probabilities and Arrow-Debreu Theory. The Single-Step Model. The Multistep Model. Reconciling Theory and Practice. Risk Management. Chapter 12. Spread Inversion. Introduction. The Prediction Equation. The Observation Equation. Applying the Kalman Filter. Model Selection. Applications to Trading. Index.

Journal ArticleDOI
TL;DR: It is shown that exact leave-one-out cross-validation of sparse Least-Squares Support Vector Machines (LS-SVMs) can be implemented with a computational complexity of only O(ln2) floating point operations, rather than the O(l2n2) operations of a naïve implementation.

Journal ArticleDOI
TL;DR: The reproducibility of logistic regression models developed using automated variable selection methods are determined to be unstable and not reproducible because the variables selected as independent predictors are sensitive to random fluctuations in the data.

Journal ArticleDOI
TL;DR: A fully Bayesian approach to modeling in functional magnetic resonance imaging (FMRI), incorporating spatio-temporal noise modeling and haemodynamic response function (HRF) modeling is presented, and a novel HRF model made up of half-cosines is proposed, which allows distinct combinations of parameters to represent characteristics of interest.
Abstract: We present a fully Bayesian approach to modeling in functional magnetic resonance imaging (FMRI), incorporating spatio-temporal noise modeling and haemodynamic response function (HRF) modeling. A fully Bayesian approach allows for the uncertainties in the noise and signal modeling to be incorporated together to provide full posterior distributions of the HRF parameters. The noise modeling is achieved via a nonseparable space-time vector autoregressive process. Previous FMRI noise models have either been purely temporal, separable or modeling deterministic trends. The specific form of the noise process is determined using model selection techniques. Notably, this results in the need for a spatially nonstationary and temporally stationary spatial component. Within the same full model, we also investigate the variation of the HRF in different areas of the activation, and for different experimental stimuli. We propose a novel HRF model made up of half-cosines, which allows distinct combinations of parameters to represent characteristics of interest. In addition, to adaptively avoid over-fitting we propose the use of automatic relevance determination priors to force certain parameters in the model to zero with high precision if there is no evidence to support them in the data. We apply the model to three datasets and observe matter-type dependence of the spatial and temporal noise, and a negative correlation between activation height and HRF time to main peak (although we suggest that this apparent correlation may be due to a number of different effects).

Journal ArticleDOI
TL;DR: An online (recursive) algorithm is proposed that estimates the parameters of the mixture and that simultaneously selects the number of components to search for the maximum a posteriori (MAP) solution and to discard the irrelevant components.
Abstract: There are two open problems when finite mixture densities are used to model multivariate data: the selection of the number of components and the initialization. In this paper, we propose an online (recursive) algorithm that estimates the parameters of the mixture and that simultaneously selects the number of components. The new algorithm starts with a large number of randomly initialized components. A prior is used as a bias for maximally structured models. A stochastic approximation recursive learning algorithm is proposed to search for the maximum a posteriori (MAP) solution and to discard the irrelevant components.