Showing papers on "Model selection published in 2010"

PDF

Open Access

Journal Article•DOI•

A survey of cross-validation procedures for model selection

[...]

Sylvain Arlot¹, Alain Celisse•Institutions (1)

01 Jan 2010-Statistics Surveys

TL;DR: This survey intends to relate the model selection performances of cross-validation procedures to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results.

...read moreread less

Abstract: Used to estimate the risk of an estimator or to perform model selection, cross-validation is a widespread strategy because of its simplicity and its apparent universality. Many results exist on the model selection performances of cross-validation procedures. This survey intends to relate these results to the most recent advances of model selection theory, with a particular emphasis on distinguishing empirical statements from rigorous theoretical results. As a conclusion, guidelines are provided for choosing the best cross-validation procedure according to the particular features of the problem in hand.

...read moreread less

2,980 citations

Journal Article•DOI•

Uninformative Parameters and Model Selection Using Akaike's Information Criterion

[...]

Todd W. Arnold¹•Institutions (1)

University of Minnesota¹

01 Aug 2010-Journal of Wildlife Management

TL;DR: Models with uninformative parameters are frequently presented as being competitive in the Journal of Wildlife Management, including 72% of all AIC-based papers in 2008, and authors and readers need to be more aware of this problem and take appropriate steps to eliminate misinterpretation.

...read moreread less

Abstract: As use of Akaike's Information Criterion (AIC) for model selection has become increasingly common, so has a mistake involving interpretation of models that are within 2 AIC units (ΔAIC ≤ 2) of the top-supported model. Such models are <2 ΔAIC units because the penalty for one additional parameter is +2 AIC units, but model deviance is not reduced by an amount sufficient to overcome the 2-unit penalty and, hence, the additional parameter provides no net reduction in AIC. Simply put, the uninformative parameter does not explain enough variation to justify its inclusion in the model and it should not be interpreted as having any ecological effect. Models with uninformative parameters are frequently presented as being competitive in the Journal of Wildlife Management, including 72% of all AIC-based papers in 2008, and authors and readers need to be more aware of this problem and take appropriate steps to eliminate misinterpretation. I reviewed 5 potential solutions to this problem: 1) report all model...

...read moreread less

2,700 citations

Journal Article•DOI•

Nearly unbiased variable selection under minimax concave penalty

[...]

Cun-Hui Zhang

01 Apr 2010-Annals of Statistics

TL;DR: In this paper, the authors proposed a penalized linear unbiased selection (PLUS) algorithm, which computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the loss.

...read moreread less

Abstract: We propose MC+, a fast, continuous, nearly unbiased and accurate method of penalized variable selection in high-dimensional linear regression. The LASSO is fast and continuous, but biased. The bias of the LASSO may prevent consistent variable selection. Subset selection is unbiased but computationally costly. The MC+ has two elements: a minimax concave penalty (MCP) and a penalized linear unbiased selection (PLUS) algorithm. The MCP provides the convexity of the penalized loss in sparse regions to the greatest extent given certain thresholds for variable selection and unbiasedness. The PLUS computes multiple exact local minimizers of a possibly nonconvex penalized loss function in a certain main branch of the graph of critical points of the penalized loss. Its output is a continuous piecewise linear path encompassing from the origin for infinite penalty to a least squares solution for zero penalty. We prove that at a universal penalty level, the MC+ has high probability of matching the signs of the unknowns, and thus correct selection, without assuming the strong irrepresentable condition required by the LASSO. This selection consistency applies to the case of p≫n, and is proved to hold for exactly the MC+ solution among possibly many local minimizers. We prove that the MC+ attains certain minimax convergence rates in probability for the estimation of regression coefficients in lr balls. We use the SURE method to derive degrees of freedom and Cp-type risk estimates for general penalized LSE, including the LASSO and MC+ estimators, and prove their unbiasedness. Based on the estimated degrees of freedom, we propose an estimator of the noise level for proper choice of the penalty level. For full rank designs and general sub-quadratic penalties, we provide necessary and sufficient conditions for the continuity of the penalized LSE. Simulation results overwhelmingly support our claim of superior variable selection properties and demonstrate the computational efficiency of the proposed method.

...read moreread less

2,382 citations

Journal Article•DOI•

On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation

[...]

Gavin C. Cawley, Nicola L. C. Talbot

01 Mar 2010-Journal of Machine Learning Research

TL;DR: It is demonstrated that a low variance is at least as important, as a non-negligible variance introduces the potential for over-fitting in model selection as well as in training the model, and some common performance evaluation practices are susceptible to a form of selection bias as a result of this form of over- fitting and hence are unreliable.

...read moreread less

Abstract: Model selection strategies for machine learning algorithms typically involve the numerical optimisation of an appropriate model selection criterion, often based on an estimator of generalisation performance, such as k-fold cross-validation. The error of such an estimator can be broken down into bias and variance components. While unbiasedness is often cited as a beneficial quality of a model selection criterion, we demonstrate that a low variance is at least as important, as a non-negligible variance introduces the potential for over-fitting in model selection as well as in training the model. While this observation is in hindsight perhaps rather obvious, the degradation in performance due to over-fitting the model selection criterion can be surprisingly large, an observation that appears to have received little attention in the machine learning literature to date. In this paper, we show that the effects of this form of over-fitting are often of comparable magnitude to differences in performance between learning algorithms, and thus cannot be ignored in empirical evaluation. Furthermore, we show that some common performance evaluation practices are susceptible to a form of selection bias as a result of this form of over-fitting and hence are unreliable. We discuss methods to avoid over-fitting in model selection and subsequent selection bias in performance evaluation, which we hope will be incorporated into best practice. While this study concentrates on cross-validation based model selection, the findings are quite general and apply to any model selection practice involving the optimisation of a model selection criterion evaluated over a finite sample of data, including maximisation of the Bayesian evidence and optimisation of performance bounds.

...read moreread less

1,532 citations

Journal Article•DOI•

glmulti: An R Package for Easy Automated Model Selection with (Generalized) Linear Models

[...]

Vincent Calcagno, Claire de Mazancourt

31 May 2010-Journal of Statistical Software

TL;DR: In this paper, an R package for automated model selection and multi-model inference with glm and related functions is presented. But it is not suitable for large candidate sets by avoiding memory limitation, facilitating parallelization and providing, in addition to exhaustive screening, a compiled genetic algorithm method.

...read moreread less

Abstract: We introduce glmulti, an R package for automated model selection and multi-model inference with glm and related functions. From a list of explanatory variables, the provided function glmulti builds all possible unique models involving these variables and, optionally, their pairwise interactions. Restrictions can be specified for candidate models, by excluding specific terms, enforcing marginality, or controlling model complexity. Models are fitted with standard R functions like glm. The n best models and their support (e.g., (Q)AIC, (Q)AICc, or BIC) are returned, allowing model selection and multi-model inference through standard R functions. The package is optimized for large candidate sets by avoiding memory limitation, facilitating parallelization and providing, in addition to exhaustive screening, a compiled genetic algorithm method. This article briefly presents the statistical framework and introduces the package, with applications to simulated and real data.

...read moreread less

962 citations

Journal Article•

A Selective Overview of Variable Selection in High Dimensional Feature Space.

[...]

Jianqing Fan¹, Jinchi Lv²•Institutions (2)

Princeton University¹, University of Southern California²

01 Jan 2010-Statistica Sinica

TL;DR: In this paper, a brief account of the recent developments of theory, methods, and implementations for high-dimensional variable selection is presented, with emphasis on independence screening and two-scale methods.

...read moreread less

Abstract: High dimensional statistical problems arise from diverse fields of scientific research and technological development. Variable selection plays a pivotal role in contemporary statistical learning and scientific discoveries. The traditional idea of best subset selection methods, which can be regarded as a specific form of penalized likelihood, is computationally too expensive for many modern statistical applications. Other forms of penalized likelihood methods have been successfully developed over the last decade to cope with high dimensionality. They have been widely applied for simultaneously selecting important variables and estimating their effects in high dimensional statistical inference. In this article, we present a brief account of the recent developments of theory, methods, and implementations for high dimensional variable selection. What limits of the dimensionality such methods can handle, what the role of penalty functions is, and what the statistical properties are rapidly drive the advances of the field. The properties of non-concave penalized likelihood and its roles in high dimensional statistical modeling are emphasized. We also review some recent advances in ultra-high dimensional variable selection, with emphasis on independence screening and two-scale methods.

...read moreread less

892 citations

Journal Article•DOI•

High-dimensional Ising model selection using ℓ1-regularized logistic regression

[...]

Pradeep Ravikumar, Martin J. Wainwright¹, John Lafferty²•Institutions (2)

University of California, Berkeley¹, Carnegie Mellon University²

01 Jun 2010-Annals of Statistics

TL;DR: In this paper, the problem of estimating the graph associated with a binary Ising Markov random field is considered, where the neighborhood of any given node is estimated by performing logistic regression subject to an l 1-constraint.

...read moreread less

Abstract: We consider the problem of estimating the graph associated with a binary Ising Markov random field. We describe a method based on l1-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an l1-constraint. The method is analyzed under high-dimensional scaling in which both the number of nodes p and maximum neighborhood size d are allowed to grow as a function of the number of observations n. Our main results provide sufficient conditions on the triple (n, p, d) and the model parameters for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. With coherence conditions imposed on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes n=Ω(d3log p) with exponentially decaying error. When these same conditions are imposed directly on the sample matrices, we show that a reduced sample size of n=Ω(d2log p) suffices for the method to estimate neighborhoods consistently. Although this paper focuses on the binary graphical models, we indicate how a generalization of the method of the paper would apply to general discrete Markov random fields.

...read moreread less

776 citations

Journal Article•DOI•

Comparing families of dynamic causal models.

[...]

William D. Penny¹, Klaas E. Stephan¹, Klaas E. Stephan², Jean Daunizeau¹, Maria J. Rosa¹, Karl J. Friston¹, Thomas M. Schofield¹, Alexander P. Leff¹ - Show less +4 more•Institutions (2)

Wellcome Trust Centre for Neuroimaging¹, University of Zurich²

12 Mar 2010-PLOS Computational Biology

TL;DR: A combination of two further approaches: family level inference and Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure are proposed.

...read moreread less

Abstract: Mathematical models of scientific data can be formally compared using Bayesian model evidence. Previous applications in the biological sciences have mainly focussed on model selection in which one first selects the model with the highest evidence and then makes inferences based on the parameters of that model. This "best model" approach is very useful but can become brittle if there are a large number of models to compare, and if different subjects use different models. To overcome this shortcoming we propose the combination of two further approaches: (i) family level inference and (ii) Bayesian model averaging within families. Family level inference removes uncertainty about aspects of model structure other than the characteristic of interest. For example: What are the inputs to the system? Is processing serial or parallel? Is it linear or nonlinear? Is it mediated by a single, crucial connection? We apply Bayesian model averaging within families to provide inferences about parameters that are independent of further assumptions about model structure. We illustrate the methods using Dynamic Causal Models of brain imaging data.

...read moreread less

680 citations

Journal Article•DOI•

Regression analysis of spatial data

[...]

Colin M. Beale¹, Jack J. Lennon¹, Jon M. Yearsley², Mark J. Brewer, David A. Elston - Show less +1 more•Institutions (2)

Macaulay Institute¹, University of Lausanne²

01 Feb 2010-Ecology Letters

TL;DR: The issues that need consideration when analysing spatial data are described and illustrated using simulation studies and the simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets.

...read moreread less

Abstract: Many of the most interesting questions ecologists ask lead to analyses of spatial data. Yet, perhaps confused by the large number of statistical models and fitting methods available, many ecologists seem to believe this is best left to specialists. Here, we describe the issues that need consideration when analysing spatial data and illustrate these using simulation studies. Our comparative analysis involves using methods including generalized least squares, spatial filters, wavelet revised models, conditional autoregressive models and generalized additive mixed models to estimate regression coefficients from synthetic but realistic data sets, including some which violate standard regression assumptions. We assess the performance of each method using two measures and using statistical error rates for model selection. Methods that performed well included generalized least squares family of models and a Bayesian implementation of the conditional auto-regressive model. Ordinary least squares also performed adequately in the absence of model selection, but had poorly controlled Type I error rates and so did not show the improvements in performance under model selection when using the above methods. Removing large-scale spatial trends in the response led to poor performance. These are empirical results; hence extrapolation of these findings to other situations should be performed cautiously. Nevertheless, our simulation-based approach provides much stronger evidence for comparative analysis than assessments based on single or small numbers of data sets, and should be considered a necessary foundation for statements of this type in future.

...read moreread less

560 citations

Posted Content•

Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain

[...]

Alexandre Belloni, Daniel L. Chen, Victor Chernozhukov, Christian Hansen

21 Oct 2010-arXiv: Methodology

TL;DR: A fully data-driven method for choosing the user-specified penalty that must be provided in obtaining LASSO and Post-LASSO estimates is provided and its asymptotic validity under non-Gaussian, heteroscedastic disturbances is established.

...read moreread less

Abstract: We develop results for the use of Lasso and Post-Lasso methods to form first-stage predictions and estimate optimal instruments in linear instrumental variables (IV) models with many instruments, $p$. Our results apply even when $p$ is much larger than the sample size, $n$. We show that the IV estimator based on using Lasso or Post-Lasso in the first stage is root-n consistent and asymptotically normal when the first-stage is approximately sparse; i.e. when the conditional expectation of the endogenous variables given the instruments can be well-approximated by a relatively small set of variables whose identities may be unknown. We also show the estimator is semi-parametrically efficient when the structural error is homoscedastic. Notably our results allow for imperfect model selection, and do not rely upon the unrealistic "beta-min" conditions that are widely used to establish validity of inference following model selection. In simulation experiments, the Lasso-based IV estimator with a data-driven penalty performs well compared to recently advocated many-instrument-robust procedures. In an empirical example dealing with the effect of judicial eminent domain decisions on economic outcomes, the Lasso-based IV estimator outperforms an intuitive benchmark. In developing the IV results, we establish a series of new results for Lasso and Post-Lasso estimators of nonparametric conditional expectation functions which are of independent theoretical and practical interest. We construct a modification of Lasso designed to deal with non-Gaussian, heteroscedastic disturbances which uses a data-weighted $\ell_1$-penalty function. Using moderate deviation theory for self-normalized sums, we provide convergence rates for the resulting Lasso and Post-Lasso estimators that are as sharp as the corresponding rates in the homoscedastic Gaussian case under the condition that $\log p = o(n^{1/3})$.

...read moreread less

495 citations

Journal Article•

A Surrogate Modeling and Adaptive Sampling Toolbox for Computer Based Design

[...]

Dirk Gorissen, Ivo Couckuyt, Piet Demeester, Tom Dhaene, Karel Crombecq - Show less +1 more

01 Mar 2010-Journal of Machine Learning Research

TL;DR: This paper presents a mature, flexible, and adaptive machine learning toolkit for regression modeling and active learning to tackle issues of computational cost and model accuracy.

...read moreread less

Abstract: An exceedingly large number of scientific and engineering fields are confronted with the need for computer simulations to study complex, real world phenomena or solve challenging design problems. However, due to the computational cost of these high fidelity simulations, the use of neural networks, kernel methods, and other surrogate modeling techniques have become indispensable. Surrogate models are compact and cheap to evaluate, and have proven very useful for tasks such as optimization, design space exploration, prototyping, and sensitivity analysis. Consequently, in many fields there is great interest in tools and techniques that facilitate the construction of such regression models, while minimizing the computational cost and maximizing model accuracy. This paper presents a mature, flexible, and adaptive machine learning toolkit for regression modeling and active learning to tackle these issues. The toolkit brings together algorithms for data fitting, model selection, sample selection (active learning), hyperparameter optimization, and distributed computing in order to empower a domain expert to efficiently generate an accurate model for the problem or data at hand.

...read moreread less

Book Chapter•DOI•

Latent Class Models

[...]

Jeroen K. Vermunt¹•Institutions (1)

Tilburg University¹

01 Jan 2010

TL;DR: A statistical model can be called a latent class (LC) or mixture model if it assumes that some of its parameters differ across unobserved subgroups, LCs, or mixture components as mentioned in this paper.

...read moreread less

Abstract: A statistical model can be called a latent class (LC) or mixture model if it assumes that some of its parameters differ across unobserved subgroups, LCs, or mixture components. This rather general idea has several seemingly unrelated applications, the most important of which are clustering, scaling, density estimation, and random-effects modeling. This article describes simple LC models for clustering, restricted LC models for scaling, and mixture regression models for nonparametric random-effects modeling, as well as gives an overview of recent developments in the field of LC analysis. Moreover, attention is paid to topics such as maximum likelihood estimation, identification issues, model selection, and software.

...read moreread less

Journal Article•DOI•

Variable selection in nonparametric additive models

[...]

Jian Huang¹, Joel L. Horowitz², Fengrong Wei³•Institutions (3)

University of Iowa¹, Northwestern University², University of West Georgia³

01 Aug 2010-Annals of Statistics

TL;DR: In this paper, the adaptive group Lasso was used to select nonzero components in a nonparametric additive model of a conditional mean function, where the additive components are approximated by truncated series expansions with B-spline bases, and the problem of component selection becomes that of selecting the groups of coefficients in the expansion.

...read moreread less

Abstract: We consider a nonparametric additive model of a conditional mean function in which the number of variables and additive components may be larger than the sample size but the number of nonzero additive components is "small" relative to the sample size. The statistical problem is to determine which additive components are nonzero. The additive components are approximated by truncated series expansions with B-spline bases. With this approximation, the problem of component selection becomes that of selecting the groups of coefficients in the expansion. We apply the adaptive group Lasso to select nonzero components, using the group Lasso to obtain an initial estimator and reduce the dimension of the problem. We give conditions under which the group Lasso selects a model whose number of components is comparable with the underlying model, and the adaptive group Lasso selects the nonzero components correctly with probability approaching one as the sample size increases and achieves the optimal rate of convergence. The results of Monte Carlo experiments show that the adaptive group Lasso procedure works well with samples of moderate size. A data example is used to illustrate the application of the proposed method.

...read moreread less

Journal Article•DOI•

Estimating and controlling for spatial structure in the study of ecological communities

[...]

Pedro R. Peres-Neto¹, Pierre Legendre²•Institutions (2)

Université du Québec à Montréal¹, Université de Montréal²

01 Mar 2010-Global Ecology and Biogeography

TL;DR: Numerical simulations are used to compare how different spatial predictors and model selection procedures perform in assessing the importance of the spatial component and in controlling for type I error while testing environmental predictors.

...read moreread less

Abstract: Aim Variation partitioning based on canonical analysis is the most commonly used analysis to investigate community patterns according to environmental and spatial predictors. Ecologists use this method in order to understand the pure contribution of the environment independent of space, and vice versa, as well as to control for inflated type I error in assessing the environmental component under spatial autocorrelation. Our goal is to use numerical simulations to compare how different spatial predictors and model selection procedures perform in assessing the importance of the spatial component and in controlling for type I error while testing environmental predictors. Innovation We determine for the first time how the ability of commonly used (polynomial regressors) and novel methods based on eigenvector maps compare in the realm of spatial variation partitioning. We introduce a novel forward selection procedure to select spatial regressors for community analysis. Finally, we point out a number of issues that have not been previously considered about the joint explained variation between environment and space, which should be taken into account when reporting and testing the unique contributions of environment and space in patterning ecological communities. Main conclusions In tests of species-environment relationships,spatial autocorrelation is known to inflate the level of type I error and make the tests of significance invalid. First, one must determine if the spatial component is significant using all spatial predictors (Moran’s eigenvector maps). If it is, consider a model selection for the set of spatial predictors (an individual-species forward selection procedure is to be preferred) and use the environmental and selected spatial predictors in a partial regression or partial canonical analysis scheme. This is an effective way of controlling for type I error in such tests. Polynomial regressors do not provide tests with a correct level of type I error.

...read moreread less

Journal Article•DOI•

Neurally constrained modeling of perceptual decision making.

[...]

Braden A. Purcell¹, Richard P. Heitz¹, Jeremiah Y. Cohen¹, Jeffrey D. Schall¹, Gordon D. Logan¹, Thomas J. Palmeri¹ - Show less +2 more•Institutions (1)

Vanderbilt University¹

01 Oct 2010-Psychological Review

TL;DR: The present investigation mapped the firing rate of frontal eye field (FEF) visual neurons onto perceptual evidence and the firing rates of FEF movement neurons onto evidence accumulation to test alternative models of how evidence is combined in the accumulation process.

...read moreread less

Abstract: Stochastic accumulator models account for response time in perceptual decision-making tasks by assuming that perceptual evidence accumulates to a threshold. The present investigation mapped the firing rate of frontal eye field (FEF) visual neurons onto perceptual evidence and the firing rate of FEF movement neurons onto evidence accumulation to test alternative models of how evidence is combined in the accumulation process. The models were evaluated on their ability to predict both response time distributions and movement neuron activity observed in monkeys performing a visual search task. Models that assume gating of perceptual evidence to the accumulating units provide the best account of both behavioral and neural data. These results identify discrete stages of processing with anatomically distinct neural populations and rule out several alternative architectures. The results also illustrate the use of neurophysiological data as a model selection tool and establish a novel framework to bridge computational and neural levels of explanation.

...read moreread less

Journal Article•DOI•

Estimating panel data models in the presence of endogeneity and selection

[...]

Anastasia Semykina¹, Jeffrey M. Wooldridge²•Institutions (2)

Florida State University¹, Michigan State University²

01 Aug 2010-Journal of Econometrics

TL;DR: In this article, the authors consider estimation of panel data models with sample selection when the equation of interest contains endogenous explanatory variables as well as unobserved heterogeneity and propose several tests for selection bias and two estimation procedures that correct for selection in the presence of endogenous regressors.

...read moreread less

Journal Article•DOI•

Parameter Estimation and Model Selection in Computational Biology

[...]

Gabriele Lillacci¹, Mustafa Khammash¹•Institutions (1)

University of California, Santa Barbara¹

05 Mar 2010-PLOS Computational Biology

TL;DR: This work shows how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters, and shows how the same tools can be used to discriminate among alternate models of the same biological process.

...read moreread less

Abstract: A central challenge in computational modeling of biological systems is the determination of the model parameters. Typically, only a fraction of the parameters (such as kinetic rate constants) are experimentally measured, while the rest are often fitted. The fitting process is usually based on experimental time course measurements of observables, which are used to assign parameter values that minimize some measure of the error between these measurements and the corresponding model prediction. The measurements, which can come from immunoblotting assays, fluorescent markers, etc., tend to be very noisy and taken at a limited number of time points. In this work we present a new approach to the problem of parameter selection of biological models. We show how one can use a dynamic recursive estimator, known as extended Kalman filter, to arrive at estimates of the model parameters. The proposed method follows. First, we use a variation of the Kalman filter that is particularly well suited to biological applications to obtain a first guess for the unknown parameters. Secondly, we employ an a posteriori identifiability test to check the reliability of the estimates. Finally, we solve an optimization problem to refine the first guess in case it should not be accurate enough. The final estimates are guaranteed to be statistically consistent with the measurements. Furthermore, we show how the same tools can be used to discriminate among alternate models of the same biological process. We demonstrate these ideas by applying our methods to two examples, namely a model of the heat shock response in E. coli, and a model of a synthetic gene regulation system. The methods presented are quite general and may be applied to a wide class of biological systems where noisy measurements are used for parameter estimation or model selection.

...read moreread less

Posted Content•

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

[...]

Han Liu¹, Kathryn Roeder¹, Larry Wasserman¹•Institutions (1)

Carnegie Mellon University¹

16 Jun 2010-arXiv: Machine Learning

TL;DR: The method has a clear interpretation: the authors use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, which requires essentially no conditions.

...read moreread less

Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include $K$-fold cross-validation ($K$-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including $K$-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

...read moreread less

Proceedings Article•

Discriminative Clustering by Regularized Information Maximization

[...]

Andreas Krause¹, Pietro Perona¹, Ryan G. Gomes¹•Institutions (1)

California Institute of Technology¹

06 Dec 2010

TL;DR: This work presents a framework that simultaneously clusters the data and trains a discriminative classifier, and instantiate the framework to un-supervised, multi-class kernelized logistic regression, and demonstrates that RIM is an effective model selection method.

...read moreread less

Abstract: Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semi-supervised learning. In particular, we instantiate the framework to un-supervised, multi-class kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method.

...read moreread less

Journal Article•DOI•

Model selection bias and Freedman’s paradox

[...]

Paul M. Lukacs, Kenneth P. Burnham¹, David E. Anderson¹•Institutions (1)

Colorado State University¹

01 Feb 2010-Annals of the Institute of Statistical Mathematics

TL;DR: In this paper, a new model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables.

...read moreread less

Abstract: In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152–155, 1983) demonstrated how common it is to select completely unrelated variables as highly “significant” when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike’s AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly “best” model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties.

...read moreread less

Journal Article•DOI•

Simulation-based model selection for dynamical systems in systems and population biology

[...]

Tina Toni¹, Michael P. H. Stumpf¹•Institutions (1)

Imperial College London¹

01 Jan 2010-Bioinformatics

TL;DR: Toni et al. as discussed by the authors developed a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling, which can be applied across a wide range of biological scenarios, and illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway.

...read moreread less

Abstract: Motivation: Computer simulations have become an important tool across the biomedical sciences and beyond. For many important problems several different models or hypotheses exist and choosing which one best describes reality or observed data is not straightforward. We therefore require suitable statistical tools that allow us to choose rationally between different mechanistic models of, e.g. signal transduction or gene regulation networks. This is particularly challenging in systems biology where only a small number of molecular species can be assayed at any given time and all measurements are subject to measurement uncertainty. Results: Here, we develop such a model selection framework based on approximate Bayesian computation and employing sequential Monte Carlo sampling. We show that our approach can be applied across a wide range of biological scenarios, and we illustrate its use on real data describing influenza dynamics and the JAK-STAT signalling pathway. Bayesian model selection strikes a balance between the complexity of the simulation models and their ability to describe observed data. The present approach enables us to employ the whole formal apparatus to any system that can be (efficiently) simulated, even when exact likelihoods are computationally intractable. Contact:ttoni@imperial.ac.uk; m.stumpf@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

Journal Article•DOI•

Improved estimators for a general class of beta regression models

[...]

Alexandre B. Simas¹, Wagner Barreto-Souza², Andréa V. Rocha²•Institutions (2)

Instituto Nacional de Matemática Pura e Aplicada¹, Federal University of Pernambuco²

01 Feb 2010-Computational Statistics & Data Analysis

TL;DR: The beta regression model proposed by Ferrari and Cribari-Neto (2004) is extended, which is generally useful in situations where the response is restricted to the standard unit interval in two different ways: let the regression structure to be nonlinear, and allow a regression structure for the precision parameter.

...read moreread less

Journal Article•DOI•

Finite mixture models and model-based clustering

[...]

Volodymyr Melnykov, Ranjan Maitra

01 Jan 2010-Statistics Surveys

TL;DR: A detailed review into mixture models and model-based clustering is provided, for providing a convenient yet formal framework for clustering and classication.

...read moreread less

Abstract: Finite mixture models have a long history in statistics, hav- ing been used to model pupulation heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classication. This paper provides a detailed review into mixture models and model-based clustering. Recent trends in the area, as well as open problems are also discussed.

...read moreread less

Posted Content•

The Model Confidence Set

[...]

Peter Reinhard Hansen¹, Asger Lunde², James M. Nason³•Institutions (3)

Stanford University¹, Aarhus University², Federal Reserve Bank of Philadelphia³

01 Mar 2010-Research Papers in Economics

TL;DR: Model Confidence Set (MCS) as mentioned in this paper is a set of models that is constructed such that it will contain the best model with a given level of confidence, analogous to a confidence interval for a parameter.

...read moreread less

Abstract: The paper introduces the model confidence set (MCS) and applies it to the selection of models. A MCS is a set of models that is constructed such that it will contain the best model with a given level of confidence. The MCS is in this sense analogous to a confidence interval for a parameter. The MCS acknowledges the limitations of the data, such that uninformative data yields a MCS with many models, whereas informative data yields a MCS with only a few models. The MCS procedure does not assume that a particular model is the true model, in fact the MCS procedure can be used to comparemore general objects, beyond the comparison of models. We apply the MCS procedure to two empirical problems. First, we revisit the inflation forecasting problem posed by Stock and Watson (1999), and compute the MCS for their set of inflation forecasts. Second, we compare a number of Taylor rule regressions and determine the MCS of the best in terms of in-sample likelihood criteria.

...read moreread less

Proceedings Article•

Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models

[...]

Han Liu¹, Kathryn Roeder¹, Larry Wasserman¹•Institutions (1)

Carnegie Mellon University¹

06 Dec 2010

TL;DR: StARS as discussed by the authors uses the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling, i.e. with high probability, all true edges will be included in the selected model even when the graph size diverges with the sample size.

...read moreread less

Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K-fold cross-validation (K-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including K-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.

...read moreread less

Book•

Computational Modeling in Cognition: Principles and Practice

[...]

Stephan Lewandowsky, Simon Farrell¹•Institutions (1)

University of Bristol¹

29 Nov 2010

TL;DR: This book discusses Quantitative Modeling in a Broader Context, Bayesian Theories of Cognition, and Drawing Lessons and Conclusions from Modeling, which aims to clarify the role of Bayesian inference in neuroscience.

...read moreread less

Abstract: Preface 1. Introduction 1.1 Models and Theories in Science 1.2 Why Quantitative Modeling? 1.3 Quantitative Modeling in Cognition 1.4 The Ideas Underlying Modeling and Its Distinct Applications 1.5 What Can We Expect From Models? 1.6 Potential Problems 2. From Words to Models: Building a Toolkit 2.1 Working Memory 2.2 The Phonological Loop: 144 Models of Working Memory 2.3 Building a Simulation 2.4 What Can We Learn From These Simulations? 2.5 The Basic Toolkit 2.6 Models and Data: Sufficiency and Explanation 3. Basic Parameter Estimation Techniques 3.1 Fitting Models to Data: Parameter Estimation 3.2 Considering the Data: What Level of Analysis? 4. Maximum Likelihood Estimation 4.1 Basics of Probabilities 4.2 What Is a Likelihood? 4.3 Defining a Probability Function 4.4 Finding the Maximum Likelihood 4.5 Maximum Likelihood Estimation for Multiple Participants 4.6 Properties of Maximum Likelihood Estimators 5. Parameter Uncertainty and Model Comparison 5.1 Error on Maximum Likelihood Estimates 5.2 Introduction to Model Selection 5.3 The Likelihood Ratio Test 5.4 Information Criteria and Model Comparison 5.5 Conclusion 6. Not Everything That Fits Is Gold: Interpreting the Modeling 6.1 Psychological Data and The Very Bad Good Fit 6.2 Parameter Identifiability and Model Testability 6.3 Drawing Lessons and Conclusions From Modeling 7. Drawing It All Together: Two Examples 7.1 WITNESS: Simulating Eyewitness Identification 7.2 Exemplar Versus Boundary Models: Choosing Between Candidates 7.3 Conclusion 8. Modeling in a Broader Context 8.1 Bayesian Theories of Cognition 8.2 Neural Networks 8.3 Neuroscientific Modeling 8.4 Cognitive Architectures 8.5 Conclusion References Author Index Subject Index About the Authors

...read moreread less

Journal Article•DOI•

Discovering genetic associations with high-dimensional neuroimaging phenotypes: A sparse reduced-rank regression approach.

[...]

Maria Vounou¹, Thomas E. Nichols², Giovanni Montana¹•Institutions (2)

Imperial College London¹, University of Warwick²

15 Nov 2010-NeuroImage

TL;DR: This work proposes sparse reduced rank regression (sRRR), a strategy for multivariate modelling of high-dimensional imaging responses and genetic covariates and shows that sRRR offers a promising alternative for detecting brain-wide, genome-wide associations.

...read moreread less

Journal Article•DOI•

Meta-learning for time series forecasting and forecast combination

[...]

Christiane Lemke¹, Bogdan Gabrys¹•Institutions (1)

Bournemouth University¹

01 Jun 2010-Neurocomputing

TL;DR: This work identifies an extensive feature set describing both the time series and the pool of individual forecasting methods and investigates the applicability of different meta-learning approaches, showing the superiority of a ranking-based combination of methods over simple model selection approaches.

...read moreread less

Journal Article•DOI•

Stochastic model specification search for Gaussian and partial non-Gaussian state space models

[...]

Sylvia Frühwirth-Schnatter¹, Helga Wagner¹•Institutions (1)

Johannes Kepler University of Linz¹

01 Jan 2010-Journal of Econometrics

TL;DR: A new model space MCMC method is developed based on extending the Bayesian variable selection approach which is usually applied to variable selection in regression models to state space models to focus on structural time series models including seasonal components, trend or intervention.

...read moreread less

Journal Article•DOI•

Joint variable selection for fixed and random effects in linear mixed-effects models.

[...]

Howard D. Bondell¹, Arun Krishna¹, Sujit K. Ghosh¹•Institutions (1)

North Carolina State University¹

01 Dec 2010-Biometrics

TL;DR: This method is based on a penalized joint log likelihood with an adaptive penalty for the selection and estimation of both the fixed and random effects and enjoys the Oracle property, in that, asymptotically it performs as well as if the true model was known beforehand.

...read moreread less

Abstract: It is of great practical interest to simultaneously identify the important predictors that correspond to both the fixed and random effects components in a linear mixed-effects (LME) model. Typical approaches perform selection separately on each of the fixed and random effect components. However, changing the structure of one set of effects can lead to different choices of variables for the other set of effects. We propose simultaneous selection of the fixed and random factors in an LME model using a modified Cholesky decomposition. Our method is based on a penalized joint log likelihood with an adaptive penalty for the selection and estimation of both the fixed and random effects. It performs model selection by allowing fixed effects or standard deviations of random effects to be exactly zero. A constrained expectation-maximization algorithm is then used to obtain the final estimates. It is further shown that the proposed penalized estimator enjoys the Oracle property, in that, asymptotically it performs as well as if the true model was known beforehand. We demonstrate the performance of our method based on a simulation study and a real data example.

...read moreread less

Collapse