scispace - formally typeset
Search or ask a question

Showing papers on "Model selection published in 2017"


Journal ArticleDOI
TL;DR: ModelFinder is presented, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.
Abstract: Model-based molecular phylogenetics plays an important role in comparisons of genomic data, and model selection is a key step in all such analyses. We present ModelFinder, a fast model-selection method that greatly improves the accuracy of phylogenetic estimates by incorporating a model of rate heterogeneity across sites not previously considered in this context and by allowing concurrent searches of model space and tree space.

7,425 citations


Journal ArticleDOI
TL;DR: The software, “Smart Model Selection” (SMS), is implemented in the PhyML environment and available using two interfaces: command-line (to be integrated in pipelines) and a web server (http://www.atgc-montpellier.fr/phyml-sms/).
Abstract: Model selection using likelihood-based criteria (e.g., AIC) is one of the first steps in phylogenetic analysis. One must select both a substitution matrix and a model for rates across sites. A simple method is to test all combinations and select the best one. We describe heuristics to avoid these extensive calculations. Runtime is divided by $2 with results remaining nearly the same, and the method performs well compared with ProtTest and jModelTest2. Our software, "Smart Model Selection" (SMS), is implemented in the PhyML environment and available using two interfaces: command-line (to be integrated in pipelines) and a web server (http://www.atgc-montpellier.fr/phyml-sms/).

1,323 citations


Journal ArticleDOI
TL;DR: This paper showed that for typical psychological and psycholinguistic data, higher power is achieved without inflating Type I error rate if a model selection criterion is used to select a random effect structure that is supported by the data.

928 citations


Journal ArticleDOI
TL;DR: bModelTest allows for a Bayesian approach to inferring and marginalizing site models in a phylogenetic analysis and does not need to be pre-determined, as is now often the case in practice, by likelihood-based methods.
Abstract: Reconstructing phylogenies through Bayesian methods has many benefits, which include providing a mathematically sound framework, providing realistic estimates of uncertainty and being able to incorporate different sources of information based on formal principles. Bayesian phylogenetic analyses are popular for interpreting nucleotide sequence data, however for such studies one needs to specify a site model and associated substitution model. Often, the parameters of the site model is of no interest and an ad-hoc or additional likelihood based analysis is used to select a single site model. bModelTest allows for a Bayesian approach to inferring and marginalizing site models in a phylogenetic analysis. It is based on trans-dimensional Markov chain Monte Carlo (MCMC) proposals that allow switching between substitution models as well as estimating the posterior probability for gamma-distributed rate heterogeneity, a proportion of invariable sites and unequal base frequencies. The model can be used with the full set of time-reversible models on nucleotides, but we also introduce and demonstrate the use of two subsets of time-reversible substitution models. With the new method the site model can be inferred (and marginalized) during the MCMC analysis and does not need to be pre-determined, as is now often the case in practice, by likelihood-based methods. The method is implemented in the bModelTest package of the popular BEAST 2 software, which is open source, licensed under the GNU Lesser General Public License and allows joint site model and tree inference under a wide range of models.

528 citations


Posted Content
TL;DR: A practical method for L_0 norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero, which allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way.
Abstract: We propose a practical method for $L_0$ norm regularization for neural networks: pruning the network during training by encouraging weights to become exactly zero. Such regularization is interesting since (1) it can greatly speed up training and inference, and (2) it can improve generalization. AIC and BIC, well-known model selection criteria, are special cases of $L_0$ regularization. However, since the $L_0$ norm of weights is non-differentiable, we cannot incorporate it directly as a regularization term in the objective function. We propose a solution through the inclusion of a collection of non-negative stochastic gates, which collectively determine which weights to set to zero. We show that, somewhat surprisingly, for certain distributions over the gates, the expected $L_0$ norm of the resulting gated weights is differentiable with respect to the distribution parameters. We further propose the \emph{hard concrete} distribution for the gates, which is obtained by "stretching" a binary concrete distribution and then transforming its samples with a hard-sigmoid. The parameters of the distribution over the gates can then be jointly optimized with the original network parameters. As a result our method allows for straightforward and efficient learning of model structures with stochastic gradient descent and allows for conditional computation in a principled way. We perform various experiments to demonstrate the effectiveness of the resulting approach and regularizer.

400 citations


Journal ArticleDOI
TL;DR: A taxonomy has been defined in order to help researchers make an informed decision and choose the right model for their problem (long or short term, low or high resolution, building to country level).

281 citations


Journal ArticleDOI
TL;DR: This work explores statistical properties and frequentist inference in a model that combines a stochastic block model for its static part with independent Markov chains for the evolution of the nodes groups through time and proposes an inference procedure based on a variational expectation–maximization algorithm.
Abstract: Statistical node clustering in discrete time dynamic networks is an emerging field that raises many challenges. Here, we explore statistical properties and frequentist inference in a model that combines a stochastic block model (SBM) for its static part with independent Markov chains for the evolution of the nodes groups through time. We model binary data as well as weighted dynamic random graphs (with discrete or continuous edges values). Our approach, motivated by the importance of controlling for label switching issues across the different time steps, focuses on detecting groups characterized by a stable within group connectivity behavior. We study identifiability of the model parameters , propose an inference procedure based on a variational expectation maximization algorithm as well as a model selection criterion to select for the number of groups. We carefully discuss our initialization strategy which plays an important role in the method and compare our procedure with existing ones on synthetic datasets. We also illustrate our approach on dynamic contact networks, one of encounters among high school students and two others on animal interactions. An implementation of the method is available as a R package called dynsbm.

235 citations


Journal ArticleDOI
TL;DR: An algorithm for model selection is developed which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system, and it is shown that AIC scores place each candidate model in the strong support, weak support or no support category.
Abstract: We develop an algorithm for model selection which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system. The innovation circumvents a disad...

226 citations


Journal ArticleDOI
TL;DR: The study demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.
Abstract: The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable subset selection for regression and classification and perform several numerical experiments using both simulated and real world data. The results show that the optimization of a utility estimate such as the cross-validation (CV) score is liable to finding overfitted models due to relatively high variance in the utility estimates when the data is scarce. This can also lead to substantial selection induced bias and optimism in the performance evaluation for the selected model. From a predictive viewpoint, best results are obtained by accounting for model uncertainty by forming the full encompassing model, such as the Bayesian model averaging solution over the candidate models. If the encompassing model is too complex, it can be robustly simplified by the projection method, in which the information of the full model is projected onto the submodels. This approach is substantially less prone to overfitting than selection based on CV-score. Overall, the projection method appears to outperform also the maximum a posteriori model and the selection of the most probable variables. The study also demonstrates that the model selection can greatly benefit from using cross-validation outside the searching process both for guiding the model size selection and assessing the predictive performance of the finally selected model.

207 citations


Journal ArticleDOI
TL;DR: A graphical model is a statistical model that is associated with a graph whose nodes correspond to variables of interest as discussed by the authors, and the edges of the graph reflect allowed conditional dependencies among the variables.
Abstract: A graphical model is a statistical model that is associated with a graph whose nodes correspond to variables of interest. The edges of the graph reflect allowed conditional dependencies among the variables. Graphical models have computationally convenient factorization properties and have long been a valuable tool for tractable modeling of multivariate distributions. More recently, applications such as reconstructing gene regulatory networks from gene expression data have driven major advances in structure learning, that is, estimating the graph underlying a model. We review some of these advances and discuss methods such as the graphical lasso and neighborhood selection for undirected graphical models (or Markov random fields) and the PC algorithm and score-based search methods for directed graphical models (or Bayesian networks). We further review extensions that account for effects of latent variables and heterogeneous data sources.

201 citations


Journal ArticleDOI
TL;DR: A new package in R is described that may be used to generate the half-normal plot with a simulated envelope for residuals from different types of models, illustrating its use on a range of examples, including continuous and discrete responses, and how it can beused to inform model selection and diagnose overdispersion.
Abstract: Count and proportion data may present overdispersion, i.e., greater variability than expected by the Poisson and binomial models, respectively. Different extended generalized linear models that allow for overdispersion may be used to analyze this type of data, such as models that use a generalized variance function, random-effects models, zero-inflated models and compound distribution models. Assessing goodness-of-fit and verifying assumptions of these models is not an easy task and the use of half-normal plots with a simulated envelope is a possible solution for this problem. These plots are a useful indicator of goodness-of-fit that may be used with any generalized linear model and extensions. For GLIM users, functions that generated these plots were widely used, however, in the open-source software R, these functions were not yet available on the Comprehensive R Archive Network (CRAN). We describe a new package in R, hnp, that may be used to generate the half-normal plot with a simulated envelope for residuals from different types of models. The function hnp() can be used together with a range of different model fitting packages in R that extend the basic generalized linear model fitting in glm() and is written so that it is relatively easy to extend it to new model classes and different diagnostics. We illustrate its use on a range of examples, including continuous and discrete responses, and show how it can be used to inform model selection and diagnose overdispersion.

Journal ArticleDOI
TL;DR: It is demonstrated why well-established formal procedures for model selection, such as those based on standard information criteria, tend to favor models with numbers of states that are undesirably large in situations where states shall be meaningful entities.
Abstract: We discuss the notorious problem of order selection in hidden Markov models, that is of selecting an adequate number of states, highlighting typical pitfalls and practical challenges arising when analyzing real data. Extensive simulations are used to demonstrate the reasons that render order selection particularly challenging in practice despite the conceptual simplicity of the task. In particular, we demonstrate why well-established formal procedures for model selection, such as those based on standard information criteria, tend to favor models with numbers of states that are undesirably large in situations where states shall be meaningful entities. We also offer a pragmatic step-by-step approach together with comprehensive advice for how practitioners can implement order selection. Our proposed strategy is illustrated with a real-data case study on muskox movement. Supplementary materials accompanying this paper appear online.

Journal ArticleDOI
TL;DR: This work presents a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization, and exposes a direct equivalence between this microcanonical approach and alternative derivations based on the canonical SBM.
Abstract: A principled approach to characterize the hidden structure of networks is to formulate generative models and then infer their parameters from data. When the desired structure is composed of modules or "communities," a suitable choice for this task is the stochastic block model (SBM), where nodes are divided into groups, and the placement of edges is conditioned on the group memberships. Here, we present a nonparametric Bayesian method to infer the modular structure of empirical networks, including the number of modules and their hierarchical organization. We focus on a microcanonical variant of the SBM, where the structure is imposed via hard constraints, i.e., the generated networks are not allowed to violate the patterns imposed by the model. We show how this simple model variation allows simultaneously for two important improvements over more traditional inference approaches: (1) deeper Bayesian hierarchies, with noninformative priors replaced by sequences of priors and hyperpriors, which not only remove limitations that seriously degrade the inference on large networks but also reveal structures at multiple scales; (2) a very efficient inference algorithm that scales well not only for networks with a large number of nodes and edges but also with an unlimited number of modules. We show also how this approach can be used to sample modular hierarchies from the posterior distribution, as well as to perform model selection. We discuss and analyze the differences between sampling from the posterior and simply finding the single parameter estimate that maximizes it. Furthermore, we expose a direct equivalence between our microcanonical approach and alternative derivations based on the canonical SBM.

Journal ArticleDOI
TL;DR: This work develops a learning approach for the selection and identification of a dynamical system directly from noisy data by extracting a small subset of important features from an overdetermined set of possible features using a nonconvex sparse regression model.
Abstract: Model selection and parameter estimation are important for the effective integration of experimental data, scientific theory, and precise simulations. In this work, we develop a learning approach for the selection and identification of a dynamical system directly from noisy data. The learning is performed by extracting a small subset of important features from an overdetermined set of possible features using a nonconvex sparse regression model. The sparse regression model is constructed to fit the noisy data to the trajectory of the dynamical system while using the smallest number of active terms. Computational experiments detail the model's stability, robustness to noise, and recovery accuracy. Examples include nonlinear equations, population dynamics, chaotic systems, and fast-slow systems.

Journal ArticleDOI
TL;DR: The R package tscount provides likelihood-based estimation methods for analysis and modeling of count time series following generalized linear models, a flexible class of models which can describe serial correlation in a parsimonious way.
Abstract: The R package tscount provides likelihood-based estimation methods for analysis and modeling of count time series following generalized linear models. This is a flexible class of models which can describe serial correlation in a parsimonious way. The conditional mean of the process is linked to its past values, to past observations and to potential covariate effects. The package allows for models with the identity and with the logarithmic link function. The conditional distribution can be Poisson or negative binomial. An important special case of this class is the so-called INGARCH model and its log-linear extension. The package includes methods for model fitting and assessment, prediction and intervention analysis. This paper summarizes the theoretical background of these methods. It gives details on the implementation of the package and provides simulation results for models which have not been studied theoretically before. The usage of the package is illustrated by two data examples. Additionally, we provide a review of R packages which can be used for count time series analysis. This includes a detailed comparison of tscount to those packages.

Journal ArticleDOI
TL;DR: In this paper, an approach based on the log likelihood ratio statistic and analysis of its asymptotic properties under model misspecification is presented. But the authors focus on estimating the latent node labels and the model parameters rather than the issue of choosing the number of blocks.
Abstract: The stochastic block model (SBM) provides a popular framework for modeling community structures in networks. However, more attention has been devoted to problems concerning estimating the latent node labels and the model parameters than the issue of choosing the number of blocks. We consider an approach based on the log likelihood ratio statistic and analyze its asymptotic properties under model misspecification. We show the limiting distribution of the statistic in the case of underfitting is normal and obtain its convergence rate in the case of overfitting. These conclusions remain valid when the average degree grows at a polylog rate. The results enable us to derive the correct order of the penalty term for model complexity and arrive at a likelihood-based model selection criterion that is asymptotically consistent. Our analysis can also be extended to a degree-corrected block model (DCSBM). In practice, the likelihood function can be estimated using more computationally efficient variational methods or consistent label estimation algorithms, allowing the criterion to be applied to large networks.

Journal ArticleDOI
TL;DR: In this article, the authors developed an algorithm for model selection which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system, and the algorithm hierarchically ranks the most informative models, enabling the automatic and principled selection of the model with the strongest support in relation to the time series data.
Abstract: We develop an algorithm for model selection which allows for the consideration of a combinatorially large number of candidate models governing a dynamical system. The innovation circumvents a disadvantage of standard model selection which typically limits the number candidate models considered due to the intractability of computing information criteria. Using a recently developed sparse identification of nonlinear dynamics algorithm, the sub-selection of candidate models near the Pareto frontier allows for a tractable computation of AIC (Akaike information criteria) or BIC (Bayes information criteria) scores for the remaining candidate models. The information criteria hierarchically ranks the most informative models, enabling the automatic and principled selection of the model with the strongest support in relation to the time series data. Specifically, we show that AIC scores place each candidate model in the {\em strong support}, {\em weak support} or {\em no support} category. The method correctly identifies several canonical dynamical systems, including an SEIR (susceptible-exposed-infectious-recovered) disease model and the Lorenz equations, giving the correct dynamical system as the only candidate model with strong support.

Journal ArticleDOI
TL;DR: In this paper, a sequential Monte Carlo scheme for the dual purpose of Bayesian inference and model selection is proposed for the joint problem of online tracking and detection of the current modality.

Journal ArticleDOI
TL;DR: In this article, generalized additive mixed models are introduced as an extension of the generalized linear mixed model which makes it possible to deal with temporal autocorrelational structure in experimental data.

Journal ArticleDOI
TL;DR: In this article, an extension of the DLNM framework through the use of penalized splines within generalized additive models (GAMs) is presented, which offers built-in model selection procedures and the possibility of accommodating assumptions on the shape of the lag structure through specific penalties.
Abstract: Distributed lag non-linear models (DLNMs) are a modelling tool for describing potentially non-linear and delayed dependencies. Here, we illustrate an extension of the DLNM framework through the use of penalized splines within generalized additive models (GAM). This extension offers built-in model selection procedures and the possibility of accommodating assumptions on the shape of the lag structure through specific penalties. In addition, this framework includes, as special cases, simpler models previously proposed for linear relationships (DLMs). Alternative versions of penalized DLNMs are compared with each other and with the standard unpenalized version in a simulation study. Results show that this penalized extension to the DLNM class provides greater flexibility and improved inferential properties. The framework exploits recent theoretical developments of GAMs and is implemented using efficient routines within freely available software. Real-data applications are illustrated through two reproducible examples in time series and survival analysis.

Book ChapterDOI
01 Nov 2017
TL;DR: In this article, the authors present a class of generalized linear models that is as tractable as classical linear models but does not force the data into unnatural scales by using reparametrization to induce linearity and allowing a nonconstant variance to be directly incorporated into the analysis.
Abstract: Linear models and analysis of variance are popular. Many phenomena behave linearly and have errors that are Gaussian. The chapter presents a class of models that is as tractable as classical linear models but does not force the data into unnatural scales. Generalized linear models deal with these issues in a natural way by using reparametrization to induce linearity and by allowing a nonconstant variance to be directly incorporated into the analysis. The chapter describes the functions in more detail, and examines more advanced functions for model selection, diagnostics, and creating private families. It focuses on the statistical concepts associated with maximum-likelihood inference, as well as algorithmic details. Apart from simplifying the algorithms for fitting these models, this linearization allows us to use many of the tools intended for linear models and models for designed experiments. The chapter explores the estimation of generalized linear models by maximum likelihood and the associated iteratively-reweighted least-squares algorithm.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a composite likelihood BIC (CL-BIC) to select the number of communities, and showed it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions.
Abstract: Stochastic blockmodels and variants thereof are among the most widely used approaches to community detection for social networks and relational data. A stochastic blockmodel partitions the nodes of a network into disjoint sets, called communities. The approach is inherently related to clustering with mixture models; and raises a similar model selection problem for the number of communities. The Bayesian information criterion (BIC) is a popular solution, however, for stochastic blockmodels, the conditional independence assumption given the communities of the endpoints among different edges is usually violated in practice. In this regard, we propose composite likelihood BIC (CL-BIC) to select the number of communities, and we show it is robust against possible misspecifications in the underlying stochastic blockmodel assumptions. We derive the requisite methodology and illustrate the approach using both simulated and real data. Supplementary materials containing the relevant computer code are availa...

Journal ArticleDOI
TL;DR: A new penalized likelihood method for model selection of finite multivariate Gaussian mixture models is proposed and is shown to be statistically consistent in determining of the number of components.
Abstract: This paper is concerned with an important issue in finite mixture modelling, the selection of the number of mixing components. We propose a new penalized likelihood method for model selection of finite multivariate Gaussian mixture models. The proposed method is shown to be statistically consistent in determining of the number of components. A modified EM algorithm is developed to simultaneously select the number of components and to estimate the mixing weights, i.e. the mixing probabilities, and unknown parameters of Gaussian distributions. Simulations and a real data analysis are presented to illustrate the performance of the proposed method.

Journal ArticleDOI
TL;DR: This work presents a new method for the statistical estimation of mutational signatures based on an empirical Bayesian treatment of the NMF model that is robust to initial conditions and more accurate than competing alternatives.
Abstract: Motivation Mutational signatures can be used to understand cancer origins and provide a unique opportunity to group tumor types that share the same origins and result from similar processes. These signatures have been identified from high throughput sequencing data generated from cancer genomes by using non-negative matrix factorisation (NMF) techniques. Current methods based on optimization techniques are strongly sensitive to initial conditions due to high dimensionality and nonconvexity of the NMF paradigm. In this context, an important question consists in the determination of the actual number of signatures that best represent the data. The extraction of mutational signatures from high-throughput data still remains a daunting task. Results Here we present a new method for the statistical estimation of mutational signatures based on an empirical Bayesian treatment of the NMF model. While requiring minimal intervention from the user, our method addresses the determination of the number of signatures directly as a model selection problem. In addition, we introduce two new concepts of significant clinical relevance for evaluating the mutational profile. The advantages brought by our approach are shown by the analysis of real and synthetic data. The later is used to compare our approach against two alternative methods mostly used in the literature and with the same NMF parametrization as the one considered here. Our approach is robust to initial conditions and more accurate than competing alternatives. It also estimates the correct number of signatures even when other methods fail. Results on real data agree well with current knowledge. Availability and implementation signeR is implemented in R and C ++, and is available as a R package at http://bioconductor.org/packages/signeR CONTACT: itojal@cipe.accamargo.org.brSupplementary information: Supplementary data are available at Bioinformatics online.

Journal ArticleDOI
TL;DR: This paper uses three forecasting machines, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts.
Abstract: Type 2 diabetes leads to premature death and reduced quality of life for 8% of Americans. Nutrition management is critical to maintaining glycemic control, yet it is difficult to achieve due to the high individual differences in glycemic response to nutrition. Anticipating glycemic impact of different meals can be challenging not only for individuals with diabetes, but also for expert diabetes educators. Personalized computational models that can accurately forecast an impact of a given meal on an individual's blood glucose levels can serve as the engine for a new generation of decision support tools for individuals with diabetes. However, to be useful in practice, these computational engines need to generate accurate forecasts based on limited datasets consistent with typical self-monitoring practices of individuals with type 2 diabetes. This paper uses three forecasting machines: (i) data assimilation, a technique borrowed from atmospheric physics and engineering that uses Bayesian modeling to infuse data with human knowledge represented in a mechanistic model, to generate real-time, personalized, adaptable glucose forecasts; (ii) model averaging of data assimilation output; and (iii) dynamical Gaussian process model regression. The proposed data assimilation machine, the primary focus of the paper, uses a modified dual unscented Kalman filter to estimate states and parameters, personalizing the mechanistic models. Model selection is used to make a personalized model selection for the individual and their measurement characteristics. The data assimilation forecasts are empirically evaluated against actual postprandial glucose measurements captured by individuals with type 2 diabetes, and against predictions generated by experienced diabetes educators after reviewing a set of historical nutritional records and glucose measurements for the same individual. The evaluation suggests that the data assimilation forecasts compare well with specific glucose measurements and match or exceed in accuracy expert forecasts. We conclude by examining ways to present predictions as forecast-derived range quantities and evaluate the comparative advantages of these ranges.

Proceedings ArticleDOI
Ingo Scholtes1
13 Aug 2017
TL;DR: This work develops a model selection technique to infer the optimal number of layers of such a model and shows that it outperforms baseline Markov order detection techniques and allows to infer graphical models that capture both topological and temporal characteristics of such data.
Abstract: We introduce a framework for the modeling of sequential data capturing pathways of varying lengths observed in a network. Such data are important, e.g., when studying click streams in the Web, travel patterns in transportation systems, information cascades in social networks, biological pathways, or time-stamped social interactions. While it is common to apply graph analytics and network analysis to such data, recent works have shown that temporal correlations can invalidate the results of such methods. This raises a fundamental question: When is a network abstraction of sequential data justified?Addressing this open question, we propose a framework that combines Markov chains of multiple, higher orders into a multi-layer graphical model that captures temporal correlations in pathways at multiple length scales simultaneously. We develop a model selection technique to infer the optimal number of layers of such a model and show that it outperforms baseline Markov order detection techniques. An application to eight real-world data sets on pathways and temporal networks shows that it allows to infer graphical models that capture both topological and temporal characteristics of such data. Our work highlights fallacies of network abstractions and provides a principled answer to the open question when they are justified. Generalizing network representations to multi-order graphical models, it opens perspectives for new data mining and knowledge discovery algorithms.

Journal ArticleDOI
TL;DR: A novel use of a new general explanation methodology inside an intelligent system in a real-world case of business-to-business sales forecasting, showing how to solve a decision support problem, namely that the best performing black-box models are inaccessible to human interaction and analysis.
Abstract: Uniform and comprehensive explanations for an arbitrary black-box prediction model.Interactive what-if analysis for the evaluation of decision options.Support for validation and updates of decision makers' mental models.Real-world application on a difficult business problem - sales forecasting.The real-world business-to-business sales data set used is publicly available. A complexity of business dynamics often forces decision-makers to make decisions based on subjective mental models, reflecting their experience. However, research has shown that companies perform better when they apply data-driven decision-making. This creates an incentive to introduce intelligent, data-based decision models, which are comprehensive and support the interactive evaluation of decision options necessary for the business environment.Recently, a new general explanation methodology has been proposed, which supports the explanation of state-of-the-art black-box prediction models. Uniform explanations are generated on the level of model/individual instance and support what-if analysis. We present a novel use of this methodology inside an intelligent system in a real-world case of business-to-business (B2B) sales forecasting, a complex task frequently done judgmentally. Users can validate their assumptions with the presented explanations and test their hypotheses using the presented what-if parallel graph representation. The results demonstrate effectiveness and usability of the methodology. A significant advantage of the presented method is the possibility to evaluate seller's actions and to outline general recommendations in sales strategy.This flexibility of the approach and easy-to-follow explanations are suitable for many different applications. Our well-documented real-world case shows how to solve a decision support problem, namely that the best performing black-box models are inaccessible to human interaction and analysis. This could extend the use of the intelligent systems to areas where they were so far neglected due to their insistence on comprehensible models. A separation of the machine learning model selection from model explanation is another significant benefit for expert and intelligent systems. Explanations unconnected to a particular prediction model positively influence acceptance of new and complex models in the business environment through their easy assessment and switching.

Journal ArticleDOI
TL;DR: This paper relaxes some of the distributional assumptions made in HCW and shows that the HCW method works for a much wider range of data generating processes, and derives the asymptotic distribution of HCW’s average treatment effect estimator which facilitates inference.

Journal ArticleDOI
TL;DR: Sparse polynomial chaos expansions are investigated for global sensitivity analysis of computer model responses and a new Bayesian approach is proposed to perform this task, based on the Kashyap information criterion for model selection.

Journal ArticleDOI
TL;DR: The constructive representation of NLPs as mixtures of truncated distributions that enables simple posterior sampling and extending NLPs beyond previous proposals are outlined, showing that selection priors may actually be desirable for high-dimensional estimation.
Abstract: Jointly achieving parsimony and good predictive power in high dimensions is a main challenge in statistics. Nonlocal priors (NLPs) possess appealing properties for model choice, but their use for estimation has not been studied in detail. We show that for regular models NLP-based Bayesian model averaging (BMA) shrink spurious parameters either at fast polynomial or quasi-exponential rates as the sample size n increases, while nonspurious parameter estimates are not shrunk. We extend some results to linear models with dimension p growing with n. Coupled with our theoretical investigations, we outline the constructive representation of NLPs as mixtures of truncated distributions that enables simple posterior sampling and extending NLPs beyond previous proposals. Our results show notable high-dimensional estimation for linear models with p > >n at low computational cost. NLPs provided lower estimation error than benchmark and hyper-g priors, SCAD and LASSO in simulations, and in gene expression data ...