Showing papers in "arXiv: Methodology in 2018"

PDF

Open Access

Posted Content•

All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously

[...]

Aaron Fisher¹, Cynthia Rudin², Francesca Dominici³•Institutions (3)

Takeda Pharmaceutical Company¹, Duke University², Harvard University³

04 Jan 2018-arXiv: Methodology

TL;DR: This article proposed model class reliance (MCR) as the range of VI values across all well-performing models in a prespecified class, which gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.

...read moreread less

Abstract: Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model $f(\mathbf{x})=\mathbf{x}^{T}\beta$ with a fixed coefficient vector $\beta$) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, based on the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional variable importance, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR to a public data set of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

...read moreread less

377 citations

Journal Article•DOI•

Performance Metrics (Error Measures) in Machine Learning Regression, Forecasting and Prognostics: Properties and Typology

[...]

Alexei Botchkarev

09 Sep 2018-arXiv: Methodology

TL;DR: A new typology that will help to advance knowledge of metrics and facilitate their use in machine learning regression algorithms is developed and shown to cover a total of over 40 commonly used primary metrics.

...read moreread less

Abstract: Performance metrics (error measures) are vital components of the evaluation frameworks in various fields. The intention of this study was to overview of a variety of performance metrics and approaches to their classification. The main goal of the study was to develop a typology that will help to improve our knowledge and understanding of metrics and facilitate their selection in machine learning regression, forecasting and prognostics. Based on the analysis of the structure of numerous performance metrics, we propose a framework of metrics which includes four (4) categories: primary metrics, extended metrics, composite metrics, and hybrid sets of metrics. The paper identified three (3) key components (dimensions) that determine the structure and properties of primary metrics: method of determining point distance, method of normalization, method of aggregation of point distances over a data set.

...read moreread less

298 citations

Journal Article•DOI•

Statistical Aspects of Wasserstein Distances

[...]

Victor M. Panaretos¹, Yoav Zemel²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, University of Göttingen²

14 Jun 2018-arXiv: Methodology

TL;DR: Wasserstein distances as discussed by the authors measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution, and have a long history that has catalyse core developments in analysis, optimization, and probability.

...read moreread less

Abstract: Wasserstein distances are metrics on probability distributions inspired by the problem of optimal mass transportation. Roughly speaking, they measure the minimal effort required to reconfigure the probability mass of one distribution in order to recover the other distribution. They are ubiquitous in mathematics, with a long history that has seen them catalyse core developments in analysis, optimization, and probability. Beyond their intrinsic mathematical richness, they possess attractive features that make them a versatile tool for the statistician: they can be used to derive weak convergence and convergence of moments, and can be easily bounded; they are well-adapted to quantify a natural notion of perturbation of a probability distribution; and they seamlessly incorporate the geometry of the domain of the distributions in question, thus being useful for contrasting complex objects. Consequently, they frequently appear in the development of statistical theory and inferential methodology, and have recently become an object of inference in themselves. In this review, we provide a snapshot of the main concepts involved in Wasserstein distances and optimal transportation, and a succinct overview of some of their many statistical aspects.

...read moreread less

186 citations

Posted Content•

Mendelian randomization with a binary exposure variable: interpretation and presentation of causal estimates

[...]

Stephen Burgess¹, Jeremy A. Labrecque²•Institutions (2)

University of Cambridge¹, Erasmus University Rotterdam²

16 Apr 2018-arXiv: Methodology

TL;DR: In this paper, the authors provide methods for causal estimation with a binary exposure (although subject to all the above caveats) under the assumption that the causal effect is a stepwise function at the point of dichotomization.

...read moreread less

Abstract: Mendelian randomization uses genetic variants to make causal inferences about a modifiable exposure. Subject to a genetic variant satisfying the instrumental variable assumptions, an association between the variant and outcome implies a causal effect of the exposure on the outcome. Complications arise with a binary exposure that is a dichotomization of a continuous risk factor (for example, hypertension is a dichotomization of blood pressure). This can lead to violation of the exclusion restriction assumption: the genetic variant can influence the outcome via the continuous risk factor even if the binary exposure does not change. Provided the instrumental variable assumptions are satisfied for the underlying continuous risk factor, causal inferences for the binary exposure are valid for the continuous risk factor. Causal estimates for the binary exposure assume the causal effect is a stepwise function at the point of dichotomization. Even then, estimation requires further parametric assumptions. Under monotonicity, the causal estimate represents the average causal effect in `compliers', individuals for whom the binary exposure would be present if they have the genetic variant and absent otherwise. Unlike in randomized trials, genetic compliers are unlikely to be a large or representative subgroup of the population. Under homogeneity, the causal effect of the exposure on the outcome is assumed constant in all individuals; often an unrealistic assumption. We here provide methods for causal estimation with a binary exposure (although subject to all the above caveats). Mendelian randomization investigations with a dichotomized binary exposure should be conceptualized in terms of an underlying continuous variable.

...read moreread less

167 citations

Posted Content•

Validating Bayesian Inference Algorithms with Simulation-Based Calibration

[...]

Sean Talts, Michael Betancourt, Daniel Simpson¹, Aki Vehtari², Andrew Gelman³ - Show less +1 more•Institutions (3)

University of Toronto¹, Helsinki University of Technology², Columbia University³

18 Apr 2018-arXiv: Methodology

TL;DR: It is argued that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.

...read moreread less

Abstract: Verifying the correctness of Bayesian computation is challenging. This is especially true for complex models that are common in practice, as these require sophisticated model implementations and algorithms. In this paper we introduce \emph{simulation-based calibration} (SBC), a general procedure for validating inferences from Bayesian algorithms capable of generating posterior samples. This procedure not only identifies inaccurate computation and inconsistencies in model implementations but also provides graphical summaries that can indicate the nature of the problems that arise. We argue that SBC is a critical part of a robust Bayesian workflow, as well as being a useful tool for those developing computational algorithms and statistical software.

...read moreread less

161 citations

Posted Content•

How to capitalize on a priori contrasts in linear (mixed) models: A tutorial

[...]

Daniel J. Schad¹, Shravan Vasishth¹, Sven Hohenstein¹, Reinhold Kliegl¹•Institutions (1)

University of Potsdam¹

27 Jul 2018-arXiv: Methodology

TL;DR: This tutorial explains the generalized inverse which is needed to compute the coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts and demonstrates how they are applied in the R System for Statistical Computing.

...read moreread less

Abstract: Factorial experiments in research on memory, language, and in other areas are often analyzed using analysis of variance (ANOVA). However, for effects with more than one numerator degrees of freedom, e.g., for experimental factors with more than two levels, the ANOVA omnibus F-test is not informative about the source of a main effect or interaction. Because researchers typically have specific hypotheses about which condition means differ from each other, a priori contrasts (i.e., comparisons planned before the sample means are known) between specific conditions or combinations of conditions are the appropriate way to represent such hypotheses in the statistical model. Many researchers have pointed out that contrasts should be "tested instead of, rather than as a supplement to, the ordinary `omnibus' F test" (Hays, 1973, p. 601). In this tutorial, we explain the mathematics underlying different kinds of contrasts (i.e., treatment, sum, repeated, polynomial, custom, nested, interaction contrasts), discuss their properties, and demonstrate how they are applied in the R System for Statistical Computing (R Core Team, 2018). In this context, we explain the generalized inverse which is needed to compute the coefficients for contrasts that test hypotheses that are not covered by the default set of contrasts. A detailed understanding of contrast coding is crucial for successful and correct specification in linear models (including linear mixed models). Contrasts defined a priori yield far more useful confirmatory tests of experimental hypotheses than standard omnibus F-test. Reproducible code is available from this https URL.

...read moreread less

133 citations

Posted Content•

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

[...]

Yaowu Liu¹, Jun Xie²•Institutions (2)

Harvard University¹, Purdue University²

27 Aug 2018-arXiv: Methodology

TL;DR: It is proved a nonasymptotic result that the tail of the null distribution of the proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures, making the p-value calculation of this proposed test well suited for analyzing massive data.

...read moreread less

Abstract: Combining individual p-values to aggregate multiple small effects has a long-standing interest in statistics, dating back to the classic Fisher's combination test. In modern large-scale data analysis, correlation and sparsity are common features and efficient computation is a necessary requirement for dealing with massive data. To overcome these challenges, we propose a new test that takes advantage of the Cauchy distribution. Our test statistic has a very simple form and is defined as a weighted sum of Cauchy transformation of individual p-values. We prove a non-asymptotic result that the tail of the null distribution of our proposed test statistic can be well approximated by a Cauchy distribution under arbitrary dependency structures. Based on this theoretical result, the p-value calculation of our proposed test is not only accurate, but also as simple as the classic z-test or t-test, making our test well suited for analyzing massive data. We further show that the power of the proposed test is asymptotically optimal in a strong sparsity setting. Extensive simulations demonstrate that the proposed test has both strong power against sparse alternatives and a good accuracy with respect to p-value calculations, especially for very small p-values. The proposed test has also been applied to a genome-wide association study of Crohn's disease and compared with several existing tests.

...read moreread less

128 citations

Journal Article•DOI•

Deep Knockoffs.

[...]

Yaniv Romano, Matteo Sesia, Emmanuel J. Candès

16 Nov 2018-arXiv: Methodology

TL;DR: A machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models and applies this new method to a real study of mutations linked to changes in drug resistance in the human immunodeficiency virus.

...read moreread less

Abstract: This paper introduces a machine for sampling approximate model-X knockoffs for arbitrary and unspecified data distributions using deep generative models. The main idea is to iteratively refine a knockoff sampling mechanism until a criterion measuring the validity of the produced knockoffs is optimized; this criterion is inspired by the popular maximum mean discrepancy in machine learning and can be thought of as measuring the distance to pairwise exchangeability between original and knockoff features. By building upon the existing model-X framework, we thus obtain a flexible and model-free statistical tool to perform controlled variable selection. Extensive numerical experiments and quantitative tests confirm the generality, effectiveness, and power of our deep knockoff machines. Finally, we apply this new method to a real study of mutations linked to changes in drug resistance in the human immunodeficiency virus.

...read moreread less

99 citations

Posted Content•

The phase transition for the existence of the maximum likelihood estimate in high-dimensional logistic regression

[...]

Emmanuel J. Candès, Pragya Sur

25 Apr 2018-arXiv: Methodology

TL;DR: In this paper, the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp phase transition, and an explicit boundary curve is introduced, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients.

...read moreread less

Abstract: This paper rigorously establishes that the existence of the maximum likelihood estimate (MLE) in high-dimensional logistic regression models with Gaussian covariates undergoes a sharp `phase transition'. We introduce an explicit boundary curve $h_{\text{MLE}}$, parameterized by two scalars measuring the overall magnitude of the unknown sequence of regression coefficients, with the following property: in the limit of large sample sizes $n$ and number of features $p$ proportioned in such a way that $p/n \rightarrow \kappa$, we show that if the problem is sufficiently high dimensional in the sense that $\kappa > h_{\text{MLE}}$, then the MLE does not exist with probability one. Conversely, if $\kappa < h_{\text{MLE}}$, the MLE asymptotically exists with probability one.

...read moreread less

94 citations

Posted Content•

Anchor regression: heterogeneous data meets causality

[...]

Dominik Rothenhäusler, Peter Bühlmann, Nicolai Meinshausen, Jonas Peters

18 Jan 2018-arXiv: Methodology

TL;DR: In this paper, anchor regression is used to predict a response variable from a set of covariates on a data set that differs in distribution from the training data by considering a modification of the least-squares loss.

...read moreread less

Abstract: We consider the problem of predicting a response variable from a set of covariates on a data set that differs in distribution from the training data. Causal parameters are optimal in terms of predictive accuracy if in the new distribution either many variables are affected by interventions or only some variables are affected, but the perturbations are strong. If the training and test distributions differ by a shift, causal parameters might be too conservative to perform well on the above task. This motivates anchor regression, a method that makes use of exogeneous variables to solve a relaxation of the causal minimax problem by considering a modification of the least-squares loss. The procedure naturally provides an interpolation between the solutions of ordinary least squares and two-stage least squares. We prove that the estimator satisfies predictive guarantees in terms of distributional robustness against shifts in a linear class; these guarantees are valid even if the instrumental variables assumptions are violated. If anchor regression and least squares provide the same answer (anchor stability), we establish that OLS parameters are invariant under certain distributional changes. Anchor regression is shown empirically to improve replicability and protect against distributional shifts.

...read moreread less

89 citations

Posted Content•

The Augmented Synthetic Control Method

[...]

Eli Ben-Michael¹, Avi Feller¹, Jesse Rothstein¹•Institutions (1)

University of California, Berkeley¹

10 Nov 2018-arXiv: Methodology

TL;DR: Augmented SCM as mentioned in this paper uses an outcome model to estimate the bias due to imperfect pre-treatment fit and then de-biases the original SCM estimate, which can be expressed as a solution to a modified synthetic control problem that allows negative weights on some donor units.

...read moreread less

Abstract: The synthetic control method (SCM) is a popular approach for estimating the impact of a treatment on a single unit in panel data settings. The "synthetic control" is a weighted average of control units that balances the treated unit's pre-treatment outcomes as closely as possible. A critical feature of the original proposal is to use SCM only when the fit on pre-treatment outcomes is excellent. We propose Augmented SCM as an extension of SCM to settings where such pre-treatment fit is infeasible. Analogous to bias correction for inexact matching, Augmented SCM uses an outcome model to estimate the bias due to imperfect pre-treatment fit and then de-biases the original SCM estimate. Our main proposal, which uses ridge regression as the outcome model, directly controls pre-treatment fit while minimizing extrapolation from the convex hull. This estimator can also be expressed as a solution to a modified synthetic controls problem that allows negative weights on some donor units. We bound the estimation error of this approach under different data generating processes, including a linear factor model, and show how regularization helps to avoid over-fitting to noise. We demonstrate gains from Augmented SCM with extensive simulation studies and apply this framework to estimate the impact of the 2012 Kansas tax cuts on economic growth. We implement the proposed method in the new augsynth R package.

...read moreread less

Posted Content•

All Models are Wrong but many are Useful: Variable Importance for Black-Box, Proprietary, or Misspecified Prediction Models, using Model Class Reliance

[...]

Aaron Fisher, Cynthia Rudin, Francesca Dominici

04 Jan 2018-arXiv: Methodology

TL;DR: Model class reliance (MCR) is proposed as the range of VI values across all well-performing model in a prespecified class, which gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well.

...read moreread less

Abstract: Variable importance (VI) tools describe how much covariates contribute to a prediction model's accuracy. However, important variables for one well-performing model (for example, a linear model $f(\mathbf{x})=\mathbf{x}^{T}\beta$ with a fixed coefficient vector $\beta$) may be unimportant for another model. In this paper, we propose model class reliance (MCR) as the range of VI values across all well-performing model in a prespecified class. Thus, MCR gives a more comprehensive description of importance by accounting for the fact that many prediction models, possibly of different parametric forms, may fit the data well. In the process of deriving MCR, we show several informative results for permutation-based VI estimates, similar to the VI measures used in Random Forests. Specifically, we derive connections between permutation importance estimates for a single prediction model, U-statistics, conditional causal effects, and linear model coefficients. We then give probabilistic bounds for MCR, using a novel, generalizable technique. We apply MCR in a public dataset of Broward County criminal records to study the reliance of recidivism prediction models on sex and race. In this application, MCR can be used to help inform VI for unknown, proprietary models.

...read moreread less

Posted Content•

A Sherman-Morrison-Woodbury Identity for Rank Augmenting Matrices with Application to Centering

[...]

Kurt S. Riedel

28 Mar 2018-arXiv: Methodology

TL;DR: An explicit expression for the inverse is given, provided that ${\bf W}_i^ * {\bf W]_i $ has rank k.

...read moreread less

Abstract: Matrices of the form $\bf{A} + (\bf{V}_1 + \bf{W}_1)\bf{G}(\bf{V}_2 + \bf{W}_2)^*$ are considered where $\bf{A}$ is a $singular$ $\ell \times \ell$ matrix and $\bf{G}$ is a nonsingular $k \times k$ matrix, $k \le \ell$. Let the columns of $\bf{V}_1$ be in the column space of $\bf{A}$ and the columns of $\bf{W}_1$ be orthogonal to $\bf{A}$. Similarly, let the columns of $\bf{V}_2$ be in the column space of $\bf{A}^*$ and the columns of $\bf{W}_2$ be orthogonal to $\bf{A}^*$. An explicit expression for the inverse is given, provided that $\bf{W}_i^* \bf{W}_i$ has rank $k$. %and $\bf{W}_1$ and $\bf{W}_2$ have the same column space. An application to centering covariance matrices about the mean is given.

...read moreread less

Posted Content•

Distributionally Robust Mean-Variance Portfolio Selection with Wasserstein Distances

[...]

Jose Blanchet, Lin Chen, Xun Yu Zhou

13 Feb 2018-arXiv: Methodology

TL;DR: The authors revisited Markowitz's mean-variance portfolio selection model by considering a distributionally robust version, where the region of distributional uncertainty is around the empirical measure and the discrepancy between probability measures is dictated by the so-called Wasserstein distance.

...read moreread less

Abstract: We revisit Markowitz's mean-variance portfolio selection model by considering a distributionally robust version, where the region of distributional uncertainty is around the empirical measure and the discrepancy between probability measures is dictated by the so-called Wasserstein distance We reduce this problem into an empirical variance minimization problem with an additional regularization term Moreover, we extend recent inference methodology in order to select the size of the distributional uncertainty as well as the associated robust target return rate in a data-driven way

...read moreread less

Posted Content•

Synthetic Difference in Differences

[...]

Dmitry Arkhangelsky¹, Susan Athey², David A. Hirshberg², Guido W. Imbens², Stefan Wager² - Show less +1 more•Institutions (2)

CEMFI¹, Stanford University²

24 Dec 2018-arXiv: Methodology

TL;DR: In this paper, a new estimator for causal effects with panel data is presented, which builds on insights behind the widely used difference in differences and synthetic control methods, and it performs well in settings where the conventional estimators are commonly used in practice.

...read moreread less

Abstract: We present a new estimator for causal effects with panel data that builds on insights behind the widely used difference in differences and synthetic control methods. Relative to these methods we find, both theoretically and empirically, that this "synthetic difference in differences" estimator has desirable robustness properties, and that it performs well in settings where the conventional estimators are commonly used in practice. We study the asymptotic behavior of the estimator when the systematic part of the outcome model includes latent unit factors interacted with latent time factors, and we present conditions for consistency and asymptotic normality.

...read moreread less

Posted Content•

Parameter estimation for fractional Poisson processes.

[...]

Dexter O. Cahoy¹, Vladimir V. Uchaikin, Wojbor A. Woyczyński²•Institutions (2)

Louisiana Tech University¹, Case Western Reserve University²

07 Jun 2018-arXiv: Methodology

TL;DR: In this paper, a formal estimation procedure for parameters of the fractional Poisson process (fPp) is proposed to make the fPp model more flexible by permitting non-exponential, heavy-tailed distributions of interarrival times and different scaling properties.

...read moreread less

Abstract: The paper proposes a formal estimation procedure for parameters of the fractional Poisson process (fPp). Such procedures are needed to make the fPp model usable in applied situations. The basic idea of fPp, motivated by experimental data with long memory is to make the standard Poisson model more flexible by permitting non-exponential, heavy-tailed distributions of interarrival times and different scaling properties. We establish the asymptotic normality of our estimators for the two parameters appearing in our fPp model. This fact permits construction of the corresponding confidence intervals. The properties of the estimators are then tested using simulated data.

...read moreread less

Journal Article•DOI•

Likelihood-based meta-analysis with few studies: Empirical and simulation studies

[...]

Svenja Seide, Christian Röver, Tim Friede

24 Jul 2018-arXiv: Methodology

TL;DR: In the presence of between-study heterogeneity, especially with unbalanced study sizes, caution is needed in applying meta-analytical methods to few studies, as either coverage probabilities might be compromised, or intervals are inconclusively wide.

...read moreread less

Abstract: Standard random-effects meta-analysis methods perform poorly when applied to few studies only. Such settings however are commonly encountered in practice. It is unclear, whether or to what extent small-sample-size behaviour can be improved by more sophisticated modeling. We consider several likelihood-based inference methods. Confidence intervals are based on normal or Student-t approximations. We extract an empirical data set of 40 meta-analyses from recent reviews published by the German Institute for Quality and Efficiency in Health Care (IQWiG). Methods are then compared empirically as well as in a simulation study, considering odds-ratio and risk ratio effect sizes. Empirically, a majority of the identified meta-analyses include only 2 studies. In the simulation study, coverage probability is, in the presence of heterogeneity and few studies, below the nominal level for all frequentist methods based on normal approximation, in particular when sizes in meta-analyses are not balanced, but improve when confidence intervals are adjusted. Bayesian methods result in better coverage than the frequentist methods with normal approximation in all scenarios. Credible intervals are empirically and in the simulation study wider than unadjusted confidence intervals, but considerably narrower than adjusted ones. Confidence intervals based on the generalized linear mixed models are in general, slightly narrower than those from other frequentist methods. Certain methods turned out impractical due to frequent numerical problems. In the presence of between-study heterogeneity, especially with unbalanced study sizes, caution is needed in applying meta-analytical methods to few studies, as either coverage probabilities might be compromised, or intervals are inconclusively wide. Bayesian estimation with a sensibly chosen prior for between-trial heterogeneity may offer a promising compromise.

...read moreread less

Posted Content•

A Swiss Army Infinitesimal Jackknife

[...]

Ryan Giordano¹, William T. Stephenson², Runjing Liu¹, Michael I. Jordan¹, Tamara Broderick² - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

01 Jun 2018-arXiv: Methodology

TL;DR: A linear approximation to the dependence of the fitting procedure on the weights is used, producing results that can be faster than repeated re-fitting by an order of magnitude and support the application of the infinitesimal jackknife to a wide variety of practical problems in machine learning.

...read moreread less

Abstract: The error or variability of machine learning algorithms is often assessed by repeatedly re-fitting a model with different weighted versions of the observed data. The ubiquitous tools of cross-validation (CV) and the bootstrap are examples of this technique. These methods are powerful in large part due to their model agnosticism but can be slow to run on modern, large data sets due to the need to repeatedly re-fit the model. In this work, we use a linear approximation to the dependence of the fitting procedure on the weights, producing results that can be faster than repeated re-fitting by an order of magnitude. This linear approximation is sometimes known as the "infinitesimal jackknife" in the statistics literature, where it is mostly used as a theoretical tool to prove asymptotic results. We provide explicit finite-sample error bounds for the infinitesimal jackknife in terms of a small number of simple, verifiable assumptions. Our results apply whether the weights and data are stochastic or deterministic, and so can be used as a tool for proving the accuracy of the infinitesimal jackknife on a wide variety of problems. As a corollary, we state mild regularity conditions under which our approximation consistently estimates true leave-$k$-out cross-validation for any fixed $k$. These theoretical results, together with modern automatic differentiation software, support the application of the infinitesimal jackknife to a wide variety of practical problems in machine learning, providing a "Swiss Army infinitesimal jackknife". We demonstrate the accuracy of our methods on a range of simulated and real datasets.

...read moreread less

Posted Content•

Bayesian model reduction

[...]

Karl J. Friston, Thomas Parr, Peter Zeidman¹•Institutions (1)

University College London¹

18 May 2018-arXiv: Methodology

TL;DR: Bayesian model reduction is considered and structure learning and hierarchical or empirical Bayes that can be regarded as a metaphor for neurobiological processes like abductive reasoning are considered.

...read moreread less

Abstract: This paper reviews recent developments in statistical structure learning; namely, Bayesian model reduction. Bayesian model reduction is a method for rapidly computing the evidence and parameters of probabilistic models that differ only in their priors. In the setting of variational Bayes this has an analytical solution, which finesses the problem of scoring large model spaces in model comparison or structure learning. In this technical note, we review Bayesian model reduction and provide the relevant equations for several discrete and continuous probability distributions. We provide worked examples in the context of multivariate linear regression, Gaussian mixture models and dynamical systems (dynamic causal modelling). These examples are accompanied by the Matlab scripts necessary to reproduce the results. Finally, we briefly review recent applications in the fields of neuroimaging and neuroscience. Specifically, we consider structure learning and hierarchical or empirical Bayes that can be regarded as a metaphor for neurobiological processes like abductive reasoning.

...read moreread less

Posted Content•

Extending inferences from a randomized trial to a new target population

[...]

Issa J Dahabreh¹, Sarah E. Robertson¹, Jon A. Steingrimsson¹, Elizabeth A. Stuart², Miguel A. Hernán³, Miguel A. Hernán⁴ - Show less +2 more•Institutions (4)

Brown University¹, Johns Hopkins University², Massachusetts Institute of Technology³, Harvard University⁴

01 May 2018-arXiv: Methodology

TL;DR: This tutorial considers methods for extending causal inferences about time‐fixed treatments from a trial to a new target population of nonparticipants, using data from a completed randomized trial and baseline covariates from a sample from the target population.

...read moreread less

Abstract: When treatment effect modifiers influence the decision to participate in a randomized trial, the average treatment effect in the population represented by the randomized individuals will differ from the effect in other populations. In this tutorial, we consider methods for extending causal inferences about time-fixed treatments from a trial to a new target population of non-participants, using data from a completed randomized trial and baseline covariate data from a sample from the target population. We examine methods based on modeling the expectation of the outcome, the probability of participation, or both (doubly robust). We compare the methods in a simulation study and show how they can be implemented in software. We apply the methods to a randomized trial nested within a cohort of trial-eligible patients to compare coronary artery surgery plus medical therapy versus medical therapy alone for patients with chronic coronary artery disease. We conclude by discussing issues that arise when using the methods in applied analyses.

...read moreread less

Posted Content•

Doubly Robust Inference with Non-probability Survey Samples

[...]

Yilin Chen, Pengfei Li, Changbao Wu

16 May 2018-arXiv: Methodology

TL;DR: This article established a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a probability survey sample, and constructed doubly robust estimators for the finite population mean.

...read moreread less

Abstract: We establish a general framework for statistical inferences with non-probability survey samples when relevant auxiliary information is available from a probability survey sample. We develop a rigorous procedure for estimating the propensity scores for units in the non-probability sample, and construct doubly robust estimators for the finite population mean. Variance estimation is discussed under the proposed framework. Results from simulation studies show the robustness and the efficiency of our proposed estimators as compared to existing methods. The proposed method is used to analyze a non-probability survey sample collected by the Pew Research Center with auxiliary information from the Behavioral Risk Factor Surveillance System and the Current Population Survey. Our results illustrate a general approach to inference with non-probability samples and highlight the importance and usefulness of auxiliary information from probability survey samples.

...read moreread less

Posted Content•

Comparing Spike and Slab Priors for Bayesian Variable Selection

[...]

Gertraud Malsiner-Walli¹, Helga Wagner¹•Institutions (1)

Johannes Kepler University of Linz¹

18 Dec 2018-arXiv: Methodology

TL;DR: In this article, the authors compare the MCMC implementations for several spike and slab priors with regard to posterior inclusion probabilities and their sampling efficiency for simulated data and investigate posterior inclusion probability analytically for different slabs in two simple settings.

...read moreread less

Abstract: An important task in building regression models is to decide which regressors should be included in the final model. In a Bayesian approach, variable selection can be performed using mixture priors with a spike and a slab component for the effects subject to selection. As the spike is concentrated at zero, variable selection is based on the probability of assigning the corresponding regression effect to the slab component. These posterior inclusion probabilities can be determined by MCMC sampling. In this paper we compare the MCMC implementations for several spike and slab priors with regard to posterior inclusion probabilities and their sampling efficiency for simulated data. Further, we investigate posterior inclusion probabilities analytically for different slabs in two simple settings. Application of variable selection with spike and slab priors is illustrated on a data set of psychiatric patients where the goal is to identify covariates affecting metabolism.

...read moreread less

Journal Article•DOI•

Quantile Regression Under Memory Constraint.

[...]

Xi Chen, Weidong Liu, Yichen Zhang

18 Oct 2018-arXiv: Methodology

TL;DR: This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations and establishes the asymptotic normality for the obtained estimator.

...read moreread less

Abstract: This paper studies the inference problem in quantile regression (QR) for a large sample size $n$ but under a limited memory constraint, where the memory can only store a small batch of data of size $m$. A natural method is the na\"ive divide-and-conquer approach, which splits data into batches of size $m$, computes the local QR estimator for each batch, and then aggregates the estimators via averaging. However, this method only works when $n=o(m^2)$ and is computationally expensive. This paper proposes a computationally efficient method, which only requires an initial QR estimator on a small batch of data and then successively refines the estimator via multiple rounds of aggregations. Theoretically, as long as $n$ grows polynomially in $m$, we establish the asymptotic normality for the obtained estimator and show that our estimator with only a few rounds of aggregations achieves the same efficiency as the QR estimator computed on all the data. Moreover, our result allows the case that the dimensionality $p$ goes to infinity. The proposed method can also be applied to address the QR problem under distributed computing environment (e.g., in a large-scale sensor network) or for real-time streaming data.

...read moreread less

Posted Content•

Birnbaum-Saunders Distribution: A Review of Models, Analysis and Applications

[...]

Narayanaswamy Balakrishnan¹, Debasis Kundu²•Institutions (2)

McMaster University¹, Indian Institute of Technology Kanpur²

17 May 2018-arXiv: Methodology

TL;DR: In this article, the authors provide a detailed review of all these developments and at the same time indicate several open problems that could be considered for further research, while also providing a detailed discussion of some open problems.

...read moreread less

Abstract: Birnbaum and Saunders introduced a two-parameter lifetime distribution to model fatigue life of a metal, subject to cyclic stress. Since then, extensive work has been done on this model providing different interpretations, constructions, generalizations, inferential methods, and extensions to bivariate, multivariate and matrix-variate cases. More than two hundred papers and one research monograph have already appeared describing all these aspects and developments. In this paper, we provide a detailed review of all these developments and at the same time indicate several open problems that could be considered for further research.

...read moreread less

Posted Content•

Multiple Imputation: A Review of Practical and Theoretical Findings

[...]

Jared S. Murray

12 Jan 2018-arXiv: Methodology

TL;DR: A review of strategies for generating imputations can be found in this paper, including recent developments in flexible joint modeling and sequential regression/chained equations/fully conditional specification approaches, as well as several promising avenues for future research.

...read moreread less

Abstract: Multiple imputation is a straightforward method for handling missing data in a principled fashion. This paper presents an overview of multiple imputation, including important theoretical results and their practical implications for generating and using multiple imputations. A review of strategies for generating imputations follows, including recent developments in flexible joint modeling and sequential regression/chained equations/fully conditional specification approaches. Finally, we compare and contrast different methods for generating imputations on a range of criteria before identifying promising avenues for future research.

...read moreread less

Posted Content•

Lecture Notes: Temporal Point Processes and the Conditional Intensity Function

[...]

Jakob Gulddahl Rasmussen

01 Jun 2018-arXiv: Methodology

TL;DR: In this article, a not too technical introduction to point processes on the time line using the conditional intensity function is given. But the focus lies on defining these processes using the intensity function.

...read moreread less

Abstract: These short lecture notes contain a not too technical introduction to point processes on the time line. The focus lies on defining these processes using the conditional intensity function. Furthermore, likelihood inference, methods of simulation and residual analysis for temporal point processes specified by a conditional intensity function are considered.

...read moreread less

Posted Content•

Bounds on the conditional and average treatment effect with unobserved confounding factors

[...]

Steve Yadlowsky¹, Hongseok Namkoong², Sanjay Basu³, John C. Duchi¹, Lu Tian¹ - Show less +1 more•Institutions (3)

Stanford University¹, Columbia University², Imperial College London³

28 Aug 2018-arXiv: Methodology

TL;DR: A loss minimization approach that quantifies bounds on the conditional average treatment effect (CATE) when unobserved confounder have a bounded effect on the odds of treatment selection and a semi-parametric framework that extends/bounds the augmented inverse propensity weighted (AIPW) estimator for the ATE beyond the assumption that all confounders are observed.

...read moreread less

Abstract: For observational studies, we study the sensitivity of causal inference when treatment assignments may depend on unobserved confounding factors. We develop a loss minimization approach that quantifies bounds on the conditional average treatment effect (CATE) when unobserved confounder have a bounded effect on the odds of treatment selection. Our approach is scalable and allows flexible use of model classes, including nonparametric and black-box machine learning methods. Using these bounds, we propose a related sensitivity analysis for the average treatment effect (ATE), and develop a semi-parametric framework that extends/bounds the augmented inverse propensity weighted (AIPW) estimator for the ATE beyond the assumption that all confounders are observed. By constructing a Neyman orthogonal score, our estimator is a regular root-n estimator so long as the nuisance parameters can be estimated at the $o_p(n^{-1/4})$ rate. We complement our methodological development with optimality results showing that our proposed bounds are tight in certain cases. We demonstrate our method on simulated and real data examples, and show accurate coverage of our confidence intervals in practical finite sample regimes.

...read moreread less

Journal Article•DOI•

Uncertainty quantification for computer models with spatial output using calibration-optimal bases

[...]

James M. Salter¹, Daniel Williamson¹, John Scinocca, Viatcheslav Kharin•Institutions (1)

University of Exeter¹

24 Jan 2018-arXiv: Methodology

TL;DR: In this article, the authors introduce the "terminal case", in which the model cannot reproduce observations to within model discrepancy, and for which standard calibration methods in uncertainty quantification (UQ) fail to give sensible results.

...read moreread less

Abstract: The calibration of complex computer codes using uncertainty quantification (UQ) methods is a rich area of statistical methodological development. When applying these techniques to simulators with spatial output, it is now standard to use principal component decomposition to reduce the dimensions of the outputs in order to allow Gaussian process emulators to predict the output for calibration. We introduce the `terminal case', in which the model cannot reproduce observations to within model discrepancy, and for which standard calibration methods in UQ fail to give sensible results. We show that even when there is no such issue with the model, the standard decomposition on the outputs can and usually does lead to a terminal case analysis. We present a simple test to allow a practitioner to establish whether their experiment will result in a terminal case analysis, and a methodology for defining calibration-optimal bases that avoid this whenever it is not inevitable. We present the optimal rotation algorithm for doing this, and demonstrate its efficacy for an idealised example for which the usual principal component methods fail. We apply these ideas to the CanAM4 model to demonstrate the terminal case issue arising for climate models. We discuss climate model tuning and the estimation of model discrepancy within this context, and show how the optimal rotation algorithm can be used in developing practical climate model tuning tools.

...read moreread less

Posted Content•

A Confounding Bridge Approach for Double Negative Control Inference on Causal Effects (Supplement and Sample Codes are included)

[...]

Wang Miao, Eric J. Tchetgen Tchetgen

15 Aug 2018-arXiv: Methodology

TL;DR: In this paper, the authors introduce a confounding bridge function that links the potential outcome mean and the negative control outcome distribution, and incorporate a negative control exposure to identify the bridge function and the average causal effect.

...read moreread less

Abstract: Unmeasured confounding is a key challenge for causal inference. Negative control variables are widely available in observational studies. A negative control outcome is associated with the confounder but not causally affected by the exposure in view, and a negative control exposure is correlated with the primary exposure or the confounder but does not causally affect the outcome of interest. In this paper, we establish a framework to use them for unmeasured confounding adjustment. We introduce a confounding bridge function that links the potential outcome mean and the negative control outcome distribution, and we incorporate a negative control exposure to identify the bridge function and the average causal effect. Our approach can be used to repair an invalid instrumental variable in case it is correlated with the unmeasured confounder. We also extend our approach by allowing for a causal association between the primary exposure and the control outcome. We illustrate our approach with simulations and apply it to a study about the short-term effect of air pollution. Although a standard analysis shows a significant acute effect of PM2.5 on mortality, our analysis indicates that this effect may be confounded, and after double negative control adjustment, the effect is attenuated toward zero.

...read moreread less

Posted Content•

Limitations of "Limitations of Bayesian leave-one-out cross-validation for model selection"

[...]

Aki Vehtari¹, Daniel Simpson², Yuling Yao³, Andrew Gelman³•Institutions (3)

Aalto University¹, University of Toronto², Columbia University³

12 Oct 2018-arXiv: Methodology

TL;DR: The use of LOO in practical data analysis is discussed, from the perspective that the idea that there is a device that will produce a single-number decision rule is abandoned.

...read moreread less

Abstract: This article is an invited discussion of the article by Gronau and Wagenmakers (2018) that can be found at this https URL.

...read moreread less

Collapse