scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Methodology in 2016"


Posted Content
TL;DR: Accumulated local effects plots are presented, which do not require this unreliable extrapolation with correlated predictors and are far less computationally expensive than partial dependence plots.
Abstract: When fitting black box supervised learning models (e.g., complex trees, neural networks, boosted trees, random forests, nearest neighbors, local kernel-weighted methods, etc.), visualizing the main effects of the individual predictor variables and their low-order interaction effects is often important, and partial dependence (PD) plots are the most popular approach for accomplishing this. However, PD plots involve a serious pitfall if the predictor variables are far from independent, which is quite common with large observational data sets. Namely, PD plots require extrapolation of the response at predictor values that are far outside the multivariate envelope of the training data, which can render the PD plots unreliable. Although marginal plots (M plots) do not require such extrapolation, they produce substantially biased and misleading results when the predictors are dependent, analogous to the omitted variable bias in regression. We present a new visualization approach that we term accumulated local effects (ALE) plots, which inherits the desirable characteristics of PD and M plots, without inheriting their preceding shortcomings. Like M plots, ALE plots do not require extrapolation; and like PD plots, they are not biased by the omitted variable phenomenon. Moreover, ALE plots are far less computationally expensive than PD plots.

439 citations


Posted Content
TL;DR: The generalized random forests (GRF) as mentioned in this paper ) is a non-parametric statistical estimation method based on random forests that can be used to fit any quantity of interest identified as the solution to a set of local moment equations.
Abstract: We propose generalized random forests, a method for non-parametric statistical estimation based on random forests (Breiman, 2001) that can be used to fit any quantity of interest identified as the solution to a set of local moment equations. Following the literature on local maximum likelihood estimation, our method considers a weighted set of nearby training examples; however, instead of using classical kernel weighting functions that are prone to a strong curse of dimensionality, we use an adaptive weighting function derived from a forest designed to express heterogeneity in the specified quantity of interest. We propose a flexible, computationally efficient algorithm for growing generalized random forests, develop a large sample theory for our method showing that our estimates are consistent and asymptotically Gaussian, and provide an estimator for their asymptotic variance that enables valid confidence intervals. We use our approach to develop new methods for three statistical tasks: non-parametric quantile regression, conditional average partial effect estimation, and heterogeneous treatment effect estimation via instrumental variables. A software implementation, grf for R and C++, is available from CRAN.

369 citations


Posted Content
TL;DR: In this paper, the Gaussian graphical model (GGM) is used to model the residual structure of a vector-autoregression analysis (VAR), and two network models can then be obtained: a temporal network and a contemporaneous network.
Abstract: We discuss the Gaussian graphical model (GGM; an undirected network of partial correlation coefficients) and detail its utility as an exploratory data analysis tool. The GGM shows which variables predict one-another, allows for sparse modeling of covariance structures, and may highlight potential causal relationships between observed variables. We describe the utility in 3 kinds of psychological datasets: datasets in which consecutive cases are assumed independent (e.g., cross-sectional data), temporally ordered datasets (e.g., n = 1 time series), and a mixture of the 2 (e.g., n > 1 time series). In time-series analysis, the GGM can be used to model the residual structure of a vector-autoregression analysis (VAR), also termed graphical VAR. Two network models can then be obtained: a temporal network and a contemporaneous network. When analyzing data from multiple subjects, a GGM can also be formed on the covariance structure of stationary means---the between-subjects network. We discuss the interpretation of these models and propose estimation methods to obtain these networks, which we implement in the R packages graphicalVAR and mlVAR. The methods are showcased in two empirical examples, and simulation studies on these methods are included in the supplementary materials.

306 citations


Journal ArticleDOI
TL;DR: Sparsified Binary Segmentation (SBS) as mentioned in this paper combines the CUSUM statistics obtained from local periodograms and cross-periodograms of the components of the input time series to reduce the impact of irrelevant, noisy contributions.
Abstract: Time series segmentation, a.k.a. multiple change-point detection, is a well-established problem. However, few solutions are designed specifically for high-dimensional situations. In this paper, our interest is in segmenting the second-order structure of a high-dimensional time series. In a generic step of a binary segmentation algorithm for multivariate time series, one natural solution is to combine CUSUM statistics obtained from local periodograms and cross-periodograms of the components of the input time series. However, the standard "maximum" and "average" methods for doing so often fail in high dimensions when, for example, the change-points are sparse across the panel or the CUSUM statistics are spuriously large. In this paper, we propose the Sparsified Binary Segmentation (SBS) algorithm which aggregates the CUSUM statistics by adding only those that pass a certain threshold. This "sparsifying" step reduces the impact of irrelevant, noisy contributions, which is particularly beneficial in high dimensions. In order to show the consistency of SBS, we introduce the multivariate Locally Stationary Wavelet model for time series, which is a separate contribution of this work.

235 citations


Journal ArticleDOI
TL;DR: It is shown that using the Akaike information criterion or Bayes information criterion to select the tuning parameter may not be adequate for consistently identifying the true model, and a uniform choice of the model complexity penalty is proposed.
Abstract: Determining how to appropriately select the tuning parameter is essential in penalized likelihood methods for high-dimensional data analysis. We examine this problem in the setting of penalized likelihood methods for generalized linear models, where the dimensionality of covariates p is allowed to increase exponentially with the sample size n. We propose to select the tuning parameter by optimizing the generalized information criterion (GIC) with an appropriate model complexity penalty. To ensure that we consistently identify the true model, a range for the model complexity penalty is identified in GIC. We find that this model complexity penalty should diverge at the rate of some power of $\log p$ depending on the tail probability behavior of the response variables. This reveals that using the AIC or BIC to select the tuning parameter may not be adequate for consistently identifying the true model. Based on our theoretical study, we propose a uniform choice of the model complexity penalty and show that the proposed approach consistently identifies the true model among candidate models with asymptotic probability one. We justify the performance of the proposed procedure by numerical simulations and a gene expression data analysis.

229 citations


Journal ArticleDOI
TL;DR: It is shown that 3 of the 4 approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness, and simulation studies reveal the behavior of the approaches in finite samples.
Abstract: Many modern estimators require bootstrapping to calculate confidence intervals because either no analytic standard error is available or the distribution of the parameter of interest is non-symmetric. It remains however unclear how to obtain valid bootstrap inference when dealing with multiple imputation to address missing data. We present four methods which are intuitively appealing, easy to implement, and combine bootstrap estimation with multiple imputation. We show that three of the four approaches yield valid inference, but that the performance of the methods varies with respect to the number of imputed data sets and the extent of missingness. Simulation studies reveal the behavior of our approaches in finite samples. A topical analysis from HIV treatment research, which determines the optimal timing of antiretroviral treatment initiation in young children, demonstrates the practical implications of the four methods in a sophisticated and realistic setting. This analysis suffers from missing data and uses the $g$-formula for inference, a method for which no standard errors are available.

200 citations


Posted Content
TL;DR: This paper discusses new research on identification strategies in program evaluation, with particular focus on synthetic control methods, regression discontinuity, external validity, and the causal interpretation of regression methods.
Abstract: In this paper we discuss recent developments in econometrics that we view as important for empirical researchers working on policy evaluation questions. We focus on three main areas, where in each case we highlight recommendations for applied work. First, we discuss new research on identification strategies in program evaluation, with particular focus on synthetic control methods, regression discontinuity, external validity, and the causal interpretation of regression methods. Second, we discuss various forms of supplementary analyses to make the identification strategies more credible. These include placebo analyses as well as sensitivity and robustness analyses. Third, we discuss recent advances in machine learning methods for causal effects. These advances include methods to adjust for differences between treated and control units in high-dimensional settings, and methods for identifying and estimating heterogeneous treatment effects.

199 citations


Posted Content
TL;DR: In this article, higher order influence functions for inference in semi-and non-parametric models are derived from a set of functionals and their corresponding higher-order influence functions, which are then used to estimate the causal effect of a time dependent treatment on an outcome under missing at random.
Abstract: Robins et al, 2008, published a theory of higher order influence functions for inference in semi- and non-parametric models. This paper is a comprehensive manuscript from which Robins et al, was drawn. The current paper includes many results and proofs that were not included in Robins et al due to space limitation. Particular results contained in the present paper that were not reported in Robins et al include the following. Given a set of functionals and their corresponding higher order influence functions, we show how to derive the higher order influence function of their product. We apply this result to obtain higher order influence functions and associated estimators for the mean of a response Y subject to monotone missingness under missing at random. These results also apply to estimating the causal effect of a time dependent treatment on an outcome Y in the presence of time-varying confounding. Finally, we include an appendix that contains proofs for all theorems that were stated without proof in Robins et al, 2008. The initial part of the paper is closely related to Robins et al, the latter parts differ.

152 citations


Posted Content
TL;DR: In this paper, a general introduction to network modeling in psychometrics is provided, with an introduction to the statistical model formulation of pairwise Markov random fields (PMRF), followed by an introduction of the Ising model suitable for binary data.
Abstract: This chapter provides a general introduction of network modeling in psychometrics. The chapter starts with an introduction to the statistical model formulation of pairwise Markov random fields (PMRF), followed by an introduction of the PMRF suitable for binary data: the Ising model. The Ising model is a model used in ferromagnetism to explain phase transitions in a field of particles. Following the description of the Ising model in statistical physics, the chapter continues to show that the Ising model is closely related to models used in psychometrics. The Ising model can be shown to be equivalent to certain kinds of logistic regression models, loglinear models and multi-dimensional item response theory (MIRT) models. The equivalence between the Ising model and the MIRT model puts standard psychometrics in a new light and leads to a strikingly different interpretation of well-known latent variable models. The chapter gives an overview of methods that can be used to estimate the Ising model, and concludes with a discussion on the interpretation of latent variables given the equivalence between the Ising model and MIRT.

150 citations


Posted Content
TL;DR: The reasons for the success of the INLA approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute, and why LGMs make such a useful concept for Bayesian computing are discussed.
Abstract: The key operation in Bayesian inference, is to compute high-dimensional integrals. An old approximate technique is the Laplace method or approximation, which dates back to Pierre- Simon Laplace (1774). This simple idea approximates the integrand with a second order Taylor expansion around the mode and computes the integral analytically. By developing a nested version of this classical idea, combined with modern numerical techniques for sparse matrices, we obtain the approach of Integrated Nested Laplace Approximations (INLA) to do approximate Bayesian inference for latent Gaussian models (LGMs). LGMs represent an important model-abstraction for Bayesian inference and include a large proportion of the statistical models used today. In this review, we will discuss the reasons for the success of the INLA-approach, the R-INLA package, why it is so accurate, why the approximations are very quick to compute and why LGMs make such a useful concept for Bayesian computing.

131 citations


Posted Content
TL;DR: This work studies the subtle but important decisions underlying the specification of a configuration model, and investigates the role these choices play in graph sampling procedures and a suite of applications, placing particular emphasis on the importance of specifying the appropriate graph labeling under which to consider a null model.
Abstract: Random graph null models have found widespread application in diverse research communities analyzing network datasets, including social, information, and economic networks, as well as food webs, protein-protein interactions, and neuronal networks. The most popular family of random graph null models, called configuration models, are defined as uniform distributions over a space of graphs with a fixed degree sequence. Commonly, properties of an empirical network are compared to properties of an ensemble of graphs from a configuration model in order to quantify whether empirical network properties are meaningful or whether they are instead a common consequence of the particular degree sequence. In this work we study the subtle but important decisions underlying the specification of a configuration model, and investigate the role these choices play in graph sampling procedures and a suite of applications. We place particular emphasis on the importance of specifying the appropriate graph labeling (stub-labeled or vertex-labeled) under which to consider a null model, a choice that closely connects the study of random graphs to the study of random contingency tables. We show that the choice of graph labeling is inconsequential for studies of simple graphs, but can have a significant impact on analyses of multigraphs or graphs with self-loops. The importance of these choices is demonstrated through a series of three vignettes, analyzing network datasets under many different configuration models and observing substantial differences in study conclusions under different models. We argue that in each case, only one of the possible configuration models is appropriate. While our work focuses on undirected static networks, it aims to guide the study of directed networks, dynamic networks, and all other network contexts that are suitably studied through the lens of random graph null models.

Journal ArticleDOI
TL;DR: In this article, a modified MI strategy called multiple imputation, then deletion (MID) is proposed to handle missing values of the dependent variable Y in a generalized linear model, where all cases are used for imputation but following imputation cases with imputed Y values are excluded from the analysis.
Abstract: When fitting a generalized linear model -- such as a linear regression, a logistic regression, or a hierarchical linear model -- analysts often wonder how to handle missing values of the dependent variable Y. If missing values have been filled in using multiple imputation, the usual advice is to use the imputed Y values in analysis. We show, however, that using imputed Ys can add needless noise to the estimates. Better estimates can usually be obtained using a modified strategy that we call multiple imputation, then deletion (MID). Under MID, all cases are used for imputation, but following imputation cases with imputed Y values are excluded from the analysis. When there is something wrong with the imputed Y values, MID protects the estimates from the problematic imputations. And when the imputed Y values are acceptable, MID usually offers somewhat more efficient estimates than an ordinary MI strategy.

Posted Content
TL;DR: A two-stage procedure is recommended in which you conduct a pilot analysis using a small-to-moderate number of imputations, then use the results to calculate the number ofImputations that are needed for a final analysis whose standard error estimates will have the desired level of replicability.
Abstract: When using multiple imputation, users often want to know how many imputations they need. An old answer is that 2 to 10 imputations usually suffice, but this recommendation only addresses the efficiency of point estimates. You may need more imputations if, in addition to efficient point estimates, you also want standard error (SE) estimates that would not change (much) if you imputed the data again. For replicable SE estimates, the required number of imputations increases quadratically with the fraction of missing information (not linearly, as previous studies have suggested). I recommend a two-stage procedure in which you conduct a pilot analysis using a small-to-moderate number of imputations, then use the results to calculate the number of imputations that are needed for a final analysis whose SE estimates will have the desired level of replicability. I implement the two-stage procedure using a new Stata command called how_many_imputations (available from SSC) and a new SAS macro called %mi_combine (available from the website this http URL).

Posted Content
TL;DR: The authors proposed a bias-reduced linearization (BRL) method to correct the bias of cluster-robust variance estimation (CRVE) under a working model, and t-tests with Satterthwaite degrees of freedom.
Abstract: In longitudinal panels and other regression models with unobserved effects, fixed effects estimation is often paired with cluster-robust variance estimation (CRVE) in order to account for heteroskedasticity and un-modeled dependence among the errors. CRVE is asymptotically consistent as the number of independent clusters increases, but can be biased downward for sample sizes often found in applied work, leading to hypothesis tests with overly liberal rejection rates. One solution is to use bias-reduced linearization (BRL), which corrects the CRVE so that it is unbiased under a working model, and t-tests with Satterthwaite degrees of freedom. We propose a generalization of BRL that can be applied in models with arbitrary sets of fixed effects, where the original BRL method is undefined, and describe how to apply the method when the regression is estimated after absorbing the fixed effects. We also propose a small-sample test for multiple-parameter hypotheses, which generalizes the Satterthwaite approximation for t-tests. In simulations covering a variety of study designs, we find that conventional cluster-robust Wald tests can severely under-reject while the proposed small-sample test maintains Type I error very close to nominal levels.

Posted Content
TL;DR: A recently proposed parameterisation of the BYM model is discussed that leads to improved parameter control as the hyperparameters can be seen independently from each other, and the need for a scaled spatial component is addressed, which facilitates assignment of interpretable hyperpriors.
Abstract: In recent years, disease mapping studies have become a routine application within geographical epidemiology and are typically analysed within a Bayesian hierarchical model formulation. A variety of model formulations for the latent level have been proposed but all come with inherent issues. In the classical BYM model, the spatially structured component cannot be seen independently from the unstructured component. This makes prior definitions for the hyperparameters of the two random effects challenging. There are alternative model formulations that address this confounding, however, the issue on how to choose interpretable hyperpriors is still unsolved. Here, we discuss a recently proposed parameterisation of the BYM model that leads to improved parameter control as the hyperparameters can be seen independently from each other. Furthermore, the need for a scaled spatial component is addressed, which facilitates assignment of interpretable hyperpriors and make these transferable between spatial applications with different graph structures. We provide implementation details for the new model formulation which preserve sparsity properties, and we investigate systematically the model performance and compare it to existing parameterisations. Through a simulation study, we show that the new model performs well, both showing good learning abilities and good shrinkage behaviour. In terms of model choice criteria, the proposed model performs at least equally well as existing parameterisations, but only the new formulation offers parameters that are interpretable and hyperpriors that have a clear meaning.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the properties of a range of commonly used frequentist and Bayesian procedures in extensive simulation studies, and the consequences for interval estimation of the common treatment effect in random effects meta-analysis are assessed.
Abstract: Meta-analyses in orphan diseases and small populations generally face particular problems including small numbers of studies, small study sizes, and heterogeneity of results. However, the heterogeneity is difficult to estimate if only very few studies are included. Motivated by a systematic review in immunosuppression following liver transplantation in children we investigate the properties of a range of commonly used frequentist and Bayesian procedures in extensive simulation studies. Furthermore, the consequences for interval estimation of the common treatment effect in random effects meta-analysis are assessed. The Bayesian credibility intervals using weakly informative priors for the between-trial heterogeneity exhibited coverage probabilities in excess of the nominal level for a range of scenarios considered. However, they tended to be shorter than those obtained by the Knapp-Hartung method, which were also conservative. In contrast, methods based on normal quantiles exhibited coverages well below the nominal levels in many scenarios. With very few studies, the performance of the Bayesian credibility intervals is of course sensitive to the specification of the prior for the between trial heterogeneity. In conclusion, the use of weakly informative priors as exemplified by half-normal priors (with scale 0.5 or 1.0) for log odds ratios is recommended for applications in rare diseases.

Posted Content
TL;DR: An extended unconfoundedness assumption that accounts for interference is proposed, and new covariate-adjustment methods are developed that lead to valid estimates of treatment and interference effects in observational studies on networks.
Abstract: Causal inference on a population of units connected through a network often presents technical challenges, including how to account for interference. In the presence of local interference, for instance, potential outcomes of a unit depend on its treatment as well as on the treatments of other local units, such as its neighbors according to the network. In observational studies, a further complication is that the typical unconfoundedness assumption must be extended - say, to include the treatment of neighbors, and indi- vidual and neighborhood covariates - to guarantee identification and valid inference. Here, we propose new estimands that define treatment and interference effects. We then derive analytical expressions for the bias of a naive estimator that wrongly assumes away interference. The bias depends on the level of interference but also on the degree of association between individual and neighborhood treatments. We propose an extended unconfoundedness assumption that accounts for interference, and we develop new covariate-adjustment methods that lead to valid estimates of treatment and interference effects in observational studies on networks. Estimation is based on a generalized propensity score that balances individual and neighborhood covariates across units under different levels of individual treatment and of exposure to neighbors' treatment. We carry out simulations, calibrated using friendship networks and covariates in a nationally representative longitudinal study of adolescents in grades 7-12, in the United States, to explore finite-sample performance in different realistic settings.

Posted Content
TL;DR: The main theoretical result proves that the SABHA method controls the FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data.
Abstract: In multiple testing problems, where a large number of hypotheses are tested simultaneously, false discovery rate (FDR) control can be achieved with the well-known Benjamini-Hochberg procedure, which adapts to the amount of signal present in the data. Many modifications of this procedure have been proposed to improve power in scenarios where the hypotheses are organized into groups or into a hierarchy, as well as other structured settings. Here we introduce SABHA, the "structure-adaptive Benjamini-Hochberg algorithm", as a generalization of these adaptive testing methods. SABHA incorporates prior information about any pre-determined type of structure in the pattern of locations of the signals and nulls within the list of hypotheses, to reweight the p-values in a data-adaptive way. This raises the power by making more discoveries in regions where signals appear to be more common. Our main theoretical result proves that SABHA controls FDR at a level that is at most slightly higher than the target FDR level, as long as the adaptive weights are constrained sufficiently so as not to overfit too much to the data-interestingly, the excess FDR can be related to the Rademacher complexity or Gaussian width of the class from which we choose our data-adaptive weights. We apply this general framework to various structured settings, including ordered, grouped, and low total variation structures, and get the bounds on FDR for each specific setting. We also examine the empirical performance of SABHA on fMRI activity data and on gene/drug response data, as well as on simulated data.

Journal ArticleDOI
TL;DR: In this paper, Hartung-Knapp-Sidik-Jonkman method (HKSJ), modified Knapp-Hartung method (mKH, a variation of the HKSJ method) and Bayesian random-effects meta-analyses with priors covering plausible heterogeneity values were compared.
Abstract: Random-effects meta-analyses are used to combine evidence of treatment effects from multiple studies. Since treatment effects may vary across trials due to differences in study characteristics, heterogeneity in treatment effects between studies must be accounted for to achieve valid inference. The standard model for random-effects meta-analysis assumes approximately normal effect estimates and a normal random-effects model. However, standard methods based on this model ignore the uncertainty in estimating the between-trial heterogeneity. In the special setting of only two studies and in the presence of heterogeneity we investigate here alternatives such as the Hartung-Knapp-Sidik-Jonkman method (HKSJ), the modified Knapp-Hartung method (mKH, a variation of the HKSJ method) and Bayesian random-effects meta-analyses with priors covering plausible heterogeneity values. The properties of these methods are assessed by applying them to five examples from various rare diseases and by a simulation study. Whereas the standard method based on normal quantiles has poor coverage, the HKSJ and mKH generally lead to very long, and therefore inconclusive, confidence intervals. The Bayesian intervals on the whole show satisfying properties and offer a reasonable compromise between these two extremes.

Posted Content
TL;DR: In this article, the authors developed a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the total number of observational units.
Abstract: This paper develops a framework for testing for associations in a possibly high-dimensional linear model where the number of features/variables may far exceed the number of observational units. In this framework, the observations are split into two groups, where the first group is used to screen for a set of potentially relevant variables, whereas the second is used for inference over this reduced set of variables; we also develop strategies for leveraging information from the first part of the data at the inference step for greater power. In our work, the inferential step is carried out by applying the recently introduced knockoff filter, which creates a knockoff copy-a fake variable serving as a control-for each screened variable. We prove that this procedure controls the directional false discovery rate (FDR) in the reduced model controlling for all screened variables; this says that our high-dimensional knockoff procedure 'discovers' important variables as well as the directions (signs) of their effects, in such a way that the expected proportion of wrongly chosen signs is below the user-specified level (thereby controlling a notion of Type S error averaged over the selected set). This result is non-asymptotic, and holds for any distribution of the original features and any values of the unknown regression coefficients, so that inference is not calibrated under hypothesized values of the effect sizes. We demonstrate the performance of our general and flexible approach through numerical studies, showing more power than existing alternatives. Finally, we apply our method to a genome-wide association study to find locations on the genome that are possibly associated with a continuous phenotype.

Posted Content
TL;DR: In this article, the authors introduce a formal definition for moderated effects in terms of potential outcomes, a definition that is particularly suited to mobile interventions, where treatment occasions are numerous, individuals are not always available for treatment, and potential moderators might be influenced by past treatment.
Abstract: In mobile health interventions aimed at behavior change and maintenance, treatments are provided in real time to manage current or impending high risk situations or promote healthy behaviors in near real time. Currently there is great scientific interest in developing data analysis approaches to guide the development of mobile interventions. In particular data from mobile health studies might be used to examine effect moderators-i.e., individual characteristics, time-varying context or past treatment response that moderate the effect of current treatment on a subsequent response. This paper introduces a formal definition for moderated effects in terms of potential outcomes, a definition that is particularly suited to mobile interventions, where treatment occasions are numerous, individuals are not always available for treatment, and potential moderators might be influenced by past treatment. Methods for estimating moderated effects are developed and compared. The proposed approach is illustrated using BASICS-Mobile, a smartphone-based intervention designed to curb heavy drinking and smoking among college students.

Posted Content
TL;DR: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter due to the previous default choices.
Abstract: The horseshoe prior has proven to be a noteworthy alternative for sparse Bayesian estimation, but as shown in this paper, the results can be sensitive to the prior choice for the global shrinkage hyperparameter. We argue that the previous default choices are dubious due to their tendency to favor solutions with more unshrunk coefficients than we typically expect a priori. This can lead to bad results if this parameter is not strongly identified by data. We derive the relationship between the global parameter and the effective number of nonzeros in the coefficient vector, and show an easy and intuitive way of setting up the prior for the global parameter based on our prior beliefs about the number of nonzero coefficients in the model. The results on real world data show that one can benefit greatly -- in terms of improved parameter estimates, prediction accuracy, and reduced computation time -- from transforming even a crude guess for the number of nonzero coefficients into the prior for the global parameter using our framework.

Posted Content
TL;DR: The most frequently used methods for distinguishing common and distinct components are explained in this framework, and some practical examples are given of these methods in the areas of medical biology and food science.
Abstract: In many areas of science multiple sets of data are collected pertaining to the same system. Examples are food products which are characterized by different sets of variables, bio-processes which are on-line sampled with different instruments, or biological systems of which different genomics measurements are obtained. Data fusion is concerned with analyzing such sets of data simultaneously to arrive at a global view of the system under study. One of the upcoming areas of data fusion is exploring whether the data sets have something in common or not. This gives insight into common and distinct variation in each data set, thereby facilitating understanding the relationships between the data sets. Unfortunately, research on methods to distinguish common and distinct components is fragmented, both in terminology as well as in methods: there is no common ground which hampers comparing methods and understanding their relative merits. This paper provides a unifying framework for this subfield of data fusion by using rigorous arguments from linear algebra. The most frequently used methods for distinguishing common and distinct components are explained in this framework and some practical examples are given of these methods in the areas of (medical) biology and food science.

Journal ArticleDOI
TL;DR: It is demonstrated that three Ising model representations exist that, although each proposes a distinct theoretical explanation for the observed associations, are mathematically equivalent, allowing the researcher to interpret the results of one model in three different ways.
Abstract: Statistical models that analyse (pairwise) relations between variables encompass assumptions about the underlying mechanism that generated the associations in the observed data. In the present paper we demonstrate that three Ising model representations exist that, although each proposes a distinct theoretical explanation for the observed associations, are mathematically equivalent. This equivalence allows the researcher to interpret the results of one model in three different ways. We illustrate the ramifications of this by discussing concepts that are conceived as problematic in their traditional explanation, yet when interpreted in the context of another explanation make immediate sense.

Posted Content
TL;DR: In this article, the authors consider the asymptotic behavior of the posterior distribution obtained by approximate Bayesian computation and give general results on the rate at which posterior distribution concentrates on sets containing the true parameter.
Abstract: Approximate Bayesian computation allows for statistical analysis in models with intractable likelihoods. In this paper we consider the asymptotic behaviour of the posterior distribution obtained by this method. We give general results on the rate at which the posterior distribution concentrates on sets containing the true parameter, its limiting shape, and the asymptotic distribution of the posterior mean. These results hold under given rates for the tolerance used within the method, mild regularity conditions on the summary statistics, and a condition linked to identification of the true parameters. Implications for practitioners are discussed.

Posted ContentDOI
TL;DR: The foundations for a general theory of statistical causal modeling with SCMs are provided, allowing for the presence of both latent confounders and cycles, and a class of simple SCMs is introduced that extends the class of acyclic SCMs to the cyclic setting, while preserving many of the convenient properties.
Abstract: Structural causal models (SCMs), also known as (nonparametric) structural equation models (SEMs), are widely used for causal modeling purposes. In particular, acyclic SCMs, also known as recursive SEMs, form a well-studied subclass of SCMs that generalize causal Bayesian networks to allow for latent confounders. In this paper, we investigate SCMs in a more general setting, allowing for the presence of both latent confounders and cycles. We show that in the presence of cycles, many of the convenient properties of acyclic SCMs do not hold in general: they do not always have a solution; they do not always induce unique observational, interventional and counterfactual distributions; a marginalization does not always exist, and if it exists the marginal model does not always respect the latent projection; they do not always satisfy a Markov property; and their graphs are not always consistent with their causal semantics. We prove that for SCMs in general each of these properties does hold under certain solvability conditions. Our work generalizes results for SCMs with cycles that were only known for certain special cases so far. We introduce the class of simple SCMs that extends the class of acyclic SCMs to the cyclic setting, while preserving many of the convenient properties of acyclic SCMs. With this paper we aim to provide the foundations for a general theory of statistical causal modeling with SCMs.

Journal ArticleDOI
TL;DR: In this paper, a unified view of likelihood based Gaussian progress regression for simulation experiments exhibiting input-dependent noise is presented, where multiple applications of a well-known Woodbury identity facilitate inference for all parameters under the likelihood, bypassing the typical full-data sized calculations.
Abstract: We present a unified view of likelihood based Gaussian progress regression for simulation experiments exhibiting input-dependent noise. Replication plays an important role in that context, however previous methods leveraging replicates have either ignored the computational savings that come from such design, or have short-cut full likelihood-based inference to remain tractable. Starting with homoskedastic processes, we show how multiple applications of a well-known Woodbury identity facilitate inference for all parameters under the likelihood (without approximation), bypassing the typical full-data sized calculations. We then borrow a latent-variable idea from machine learning to address heteroskedasticity, adapting it to work within the same thrifty inferential framework, thereby simultaneously leveraging the computational and statistical efficiency of designs with replication. The result is an inferential scheme that can be characterized as single objective function, complete with closed form derivatives, for rapid library-based optimization. Illustrations are provided, including real-world simulation experiments from manufacturing and the management of epidemics.

Posted Content
TL;DR: In this article, the authors provide computationally attractive procedures to construct confidence sets (CSs) for identified sets of full parameters and subvectors in models defined through a likelihood or a vector of moment equalities or inequalities.
Abstract: In complicated/nonlinear parametric models, it is generally hard to know whether the model parameters are point identified. We provide computationally attractive procedures to construct confidence sets (CSs) for identified sets of full parameters and of subvectors in models defined through a likelihood or a vector of moment equalities or inequalities. These CSs are based on level sets of optimal sample criterion functions (such as likelihood or optimally-weighted or continuously-updated GMM criterions). The level sets are constructed using cutoffs that are computed via Monte Carlo (MC) simulations directly from the quasi-posterior distributions of the criterions. We establish new Bernstein-von Mises (or Bayesian Wilks) type theorems for the quasi-posterior distributions of the quasi-likelihood ratio (QLR) and profile QLR in partially-identified regular models and some non-regular models. These results imply that our MC CSs have exact asymptotic frequentist coverage for identified sets of full parameters and of subvectors in partially-identified regular models, and have valid but potentially conservative coverage in models with reduced-form parameters on the boundary. Our MC CSs for identified sets of subvectors are shown to have exact asymptotic coverage in models with singularities. We also provide results on uniform validity of our CSs over classes of DGPs that include point and partially identified models. We demonstrate good finite-sample coverage properties of our procedures in two simulation experiments. Finally, our procedures are applied to two non-trivial empirical examples: an airline entry game and a model of trade flows.

Posted Content
TL;DR: In this paper, the authors developed three extensions to instrumental variable methods using robust regression, the penalization of weights from candidate instruments with heterogeneous causal estimates, and L1 penalization.
Abstract: Mendelian randomization is the use of genetic variants to make causal inferences from observational data. The field is currently undergoing a revolution fuelled by increasing numbers of genetic variants demonstrated to be associated with exposures in genome-wide association studies, and the public availability of summarized data on genetic associations with exposures and outcomes from large consortia. A Mendelian randomization analysis with many genetic variants can be performed relatively simply using summarized data. However, a causal interpretation is only assured if each genetic variant satisfies the assumptions of an instrumental variable. To provide some protection against failure of these assumptions, robust methods for instrumental variable analysis have been proposed. Here, we develop three extensions to instrumental variable methods using: i) robust regression, ii) the penalization of weights from candidate instruments with heterogeneous causal estimates, and iii) L1 penalization. Results from a wide variety of robust methods, including the recently-proposed MR-Egger and median-based methods, are compared in an extensive simulation study. We demonstrate that two methods, robust regression in an inverse-variance weighted method and a simple median of the causal estimates from the individual variants, have considerably improved Type 1 error rates compared with conventional methods in a wide variety of scenarios when up to 30% of the genetic variants are invalid instruments. While the MR-Egger method gives unbiased estimates when its assumptions are satisfied, these estimates are less efficient than those from other methods and are highly sensitive to violations of the assumptions. Methods that make different assumptions should be used routinely to assess the robustness of findings from applied Mendelian randomization investigations with multiple genetic variants.

Posted Content
TL;DR: The Narrowest-Over-Threshold (NOT) as mentioned in this paper estimator is based on the smallest local sections of the data on which the existence of a feature is suspected.
Abstract: We propose a new, generic and flexible methodology for nonparametric function estimation, in which we first estimate the number and locations of any features that may be present in the function, and then estimate the function parametrically between each pair of neighbouring detected features. Examples of features handled by our methodology include change-points in the piecewise-constant signal model, kinks in the piecewise-linear signal model, and other similar irregularities, which we also refer to as generalised change-points. Our methodology works with only minor modifications across a range of generalised change-point scenarios, and we achieve such a high degree of generality by proposing and using a new multiple generalised change-point detection device, termed Narrowest-Over-Threshold (NOT). The key ingredient of NOT is its focus on the smallest local sections of the data on which the existence of a feature is suspected. Crucially, this adaptive localisation technique prevents NOT from considering subsamples containing two or more features, a key factor that ensures the general applicability of NOT. For selected scenarios, we show the consistency and near-optimality of NOT in detecting the number and locations of generalised change-points. Furthermore, we propose to select NOT's threshold (automatically) via the strengthened Schwarz Information Criterion (sSIC) and give theoretical justifications. The NOT estimators are easy to implement and rapid to compute: the entire threshold-indexed solution path can be computed in close-to-linear time. Importantly, the NOT approach is easy to extend by the user to tailor to their own needs. There is no single competitor, but we show that the performance of NOT matches or surpasses the state of the art in the scenarios tested. Our methodology is implemented in the R package \textbf{not}.