scispace - formally typeset
Search or ask a question

Showing papers by "Clark Glymour published in 2019"


Journal ArticleDOI
TL;DR: This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.
Abstract: A fundamental task in various disciplines of science, including biology, is to find underlying causal relations and make use of them. Causal relations can be seen if interventions are properly applied; however, in many cases they are difficult or even impossible to conduct. It is then necessary to discover causal relations by analyzing statistical properties of purely observational data, which is known as causal discovery or causal structure search. This paper aims to give a introduction to and a brief review of the computational methods for causal discovery that were developed in the past three decades, including constraint-based and score-based methods and those based on functional causal models, supplemented by some illustrations and applications.

489 citations


Journal ArticleDOI
TL;DR: An overview of causal inference frameworks is given, promising applications and methodological challenges are identified, and a causality benchmark platform is initiated to close the gap between method users and developers.
Abstract: The heart of the scientific enterprise is a rational effort to understand the causes behind the phenomena we observe. In large-scale complex dynamical systems such as the Earth system, real experiments are rarely feasible. However, a rapidly increasing amount of observational and simulated data opens up the use of novel data-driven causal methods beyond the commonly adopted correlation techniques. Here, we give an overview of causal inference frameworks and identify promising generic application cases common in Earth system sciences and beyond. We discuss challenges and initiate the benchmark platform causeme.net to close the gap between method users and developers.

460 citations


Journal ArticleDOI
04 Feb 2019
TL;DR: The adequacies of several proposed and two new statistical methods for recovering the causal structure of systems with feedback from synthetic BOLD time series are tested and Fast Adjacency Skewness (FASK) and Two-Step, both of which exploit non-Gaussian features of the BOLD signal are introduced.
Abstract: We test the adequacies of several proposed and two new statistical methods for recovering the causal structure of systems with feedback from synthetic BOLD time series. We compare an adaptation of the first correct method for recovering cyclic linear systems; Granger causal regression; a multivariate autoregressive model with a permutation test; the Group Iterative Multiple Model Estimation (GIMME) algorithm; the Ramsey et al. non-Gaussian methods; two non-Gaussian methods by Hyvarinen and Smith; a method due to Patel et al.; and the GlobalMIT algorithm. We introduce and also compare two new methods, Fast Adjacency Skewness (FASK) and Two-Step, both of which exploit non-Gaussian features of the BOLD signal. We give theoretical justifications for the latter two algorithms. Our test models include feedback structures with and without direct feedback (2-cycles), excitatory and inhibitory feedback, models using experimentally determined structural connectivities of macaques, and empirical human resting-state and task data. We find that averaged over all of our simulations, including those with 2-cycles, several of these methods have a better than 80% orientation precision (i.e., the probability of a directed edge is in the true structure given that a procedure estimates it to be so) and the two new methods also have better than 80% recall (probability of recovering an orientation in the true structure).

66 citations


Journal ArticleDOI
TL;DR: A new algorithm is used, CausalMGM, to identify variables directly linked to disease diagnosis and progression in various multi-modal datasets, including clinical datasets from chronic obstructive pulmonary disease (COPD).
Abstract: Motivation Integration of data from different modalities is a necessary step for multi-scale data analysis in many fields, including biomedical research and systems biology Directed graphical models offer an attractive tool for this problem because they can represent both the complex, multivariate probability distributions and the causal pathways influencing the system Graphical models learned from biomedical data can be used for classification, biomarker selection and functional analysis, while revealing the underlying network structure and thus allowing for arbitrary likelihood queries over the data Results In this paper, we present and test new methods for finding directed graphs over mixed data types (continuous and discrete variables) We used this new algorithm, CausalMGM, to identify variables directly linked to disease diagnosis and progression in various multi-modal datasets, including clinical datasets from chronic obstructive pulmonary disease (COPD) COPD is the third leading cause of death and a major cause of disability and thus determining the factors that cause longitudinal lung function decline is very important Applied on a COPD dataset, mixed graphical models were able to confirm and extend previously described causal effects and provide new insights on the factors that potentially affect the longitudinal lung function decline of COPD patients Availability and implementation The CausalMGM package is available on http://wwwcausalmgmorg Supplementary information Supplementary data are available at Bioinformatics online

46 citations


Posted Content
TL;DR: This paper treats forecasting as a problem in Bayesian inference in the causal model, which exploits the timevarying property of the data and adapts to new observations in a principled manner.
Abstract: In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the time-varying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods.

31 citations


Proceedings Article
01 Dec 2019
TL;DR: This paper designs a form of "pseudo-residual" from three variables, and shows that when causal relations are linear and noise terms are non-Gaussian, the causal direction between the latent variables for the three observed variables is identifiable by checking a certain kind of independence relationship.
Abstract: Learning causal structure from observational data has attracted much attention, and it is notoriously challenging to find the underlying structure in the presence of confounders (hidden direct common causes of two variables). In this paper, by properly leveraging the non-Gaussianity of the data, we propose to estimate the structure over latent variables with the so-called Triad constraints: we design a form of "pseudo-residual" from three variables, and show that when causal relations are linear and noise terms are non-Gaussian, the causal direction between the latent variables for the three observed variables is identifiable by checking a certain kind of independence relationship. In other words, the Triad constraints help us to locate latent confounders and determine the causal direction between them. This goes far beyond the Tetrad constraints and reveals more information about the underlying structure from non-Gaussian data. Finally, based on the Triad constraints, we develop a two-step algorithm to learn the causal structure corresponding to measurement models. Experimental results on both synthetic and real data demonstrate the effectiveness and reliability of our method.

29 citations


Proceedings Article
01 Dec 2019
TL;DR: This paper proposes a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation.
Abstract: State-of-the-art approaches to causal discovery usually assume a fixed underlying causal model. However, it is often the case that causal models vary across domains or subjects, due to possibly omitted factors that affect the quantitative causal effects. As a typical example, causal connectivity in the brain network has been reported to vary across individuals, with significant differences across groups of people, such as autistics and typical controls. In this paper, we develop a unified framework for causal discovery and mechanism-based group identification. In particular, we propose a specific and shared causal model (SSCM), which takes into account the variabilities of causal relations across individuals/groups and leverages their commonalities to achieve statistically reliable estimation. The learned SSCM gives the specific causal knowledge for each individual as well as the general trend over the population. In addition, the estimated model directly provides the group information of each individual. Experimental results on synthetic and real-world data demonstrate the efficacy of the proposed method.

18 citations


Proceedings Article
24 May 2019
TL;DR: In this paper, a causal discovery and forecasting method for non-stationary time series is proposed, where changes in both causal strengths and noise variances in the nonlinear state-space models render both the causal structure and model parameters identifiable.
Abstract: In many scientific fields, such as economics and neuroscience, we are often faced with nonstationary time series, and concerned with both finding causal relations and forecasting the values of variables of interest, both of which are particularly challenging in such nonstationary environments. In this paper, we study causal discovery and forecasting for nonstationary time series. By exploiting a particular type of state-space model to represent the processes, we show that nonstationarity helps to identify causal structure and that forecasting naturally benefits from learned causal knowledge. Specifically, we allow changes in both causal strengths and noise variances in the nonlinear state-space models, which, interestingly, renders both the causal structure and model parameters identifiable. Given the causal model, we treat forecasting as a problem in Bayesian inference in the causal model, which exploits the timevarying property of the data and adapts to new observations in a principled manner. Experimental results on synthetic and real-world data sets demonstrate the efficacy of the proposed methods.

18 citations


Book ChapterDOI
TL;DR: A two-step method for large-scale and cyclic causal discovery from fMRI that can identify brain causal structures without doing interventional experiments is proposed and shows that with causal connectivities, the diagnostic accuracy largely improves.
Abstract: Current clinical diagnoses of autism spectrum disorder (ASD) mainly rely on descriptions and observations of behavior. However, behavior assessment has limitations in the aspect that it is subject to human judgment highly depending on doctors’ experiences, and it is unable to find out the underlying biological bases. Recently, with the hypothesis that ASD is associated with atypical brain connectivities, several studies investigate the differences in brain connectivities between ASD and typical controls from functional magnetic resonance imaging (fMRI). A substantial body of researches use Pearson’s correlation coefficients, mutual information, or partial correlation to estimate statistical dependencies between brain areas from neuroimaging data. However, correlation or partial correlation does not directly reveal causal influences between brain regions. Compared to correlation, causality pinpoints the key connectivity characteristics and removes redundant features for diagnosis. In this chapter, we propose a two-step method for large-scale and cyclic causal discovery from fMRI data. It can identify brain causal structures without doing interventional experiments. The learned causal structure, as well as the causal influence strength, provides us the path and effectiveness of information flow, that is, whether region A directly influences region B, and if there is a direct influence, how strong it is. We apply our methods to three datasets from Autism Brain Imaging Data Exchange. With causal connectivities, the diagnostic accuracy largely improves, compared with that using correlation or partial correlation as measures. A closer examination shows that information flows starting from the superior frontal gyrus to other areas are largely reduced; the target areas are mostly concentrated in the default mode network and the posterior part. Moreover, all enhanced information flows are from the posterior to the anterior or in local areas. Overall, it shows that long-range influences have a larger proportion of reductions than local ones, while local influences have a larger proportion of increases than long-range ones. By examining the graph properties of brain causal structure, the group of ASD shows reduced small-worldness.

16 citations


Posted Content
TL;DR: A framework for causal discovery from heterogeneous/NOnstationary Data to find causal skeleton and directions and estimate the properties of mechanism changes, and finds that data heterogeneity benefits causal structure identification even with particular types of confounders.
Abstract: It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods.

11 citations


Journal ArticleDOI
18 Jan 2019
TL;DR: A central theme in western philosophy was to find formal methods that can reliably discover empirical relationships and their explanations from data assembled from experience, but to show that the model/simulation strategy is not confined to causal inference, two classification problems from astrophysics are considered: identifying exoplanets and identifying dark matter concentrations.
Abstract: Abstract A central theme in western philosophy was to find formal methods that can reliably discover empirical relationships and their explanations from data assembled from experience. As a philosophical project, that ambition was abandoned in the 20th century and generally dismissed as impossible. It was replaced in philosophy by neo-Kantian efforts at reconstruction and justification, and in professional statistics by the more limited ambition to estimate a small number of parameters in pre-specified hypotheses. The influx of “big data” from climate science, neuropsychology, biology, astronomy and elsewhere implicitly called for a revival of the grander philosophical ambition. Search algorithms are meeting that call, but they pose a problem: how are their accuracies to be assessed in domains where experimentation is limited or impossible? Increasingly, the answer is through simulation of data from models of the kind of process in the domain. In some cases, these innovations require rethinking how the accuracy and informativeness of inference methods can be assessed. Focusing on causal inference, we give an example from neuroscience, but to show that the model/simulation strategy is not confined to causal inference, we also consider two classification problems from astrophysics: identifying exoplanets and identifying dark matter concentrations.

Posted Content
05 Mar 2019
TL;DR: A principled framework for causal discovery from NOnstationary/heterogeneous Data (CD-NOD) is developed, which addresses two important questions: an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables and the tight connection between nonstationarity/heterogeneity and soft intervention in causal discovery.
Abstract: It is commonplace to encounter nonstationary or heterogeneous data. Such a distribution shift feature presents both challenges and opportunities for causal discovery, of which the underlying generating process changes over time or across domains. In this paper, we develop a principled framework for causal discovery from such data, called Constraint-based causal Discovery from NOnstationary/heterogeneous Data (CD-NOD), which addresses two important questions. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a way to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the `driving force' of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional and interpretable representation of changes. The proposed methods are totally nonparametric, with no restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that nonstationarity benefits causal structure identification with particular types of confounders. Finally, we show the tight connection between nonstationarity/heterogeneity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock data) are presented to demonstrate the efficacy of the proposed methods.

Posted Content
TL;DR: Two recently published graphical search methods are used for identification of subregions of voxels driving the connectivity between regions of interest, recovering valuable anatomical and functional information that is lost when ROIs are aggregated.
Abstract: Standard fMRI connectivity analyses depend on aggregating the time series of individual voxels within regions of interest (ROIs). In certain cases, this spatial aggregation implies a loss of valuable functional and anatomical information about smaller subsets of voxels that drive the ROI level connectivity. We use two recently published graphical search methods to identify subsets of voxels that are highly responsible for the connectivity between larger ROIs. To illustrate the procedure, we apply both methods to longitudinal high-resolution resting state fMRI data from regions in the medial temporal lobe from a single individual. Both methods recovered similar subsets of voxels within larger ROIs of entorhinal cortex and hippocampus subfields that also show spatial consistency across different scanning sessions and across hemispheres. In contrast to standard functional connectivity methods, both algorithms applied here are robust against false positive connections produced by common causes and indirect paths (in contrast to Pearson's correlation) and common effect conditioning (in contrast to partial correlation based approaches). These algorithms allow for identification of subregions of voxels driving the connectivity between regions of interest, recovering valuable anatomical and functional information that is lost when ROIs are aggregated. Both methods are specially suited for voxelwise connectivity research, given their running times and scalability to big data problems.

Posted Content
TL;DR: In this paper, a constraint-based causal discovery from heterogeneous/NOnstationary data (CD-NOD) is proposed to find causal skeleton and directions and estimate the properties of mechanism changes.
Abstract: It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time. Such a distribution shift feature presents both challenges and opportunities for causal discovery. In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes. First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables. Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions. After learning the causal structure, next, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism. That is, we aim to extract from data a low-dimensional representation of changes. The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation. Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders. Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery. Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods.