scispace - formally typeset
Search or ask a question

Showing papers by "Clark Glymour published in 2020"


Journal Article
TL;DR: A framework for causal discovery from heterogeneous/NOnstationary Data to find causal skeleton and directions and estimate the properties of mechanism changes, and finds that data heterogeneity benefits causal structure identification even with particular types of confounders.
Abstract: It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time Such a distribution shift feature presents both challenges and opportunities for causal discovery In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions After learning the causal structure, next, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism That is, we aim to extract from data a low-dimensional representation of changes The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods

109 citations


Proceedings Article
01 Jan 2020
TL;DR: This paper considers Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and proposes a Generalized Independent Noise (GIN) condition and shows that GIN helps locate latent variables and identify their causal structure, including causal directions.
Abstract: Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

31 citations


Proceedings Article
01 Jan 2020
TL;DR: This paper uses a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and views domain adaptation as a problem of Bayesian inference on the graphical models.
Abstract: This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at this https URL .

25 citations


Journal ArticleDOI
03 Apr 2020
TL;DR: This paper proposes a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases, and presents two types of approaches to parameter estimation.
Abstract: A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a fixed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to technical constraints or to save time and cost. This raises a question of how to handle causal discovery from multiple data sets with non-identical variable sets, and at the same time, it would be interesting to see how more recorded variables can help to mitigate the confounding problem. In this paper, we propose a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases. The proposed method also allows distribution shifts across data sets. Theoretically, we show that the causal structure over the integrated set of variables is identifiable under testable conditions. Furthermore, we present two types of approaches to parameter estimation: one is based on maximum likelihood, and the other is likelihood free and leverages generative adversarial nets to improve scalability of the estimation procedure. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

19 citations


Posted Content
TL;DR: In this article, a graphical model is proposed to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models.
Abstract: This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at this https URL .

14 citations


Posted Content
TL;DR: This paper considers Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and proposes a Generalized Independent Noise (GIN) condition and shows that GIN helps locate latent variables and identify their causal structure, including causal directions.
Abstract: Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

7 citations


Posted Content
TL;DR: Li et al. as mentioned in this paper proposed a generalized independent noise (GIN) condition to estimate latent variable graphs in non-Gaussian Latent Variable Models (LiNGLaMs) and showed that GIN helps locate latent variables and identify their causal structure.
Abstract: Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

1 citations


19 May 2020
TL;DR: The constraint-based causal discovery method PC is extended to handle binary data sets with missing values for the neuropathic pain diagnosis and identifies the potential errors of simply applying PC to data setsWith missing values.
Abstract: The missing data issue is a common phenomenon in many applications such as healthcare. When applying causal discovery algorithms, such as PC, to a data set with missing values, not properly handling the missing data issue might introduce bias and lead to wrong causal relations. In this work, we identify the potential errors of simply applying PC to data sets with missing values. Further, we extend the constraint-based causal discovery method PC to handle binary data sets with missing values for the neuropathic pain diagnosis1.