Top 8 papers published by Clark Glymour from Carnegie Mellon University in 2020

Journal Article•

Causal Discovery from Heterogeneous/Nonstationary Data

[...]

Biwei Huang¹, Kun Zhang¹, Jiji Zhang, Joseph D. Ramsey¹, Ruben Sanchez-Romero¹, Clark Glymour¹, Bernhard Schölkopf - Show less +3 more•Institutions (1)

Carnegie Mellon University¹

01 May 2020-Journal of Machine Learning Research

TL;DR: A framework for causal discovery from heterogeneous/NOnstationary Data to find causal skeleton and directions and estimate the properties of mechanism changes, and finds that data heterogeneity benefits causal structure identification even with particular types of confounders.

...read moreread less

Abstract: It is commonplace to encounter heterogeneous or nonstationary data, of which the underlying generating process changes across domains or over time Such a distribution shift feature presents both challenges and opportunities for causal discovery In this paper, we develop a framework for causal discovery from such data, called Constraint-based causal Discovery from heterogeneous/NOnstationary Data (CD-NOD), to find causal skeleton and directions and estimate the properties of mechanism changes First, we propose an enhanced constraint-based procedure to detect variables whose local mechanisms change and recover the skeleton of the causal structure over observed variables Second, we present a method to determine causal orientations by making use of independent changes in the data distribution implied by the underlying causal model, benefiting from information carried by changing distributions After learning the causal structure, next, we investigate how to efficiently estimate the "driving force" of the nonstationarity of a causal mechanism That is, we aim to extract from data a low-dimensional representation of changes The proposed methods are nonparametric, with no hard restrictions on data distributions and causal mechanisms, and do not rely on window segmentation Furthermore, we find that data heterogeneity benefits causal structure identification even with particular types of confounders Finally, we show the connection between heterogeneity/nonstationarity and soft intervention in causal discovery Experimental results on various synthetic and real-world data sets (task-fMRI and stock market data) are presented to demonstrate the efficacy of the proposed methods

...read moreread less

109 citations

Proceedings Article•

Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs

[...]

Feng Xie¹, Ruichu Cai¹, Biwei Huang², Clark Glymour², Zhifeng Hao³, Kun Zhang² - Show less +2 more•Institutions (3)

Guangdong University of Technology¹, Carnegie Mellon University², Foshan University³

01 Jan 2020

TL;DR: This paper considers Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and proposes a Generalized Independent Noise (GIN) condition and shows that GIN helps locate latent variables and identify their causal structure, including causal directions.

...read moreread less

Abstract: Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

...read moreread less

31 citations

Proceedings Article•

Domain Adaptation as a Problem of Inference on Graphical Models

[...]

Kun Zhang¹, Mingming Gong², Petar Stojanov¹, Biwei Huang¹, Qingsong Liu, Clark Glymour¹ - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, University of Melbourne²

01 Jan 2020

TL;DR: This paper uses a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and views domain adaptation as a problem of Bayesian inference on the graphical models.

...read moreread less

Abstract: This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at this https URL .

...read moreread less

25 citations

Journal Article•DOI•

Causal Discovery from Multiple Data Sets with Non-Identical Variable Sets.

[...]

Biwei Huang¹, Kun Zhang¹, Mingming Gong², Clark Glymour¹•Institutions (2)

Carnegie Mellon University¹, University of Melbourne²

03 Apr 2020

TL;DR: This paper proposes a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases, and presents two types of approaches to parameter estimation.

...read moreread less

Abstract: A number of approaches to causal discovery assume that there are no hidden confounders and are designed to learn a fixed causal model from a single data set. Over the last decade, with closer cooperation across laboratories, we are able to accumulate more variables and data for analysis, while each lab may only measure a subset of them, due to technical constraints or to save time and cost. This raises a question of how to handle causal discovery from multiple data sets with non-identical variable sets, and at the same time, it would be interesting to see how more recorded variables can help to mitigate the confounding problem. In this paper, we propose a principled method to uniquely identify causal relationships over the integrated set of variables from multiple data sets, in linear, non-Gaussian cases. The proposed method also allows distribution shifts across data sets. Theoretically, we show that the causal structure over the integrated set of variables is identifiable under testable conditions. Furthermore, we present two types of approaches to parameter estimation: one is based on maximum likelihood, and the other is likelihood free and leverages generative adversarial nets to improve scalability of the estimation procedure. Experimental results on various synthetic and real-world data sets are presented to demonstrate the efficacy of our methods.

...read moreread less

19 citations

Posted Content•

Domain Adaptation As a Problem of Inference on Graphical Models

[...]

Kun Zhang¹, Mingming Gong², Petar Stojanov¹, Biwei Huang¹, Qingsong Liu, Clark Glymour¹ - Show less +2 more•Institutions (2)

Carnegie Mellon University¹, University of Melbourne²

09 Feb 2020-arXiv: Learning

TL;DR: In this article, a graphical model is proposed to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models.

...read moreread less

Abstract: This paper is concerned with data-driven unsupervised domain adaptation, where it is unknown in advance how the joint distribution changes across domains, i.e., what factors or modules of the data distribution remain invariant or change across domains. To develop an automated way of domain adaptation with multiple source domains, we propose to use a graphical model as a compact way to encode the change property of the joint distribution, which can be learned from data, and then view domain adaptation as a problem of Bayesian inference on the graphical models. Such a graphical model distinguishes between constant and varied modules of the distribution and specifies the properties of the changes across domains, which serves as prior knowledge of the changing modules for the purpose of deriving the posterior of the target variable $Y$ in the target domain. This provides an end-to-end framework of domain adaptation, in which additional knowledge about how the joint distribution changes, if available, can be directly incorporated to improve the graphical representation. We discuss how causality-based domain adaptation can be put under this umbrella. Experimental results on both synthetic and real data demonstrate the efficacy of the proposed framework for domain adaptation. The code is available at this https URL .

...read moreread less

14 citations

Posted Content•

Generalized Independent Noise Condition for Estimating Linear Non-Gaussian Latent Variable Graphs.

[...]

Feng Xie, Ruichu Cai, Biwei Huang, Clark Glymour, Zhifeng Hao, Kun Zhang - Show less +2 more

10 Oct 2020-arXiv: Learning

TL;DR: This paper considers Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and proposes a Generalized Independent Noise (GIN) condition and shows that GIN helps locate latent variables and identify their causal structure, including causal directions.

...read moreread less

Abstract: Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

...read moreread less

7 citations

Posted Content•

Generalized Independent Noise Condition for Estimating Latent Variable Causal Graphs

[...]

Feng Xie¹, Ruichu Cai¹, Biwei Huang², Clark Glymour², Zhifeng Hao¹, Kun Zhang² - Show less +2 more•Institutions (2)

Guangdong University of Technology¹, Carnegie Mellon University²

10 Oct 2020-arXiv: Learning

TL;DR: Li et al. as mentioned in this paper proposed a generalized independent noise (GIN) condition to estimate latent variable graphs in non-Gaussian Latent Variable Models (LiNGLaMs) and showed that GIN helps locate latent variables and identify their causal structure.

...read moreread less

Abstract: Causal discovery aims to recover causal structures or models underlying the observed data. Despite its success in certain domains, most existing methods focus on causal relations between observed variables, while in many scenarios the observed ones may not be the underlying causal variables (e.g., image pixels), but are generated by latent causal variables or confounders that are causally related. To this end, in this paper, we consider Linear, Non-Gaussian Latent variable Models (LiNGLaMs), in which latent confounders are also causally related, and propose a Generalized Independent Noise (GIN) condition to estimate such latent variable graphs. Specifically, for two observed random vectors $\mathbf{Y}$ and $\mathbf{Z}$, GIN holds if and only if $\omega^{\intercal}\mathbf{Y}$ and $\mathbf{Z}$ are statistically independent, where $\omega$ is a parameter vector characterized from the cross-covariance between $\mathbf{Y}$ and $\mathbf{Z}$. From the graphical view, roughly speaking, GIN implies that causally earlier latent common causes of variables in $\mathbf{Y}$ d-separate $\mathbf{Y}$ from $\mathbf{Z}$. Interestingly, we find that the independent noise condition, i.e., if there is no confounder, causes are independent from the error of regressing the effect on the causes, can be seen as a special case of GIN. Moreover, we show that GIN helps locate latent variables and identify their causal structure, including causal directions. We further develop a recursive learning algorithm to achieve these goals. Experimental results on synthetic and real-world data demonstrate the effectiveness of our method.

...read moreread less

1 citations

Causal Discovery in the Presence of Missing Values for Neuropathic Pain Diagnosis

[...]

Ruibo Tu, Kun Zhang, Bo Christer Bertilson, Clark Glymour, Hedvig Kjellström, Cheng Zhang - Show less +2 more

19 May 2020

TL;DR: The constraint-based causal discovery method PC is extended to handle binary data sets with missing values for the neuropathic pain diagnosis and identifies the potential errors of simply applying PC to data setsWith missing values.

...read moreread less

Abstract: The missing data issue is a common phenomenon in many applications such as healthcare. When applying causal discovery algorithms, such as PC, to a data set with missing values, not properly handling the missing data issue might introduce bias and lead to wrong causal relations. In this work, we identify the potential errors of simply applying PC to data sets with missing values. Further, we extend the constraint-based causal discovery method PC to handle binary data sets with missing values for the neuropathic pain diagnosis1.

...read moreread less

Showing papers by "Clark Glymour published in 2020"