Home
/
Authors
/
Diego Colombo

Author

Diego Colombo

Bio: Diego Colombo is an academic researcher from ETH Zurich. The author has contributed to research in topics: Causal inference & Directed acyclic graph. The author has an hindex of 11, co-authored 13 publications receiving 1539 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Causal Inference using Graphical Models with the R Package pcalg

[...]

Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann - Show less +1 more

17 May 2012-Journal of Statistical Software

TL;DR: The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data.

...read moreread less

Abstract: The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data. In this document, we give a brief overview of the methodology, and demonstrate the package’s functionality in both toy examples and applications.

...read moreread less

576 citations

Journal Article•DOI•

Order-independent constraint-based causal structure learning

[...]

Diego Colombo¹, Marloes H. Maathuis¹•Institutions (1)

ETH Zurich¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: In this paper, the first step of the adjacency search of the PC-algorithm is replaced by several modifications that remove part or all of this order-dependence.

...read moreread less

Abstract: We consider constraint-based methods for causal structure learning, such as the PC-, FCI-, RFCI- and CCD- algorithms (Spirtes et al., 1993, 2000; Richardson, 1996; Colombo et al., 2012; Claassen et al., 2013). The first step of all these algorithms consists of the adjacency search of the PC-algorithm. The PC-algorithm is known to be order-dependent, in the sense that the output can depend on the order in which the variables are given. This order-dependence is a minor issue in low-dimensional settings. We show, however, that it can be very pronounced in high-dimensional settings, where it can lead to highly variable results. We propose several modifications of the PC-algorithm (and hence also of the other algorithms) that remove part or all of this order-dependence. All proposed modifications are consistent in high-dimensional settings under the same conditions as their original counterparts. We compare the PC-, FCI-, and RFCI-algorithms and their modifications in simulation studies and on a yeast gene expression data set. We show that our modifications yield similar performance in low-dimensional settings and improved performance in high-dimensional settings. All software is implemented in the R-package pcalg.

...read moreread less

322 citations

Journal Article•DOI•

Learning high-dimensional directed acyclic graphs with latent and selection variables

[...]

Diego Colombo¹, Marloes H. Maathuis¹, Markus Kalisch¹, Thomas S. Richardson²•Institutions (2)

ETH Zurich¹, University of Washington²

29 Apr 2011-arXiv: Methodology

TL;DR: This work proposes the new RFCI algorithm, which is much faster than FCI, and proves consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrates in simulations that the estimation performances of the algorithms are very similar.

...read moreread less

Abstract: We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg.

...read moreread less

277 citations

Journal Article•DOI•

Learning high-dimensional directed acyclic graphs with latent and selection variables

[...]

Diego Colombo¹, Marloes H. Maathuis¹, Markus Kalisch¹, Thomas S. Richardson²•Institutions (2)

ETH Zurich¹, University of Washington²

01 Feb 2012-Annals of Statistics

TL;DR: In this article, the authors consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables.

...read moreread less

259 citations

Journal Article•DOI•

Predicting causal effects in large-scale systems from observational data

[...]

Marloes H. Maathuis¹, Diego Colombo¹, Markus Kalisch¹, Peter Bühlmann¹•Institutions (1)

ETH Zurich¹

01 Apr 2010-Nature Methods

TL;DR: IDA, Lasso and Elastic-net are compared on the five DREAM4 networks of size 10 with multifactorial data as observational data and random guessing on the Hughes et al. data is compared.

...read moreread less

Abstract: Supplementary Figure 1 Comparing IDA, Lasso and Elastic-net on the five DREAM4 networks of size 10 with multifactorial data. Supplementary Table 1 Comparing IDA, Lasso and Elastic-net to random guessing on the Hughes et al. data. Supplementary Table 2 Comparing IDA, Lasso and Elastic-net to random guessing on the five DREAM4 networks of size 10, using the multifactorial data as observational data. Supplementary Methods

...read moreread less

248 citations

Cited by

PDF

Open Access

More filters

Book•

Machine Learning : A Probabilistic Perspective

[...]

Kevin P. Murphy

24 Aug 2012

TL;DR: This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach, and is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

Abstract: Today's Web-enabled deluge of electronic data calls for automated methods of data analysis. Machine learning provides these, developing methods that can automatically detect patterns in data and then use the uncovered patterns to predict future data. This textbook offers a comprehensive and self-contained introduction to the field of machine learning, based on a unified, probabilistic approach. The coverage combines breadth and depth, offering necessary background material on such topics as probability, optimization, and linear algebra as well as discussion of recent developments in the field, including conditional random fields, L1 regularization, and deep learning. The book is written in an informal, accessible style, complete with pseudo-code for the most important algorithms. All topics are copiously illustrated with color images and worked examples drawn from such application domains as biology, text processing, computer vision, and robotics. Rather than providing a cookbook of different heuristic methods, the book stresses a principled model-based approach, often using the language of graphical models to specify models in a concise and intuitive way. Almost all the models described have been implemented in a MATLAB software package--PMTK (probabilistic modeling toolkit)--that is freely available online. The book is suitable for upper-level undergraduates with an introductory-level college math background and beginning graduate students.

...read moreread less

8,059 citations

Journal Article•DOI•

Qgraph: Network visualizations of relationships in psychometric data

[...]

Sacha Epskamp, Angélique O. J. Cramer¹, Lourens J. Waldorp, Verena D. Schmittmann, Denny Borsboom - Show less +1 more•Institutions (1)

VU University Amsterdam¹

24 May 2012-Journal of Statistical Software

TL;DR: The qgraph package for R is presented, which provides an interface to visualize data through network modeling techniques, and is introduced by applying the package functions to data from the NEO-PI-R, a widely used personality questionnaire.

...read moreread less

Abstract: We present the qgraph package for R, which provides an interface to visualize data through network modeling techniques. For instance, a correlation matrix can be represented as a network in which each variable is a node and each correlation an edge; by varying the width of the edges according to the magnitude of the correlation, the structure of the correlation matrix can be visualized. A wide variety of matrices that are used in statistics can be represented in this fashion, for example matrices that contain (implied) covariances, factor loadings, regression parameters and p values. qgraph can also be used as a psychometric tool, as it performs exploratory and confirmatory factor analysis, using sem and lavaan; the output of these packages is automatically visualized in qgraph ,w hich may aid the interpretation of results. In this article, we introduce qgraph by applying the package functions to data from the NEO-PI-R, a widely used personality questionnaire.

...read moreread less

2,338 citations

Journal Article•DOI•

Network Analysis: An Integrative Approach to the Structure of Psychopathology

[...]

Denny Borsboom¹, Angélique O. J. Cramer¹•Institutions (1)

University of Amsterdam¹

28 Mar 2013-Annual Review of Clinical Psychology

TL;DR: An examines methodologies suited to identify such symptom networks and discusses network analysis techniques that may be used to extract clinically and scientifically useful information from such networks (e.g., which symptom is most central in a person's network).

...read moreread less

Abstract: In network approaches to psychopathology, disorders result from the causal interplay between symptoms (e.g., worry → insomnia → fatigue), possibly involving feedback loops (e.g., a person may engage in substance abuse to forget the problems that arose due to substance abuse). The present review examines methodologies suited to identify such symptom networks and discusses network analysis techniques that may be used to extract clinically and scientifically useful information from such networks (e.g., which symptom is most central in a person's network). The authors also show how network analysis techniques may be used to construct simulation models that mimic symptom dynamics. Network approaches naturally explain the limited success of traditional research strategies, which are typically based on the idea that symptoms are manifestations of some common underlying factor, while offering promising methodological alternatives. In addition, these techniques may offer possibilities to guide and evaluate therape...

...read moreread less

1,824 citations

Posted Content•

Estimating Psychological Networks and their Accuracy: A Tutorial Paper

[...]

Sacha Epskamp¹, Denny Borsboom¹, Eiko I. Fried¹•Institutions (1)

University of Amsterdam¹

28 Apr 2016-arXiv: Applications

TL;DR: The current state-of-the-art of network estimation is introduced and a rationale why researchers should investigate the accuracy of psychological networks is provided, and the free R-package bootnet is developed that allows for estimating psychological networks in a generalized framework in addition to the proposed bootstrap methods.

...read moreread less

Abstract: The usage of psychological networks that conceptualize psychological behavior as a complex interplay of psychological and other components has gained increasing popularity in various fields of psychology. While prior publications have tackled the topics of estimating and interpreting such networks, little work has been conducted to check how accurate (i.e., prone to sampling variation) networks are estimated, and how stable (i.e., interpretation remains similar with less observations) inferences from the network structure (such as centrality indices) are. In this tutorial paper, we aim to introduce the reader to this field and tackle the problem of accuracy under sampling variation. We first introduce the current state-of-the-art of network estimation. Second, we provide a rationale why researchers should investigate the accuracy of psychological networks. Third, we describe how bootstrap routines can be used to (A) assess the accuracy of estimated network connections, (B) investigate the stability of centrality indices, and (C) test whether network connections and centrality estimates for different variables differ from each other. We introduce two novel statistical methods: for (B) the correlation stability coefficient, and for (C) the bootstrapped difference test for edge-weights and centrality indices. We conducted and present simulation studies to assess the performance of both methods. Finally, we developed the free R-package bootnet that allows for estimating psychological networks in a generalized framework in addition to the proposed bootstrap methods. We showcase bootnet in a tutorial, accompanied by R syntax, in which we analyze a dataset of 359 women with posttraumatic stress disorder available online.

...read moreread less

606 citations

Journal Article•DOI•

Causal Inference using Graphical Models with the R Package pcalg

[...]

Markus Kalisch, Martin Mächler, Diego Colombo, Marloes H. Maathuis, Peter Bühlmann - Show less +1 more

17 May 2012-Journal of Statistical Software

TL;DR: The pcalg package for R can be used for the following two purposes: Causal structure learning and estimation of causal effects from observational data.

...read moreread less

576 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse