Home
/
Authors
/
Alex Rodriguez

Author

Alex Rodriguez

International Centre for Theoretical Physics

Other affiliations: University of Barcelona, International School for Advanced Studies, Polytechnic University of Catalonia

Bio: Alex Rodriguez is an academic researcher from International Centre for Theoretical Physics. The author has contributed to research in topics: Intrinsic dimension & Cluster analysis. The author has an hindex of 14, co-authored 32 publications receiving 3018 citations. Previous affiliations of Alex Rodriguez include University of Barcelona & International School for Advanced Studies.

Topics: Intrinsic dimension, Cluster analysis, Energy landscape, Estimator, Medicine ...read more

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Clustering by fast search and find of density peaks

[...]

Alex Rodriguez¹, Alessandro Laio¹•Institutions (1)

International School for Advanced Studies¹

27 Jun 2014-Science

TL;DR: A method in which the cluster centers are recognized as local density maxima that are far away from any points of higher density, and the algorithm depends only on the relative densities rather than their absolute values.

...read moreread less

Abstract: Cluster analysis is aimed at classifying elements into categories on the basis of their similarity. Its applications range from astronomy to bioinformatics, bibliometrics, and pattern recognition. We propose an approach based on the idea that cluster centers are characterized by a higher density than their neighbors and by a relatively large distance from points with higher densities. This idea forms the basis of a clustering procedure in which the number of clusters arises intuitively, outliers are automatically spotted and excluded from the analysis, and clusters are recognized regardless of their shape and of the dimensionality of the space in which they are embedded. We demonstrate the power of the algorithm on several test cases.

...read moreread less

3,441 citations

Journal Article•DOI•

Unsupervised Learning Methods for Molecular Simulation Data.

[...]

Aldo Glielmo¹, Brooke E. Husic², Alex Rodriguez³, Cecilia Clementi², Frank Noé⁴, Frank Noé², Alessandro Laio³, Alessandro Laio¹ - Show less +4 more•Institutions (4)

International School for Advanced Studies¹, Free University of Berlin², International Centre for Theoretical Physics³, Rice University⁴

04 May 2021-Chemical Reviews

TL;DR: This Review provides a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicates likely directions for further developments in the field.

...read moreread less

Abstract: Unsupervised learning is becoming an essential tool to analyze the increasingly large amounts of data produced by atomistic and molecular simulations, in material science, solid state physics, biophysics, and biochemistry. In this Review, we provide a comprehensive overview of the methods of unsupervised learning that have been most commonly used to investigate simulation data and indicate likely directions for further developments in the field. In particular, we discuss feature representation of molecular systems and present state-of-the-art algorithms of dimensionality reduction, density estimation, and clustering, and kinetic models. We divide our discussion into self-contained sections, each discussing a specific method. In each section, we briefly touch upon the mathematical and algorithmic foundations of the method, highlight its strengths and limitations, and describe the specific ways in which it has been used-or can be used-to analyze molecular simulation data.

...read moreread less

144 citations

Journal Article•DOI•

Estimating the intrinsic dimension of datasets by a minimal neighborhood information.

[...]

Elena Facco¹, Maria d'Errico¹, Alex Rodriguez¹, Alessandro Laio¹•Institutions (1)

International School for Advanced Studies¹

22 Sep 2017-Scientific Reports

TL;DR: In this article, the authors propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample, which is theoretically exact in uniformly distributed datasets, and provides consistent measures in general.

...read moreread less

Abstract: Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved; in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in general. When used in combination with block analysis, it allows discriminating the relevant dimensions as a function of the block size. This allows estimating the ID even when the data lie on a manifold perturbed by a high-dimensional noise, a situation often encountered in real world data sets. We demonstrate the usefulness of the approach on molecular simulations and image analysis.

...read moreread less

131 citations

Journal Article•DOI•

Estimating the intrinsic dimension of datasets by a minimal neighborhood information

[...]

Elena Facco¹, Maria d'Errico¹, Alex Rodriguez¹, Alessandro Laio¹•Institutions (1)

International School for Advanced Studies¹

19 Mar 2018-arXiv: Machine Learning

TL;DR: A new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample is proposed, which enables us to reduce the effects of curvature, of density variation, and the resulting computational cost.

...read moreread less

Abstract: Analyzing large volumes of high-dimensional data is an issue of fundamental importance in data science, molecular simulations and beyond. Several approaches work on the assumption that the important content of a dataset belongs to a manifold whose Intrinsic Dimension (ID) is much lower than the crude large number of coordinates. Such manifold is generally twisted and curved, in addition points on it will be non-uniformly distributed: two factors that make the identification of the ID and its exploitation really hard. Here we propose a new ID estimator using only the distance of the first and the second nearest neighbor of each point in the sample. This extreme minimality enables us to reduce the effects of curvature, of density variation, and the resulting computational cost. The ID estimator is theoretically exact in uniformly distributed datasets, and provides consistent measures in general. When used in combination with block analysis, it allows discriminating the relevant dimensions as a function of the block size. This allows estimating the ID even when the data lie on a manifold perturbed by a high-dimensional noise, a situation often encountered in real world data sets. We demonstrate the usefulness of the approach on molecular simulations and image analysis.

...read moreread less

99 citations

Journal Article•DOI•

Computing the Free Energy without Collective Variables

[...]

Alex Rodriguez¹, Maria d'Errico¹, Elena Facco¹, Alessandro Laio¹, Alessandro Laio² - Show less +1 more•Institutions (2)

International School for Advanced Studies¹, International Centre for Theoretical Physics²

05 Feb 2018-Journal of Chemical Theory and Computation

TL;DR: This work introduces an approach for computing the free energy and the probability density in high-dimensional spaces, such as those explored in molecular dynamics simulations of biomolecules, that exploits the presence of correlations between the coordinates induced by the chemical nature of molecules.

...read moreread less

Abstract: We introduce an approach for computing the free energy and the probability density in high-dimensional spaces, such as those explored in molecular dynamics simulations of biomolecules The approach exploits the presence of correlations between the coordinates, induced, in molecular dynamics, by the chemical nature of the molecules Due to these correlations, the data points lay on a manifold that can be highly curved and twisted, but whose dimension is normally small We estimate the free energies by finding, with a statistical test, the largest neighborhood in which the free energy in the embedding manifold can be considered constant Importantly, this procedure does not require defining explicitly the manifold and provides an estimate of the error that is approximately unbiased up to large dimensions We test this approach on artificial and real data sets, demonstrating that the free energy estimates are reliable for data sets on manifolds of dimension up to ∼10, embedded in an arbitrarily large space

...read moreread less

46 citations

1
2
3
4
…
5
6
7
8

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Reversed graph embedding resolves complex single-cell trajectories.

[...]

Xiaojie Qiu¹, Qi Mao, Ying Tang², Li Wang³, Raghav Chawla¹, Hannah A. Pliner¹, Cole Trapnell¹ - Show less +3 more•Institutions (3)

University of Washington¹, Shanghai Jiao Tong University², University of Illinois at Chicago³

21 Aug 2017-Nature Methods

TL;DR: Monocle 2, an algorithm that uses reversed graph embedding to describe multiple fate decisions in a fully unsupervised manner, is applied to two studies of blood development and found that mutations in the genes encoding key lineage transcription factors divert cells to alternative fates.

...read moreread less

Abstract: Single-cell trajectories can unveil how gene regulation governs cell fate decisions. However, learning the structure of complex trajectories with multiple branches remains a challenging computational problem. We present Monocle 2, an algorithm that uses reversed graph embedding to describe multiple fate decisions in a fully unsupervised manner. We applied Monocle 2 to two studies of blood development and found that mutations in the genes encoding key lineage transcription factors divert cells to alternative fates.

...read moreread less

2,257 citations

日本物理学会誌及びJournal of the Physical Society of Japanの月刊について

[...]

正雄小谷

01 Jan 1955

2,246 citations

Reference Entry•DOI•

IEEE Transactions on Pattern Analysis and Machine Intelligence

[...]

King-Sun Fu

15 Oct 2004

2,118 citations

Journal Article•

When is nearest neighbor meaningful

[...]

Kevin S. Beyer, Jonathan Goldstein, Raghu Ramakrishnan, Uri Shaft

01 Jan 1999-Lecture Notes in Computer Science

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

...read moreread less

Abstract: We explore the effect of dimensionality on the nearest neighbor problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!.

...read moreread less

1,992 citations

Journal Article•DOI•

SARS-CoV-2 Receptor ACE2 Is an Interferon-Stimulated Gene in Human Airway Epithelial Cells and Is Detected in Specific Cell Subsets across Tissues.

[...]

Carly G. K. Ziegler, Samuel J. Allon, Sarah K. Nyquist, Ian M. Mbano¹, Vincent N. Miao, Constantine N. Tzouanas, Yuming Cao², Ashraf S. Yousif³, Julia Bals³, Blake M. Hauser⁴, Blake M. Hauser³, Jared Feldman⁴, Jared Feldman³, Christoph Muus⁵, Christoph Muus⁴, Marc H. Wadsworth, Samuel W. Kazer, Travis K. Hughes, Benjamin Doran, G. James Gatter⁵, G. James Gatter⁶, G. James Gatter³, Marko Vukovic, Faith Taliaferro⁵, Faith Taliaferro⁷, Benjamin E. Mead, Zhiru Guo², Jennifer P. Wang², Delphine Gras⁸, Magali Plaisant⁹, Meshal Ansari, Ilias Angelidis, Heiko Adler, Jennifer M.S. Sucre¹⁰, Chase J. Taylor¹⁰, Brian M. Lin⁴, Avinash Waghray⁴, Vanessa Mitsialis⁷, Vanessa Mitsialis¹¹, Daniel F. Dwyer¹¹, Kathleen M. Buchheit¹¹, Joshua A. Boyce¹¹, Nora A. Barrett¹¹, Tanya M. Laidlaw¹¹, Shaina L. Carroll¹², Lucrezia Colonna¹³, Victor Tkachev⁷, Victor Tkachev⁴, Christopher W. Peterson¹⁴, Christopher W. Peterson¹³, Alison Yu¹⁵, Alison Yu⁷, Hengqi Betty Zheng¹⁵, Hengqi Betty Zheng¹³, Hannah P. Gideon¹⁶, Caylin G. Winchell¹⁶, Philana Ling Lin⁷, Philana Ling Lin¹⁶, Colin D. Bingle¹⁷, Scott B. Snapper⁷, Scott B. Snapper¹¹, Jonathan A. Kropski¹⁸, Jonathan A. Kropski¹⁰, Fabian J. Theis, Herbert B. Schiller, Laure-Emmanuelle Zaragosi⁹, Pascal Barbry⁹, Alasdair Leslie¹⁹, Alasdair Leslie¹, Hans-Peter Kiem¹³, Hans-Peter Kiem¹⁴, JoAnne L. Flynn¹⁶, Sarah M. Fortune³, Sarah M. Fortune⁵, Sarah M. Fortune⁴, Bonnie Berger⁶, Robert W. Finberg², Leslie S. Kean⁴, Leslie S. Kean⁷, Manuel Garber², Aaron G. Schmidt³, Aaron G. Schmidt⁴, Daniel Lingwood³, Alex K. Shalek, Jose Ordovas-Montanes, Nicholas E. Banovich, Alvis Brazma, Tushar J. Desai, Thu Elizabeth Duong, Oliver Eickelberg, Christine S. Falk, Michael Farzan²⁰, Ian A. Glass, Muzlifah Haniffa, Peter Horvath, Deborah T. Hung, Naftali Kaminski, Mark A. Krasnow, Malte Kühnemund, Robert Lafyatis, Haeock Lee, Sylvie Leroy, Sten Linnarson, Joakim Lundeberg, Kerstin B. Meyer, Alexander V. Misharin, Martijn C. Nawijn, Marko Nikolic, Dana Pe'er, Joseph E. Powell, Stephen R. Quake, Jay Rajagopal, Purushothama Rao Tata, Emma L. Rawlins, Aviv Regev, Paul A. Reyfman, Mauricio Rojas, Orit Rosen, Kourosh Saeb-Parsy, Christos Samakovlis, Herbert B. Schiller, Joachim L. Schultze, Max A. Seibold, Douglas P. Shepherd, Jason R. Spence, Avrum Spira, Xin Sun, Sarah A. Teichmann, Fabian J. Theis, Alexander M. Tsankov, Maarten van den Berge, Michael von Papen, Jeffrey A. Whitsett, Ramnik J. Xavier, Yan Xu, Kun Zhang - Show less +132 more•Institutions (20)

University of KwaZulu-Natal¹, University of Massachusetts Medical School², Ragon Institute of MGH, MIT and Harvard³, Harvard University⁴, Broad Institute⁵, Massachusetts Institute of Technology⁶, Boston Children's Hospital⁷, Aix-Marseille University⁸, Centre national de la recherche scientifique⁹, Vanderbilt University Medical Center¹⁰, Brigham and Women's Hospital¹¹, University of California, Berkeley¹², University of Washington¹³, Fred Hutchinson Cancer Research Center¹⁴, Seattle Children's¹⁵, University of Pittsburgh¹⁶, University of Sheffield¹⁷, United States Department of Veterans Affairs¹⁸, University College London¹⁹, Scripps Research Institute²⁰

28 May 2020-Cell

TL;DR: The data suggest that SARS-CoV-2 could exploit species-specific interferon-driven upregulation of ACE2, a tissue-protective mediator during lung injury, to enhance infection.

...read moreread less

1,911 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse