Home
/
Authors
/
Xuechen Li

Author

Xuechen Li

Other affiliations: University of Toronto

Bio: Xuechen Li is an academic researcher from Stanford University. The author has contributed to research in topics: Computer science & Stochastic differential equation. The author has an hindex of 11, co-authored 25 publications receiving 1479 citations. Previous affiliations of Xuechen Li include University of Toronto.

Papers

PDF

Open Access

More filters

Proceedings Article•

Isolating Sources of Disentanglement in Variational Autoencoders

[...]

Tian Qi Chen, Xuechen Li¹, Roger Grosse¹, David Duvenaud¹•Institutions (1)

University of Toronto¹

14 Feb 2018

TL;DR: In this paper, the authors decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables, and use this to motivate the beta-TCVAE (Total Correlation Variational Autoencoder) algorithm.

...read moreread less

Abstract: We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables. We use this to motivate the beta-TCVAE (Total Correlation Variational Autoencoder) algorithm, a refinement and plug-in replacement of the beta-VAE for learning disentangled representations, requiring no additional hyperparameters during training. We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG). We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the model is trained using our framework.

...read moreread less

541 citations

Posted Content•

Isolating Sources of Disentanglement in Variational Autoencoders

[...]

Ricky T. Q. Chen, Xuechen Li, Roger Grosse, David Duvenaud¹•Institutions (1)

University of Toronto¹

14 Feb 2018-arXiv: Learning

TL;DR: In this article, the authors decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables and use this to motivate the Total Correlation Variational Autoencoder (TCVAE), a refinement of the state-of-the-art VAE objective for learning disentangled representations.

...read moreread less

Abstract: We decompose the evidence lower bound to show the existence of a term measuring the total correlation between latent variables We use this to motivate our $\beta$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the state-of-the-art $\beta$-VAE objective for learning disentangled representations, requiring no additional hyperparameters during training We further propose a principled classifier-free measure of disentanglement called the mutual information gap (MIG) We perform extensive quantitative and qualitative experiments, in both restricted and non-restricted settings, and show a strong relation between total correlation and disentanglement, when the latent variables model is trained using our framework

...read moreread less

409 citations

Journal Article•DOI•

Holistic Evaluation of Language Models

[...]

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher R'e, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Byron Rogers, Mirac M. Suzgun, Nathan S. Kim, Neel Guha, Niladri S. Chatterji, Peter Henderson, Qian Huang, Ryan Chi, Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda - Show less +45 more

16 Nov 2022-Annals of the New York Academy of Sciences

TL;DR: The Holistic Evaluation of Language Models (HELM) as mentioned in this paper ) is a popular benchmark for language models, with 30 models evaluated on 16 core scenarios and 7 metrics, exposing important trade-offs.

...read moreread less

Abstract: Language models (LMs) like GPT-3, PaLM, and ChatGPT are the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of LMs. LMs can serve many purposes and their behavior should satisfy many desiderata. To navigate the vast space of potential scenarios and metrics, we taxonomize the space and select representative subsets. We evaluate models on 16 core scenarios and 7 metrics, exposing important trade-offs. We supplement our core evaluation with seven targeted evaluations to deeply analyze specific aspects (including world knowledge, reasoning, regurgitation of copyrighted content, and generation of disinformation). We benchmark 30 LMs, from OpenAI, Microsoft, Google, Meta, Cohere, AI21 Labs, and others. Prior to HELM, models were evaluated on just 17.9% of the core HELM scenarios, with some prominent models not sharing a single scenario in common. We improve this to 96.0%: all 30 models are now benchmarked under the same standardized conditions. Our evaluation surfaces 25 top-level findings. For full transparency, we release all raw model prompts and completions publicly. HELM is a living benchmark for the community, continuously updated with new scenarios, metrics, and models https://crfm.stanford.edu/helm/latest/.

...read moreread less

168 citations

Posted Content•

Inference Suboptimality in Variational Autoencoders

[...]

Chris Cremer, Xuechen Li¹, David Duvenaud¹•Institutions (1)

University of Toronto¹

10 Jan 2018-arXiv: Learning

TL;DR: In this article, the authors examine approximate inference in variational autoencoders and find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution.

...read moreread less

Abstract: Amortized inference allows latent-variable models trained via variational learning to scale to large datasets. The quality of approximate inference is determined by two factors: a) the capacity of the variational distribution to match the true posterior and b) the ability of the recognition network to produce good variational parameters for each datapoint. We examine approximate inference in variational autoencoders in terms of these factors. We find that divergence from the true posterior is often due to imperfect recognition networks, rather than the limited complexity of the approximating distribution. We show that this is due partly to the generator learning to accommodate the choice of approximation. Furthermore, we show that the parameters used to increase the expressiveness of the approximation play a role in generalizing inference rather than simply improving the complexity of the approximation.

...read moreread less

149 citations

Posted Content•

Scalable Gradients for Stochastic Differential Equations

[...]

Xuechen Li¹, Ting-Kam Leonard Wong², Ricky T. Q. Chen², David Duvenaud²•Institutions (2)

Stanford University¹, University of Toronto²

05 Jan 2020-arXiv: Learning

TL;DR: The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations and is generalized to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers.

...read moreread less

Abstract: The adjoint sensitivity method scalably computes gradients of solutions to ordinary differential equations. We generalize this method to stochastic differential equations, allowing time-efficient and constant-memory computation of gradients with high-order adaptive solvers. Specifically, we derive a stochastic differential equation whose solution is the gradient, a memory-efficient algorithm for caching noise, and conditions under which numerical solutions converge. In addition, we combine our method with gradient-based stochastic variational inference for latent stochastic differential equations. We use our method to fit stochastic dynamics defined by neural networks, achieving competitive performance on a 50-dimensional motion capture dataset.

...read moreread less

142 citations

1
2
3
4
…
5
6
7

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Random graphs

[...]

Alan Frieze¹•Institutions (1)

Carnegie Mellon University¹

22 Jan 2006

TL;DR: Some of the major results in random graphs and some of the more challenging open problems are reviewed, including those related to the WWW.

...read moreread less

Abstract: We will review some of the major results in random graphs and some of the more challenging open problems. We will cover algorithmic and structural questions. We will touch on newer models, including those related to the WWW.

...read moreread less

7,116 citations

Book Chapter•DOI•

Stochastic Differential Equations

[...]

Ioannis Karatzas¹, Steven E. Shreve²•Institutions (2)

Columbia University¹, Carnegie Mellon University²

01 Jan 1998

TL;DR: In this paper, the authors explore questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties, using diffusion processes as a model of a Markov process with continuous sample paths.

...read moreread less

Abstract: We explore in this chapter questions of existence and uniqueness for solutions to stochastic differential equations and offer a study of their properties. This endeavor is really a study of diffusion processes. Loosely speaking, the term diffusion is attributed to a Markov process which has continuous sample paths and can be characterized in terms of its infinitesimal generator.

...read moreread less

2,446 citations

Posted Content•

A Style-Based Generator Architecture for Generative Adversarial Networks

[...]

Tero Karras¹, Samuli Laine¹, Timo Aila¹•Institutions (1)

Nvidia¹

12 Dec 2018-arXiv: Neural and Evolutionary Computing

TL;DR: This article proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.

...read moreread less

Abstract: We propose an alternative generator architecture for generative adversarial networks, borrowing from style transfer literature. The new architecture leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images (e.g., freckles, hair), and it enables intuitive, scale-specific control of the synthesis. The new generator improves the state-of-the-art in terms of traditional distribution quality metrics, leads to demonstrably better interpolation properties, and also better disentangles the latent factors of variation. To quantify interpolation quality and disentanglement, we propose two new, automated methods that are applicable to any generator architecture. Finally, we introduce a new, highly varied and high-quality dataset of human faces.

...read moreread less

1,612 citations

Journal Article•

Measuring statistical dependence with Hilbert-Schmidt norms

[...]

Arthur Gretton, Olivier Bousquet, Alexander J. Smola, Bernhard Schölkopf

01 Jan 2005-Lecture Notes in Computer Science

TL;DR: An independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator, or HSIC, is proposed.

...read moreread less

Abstract: We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator (we term this a Hilbert-Schmidt Independence Criterion, or HSIC). This approach has several advantages, compared with previous kernel-based independence criteria. First, the empirical estimate is simpler than any other kernel dependence test, and requires no user-defined regularisation. Second, there is a clearly defined population quantity which the empirical estimate approaches in the large sample limit, with exponential convergence guaranteed between the two: this ensures that independence tests based on HSIC do not suffer from slow learning rates. Finally, we show in the context of independent component analysis (ICA) that the performance of HSIC is competitive with that of previously published kernel-based criteria, and of other recently published ICA methods.

...read moreread less

1,134 citations

贝叶斯滤波与平滑 (Bayesian filtering and smoothing)

[...]

Simo Särkkä

01 Jan 2015

TL;DR: This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework and learns what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages.

...read moreread less

Abstract: Filtering and smoothing methods are used to produce an accurate estimate of the state of a time-varying system based on multiple observational inputs (data). Interest in these methods has exploded in recent years, with numerous applications emerging in fields such as navigation, aerospace engineering, telecommunications, and medicine. This compact, informal introduction for graduate students and advanced undergraduates presents the current state-of-the-art filtering and smoothing methods in a unified Bayesian framework. Readers learn what non-linear Kalman filters and particle filters are, how they are related, and their relative advantages and disadvantages. They also discover how state-of-the-art Bayesian parameter estimation methods can be combined with state-of-the-art filtering and smoothing algorithms. The book’s practical and algorithmic approach assumes only modest mathematical prerequisites. Examples include MATLAB computations, and the numerous end-of-chapter exercises include computational assignments. MATLAB/GNU Octave source code is available for download at www.cambridge.org/sarkka, promoting hands-on work with the methods.

...read moreread less

1,102 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse