Home
/
Authors
/
Weijie J. Su

Author

Weijie J. Su

Other affiliations: Peking University, Stanford University

Bio: Weijie J. Su is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Artificial neural network & Differential privacy. The author has an hindex of 20, co-authored 83 publications receiving 2434 citations. Previous affiliations of Weijie J. Su include Peking University & Stanford University.

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2011

Papers

PDF

Open Access

More filters

Journal Article•

A differential equation for modeling Nesterov's accelerated gradient method: theory and insights

[...]

Weijie J. Su¹, Stephen Boyd², Emmanuel J. Candès²•Institutions (2)

University of Pennsylvania¹, Stanford University²

01 Jan 2016-Journal of Machine Learning Research

TL;DR: A second-order ordinary differential equation is derived, which is the limit of Nesterov's accelerated gradient method, and it is shown that the continuous time ODE allows for a better understanding of Nestersov's scheme.

...read moreread less

Abstract: We derive a second-order ordinary differential equation (ODE) which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.

...read moreread less

949 citations

Journal Article•DOI•

SLOPE { Adaptive Variable Selection via Convex Optimization

[...]

Małgorzata Bogdan¹, Ewout van den Berg², Chiara Sabatti³, Weijie J. Su³, Emmanuel J. Candès³ - Show less +1 more•Institutions (3)

Wrocław University of Technology¹, IBM², Stanford University³

01 Jan 2015-The Annals of Applied Statistics

TL;DR: SLOPE as mentioned in this paper is the solution to the sorted L-one penalized estimator, where the regularizer is a sorted l1 norm, which penalizes the regression coefficients according to their rank: the higher the rank, stronger the signal, the larger the penalty.

...read moreread less

Abstract: We introduce a new estimator for the vector of coefficients β in the linear model y = Xβ + z, where X has dimensions n × p with p possibly larger than n. SLOPE, short for Sorted L-One Penalized Estimation, is the solution to [Formula: see text]where λ1 ≥ λ2 ≥ … ≥ λ p ≥ 0 and [Formula: see text] are the decreasing absolute values of the entries of b. This is a convex program and we demonstrate a solution algorithm whose computational complexity is roughly comparable to that of classical l1 procedures such as the Lasso. Here, the regularizer is a sorted l1 norm, which penalizes the regression coefficients according to their rank: the higher the rank-that is, stronger the signal-the larger the penalty. This is similar to the Benjamini and Hochberg [J. Roy. Statist. Soc. Ser. B57 (1995) 289-300] procedure (BH) which compares more significant p-values with more stringent thresholds. One notable choice of the sequence {λ i } is given by the BH critical values [Formula: see text], where q ∈ (0, 1) and z(α) is the quantile of a standard normal distribution. SLOPE aims to provide finite sample guarantees on the selected model; of special interest is the false discovery rate (FDR), defined as the expected proportion of irrelevant regressors among all selected predictors. Under orthogonal designs, SLOPE with λBH provably controls FDR at level q. Moreover, it also appears to have appreciable inferential properties under more general designs X while having substantial power, as demonstrated in a series of experiments running on both simulated and real data.

...read moreread less

303 citations

Proceedings Article•

A Differential Equation for Modeling Nesterovâ€™s Accelerated Gradient Method: Theory and Insights

[...]

Weijie J. Su¹, Stephen Boyd¹, Emmanuel J. Candès¹•Institutions (1)

Stanford University¹

08 Dec 2014

TL;DR: In this paper, the authors derive a second-order ordinary differential equation (ODE), which is the limit of Nesterov's accelerated gradient method, which can serve as a tool for analysis.

...read moreread less

Abstract: We derive a second-order ordinary differential equation (ODE), which is the limit of Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to Nesterov's scheme and thus can serve as a tool for analysis. We show that the continuous time ODE allows for a better understanding of Nesterov's scheme. As a byproduct, we obtain a family of schemes with similar convergence rates. The ODE interpretation also suggests restarting Nesterov's scheme leading to an algorithm, which can be rigorously proven to converge at a linear rate whenever the objective is strongly convex.

...read moreread less

210 citations

Posted Content•

Gaussian Differential Privacy

[...]

Jinshuo Dong¹, Aaron Roth¹, Weijie J. Su¹•Institutions (1)

University of Pennsylvania¹

07 May 2019-arXiv: Learning

TL;DR: The privacy guarantees of any hypothesis testing based definition of privacy (including the original differential privacy definition) converges to GDP in the limit under composition and a Berry–Esseen style version of the central limit theorem is proved, which gives a computationally inexpensive tool for tractably analysing the exact composition of private algorithms.

...read moreread less

Abstract: Differential privacy has seen remarkable success as a rigorous and practical formalization of data privacy in the past decade. This privacy definition and its divergence based relaxations, however, have several acknowledged weaknesses, either in handling composition of private algorithms or in analyzing important primitives like privacy amplification by subsampling. Inspired by the hypothesis testing formulation of privacy, this paper proposes a new relaxation, which we term `$f$-differential privacy' ($f$-DP). This notion of privacy has a number of appealing properties and, in particular, avoids difficulties associated with divergence based relaxations. First, $f$-DP preserves the hypothesis testing interpretation. In addition, $f$-DP allows for lossless reasoning about composition in an algebraic fashion. Moreover, we provide a powerful technique to import existing results proven for original DP to $f$-DP and, as an application, obtain a simple subsampling theorem for $f$-DP. In addition to the above findings, we introduce a canonical single-parameter family of privacy notions within the $f$-DP class that is referred to as `Gaussian differential privacy' (GDP), defined based on testing two shifted Gaussians. GDP is focal among the $f$-DP class because of a central limit theorem we prove. More precisely, the privacy guarantees of \emph{any} hypothesis testing based definition of privacy (including original DP) converges to GDP in the limit under composition. The CLT also yields a computationally inexpensive tool for analyzing the exact composition of private algorithms. Taken together, this collection of attractive properties render $f$-DP a mathematically coherent, analytically tractable, and versatile framework for private data analysis. Finally, we demonstrate the use of the tools we develop by giving an improved privacy analysis of noisy stochastic gradient descent.

...read moreread less

192 citations

Journal Article•DOI•

False Discoveries Occur Early on the Lasso Path

[...]

Weijie J. Su, Małgorzata Bogdan, Emmanuel J. Candès

01 Oct 2017-Annals of Statistics

TL;DR: It is demonstrated that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are.

...read moreread less

Abstract: In regression settings where explanatory variables have very low correlations and there are relatively few effects, each of large magnitude, we expect the Lasso to find the important variables with few errors, if any. This paper shows that in a regime of linear sparsity—meaning that the fraction of variables with a nonvanishing effect tends to a constant, however small—this cannot really be the case, even when the design variables are stochastically independent. We demonstrate that true features and null features are always interspersed on the Lasso path, and that this phenomenon occurs no matter how strong the effect sizes are. We derive a sharp asymptotic trade-off between false and true positive rates or, equivalently, between measures of type I and type II errors along the Lasso path. This trade-off states that if we ever want to achieve a type II error (false negative rate) under a critical value, then anywhere on the Lasso path the type I error (false positive rate) will need to exceed a given threshold so that we can never have both errors at a low level at the same time. Our analysis uses tools from approximate message passing (AMP) theory as well as novel elements to deal with a possibly adaptive selection of the Lasso regularizing parameter.

...read moreread less

181 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Collapse

Cited by

PDF

Open Access

More filters

Convex Analysisの二,三の進展について

[...]

徹丸山

01 Feb 1977

5,933 citations

Book•

Proximal Algorithms

[...]

Neal Parikh¹, Stephen Boyd¹•Institutions (1)

Stanford University¹

27 Nov 2013

TL;DR: The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided.

...read moreread less

Abstract: This monograph is about a class of optimization algorithms called proximal algorithms. Much like Newton's method is a standard tool for solving unconstrained smooth optimization problems of modest size, proximal algorithms can be viewed as an analogous tool for nonsmooth, constrained, large-scale, or distributed versions of these problems. They are very generally applicable, but are especially well-suited to problems of substantial recent interest involving large or high-dimensional datasets. Proximal methods sit at a higher level of abstraction than classical algorithms like Newton's method: the base operation is evaluating the proximal operator of a function, which itself involves solving a small convex optimization problem. These subproblems, which generalize the problem of projecting a point onto a convex set, often admit closed-form solutions or can be solved very quickly with standard or simple specialized methods. Here, we discuss the many different interpretations of proximal operators and algorithms, describe their connections to many other topics in optimization and applied mathematics, survey some popular algorithms, and provide a large number of examples of proximal operators that commonly arise in practice.

...read moreread less

3,627 citations

Journal Article•DOI•

Differential-Difference Equations. By Richard Bellman and Kenneth L. Cooke. Pp. xvi, 465. 1963. (Academic Press)

[...]

G. Temple

01 Oct 1967-The Mathematical Gazette

2,084 citations

Journal Article•DOI•

Proceedings of the National Academy of Sciences

[...]

Omar Bagasra

18 Aug 1998-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: It is shown that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent, which made it possible to formulate a variational principle for the force-free magnetic fields.

...read moreread less

Abstract: where A represents the magnetic vector potential, is an integral of the hydromagnetic equations. This -integral made it possible to formulate a variational principle for the force-free magnetic fields. The integral expresses the fact that motions cannot transform a given field in an entirely arbitrary different field, if the conductivity of the medium isconsidered infinite. In this paper we shall show that the full set of hydromagnetic equations admit five more integrals, besides the energy integral, if dissipative processes are absent. These integrals, as we shall presently verify, are I2 =fbHvdV, (2)

...read moreread less

1,858 citations

Journal Article•DOI•

Introduction to the Modern Theory of Dynamical Systems

[...]

J.A. Zukas

01 Jan 1998-Shock and Vibration

1,620 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse