Home
/
Authors
/
Wenlong Mou

Author

Wenlong Mou

Bio: Wenlong Mou is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Mathematics & Convex function. The author has an hindex of 13, co-authored 25 publications receiving 413 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Efficient Private ERM for Smooth Objectives

[...]

Jiaqi Zhang¹, Kai Zheng¹, Wenlong Mou², Liwei Wang³•Institutions (3)

Peking University¹, University of California, Berkeley², Shanghai Jiao Tong University³

01 Aug 2017

TL;DR: In this paper, RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm is proposed for non-convex but smooth objectives, which provably converges to a stationary point with privacy guarantee.

...read moreread less

Abstract: In this paper, we consider efficient differentially private empirical risk minimization from the viewpoint of optimization algorithms. For strongly convex and smooth objectives, we prove that gradient descent with output perturbation not only achieves nearly optimal utility, but also significantly improves the running time of previous state-of-the-art private optimization algorithms, for both $\epsilon$-DP and $(\epsilon, \delta)$-DP. For non-convex but smooth objectives, we propose an RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee. Besides the expected utility bounds, we also provide guarantees in high probability form. Experiments demonstrate that our algorithm consistently outperforms existing method in both utility and running time.

...read moreread less

93 citations

Posted Content•

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

[...]

Wenlong Mou¹, Liwei Wang², Xiyu Zhai³, Kai Zheng⁴•Institutions (4)

University of California, Berkeley¹, Shanghai Jiao Tong University², Massachusetts Institute of Technology³, Peking University⁴

19 Jul 2017-arXiv: Learning

TL;DR: In this article, the authors studied the generalization error of stochastic gradient Langevin dynamics with non-convex objectives and proposed two theories with nonasymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively.

...read moreread less

Abstract: Algorithm-dependent generalization error bounds are central to statistical learning theory. A learning algorithm may use a large hypothesis space, but the limited number of iterations controls its model capacity and generalization error. The impacts of stochastic gradient methods on generalization error for non-convex learning problems not only have important theoretical consequences, but are also critical to generalization errors of deep learning. In this paper, we study the generalization errors of Stochastic Gradient Langevin Dynamics (SGLD) with non-convex objectives. Two theories are proposed with non-asymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively. The stability-based theory obtains a bound of $O\left(\frac{1}{n}L\sqrt{\beta T_k}\right)$, where $L$ is uniform Lipschitz parameter, $\beta$ is inverse temperature, and $T_k$ is aggregated step sizes. For PAC-Bayesian theory, though the bound has a slower $O(1/\sqrt{n})$ rate, the contribution of each step is shown with an exponentially decaying factor by imposing $\ell^2$ regularization, and the uniform Lipschitz constant is also replaced by actual norms of gradients along trajectory. Our bounds have no implicit dependence on dimensions, norms or other capacity measures of parameter, which elegantly characterizes the phenomenon of "Fast Training Guarantees Generalization" in non-convex settings. This is the first algorithm-dependent result with reasonable dependence on aggregated step sizes for non-convex learning, and has important implications to statistical learning aspects of stochastic gradient methods in complicated models such as deep learning.

...read moreread less

60 citations

Proceedings Article•

Differentially Private Clustering in High-Dimensional Euclidean Spaces

[...]

Maria-Florina Balcan¹, Travis Dick¹, Yingyu Liang², Wenlong Mou³, Hongyang Zhang¹ - Show less +1 more•Institutions (3)

Carnegie Mellon University¹, Princeton University², University of California, Berkeley³

17 Jul 2017

TL;DR: This work gives differentially private and efficient algorithms achieving strong guarantees for k-means and k-median clustering when d = Ω(polylog(n), advancing the state-of-the-art result of √ dOPT+ poly(log n, d, k).

...read moreread less

Abstract: We study the problem of clustering sensitive data while preserving the privacy of individuals represented in the dataset, which has broad applications in practical machine learning and data analysis tasks. Although the problem has been widely studied in the context of lowdimensional, discrete spaces, much remains unknown concerning private clustering in highdimensional Euclidean spaces R. In this work, we give differentially private and efficient algorithms achieving strong guarantees for k-means and k-median clustering when d = Ω(polylog(n)). Our algorithm achieves clustering loss at most log(n)OPT+poly(log n, d, k), advancing the state-of-the-art result of √ dOPT+ poly(log n, d, k). We also study the case where the data points are s-sparse and show that the clustering loss can scale logarithmically with d, i.e., log(n)OPT + poly(log n, log d, k, s). Experiments on both synthetic and real datasets verify the effectiveness of the proposed method.

...read moreread less

55 citations

Posted Content•

Efficient Private ERM for Smooth Objectives

[...]

Jiaqi Zhang¹, Kai Zheng¹, Wenlong Mou², Liwei Wang³•Institutions (3)

Peking University¹, University of California, Berkeley², Shanghai Jiao Tong University³

29 Mar 2017-arXiv: Learning

TL;DR: An RRPSGD (Random Round Private Stochastic Gradient Descent) algorithm, which provably converges to a stationary point with privacy guarantee is proposed, which consistently outperforms existing method in both utility and running time.

...read moreread less

48 citations

Posted Content•

Improved Bounds for Discretization of Langevin Diffusions: Near-Optimal Rates without Convexity

[...]

Wenlong Mou, Nicolas Flammarion, Martin J. Wainwright, Peter L. Bartlett

25 Jul 2019-arXiv: Probability

TL;DR: An improved analysis of the Euler-Maruyama discretization of the Langevin diffusion does not require global contractivity, and yields polynomial dependence on the time horizon, and simultaneously improves all those methods based on Dalayan's approach.

...read moreread less

Abstract: We present an improved analysis of the Euler-Maruyama discretization of the Langevin diffusion. Our analysis does not require global contractivity, and yields polynomial dependence on the time horizon. Compared to existing approaches, we make an additional smoothness assumption, and improve the existing rate from $O(\eta)$ to $O(\eta^2)$ in terms of the KL divergence. This result matches the correct order for numerical SDEs, without suffering from exponential time dependence. When applied to algorithms for sampling and learning, this result simultaneously improves all those methods based on Dalayan's approach.

...read moreread less

40 citations

1
2
3
4
…
5
6

Collapse

Cited by

PDF

Open Access

More filters

Book•

Differential Equations and Dynamical Systems

[...]

Lawrence Perko¹•Institutions (1)

Northern Arizona University¹

01 Jan 1991

TL;DR: In this paper, the Third Edition of the Third edition of Linear Systems: Local Theory and Nonlinear Systems: Global Theory (LTLT) is presented, along with an extended version of the second edition.

...read moreread less

Abstract: Series Preface * Preface to the Third Edition * 1 Linear Systems * 2 Nonlinear Systems: Local Theory * 3 Nonlinear Systems: Global Theory * 4 Nonlinear Systems: Bifurcation Theory * References * Index

...read moreread less

1,977 citations

Proceedings Article•

Gradient descent finds global minima of deep neural networks

[...]

Simon S. Du¹, Jason D. Lee², Haochuan Li³, Liwei Wang³, Xiyu Zhai⁴ - Show less +1 more•Institutions (4)

Carnegie Mellon University¹, University of Southern California², Peking University³, Massachusetts Institute of Technology⁴

24 May 2019

TL;DR: This paper showed that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet) and further extended their analysis to deep residual convolutional neural networks and obtained a similar convergence result.

...read moreread less

Abstract: Gradient descent finds a global minimum in training deep neural networks despite the objective function being non-convex. The current paper proves gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet). Our analysis relies on the particular structure of the Gram matrix induced by the neural network architecture. This structure allows us to show the Gram matrix is stable throughout the training process and this stability implies the global optimality of the gradient descent algorithm. We further extend our analysis to deep residual convolutional neural networks and obtain a similar convergence result.

...read moreread less

626 citations

Journal Article•

Weak Convergence and Empirical Processes - A. W. van der Vaart; J. A. Wellner.

[...]

P. Gänßler

01 Jan 1997-Metrika

609 citations

Journal Article•DOI•

High-dimensional Statistics: A Non-asymptotic Viewpoint, Martin J. Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978-1-1084-9802-9

[...]

G. Alastair Young¹•Institutions (1)

Imperial College London¹

01 Apr 2020-International Statistical Review

512 citations

Posted Content•

Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks

[...]

Sanjeev Arora¹, Simon S. Du², Wei Hu¹, Zhiyuan Li¹, Ruosong Wang² - Show less +1 more•Institutions (2)

Princeton University¹, Carnegie Mellon University²

24 Jan 2019-arXiv: Learning

TL;DR: This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: a tighter characterization of training speed, an explanation for why training a neuralNet with random labels leads to slower training, and a data-dependent complexity measure.

...read moreread less

Abstract: Recent works have cast some light on the mystery of why deep nets fit any data and generalize despite being very overparametrized. This paper analyzes training and generalization for a simple 2-layer ReLU net with random initialization, and provides the following improvements over recent works: (i) Using a tighter characterization of training speed than recent papers, an explanation for why training a neural net with random labels leads to slower training, as originally observed in [Zhang et al. ICLR'17]. (ii) Generalization bound independent of network size, using a data-dependent complexity measure. Our measure distinguishes clearly between random labels and true labels on MNIST and CIFAR, as shown by experiments. Moreover, recent papers require sample complexity to increase (slowly) with the size, while our sample complexity is completely independent of the network size. (iii) Learnability of a broad class of smooth functions by 2-layer ReLU nets trained via gradient descent. The key idea is to track dynamics of training and generalization via properties of a related kernel.

...read moreread less

476 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106

Collapse