scispace - formally typeset
X

Xiyu Zhai

Researcher at Massachusetts Institute of Technology

Publications -  9
Citations -  2060

Xiyu Zhai is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Artificial neural network & Convolutional neural network. The author has an hindex of 9, co-authored 9 publications receiving 1766 citations.

Papers
More filters
Posted Content

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

TL;DR: This article showed that gradient descent converges at a global linear rate to the global optimum for two-layer fully connected ReLU activated neural networks, where over-parameterization and random initialization jointly restrict weight vector to be close to its initialization for all iterations.
Proceedings Article

Gradient descent finds global minima of deep neural networks

TL;DR: This paper showed that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet) and further extended their analysis to deep residual convolutional neural networks and obtained a similar convergence result.
Proceedings Article

Gradient Descent Provably Optimizes Over-parameterized Neural Networks

TL;DR: The authors showed that gradient descent converges at a global linear rate to the global optimum for two-layer fully connected ReLU activated neural networks, where over-parameterization and random initialization jointly restrict weight vector to be close to its initialization for all iterations.
Posted Content

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

TL;DR: In this article, the authors studied the generalization error of stochastic gradient Langevin dynamics with non-convex objectives and proposed two theories with nonasymptotic discrete-time analysis, using Stability and PAC-Bayesian results respectively.
Posted Content

How Many Samples are Needed to Learn a Convolutional Neural Network

TL;DR: It is shown that for learning an $m-dimensional convolutional filter with linear activation acting on a $d$-dimensional input, the sample complexity of achieving population prediction error of $\epsilon$ is $\widetilde{O} (m/\Epsilon^2)$, whereas its FNN counterpart needs at least $\Omega(d/\epsil on)$ samples.