scispace - formally typeset
H

Haochuan Li

Researcher at Massachusetts Institute of Technology

Publications -  16
Citations -  794

Haochuan Li is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Computer science & Artificial neural network. The author has an hindex of 4, co-authored 7 publications receiving 742 citations. Previous affiliations of Haochuan Li include Peking University.

Papers
More filters
Proceedings Article

Gradient descent finds global minima of deep neural networks

TL;DR: This paper showed that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet) and further extended their analysis to deep residual convolutional neural networks and obtained a similar convergence result.
Posted Content

Convergence of Adversarial Training in Overparametrized Neural Networks

TL;DR: This paper provides a partial answer to the success of adversarial training, by showing that it converges to a network where the surrogate loss with respect to the the attack algorithm is within $\epsilon$ of the optimal robust loss.
Posted Content

Gradient Descent Finds Global Minima of Deep Neural Networks

TL;DR: This article showed that gradient descent achieves zero training loss in polynomial time for a deep over-parameterized neural network with residual connections (ResNet) and further extended their analysis to deep residual convolutional neural networks and obtained a similar convergence result.
Proceedings Article

Convergence of Adversarial Training in Overparametrized Neural Networks

TL;DR: In this paper, the authors show that adversarial training converges to a network where the surrogate loss with respect to the attack algorithm is within a factor of 1/ε of the optimal robust loss.
Journal ArticleDOI

Variance-reduced Clipping for Non-convex Optimization

TL;DR: In this article , the authors employ a variance reduction technique, namely SPIDER, and demonstrate that for a carefully designed learning rate, this complexity is improved to O(epsilon^{-3})$ which is order-optimal.