Z
Zhao Song
Researcher at Princeton University
Publications - 144
Citations - 4661
Zhao Song is an academic researcher from Princeton University. The author has contributed to research in topics: Computer science & Matrix (mathematics). The author has an hindex of 27, co-authored 124 publications receiving 3106 citations. Previous affiliations of Zhao Song include University of Texas at Austin & Harvard University.
Papers
More filters
Posted Content
A Convergence Theory for Deep Learning via Over-Parameterization
TL;DR: This work proves why stochastic gradient descent can find global minima on the training objective of DNNs in $\textit{polynomial time}$ and implies an equivalence between over-parameterized neural networks and neural tangent kernel (NTK) in the finite (and polynomial) width setting.
Proceedings Article
Towards Fast Computation of Certified Robustness for ReLU Networks
Tsui-Wei Weng,Huan Zhang,Hongge Chen,Zhao Song,Cho-Jui Hsieh,Duane S. Boning,Inderjit S. Dhillon,Luca Daniel +7 more
TL;DR: In this paper, the authors exploit the special structure of ReLU networks and provide two computationally efficient algorithms FastLin and Fast-Lip that are able to certify non-trivial lower bounds of minimum distortions, by bounding the ReLU units with appropriate linear functions Fast-Linear or FastLip.
Proceedings Article
A Convergence Theory for Deep Learning via Over-Parameterization
TL;DR: In this paper, the authors present a new theory to understand the convergence of training DNNs, where they make two assumptions: the inputs do not degenerate and the network is over-parameterized.
Posted Content
Solving Linear Programs in the Current Matrix Multiplication Time
TL;DR: This paper shows how to solve linear programs of the form minAx=b,x≥0 c⊤x with n variables in time O*((nω+n2.5−α/2+ n2+1/6) log(n/δ)) where ω is the exponent of matrix multiplication, α is the dual exponent of Matrix multiplication, and δ is the relative accuracy.
Posted Content
Recovery Guarantees for One-hidden-layer Neural Networks
TL;DR: This work distill some properties of activation functions that lead to local strong convexity in the neighborhood of the ground-truth parameters for the 1NN squared-loss objective, and provides recovery guarantees for 1NNs with both sample complexity and computational complexity $\mathit{linear}$ in the input dimension and $\math it{logarithmic}$in the precision.