P
Pratik Chaudhari
Researcher at University of Pennsylvania
Publications - 84
Citations - 2435
Pratik Chaudhari is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 18, co-authored 65 publications receiving 1730 citations. Previous affiliations of Pratik Chaudhari include Indian Institute of Technology Bombay & Massachusetts Institute of Technology.
Papers
More filters
Posted Content
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari,Pratik Chaudhari,Anna Choromanska,Stefano Soatto,Yann LeCun,Yann LeCun,Carlo Baldassi,Carlo Baldassi,Christian Borgs,Jennifer Chayes,Levent Sagun,Riccardo Zecchina,Riccardo Zecchina +12 more
TL;DR: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape and compares favorably to state-of-the-art techniques in terms of generalization error and training time.
Proceedings Article
A Baseline for Few-Shot Image Classification
TL;DR: This work performs extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode and finds that using a large number of meta-training classes results in high few- shot accuracies even for a largeNumber of few-shots classes.
Proceedings ArticleDOI
Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks
Pratik Chaudhari,Stefano Soatto +1 more
TL;DR: The authors showed that SGD does not converge in the classical sense: the most likely trajectories of SGD for deep networks do not behave like Brownian motion around critical points, instead, they resemble closed loops with deterministic components.
Proceedings Article
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys.
Pratik Chaudhari,Anna Choromanska,Stefano Soatto,Yann LeCun,Carlo Baldassi,Christian Borgs,Jennifer Chayes,Levent Sagun,Riccardo Zecchina +8 more
TL;DR: In this article, a local-entropy-based objective function is proposed for training deep neural networks that is motivated by the local geometry of the energy landscape, where the gradient of the local entropy is computed before each update of the weights.
Posted Content
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks
Pratik Chaudhari,Stefano Soatto +1 more
TL;DR: This paper showed that the most likely trajectories of SGD for deep networks do not behave like Brownian motion around critical points, instead, they resemble closed loops with deterministic components, and showed that such "out-of-equilibrium" behavior is a consequence of highly nonisotropic gradient noise in SGD.