scispace - formally typeset
P

Pratik Chaudhari

Researcher at University of Pennsylvania

Publications -  84
Citations -  2435

Pratik Chaudhari is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 18, co-authored 65 publications receiving 1730 citations. Previous affiliations of Pratik Chaudhari include Indian Institute of Technology Bombay & Massachusetts Institute of Technology.

Papers
More filters
Posted Content

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

TL;DR: This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape and compares favorably to state-of-the-art techniques in terms of generalization error and training time.
Proceedings Article

A Baseline for Few-Shot Image Classification

TL;DR: This work performs extensive studies on benchmark datasets to propose a metric that quantifies the "hardness" of a few-shot episode and finds that using a large number of meta-training classes results in high few- shot accuracies even for a largeNumber of few-shots classes.
Proceedings ArticleDOI

Stochastic Gradient Descent Performs Variational Inference, Converges to Limit Cycles for Deep Networks

TL;DR: The authors showed that SGD does not converge in the classical sense: the most likely trajectories of SGD for deep networks do not behave like Brownian motion around critical points, instead, they resemble closed loops with deterministic components.
Proceedings Article

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys.

TL;DR: In this article, a local-entropy-based objective function is proposed for training deep neural networks that is motivated by the local geometry of the energy landscape, where the gradient of the local entropy is computed before each update of the weights.
Posted Content

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

TL;DR: This paper showed that the most likely trajectories of SGD for deep networks do not behave like Brownian motion around critical points, instead, they resemble closed loops with deterministic components, and showed that such "out-of-equilibrium" behavior is a consequence of highly nonisotropic gradient noise in SGD.