scispace - formally typeset
I

Ilya Loshchilov

Researcher at University of Freiburg

Publications -  47
Citations -  17277

Ilya Loshchilov is an academic researcher from University of Freiburg. The author has contributed to research in topics: CMA-ES & Evolution strategy. The author has an hindex of 27, co-authored 47 publications receiving 8257 citations. Previous affiliations of Ilya Loshchilov include French Institute for Research in Computer Science and Automation & École Polytechnique Fédérale de Lausanne.

Papers
More filters
Posted Content

Decoupled Weight Decay Regularization

TL;DR: This work proposes a simple modification to recover the original formulation of weight decay regularization by decoupling the weight decay from the optimization steps taken w.r.t. the loss function, and provides empirical evidence that this modification substantially improves Adam's generalization performance.
Posted Content

SGDR: Stochastic Gradient Descent with Warm Restarts

TL;DR: In this paper, a simple warm restart technique for stochastic gradient descent was proposed to improve its anytime performance when training deep neural networks, which achieved state-of-the-art results on both the CIFAR-10 and CifAR-100 datasets.
Proceedings Article

Decoupled Weight Decay Regularization.

TL;DR: Recently, this paper proposed a decoupled weight decay regularization that decouples the optimal weight decay factor from the setting of the learning rate for both standard SGD and Adam and substantially improves Adam's generalization performance.
Proceedings Article

SGDR: Stochastic Gradient Descent with Warm Restarts

TL;DR: This paper proposes a simple warm restart technique for stochastic gradient descent to improve its anytime performance when training deep neural networks and empirically studies its performance on the CIFAR-10 and CIFARS datasets.
Posted Content

Fixing Weight Decay Regularization in Adam

TL;DR: This work decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam and substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets.