D
Dan Alistarh
Researcher at Institute of Science and Technology Austria
Publications - 213
Citations - 4887
Dan Alistarh is an academic researcher from Institute of Science and Technology Austria. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 27, co-authored 175 publications receiving 3761 citations. Previous affiliations of Dan Alistarh include ETH Zurich & Microsoft.
Papers
More filters
Posted Content
The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
TL;DR: In this article, lock-free concurrent stochastic gradient descent (SGD) was shown to converge faster and with a wider range of parameters than previously known under asynchronous iterations.
Posted Content
WoodFisher: Efficient second-order approximations for model compression.
Sidak Pal Singh,Dan Alistarh +1 more
TL;DR: It is demonstrated that WoodFisher significantly outperforms magnitude pruning (isotropic Hessian), as well as methods that maintain other diagonal estimates, and results in a gain in test accuracy over the state-of-the-art approaches, for standard image classification datasets such as CIFAR-10, ImageNet.
Book ChapterDOI
How to Solve Consensus in the Smallest Window of Synchrony
TL;DR: The first optimally-resilient algorithm ASAP is presented that solves consensus as soon as possible in an eventually synchronous system, i.e., a system that from some time GSTonwards, delivers messages in a timely fashion.
Journal Article
NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization
TL;DR: A new gradient quantization scheme is proposed that has both stronger theoretical guarantees and empirical performance that matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.
Proceedings Article
Distributed Learning over Unreliable Networks
TL;DR: A novel theoretical analysis proving that distributedlearning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks and that the influence of the packet drop rate diminishes with the growth of the number of distributed parameter servers.