scispace - formally typeset
D

Dan Alistarh

Researcher at Institute of Science and Technology Austria

Publications -  213
Citations -  4887

Dan Alistarh is an academic researcher from Institute of Science and Technology Austria. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 27, co-authored 175 publications receiving 3761 citations. Previous affiliations of Dan Alistarh include ETH Zurich & Microsoft.

Papers
More filters
Posted Content

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

TL;DR: In this article, lock-free concurrent stochastic gradient descent (SGD) was shown to converge faster and with a wider range of parameters than previously known under asynchronous iterations.
Posted Content

WoodFisher: Efficient second-order approximations for model compression.

TL;DR: It is demonstrated that WoodFisher significantly outperforms magnitude pruning (isotropic Hessian), as well as methods that maintain other diagonal estimates, and results in a gain in test accuracy over the state-of-the-art approaches, for standard image classification datasets such as CIFAR-10, ImageNet.
Book ChapterDOI

How to Solve Consensus in the Smallest Window of Synchrony

TL;DR: The first optimally-resilient algorithm ASAP is presented that solves consensus as soon as possible in an eventually synchronous system, i.e., a system that from some time GSTonwards, delivers messages in a timely fashion.
Journal Article

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

TL;DR: A new gradient quantization scheme is proposed that has both stronger theoretical guarantees and empirical performance that matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.
Proceedings Article

Distributed Learning over Unreliable Networks

TL;DR: A novel theoretical analysis proving that distributedlearning over unreliable network can achieve comparable convergence rate to centralized or distributed learning over reliable networks and that the influence of the packet drop rate diminishes with the growth of the number of distributed parameter servers.