scispace - formally typeset
D

Dan Alistarh

Researcher at Institute of Science and Technology Austria

Publications -  213
Citations -  4887

Dan Alistarh is an academic researcher from Institute of Science and Technology Austria. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 27, co-authored 175 publications receiving 3761 citations. Previous affiliations of Dan Alistarh include ETH Zurich & Microsoft.

Papers
More filters
Proceedings Article

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

TL;DR: Quantized SGD (QSGD) as discussed by the authors is a family of compression schemes for gradient updates which provides convergence guarantees for convex and nonconvex objectives, under asynchrony, and can be extended to stochastic variance-reduced techniques.
Posted Content

QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding

TL;DR: Quantized SGD is proposed, a family of compression schemes for gradient updates which provides convergence guarantees and leads to significant reductions in end-to-end training time, and can be extended to stochastic variance-reduced techniques.
Posted Content

Model compression via distillation and quantization

TL;DR: This paper proposed quantized distillation and differentiable quantization to optimize the location of quantization points through stochastic gradient descent to better fit the behavior of the teacher model, and showed that quantized shallow students can reach similar accuracy levels to full-precision teacher models.
Proceedings Article

The Convergence of Sparsified Gradient Methods

TL;DR: The authors showed that sparsifying gradients by magnitude with local error correction provides convergence guarantees, for both convex and non-convex smooth objectives, for data-parallel SGD.
Proceedings Article

ZipML: Training Linear Models with End-to-End Low Precision, and a Little Bit of Deep Learning

TL;DR: The ZipML framework is able to execute training at low precision with no bias, guaranteeing convergence, whereas naive quantization would introduce significant bias, and it enables an FPGA prototype that is up to 6.5× faster than an implementation using full 32-bit precision.