scispace - formally typeset
D

Dan Alistarh

Researcher at Institute of Science and Technology Austria

Publications -  213
Citations -  4887

Dan Alistarh is an academic researcher from Institute of Science and Technology Austria. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 27, co-authored 175 publications receiving 3761 citations. Previous affiliations of Dan Alistarh include ETH Zurich & Microsoft.

Papers
More filters
Proceedings ArticleDOI

The Complexity of Renaming

TL;DR: An individual lower bound of \Omega( k ) process steps for deterministic renaming into any namespace of size sub-exponential in k is proved, which draws an exponential separation between deterministic and randomized solutions, and implies new tight bounds for Deterministic fetch-and-increment registers, queues and stacks.
Proceedings ArticleDOI

The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models

TL;DR: The Optimal BERT Surgeon (oBERT), ancient and accurate pruning method based on approximate second-order information, is introduced, which is shown to yield state-of-the-art results for compression in both stages of language tasks: pre-training and fine-tuning.
Posted Content

Distributed Learning over Unreliable Networks

TL;DR: In this paper, the authors consider the problem of designing machine learning systems that are tolerant to network unreliability during training and show that the influence of packet drop rate diminishes with the number of black parameter servers.
Book ChapterDOI

Sub-logarithmic test-and-set against aweak adversary

TL;DR: A randomized implementation is given of a test-and-set register with O(log log n) individual step complexity and O(n) total step complexity against an oblivious adversary that shows an exponential complexity improvement over previous solutions designed to work against a strong adversary.
Proceedings ArticleDOI

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

TL;DR: In this article, lock-free concurrent stochastic gradient descent (SGD) was shown to converge faster and with a wider range of parameters than previously known under asynchronous iterations, while exhibiting a fundamental trade-off between the maximum delay in the system and the rate at which SGD can converge.