scispace - formally typeset
D

Dan Alistarh

Researcher at Institute of Science and Technology Austria

Publications -  213
Citations -  4887

Dan Alistarh is an academic researcher from Institute of Science and Technology Austria. The author has contributed to research in topics: Computer science & Stochastic gradient descent. The author has an hindex of 27, co-authored 175 publications receiving 3761 citations. Previous affiliations of Dan Alistarh include ETH Zurich & Microsoft.

Papers
More filters
Journal ArticleDOI

Lease/Release: Architectural Support for Scaling Contended Data Structures

TL;DR: This article proposes Lease/Release, a simple addition to standard directory-based MESI cache coherence protocols, allowing participants to lease memory, at the granularity of cache lines, by delaying coherence messages for a short, bounded period of time, which can significantly reduce the overheads of contention for both non-blocking and lock-based data structure implementations.
Proceedings ArticleDOI

A Scalable Concurrent Algorithm for Dynamic Connectivity

TL;DR: In this paper, the Euler Tour Tree (ET) data structure is used to obtain the first concurrent generalization of dynamic connectivity, which preserves the time complexity of its sequential counterpart, but is also scalable in practice.
Posted Content

Project CGX: Scalable Deep Learning on Commodity GPUs

TL;DR: In this paper, the authors investigate whether the expensive hardware overprovisioning approach can be supplanted via algorithmic and system design, and propose a framework called CGX, which provides efficient software support for communication compression.
Posted Content

The LevelArray: A Fast, Practical Long-Lived Renaming Algorithm

TL;DR: This paper proves that, in long-lived executions, where processes may register and deregister polynomially many times, the technique guarantees constant steps on average and O (log log n) steps with high probability for registering, unit cost for deregistering, and O(n) steps for collect queries, where n is an upper bound on the number of processes that may be active at any point in time.

L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning

TL;DR: L-GreCo as discussed by the authors is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while satisfying an error constraint, and achieves up to 2.5 times training speedup and up to 5 times compression improvement over efficient implementations of existing approaches, while recovering full accuracy.