scispace - formally typeset
N

Nikoli Dryden

Researcher at ETH Zurich

Publications -  37
Citations -  810

Nikoli Dryden is an academic researcher from ETH Zurich. The author has contributed to research in topics: Deep learning & Computer science. The author has an hindex of 11, co-authored 33 publications receiving 498 citations. Previous affiliations of Nikoli Dryden include University of Texas at Austin & Lawrence Livermore National Laboratory.

Papers
More filters
Proceedings ArticleDOI

Communication quantization for data-parallel training of deep neural networks

TL;DR: This work port two existing quantization approaches, one-bit and threshold, and develop their own adaptive quantization algorithm, which is comparable or superior for large layers without sacrificing accuracy and achieves near-linear speedup in data-parallel training.
Proceedings ArticleDOI

Gluon: a communication-optimizing substrate for distributed heterogeneous graph analytics

TL;DR: This paper introduces a new approach to building distributed-memory graph analytics systems that exploits heterogeneity in processor types (CPU and GPU), partitioning policies, and programming models, and Gluon, a communication-optimizing substrate that enables these programs to run on heterogeneous clusters and optimizes communication in a novel way.
Journal ArticleDOI

Deep learning for post-processing ensemble weather forecasts

TL;DR: A mixed model that uses only a subset of the original weather trajectories combined with a post-processing step using deep neural networks enables the model to account for non-linear relationships that are not captured by current numerical models or post- processing methods.
Posted Content

Data Movement Is All You Need: A Case Study on Optimizing Transformers

TL;DR: This work finds that data movement is the key bottleneck when training, and presents a recipe for globally optimizing data movement in transformers to achieve a 1.30x performance improvement over state-of-the-art frameworks when training BERT.
Proceedings ArticleDOI

Channel and filter parallelism for large-scale CNN training

TL;DR: This work introduces three algorithms that partition channel or filter data to exploit parallelism beyond the sample dimension, and partition the parameters of convolutional layers, replacing global all reduces with segmented allreduces among disjoint processor sets.