Shaden Smith

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

- 09 Nov 2022 -

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Journal Article

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Shaden Smith, +19 more

- 28 Jan 2022 -

arXiv.org

TL;DR: The infrastructure as well as the 3D parallelism methodology used to train the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters is presented.

...read moreread less

Proceedings ArticleDOI

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

Shaden Smith, +3 more

TL;DR: SPLATT as discussed by the authors is a C library with shared-memory parallelism for three-mode tensors that uses a data structure that exploits the sparsity patterns of tensors.

...read moreread less

Proceedings ArticleDOI

Tensor-matrix products with a compressed sparse tensor

Shaden Smith, +1 more

TL;DR: The compressed sparse fiber (CSF) a data structure for sparse tensors along with a novel parallel algorithm for tensor-matrix multiplication is introduced and offers similar operation reductions as existing compressed methods while using only a single tensor structure.

...read moreread less

Proceedings ArticleDOI

ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning

Samyam Rajbhandari, +4 more

TL;DR: The ZeRO-Infinity project as mentioned in this paper leverages GPU, CPU, and NVMe memory to allow for unprecedented model scale on limited resources without requiring model code refactoring.

...read moreread less

Papers

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication

Tensor-matrix products with a compressed sparse tensor

ZeRO-infinity: breaking the GPU memory wall for extreme scale deep learning