Mohammad Shoeybi

Posted Content

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

- 17 Sep 2019 -

TL;DR: A simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters and shows that careful attention to the placement of layer normalization in BERT-like models is critical to achieving increased performance as the model size grows.

...read moreread less

Proceedings Article

Deep Voice: Real-time Neural Text-to-Speech

Sercan O. Arik, +11 more

TL;DR: Deep Voice lays the groundwork for truly end-to-end neural speech synthesis and shows that inference with the system can be performed faster than real time and describes optimized WaveNet inference kernels on both CPU and GPU that achieve up to 400x speedups over existing implementations.

...read moreread less

Journal ArticleDOI

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao, +386 more

- 09 Nov 2022 -

arXiv.org

TL;DR: BLOOM as discussed by the authors is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total).

...read moreread less

Journal Article

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Shaden Smith, +19 more

- 28 Jan 2022 -

arXiv.org

TL;DR: The infrastructure as well as the 3D parallelism methodology used to train the largest monolithic transformer based language model, Megatron-Turing NLG 530B (MT-NLG), with 530 billion parameters is presented.

...read moreread less

Posted Content

Deep Voice: Real-time Neural Text-to-Speech

Sercan O. Arik, +11 more

- 25 Feb 2017 -

arXiv: Computation and Language

TL;DR: Deep Voice as discussed by the authors proposes a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, and an audio synthesis model.

...read moreread less

Papers

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Deep Voice: Real-time Neural Text-to-Speech

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Deep Voice: Real-time Neural Text-to-Speech