scispace - formally typeset
D

David Grangier

Researcher at Google

Publications -  108
Citations -  17040

David Grangier is an academic researcher from Google. The author has contributed to research in topics: Machine translation & Language model. The author has an hindex of 41, co-authored 103 publications receiving 12411 citations. Previous affiliations of David Grangier include Idiap Research Institute & Facebook.

Papers
More filters
Posted Content

fairseq: A Fast, Extensible Toolkit for Sequence Modeling.

TL;DR: fairseq as discussed by the authors is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks, and supports distributed training across multiple GPUs and machines.
Proceedings Article

Convolutional Sequence to Sequence Learning

TL;DR: The authors introduced an architecture based entirely on convolutional neural networks, where computations over all elements can be fully parallelized during training and optimization is easier since the number of nonlinearities is fixed and independent of the input length.
Proceedings ArticleDOI

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

TL;DR: Fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks and supports distributed training across multiple GPUs and machines.
Proceedings Article

Language modeling with gated convolutional networks

TL;DR: A finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens, is developed and is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.
Posted Content

Convolutional Sequence to Sequence Learning

TL;DR: The authors introduced an architecture based entirely on convolutional neural networks, where computations over all elements can be fully parallelized during training and optimization is easier since the number of nonlinearities is fixed and independent of the input length.