Guillaume Wenzek

Researcher at Facebook

Publications - 22

Citations - 5252

Guillaume Wenzek is an academic researcher from Facebook. The author has contributed to research in topics: Computer science & Machine translation. The author has an hindex of 10, co-authored 18 publications receiving 2192 citations.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, +9 more

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

...read moreread less

Posted Content

Unsupervised Cross-lingual Representation Learning at Scale.

Alexis Conneau, +9 more

- 05 Nov 2019 -

arXiv: Computation and Language

TL;DR: This paper showed that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks and proposed a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data.

...read moreread less

Posted Content

Beyond English-Centric Multilingual Machine Translation

Angela Fan, +16 more

- 21 Oct 2020 -

arXiv: Computation and Language

TL;DR: This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.

...read moreread less

Proceedings Article

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Guillaume Wenzek, +6 more

TL;DR: An automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages by following the data processing introduced in fastText, that deduplicates documents and identifies their language.

...read moreread less

Journal ArticleDOI

No Language Left Behind: Scaling Human-Centered Machine Translation

Nllb team, +38 more

- 11 Jul 2022 -

arXiv.org

TL;DR: A conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages is developed, laying important groundwork towards realizing a universal translation system.

...read moreread less