Vishrav Chaudhary

Researcher at Facebook

Publications - 60

Citations - 6667

Vishrav Chaudhary is an academic researcher from Facebook. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 19, co-authored 43 publications receiving 2956 citations. Previous affiliations of Vishrav Chaudhary include University of Wolverhampton.

Papers

PDF

Open Access

More filters

Proceedings ArticleDOI

Unsupervised Cross-lingual Representation Learning at Scale

Alexis Conneau, +9 more

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.

...read moreread less

Posted Content

Unsupervised Cross-lingual Representation Learning at Scale.

Alexis Conneau, +9 more

- 05 Nov 2019 -

arXiv: Computation and Language

TL;DR: This paper showed that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks and proposed a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data.

...read moreread less

Posted Content

Beyond English-Centric Multilingual Machine Translation

Angela Fan, +16 more

- 21 Oct 2020 -

arXiv: Computation and Language

TL;DR: This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.

...read moreread less

Proceedings Article

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

Guillaume Wenzek, +6 more

TL;DR: An automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages by following the data processing introduced in fastText, that deduplicates documents and identifies their language.

...read moreread less

Posted Content

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

Holger Schwenk, +4 more

- 10 Jul 2019 -

arXiv: Computation and Language

TL;DR: An approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 96 languages, including several dialects or low-resource languages is presented.

...read moreread less

Collapse