scispace - formally typeset
V

Vishrav Chaudhary

Researcher at Facebook

Publications -  60
Citations -  6667

Vishrav Chaudhary is an academic researcher from Facebook. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 19, co-authored 43 publications receiving 2956 citations. Previous affiliations of Vishrav Chaudhary include University of Wolverhampton.

Papers
More filters
Proceedings ArticleDOI

Unsupervised Cross-lingual Representation Learning at Scale

TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Posted Content

Unsupervised Cross-lingual Representation Learning at Scale.

TL;DR: This paper showed that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks and proposed a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data.
Posted Content

Beyond English-Centric Multilingual Machine Translation

TL;DR: This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.
Proceedings Article

CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data

TL;DR: An automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages by following the data processing introduced in fastText, that deduplicates documents and identifies their language.
Posted Content

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

TL;DR: An approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 96 languages, including several dialects or low-resource languages is presented.