V
Vishrav Chaudhary
Researcher at Facebook
Publications - 60
Citations - 6667
Vishrav Chaudhary is an academic researcher from Facebook. The author has contributed to research in topics: Machine translation & Computer science. The author has an hindex of 19, co-authored 43 publications receiving 2956 citations. Previous affiliations of Vishrav Chaudhary include University of Wolverhampton.
Papers
More filters
Proceedings ArticleDOI
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau,Kartikay Khandelwal,Naman Goyal,Vishrav Chaudhary,Guillaume Wenzek,Francisco Guzmán,Edouard Grave,Myle Ott,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: It is shown that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks, and the possibility of multilingual modeling without sacrificing per-language performance is shown for the first time.
Posted Content
Unsupervised Cross-lingual Representation Learning at Scale.
Alexis Conneau,Kartikay Khandelwal,Naman Goyal,Vishrav Chaudhary,Guillaume Wenzek,Francisco Guzmán,Edouard Grave,Myle Ott,Luke Zettlemoyer,Veselin Stoyanov +9 more
TL;DR: This paper showed that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks and proposed a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data.
Posted Content
Beyond English-Centric Multilingual Machine Translation
Angela Fan,Shruti Bhosale,Holger Schwenk,Zhiyi Ma,Ahmed El-Kishky,Siddharth Goyal,Mandeep Baines,Onur Celebi,Guillaume Wenzek,Vishrav Chaudhary,Naman Goyal,Tom Birch,Vitaliy Liptchinsky,Sergey Edunov,Edouard Grave,Michael Auli,Armand Joulin +16 more
TL;DR: This work creates a true Many-to-Many multilingual translation model that can translate directly between any pair of 100 languages and explores how to effectively increase model capacity through a combination of dense scaling and language-specific sparse parameters to create high quality models.
Proceedings Article
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data
Guillaume Wenzek,Marie-Anne Lachaux,Alexis Conneau,Vishrav Chaudhary,Francisco Guzmán,Armand Joulin,Edouard Grave +6 more
TL;DR: An automatic pipeline to extract massive high-quality monolingual datasets from Common Crawl for a variety of languages by following the data processing introduced in fastText, that deduplicates documents and identifies their language.
Posted Content
WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia
TL;DR: An approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 96 languages, including several dialects or low-resource languages is presented.