E
Elahe Kalbassi
Publications - 4
Citations - 198
Elahe Kalbassi is an academic researcher. The author has contributed to research in topics: Computer science. The author has an hindex of 1, co-authored 4 publications receiving 198 citations.
Papers
More filters
Journal ArticleDOI
No Language Left Behind: Scaling Human-Centered Machine Translation
Nllb team,Marta R. Costa-jussà,James Cross,Onur cCelebi,Maha Elbayad,Kenneth Heafield,Kevin Heffernan,Elahe Kalbassi,Janice Si-Man Lam,Daniel Licht,Jean Maillard,Anna Sun,Skyler Wang,Guillaume Wenzek,Alison Youngblood,Bapi Akula,Loïc Barrault,Gabriel Mejia Gonzalez,Prangthip Hansanti,John Hoffman,Semarley Jarrett,Kaushik Ram Sadagopan,Dirk Rowe,Shannon Spruit,Chau Tran,Pierre Andrews,Necip Fazil Ayan,Shruti Bhosale,Sergey Edunov,Angela Fan,Cynthia Gao,Vedanuj Goswami,Francisco Guzm'an,Philipp Koehn,Alexandre Mourachko,Christophe Ropers,Safiyyah Saleem,Holger Schwenk,Jeff Wang +38 more
TL;DR: A conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages is developed, laying important groundwork towards realizing a universal translation system.
Journal ArticleDOI
HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation
David Dale,Elena Voita,Janice Si-Man Lam,Prangthip Hansanti,Christophe Ropers,Elahe Kalbassi,Cynthia Gao,Loïc Barrault,Marta R. Costa-jussà +8 more
TL;DR: The authors released an annotated dataset for the hallucination and omission phenomena covering 18 translation directions with varying resource levels and scripts, covering different levels of partial and full hallucinations as well as omissions both at the sentence and at the word level.
Proceedings Article
Small Data, Big Impact: Leveraging Minimal Data for Effective Machine Translation
Jean Maillard,Cynthia Gao,Elahe Kalbassi,Kaushik Ram Sadagopan,Vedanuj Goswami,Philipp Koehn,Angela Fan,Francisco Guzmn +7 more
TL;DR: The authors describe a broad data collection effort involving around 6k professionally translated sentence pairs for each of 39 low-resource languages, which they make publicly available, and analyse the gains of models trained on this small but high-quality data, showing that it has significant impact even when larger but lower quality pre-existing corpora are used, or when data is augmented with millions of sentences through backtranslation.
Journal ArticleDOI
Multilingual Holistic Bias: Extending Descriptors and Patterns to Unveil Demographic Biases in Languages at Scale
Marta R. Costa-jussà,Pierre Andrews,Eric A. Smith,Prangthip Hansanti,Christophe Ropers,Elahe Kalbassi,Cynthia Gao,Daniel Licht +7 more
TL;DR: This article introduced a multilingual extension of HOLISTICBIAS dataset, the largest English template-based taxonomy of textual people references, which consists of 20,459 sentences in 50 languages distributed across all 13 demographic axes.