A
Allahsera Auguste Tapo
Researcher at Rochester Institute of Technology
Publications - 8
Citations - 101
Allahsera Auguste Tapo is an academic researcher from Rochester Institute of Technology. The author has contributed to research in topics: Computer science & Machine translation. The author has an hindex of 3, co-authored 5 publications receiving 26 citations.
Papers
More filters
Proceedings ArticleDOI
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani,Jesujoba O. Alabi,Angela Fan,Julia Kreutzer,Xiaoyu Shen,Machel Reid,Dana Ruiter,Dietrich Klakow,Peter Nabende,Ernie Chang,Tajuddeen R. Gwadabe,Freshia Sackey,Bonaventure F. P. Dossou,Chris Chinenye Emezue,Colin D. Leong,Michael Beukman,Shamsuddeen Hassan Muhammad,Guyo Dub Jarso,Oreen Yousuf,Rubungo Andre Niyongabo,Gilles Hacheme,Eric Peter Wairagala,Muhammad Umair Nasir,Benjamin Ayoade Ajibade,Tunde Ajayi,Yvonne Wambui Gitau,Jade Abbott,Mohamed Ahmed,Millicent A. Ochieng,Anuoluwapo Aremu,Perez Ogayo,Jonathan Mukiibi,Fatoumata Kabore,Godson Kalipe,Derguene Mbaye,Allahsera Auguste Tapo,V. M. Koagne,Edwin Munkoh-Buabeng,Valencia K. Wagner,Idris Abdulmumin,Ayodele Awokoya,Happy Buzaaba,Blessing Sibanda,Andiswa Bukula,Sam Manthalu +44 more
TL;DR: It is demonstrated that the most effective strategy for transferring both additional languages and additional domains is to leverage small quantities of high-quality translation data to fine-tune large pre-trained models.
Posted Content
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Isaac Caswell,Julia Kreutzer,Lisa Wang,Ahsan Wahab,Daan van Esch,Nasanbayar Ulzii-Orshikh,Allahsera Auguste Tapo,Nishant Subramani,Artem Sokolov,Claytone Sikasote,Monang Setyawan,Supheakmungkol Sarin,Sokhar Samb,Benoît Sagot,Clara E. Rivera,Annette Rios,Isabel Papadimitriou,Salomey Osei,Pedro Javier Ortiz Suárez,Iroro Orife,Kelechi Ogueji,Rubungo Andre Niyongabo,Toan Q. Nguyen,Mathias Müller,André Müller,Shamsuddeen Hassan Muhammad,Nanda Muhammad,Ayanda Mnyakeni,Jamshidbek Mirzakhalov,Tapiwanashe Matangira,Colin Leong,Nze Lawson,Sneha Kudugunta,Yacine Jernite,Mathias Jenny,Orhan Firat,Bonaventure F. P. Dossou,Sakhile Dlamini,Nisansa de Silva,Sakine Çabuk Ballı,Stella Biderman,Alessia Battisti,Ahmed Baruwa,Ankur Bapna,Pallavi Baljekar,Israel Abebe Azime,Ayodele Awokoya,Duygu Ataman,Orevaoghene Ahia,Oghenefego Ahia,Sweta Agrawal,Mofetoluwa Adeyemi +51 more
TL;DR: In this paper, the authors manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4) and audit the correctness of language codes in a sixth (JW300).
Proceedings ArticleDOI
MasakhaNER 2.0: Africa-centric Transfer Learning for Named Entity Recognition
David Ifeoluwa Adelani,Graham Neubig,Sebastian Ruder,Shruti Rijhwani,Michael Beukman,Chester Palen-Michel,Constantine Lignos,Jesujoba O. Alabi,Shamsuddeen Hassan Muhammad,Peter Nabende,Cheikh M. Bamba Dione,Andiswa Bukula,Rooweither Mabuya,Bonaventure F. P. Dossou,Blessing Sibanda,Happy Buzaaba,Jonathan Mukiibi,Godson Kalipe,Derguene Mbaye,Amelia V. Taylor,Fatoumata Kabore,Chris Chinenye Emezue,Anuoluwapo Aremu,Perez Ogayo,Catherine W. Gitau,Edwin Munkoh-Buabeng,V. M. Koagne,Allahsera Auguste Tapo,Tebogo Macucwa,Vukosi Marivate,Elvis Mboning,Tajuddeen R. Gwadabe,Tosin P. Adewumi,Orevaoghene Ahia,Joyce Nakatumba-Nabende,Neo L. Mokono,Ignatius Ezeani,C. I. Chukwuneke,Mofe Adeyemi,Gilles Hacheme,Idris Abdulmumin,O. Ogundepo,Oreen Yousuf,Tatiana Moteu Ngoli,Dietrich Klakow +44 more
TL;DR: This paper creates the largest human-annotated NER dataset for 20 African languages, and studies the behavior of state-of-the-art cross-lingual transfer methods in an Africa-centric setting, demonstrating that the choice of source language significantly affects performance.
Posted Content
Neural Machine Translation for Extremely Low-Resource African Languages: A Case Study on Bambara
Allahsera Auguste Tapo,Bakary Coulibaly,Sébastien Diarra,Christopher M. Homan,Julia Kreutzer,Sarah Luger,Arthur Nagashima,Marcos Zampieri,Michael Leventhal +8 more
TL;DR: The first parallel data set for machine translation of Bambara into and from English and French and the first benchmark results on machine translation to and from B Ambara are presented.
Posted Content
Assessing Human Translations from French to Bambara for Machine Learning: a Pilot Study
TL;DR: Novel methods for assessing the quality of human-translated aligned texts for learning machine translation models of under-resourced languages are presented and it is suggested that similar quality can be obtained from either written or spoken translations for certain kinds of texts.