B
Bonaventure F. P. Dossou
Researcher at Jacobs University Bremen
Publications - 30
Citations - 355
Bonaventure F. P. Dossou is an academic researcher from Jacobs University Bremen. The author has contributed to research in topics: Computer science & Languages of Africa. The author has an hindex of 4, co-authored 14 publications receiving 90 citations.
Papers
More filters
Proceedings ArticleDOI
Participatory Research for Low-resourced Machine Translation:A Case Study in African Languages
Wilhelmina Nekoto,Vukosi Marivate,Tshinondiwa Matsila,Timi E. Fasubaa,Tajudeen Kolawole,Taiwo Fagbohungbe,Solomon Oluwole Akinola,Shamsuddeen Hassan Muhammad,Salomon Kabongo,Salomey Osei,Sackey Freshia,Rubungo Andre Niyongabo,Ricky Macharm,Perez Ogayo,Orevaoghene Ahia,Musie Meressa,Mofe Adeyemi,Masabata Mokgesi-Selinga,Lawrence Okegbemi,Laura Martinus,Kolawole Tajudeen,Kevin Degila,Kelechi Ogueji,Kathleen Siminyu,Julia Kreutzer,Jason Webster,Jamiil Toure Ali,Jade Abbott,Iroro Orife,Ignatius Ezeani,Idris Abdulkabir Dangana,Herman Kamper,Hady Elsahar,Goodness Duru,Ghollah Kioko,Espoir Murhabazi,Elan van Biljon,Daniel Whitenack,Christopher Onyefuluchi,Chris Chinenye Emezue,Bonaventure F. P. Dossou,Blessing Sibanda,Blessing Itoro Bassey,Ayodele Olabiyi,Arshath Ramkilowan,Alp Öktem,Adewale Akinfaderin,Abdallah Bashir +47 more
TL;DR: The feasibility and scalability of participatory research is demonstrated with a case study on MT for African languages, which leads to a collection of novel translation datasets, MT benchmarks for over 30 languages, with human evaluations for a third of them, and enables participants without formal training to make a unique scientific contribution.
Proceedings ArticleDOI
Biological Sequence Design with GFlowNets
Moksh Jain,Emmanuel Bengio,Alejandro Hernández-García,Jarrid Rector-Brooks,Bonaventure F. P. Dossou,Chanakya Ajit Ekbote,Jie Fu,Micheal Kilgour,Dinghuai Zhang,Lena Simine,Payel Das,Yoshua Bengio +11 more
TL;DR: This work proposes an active learning algorithm leveraging epistemic uncertainty estimation and the recently proposed GFlowNets as a generator of diverse candidate solutions, with the objective to obtain a diverse batch of useful and novel batches with high scoring candidates after each round.
Posted Content
Masakhane - Machine Translation For Africa.
Iroro Orife,Julia Kreutzer,Blessing Sibanda,Daniel Whitenack,Kathleen Siminyu,Laura Martinus,Jamiil Toure Ali,Jade Abbott,Vukosi Marivate,Salomon Kabongo,Musie Meressa,Espoir Murhabazi,Orevaoghene Ahia,Elan van Biljon,Arshath Ramkilowan,Adewale Akinfaderin,Alp Öktem,Wole Akin,Ghollah Kioko,Kevin Degila,Herman Kamper,Bonaventure F. P. Dossou,Chris Chinenye Emezue,Kelechi Ogueji,Abdallah Bashir +24 more
TL;DR: The methodology for building the community and spurring research from the African continent, as well as the success of the community in terms of addressing the identified problems affecting African NLP are discussed.
Proceedings ArticleDOI
A Few Thousand Translations Go a Long Way! Leveraging Pre-trained Models for African News Translation
David Ifeoluwa Adelani,Jesujoba O. Alabi,Angela Fan,Julia Kreutzer,Xiaoyu Shen,Machel Reid,Dana Ruiter,Dietrich Klakow,Peter Nabende,Ernie Chang,Tajuddeen R. Gwadabe,Freshia Sackey,Bonaventure F. P. Dossou,Chris Chinenye Emezue,Colin D. Leong,Michael Beukman,Shamsuddeen Hassan Muhammad,Guyo Dub Jarso,Oreen Yousuf,Rubungo Andre Niyongabo,Gilles Hacheme,Eric Peter Wairagala,Muhammad Umair Nasir,Benjamin Ayoade Ajibade,Tunde Ajayi,Yvonne Wambui Gitau,Jade Abbott,Mohamed Ahmed,Millicent A. Ochieng,Anuoluwapo Aremu,Perez Ogayo,Jonathan Mukiibi,Fatoumata Kabore,Godson Kalipe,Derguene Mbaye,Allahsera Auguste Tapo,V. M. Koagne,Edwin Munkoh-Buabeng,Valencia K. Wagner,Idris Abdulmumin,Ayodele Awokoya,Happy Buzaaba,Blessing Sibanda,Andiswa Bukula,Sam Manthalu +44 more
TL;DR: It is demonstrated that the most effective strategy for transferring both additional languages and additional domains is to leverage small quantities of high-quality translation data to fine-tune large pre-trained models.
Posted Content
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Isaac Caswell,Julia Kreutzer,Lisa Wang,Ahsan Wahab,Daan van Esch,Nasanbayar Ulzii-Orshikh,Allahsera Auguste Tapo,Nishant Subramani,Artem Sokolov,Claytone Sikasote,Monang Setyawan,Supheakmungkol Sarin,Sokhar Samb,Benoît Sagot,Clara E. Rivera,Annette Rios,Isabel Papadimitriou,Salomey Osei,Pedro Javier Ortiz Suárez,Iroro Orife,Kelechi Ogueji,Rubungo Andre Niyongabo,Toan Q. Nguyen,Mathias Müller,André Müller,Shamsuddeen Hassan Muhammad,Nanda Muhammad,Ayanda Mnyakeni,Jamshidbek Mirzakhalov,Tapiwanashe Matangira,Colin Leong,Nze Lawson,Sneha Kudugunta,Yacine Jernite,Mathias Jenny,Orhan Firat,Bonaventure F. P. Dossou,Sakhile Dlamini,Nisansa de Silva,Sakine Çabuk Ballı,Stella Biderman,Alessia Battisti,Ahmed Baruwa,Ankur Bapna,Pallavi Baljekar,Israel Abebe Azime,Ayodele Awokoya,Duygu Ataman,Orevaoghene Ahia,Oghenefego Ahia,Sweta Agrawal,Mofetoluwa Adeyemi +51 more
TL;DR: In this paper, the authors manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4) and audit the correctness of language codes in a sixth (JW300).