scispace - formally typeset
M

Monang Setyawan

Researcher at Google

Publications -  1
Citations -  24

Monang Setyawan is an academic researcher from Google. The author has contributed to research in topics: Audit & Natural language processing. The author has an hindex of 1, co-authored 1 publications receiving 17 citations.

Papers
More filters
Posted Content

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets

TL;DR: In this paper, the authors manually audit the quality of 205 language-specific corpora released with five major public datasets (CCAligned, ParaCrawl, WikiMatrix, OSCAR, mC4) and audit the correctness of language codes in a sixth (JW300).