scispace - formally typeset
Open AccessProceedings ArticleDOI

Gender and Dialect Bias in YouTube’s Automatic Captions

Reads0
Chats0
TLDR
This project evaluates the accuracy of YouTube’s automatically-generated captions across two genders and five dialect groups, and demonstrates the need for sociolinguistically-stratified validation of systems.
Abstract
This project evaluates the accuracy of YouTube’s automatically-generated captions across two genders and five dialect groups. Speakers’ dialect and gender was controlled for by using videos uploaded as part of the “accent tag challenge”, where speakers explicitly identify their language background. The results show robust differences in accuracy across both gender and dialect, with lower accuracy for 1) women and 2) speakers from Scotland. This finding builds on earlier research finding that speaker’s sociolinguistic identity may negatively impact their ability to use automatic speech recognition, and demonstrates the need for sociolinguistically-stratified validation of systems.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science

TL;DR: It is argued that data statements will help alleviate issues related to exclusion and bias in language technology, lead to better precision in claims about how natural language processing research can generalize and thus better engineering results, protect companies from public embarrassment, and ultimately lead to language technology that meets its users in their own preferred linguistic style.
Posted Content

WILDS: A Benchmark of in-the-Wild Distribution Shifts

TL;DR: WILDS is presented, a benchmark of in-the-wild distribution shifts spanning diverse data modalities and applications, and is hoped to encourage the development of general-purpose methods that are anchored to real-world distribution shifts and that work well across different applications and problem settings.
Posted Content

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization.

TL;DR: The results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization, and introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
Proceedings ArticleDOI

Measuring and Mitigating Unintended Bias in Text Classification

TL;DR: A new approach to measuring and mitigating unintended bias in machine learning models is introduced, using a set of common demographic identity terms as the subset of input features on which to measure bias.
Proceedings ArticleDOI

Racial Bias in Hate Speech and Abusive Language Detection Datasets

TL;DR: This article examined racial bias in five different sets of Twitter data annotated for hate speech and abusive language and found that abusive language detection systems may discriminate against the groups who are often the targets of the abuse we are trying to detect.
References
More filters
Proceedings ArticleDOI

Librispeech: An ASR corpus based on public domain audio books

TL;DR: It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.
Proceedings ArticleDOI

Unbiased look at dataset bias

TL;DR: A comparison study using a set of popular datasets, evaluated based on a number of criteria including: relative data bias, cross-dataset generalization, effects of closed-world assumption, and sample value is presented.
Proceedings ArticleDOI

SWITCHBOARD: telephone speech corpus for research and development

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.
Dataset

TIMIT Acoustic-Phonetic Continuous Speech Corpus

TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.
Related Papers (5)