scispace - formally typeset
Proceedings ArticleDOI

Active learning for automatic speech recognition

Reads0
Chats0
TLDR
A new method for reducing the transcription effort for training in automatic speech recognition (ASR), which automatically estimates a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data.
Abstract
State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor intensive and time-consuming. In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. We automatically estimate a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. We compute utterance confidence scores based on these word confidence scores, then selectively sample the utterances to be transcribed using the utterance confidence scores. In our experiments, we show that we reduce the amount of labeled data needed for a given word accuracy by 27%.

read more

Citations
More filters
Journal ArticleDOI

Machine Learning Paradigms for Speech Recognition: An Overview

TL;DR: This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems, and presents and analyzes recent developments of deep learning and learning with sparse representations.
PatentDOI

Combining active and semi-supervised learning for spoken language understanding

TL;DR: This paper combined active and semi-supervised learning to reduce the amount of manual labeling when training a spoken language understanding model classifier with human-labeled utterance data, which reduced the number of manual labels.
Journal ArticleDOI

Active learning in multimedia annotation and retrieval: A survey

TL;DR: A survey on the efforts of leveraging active learning in multimedia annotation and retrieval, including semi-supervised learning, multilabel learning and multiple instance learning, focuses on two application domains: image/video annotation and content-based image retrieval.
Proceedings ArticleDOI

Libri-Light: A Benchmark for ASR with Limited or No Supervision

TL;DR: In this article, the authors introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision, which is derived from open-source audio books from the LibriVox project.
Posted Content

Building a Conversational Agent Overnight with Dialogue Self-Play

TL;DR: A new corpus of 3,000 dialogues spanning 2 domains collected with M2M is proposed, and comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows are presented.
References
More filters
Journal ArticleDOI

Improving Generalization with Active Learning

TL;DR: A formalism for active concept learning called selective sampling is described and it is shown how it may be approximately implemented by a neural network.
Book ChapterDOI

Heterogenous uncertainty sampling for supervised learning

TL;DR: This work test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program) and finds that the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger.
Journal ArticleDOI

Finding consensus in speech recognition: word error minimization and other applications of confusion networks☆

TL;DR: A new framework for distilling information from word lattices is described to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses.
Book ChapterDOI

Committee-based sampling for training probabilistic classifiers

TL;DR: A general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples, which is particularly attractive because it evaluates the expected information gain from a training example implicitly.
Proceedings Article

Active Learning for Natural Language Parsing and Information Extraction

TL;DR: It is shown that active learning can signicantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks: semantic parsing and information extraction.