Proceedings ArticleDOI
Active learning for automatic speech recognition
Dilek Hakkani-Tur,Giuseppe Riccardi,Allen Louis Gorin +2 more
- Vol. 4, pp 3904-3907
Reads0
Chats0
TLDR
A new method for reducing the transcription effort for training in automatic speech recognition (ASR), which automatically estimates a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data.Abstract:
State-of-the-art speech recognition systems are trained using transcribed utterances, preparation of which is labor intensive and time-consuming. In this paper, we describe a new method for reducing the transcription effort for training in automatic speech recognition (ASR). Active learning aims at reducing the number of training examples to be labeled by automatically processing the unlabeled examples, and then selecting the most informative ones with respect to a given cost function for a human to label. We automatically estimate a confidence score for each word of the utterance, exploiting the lattice output of a speech recognizer, which was trained on a small set of transcribed data. We compute utterance confidence scores based on these word confidence scores, then selectively sample the utterances to be transcribed using the utterance confidence scores. In our experiments, we show that we reduce the amount of labeled data needed for a given word accuracy by 27%.read more
Citations
More filters
Journal ArticleDOI
Machine Learning Paradigms for Speech Recognition: An Overview
TL;DR: This overview article provides readers with an overview of modern ML techniques as utilized in the current and as relevant to future ASR research and systems, and presents and analyzes recent developments of deep learning and learning with sparse representations.
PatentDOI
Combining active and semi-supervised learning for spoken language understanding
TL;DR: This paper combined active and semi-supervised learning to reduce the amount of manual labeling when training a spoken language understanding model classifier with human-labeled utterance data, which reduced the number of manual labels.
Journal ArticleDOI
Active learning in multimedia annotation and retrieval: A survey
Meng Wang,Xian-Sheng Hua +1 more
TL;DR: A survey on the efforts of leveraging active learning in multimedia annotation and retrieval, including semi-supervised learning, multilabel learning and multiple instance learning, focuses on two application domains: image/video annotation and content-based image retrieval.
Proceedings ArticleDOI
Libri-Light: A Benchmark for ASR with Limited or No Supervision
Jacob Kahn,Morgane Riviere,Weiyi Zheng,Eugene Kharitonov,Qiantong Xu,Pierre-Emmanuel Mazaré,Julien Karadayi,Vitaliy Liptchinsky,Ronan Collobert,Christian Fuegen,Tatiana Likhomanenko,Gabriel Synnaeve,Armand Joulin,Abdelrahman Mohamed,Emmanuel Dupoux +14 more
TL;DR: In this article, the authors introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision, which is derived from open-source audio books from the LibriVox project.
Posted Content
Building a Conversational Agent Overnight with Dialogue Self-Play
TL;DR: A new corpus of 3,000 dialogues spanning 2 domains collected with M2M is proposed, and comparisons with popular dialogue datasets on the quality and diversity of the surface forms and dialogue flows are presented.
References
More filters
Journal ArticleDOI
Improving Generalization with Active Learning
TL;DR: A formalism for active concept learning called selective sampling is described and it is shown how it may be approximately implemented by a neural network.
Book ChapterDOI
Heterogenous uncertainty sampling for supervised learning
David D. Lewis,Jason A. Catlett +1 more
TL;DR: This work test the use of one classifier (a highly efficient probabilistic one) to select examples for training another (the C4.5 rule induction program) and finds that the uncertainty samples yielded classifiers with lower error rates than random samples ten times larger.
Journal ArticleDOI
Finding consensus in speech recognition: word error minimization and other applications of confusion networks☆
TL;DR: A new framework for distilling information from word lattices is described to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses.
Book ChapterDOI
Committee-based sampling for training probabilistic classifiers
Ido Dagan,Sean P. Engelson +1 more
TL;DR: A general method for efficiently training probabilistic classifiers, by selecting for training only the more informative examples in a stream of unlabeled examples, which is particularly attractive because it evaluates the expected information gain from a training example implicitly.
Proceedings Article
Active Learning for Natural Language Parsing and Information Extraction
TL;DR: It is shown that active learning can signicantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks: semantic parsing and information extraction.