Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speaker adaptive training: a maximum likelihood approach to speaker normalization

[...]

Tasos Anastasakos¹, John McDonough, John Makhoul•Institutions (1)

Northeastern University¹

21 Apr 1997

TL;DR: Experimental results in the context of batch supervised adaptation demonstrate the effectiveness of the proposed speaker adaptive training method in large vocabulary speech recognition tasks and show that significant reductions in word error rate can be achieved over the common pooled speaker-independent paradigm.

...read moreread less

Abstract: This paper describes the speaker adaptive training (SAT) approach for speaker independent (SI) speech recognizers as a method for joint speaker normalization and estimation of the parameters of the SI acoustic models. In SAT, speaker characteristics are modeled explicitly as linear transformations of the SI acoustic parameters. The effect of inter-speaker variability in the training data is reduced, leading to parsimonious acoustic models that represent more accurately the phonetically relevant information of the speech signal. The proposed training method is applied to the Wall Street Journal (WSJ) corpus that consists of multiple training speakers. Experimental results in the context of batch supervised adaptation demonstrate the effectiveness of the proposed method in large vocabulary speech recognition tasks and show that significant reductions in word error rate can be achieved over the common pooled speaker-independent paradigm.

...read moreread less

108 citations

Journal Article•DOI•

Probabilistic-trajectory segmental HMMs☆

[...]

Wendy J. Holmes¹, Martin J. Russell¹•Institutions (1)

University of St Andrews¹

01 Jan 1999-Computer Speech & Language

TL;DR: Performance benefits have been demonstrated from incorporating a linear trajectory description and additionally from modelling variability in the mid-point parameter, and theoretical and experimental comparisons between different types of PTSHMMs, simpler SHMMs and conventional HMMs are presented.

...read moreread less

108 citations

Proceedings Article•DOI•

Deep-FSMN for Large Vocabulary Continuous Speech Recognition

[...]

Shiliang Zhang¹, Ming Lei¹, Zhi-Jie Yan¹, Li-Rong Dai•Institutions (1)

Alibaba Group¹

04 Mar 2018

TL;DR: DFSMN as mentioned in this paper introduces skip connections between memory blocks in adjacent layers, which enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure.

...read moreread less

Abstract: In this paper, we present an improved feedforward sequential memory networks (FSMN) architecture, namely Deep-FSMN (DFSMN), by introducing skip connections between memory blocks in adjacent layers. These skip connections enable the information flow across different layers and thus alleviate the gradient vanishing problem when building very deep structure. As a result, DFSMN significantly benefits from these skip connections and deep structure. We have compared the performance of DFSMN to BLSTM both with and without lower frame rate (LFR) on several large speech recognition tasks, including English and Mandarin. Experimental results shown that DFSMN can consistently outperform BLSTM with dramatic gain, especially trained with LFR using CD-Phone as modeling units. In the 20000 hours Fisher (FSH) task, the proposed DFSMN can achieve a word error rate of 9.4% by purely using the cross-entropy criterion and decoding with a 3-gram language model, which achieves a 1.5% absolute improvement compared to the BLSTM. In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM. Moreover, we can easily design the lookahead filter order of the memory blocks in DFSMN to control the latency for real-time applications.

...read moreread less

108 citations

Proceedings Article•

SAND: semi-supervised adaptive novel class detection and classification over data stream

[...]

Ahsanul Haque¹, Latifur Khan¹, Michael Baron¹•Institutions (1)

University of Texas at Dallas¹

12 Feb 2016

TL;DR: This paper proposes an efficient semi-supervised framework which uses change detection on classifier confidence to detect concept drifts, and to determine chunk boundaries dynamically, and also addresses concept evolution problem by detecting outliers having strong cohesion among themselves.

...read moreread less

Abstract: Most approaches to classifying data streams either divide the stream into fixed-size chunks or use gradual forgetting. Due to evolving nature of data streams, finding a proper size or choosing a forgetting rate without prior knowledge about time-scale of change is not a trivial task. These approaches hence suffer from a trade-off between performance and sensitivity. Existing dynamic sliding window based approaches address this problem by tracking changes in classifier error rate, but are supervised in nature. We propose an efficient semi-supervised framework in this paper which uses change detection on classifier confidence to detect concept drifts, and to determine chunk boundaries dynamically. It also addresses concept evolution problem by detecting outliers having strong cohesion among themselves. Experiment results on benchmark and synthetic data sets show effectiveness of the proposed approach.

...read moreread less

108 citations

Proceedings Article•DOI•

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

[...]

Shubham Toshniwal¹, Anjuli Kannan², Chung-Cheng Chiu², Yonghui Wu², Tara N. Sainath², Karen Livescu¹ - Show less +2 more•Institutions (2)

Toyota Technological Institute¹, Google²

01 Dec 2018

TL;DR: The authors compare a suite of past methods and some of their own proposed methods for using unpaired text data to improve encoder-decoder models and find that cold fusion has a lower oracle error rate and outperforms other approaches after second pass rescoring on the Google voice search data set.

...read moreread less

Abstract: Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and requires only a parallel corpus of speech and text for training. However, unlike in conventional approaches that combine separate acoustic and language models, it is not clear how to use additional (unpaired) text. While there has been previous work on methods addressing this problem, a thorough comparison among methods is still lacking. In this paper, we compare a suite of past methods and some of our own proposed methods for using unpaired text data to improve encoder-decoder models. For evaluation, we use the medium-sized Switchboard data set and the large-scale Google voice search and dictation data sets. Our results confirm the benefits of using unpaired text across a range of methods and data sets. Surprisingly, for first-pass decoding, the rather simple approach of shallow fusion performs best across data sets. However, for Google data sets we find that cold fusion has a lower oracle error rate and outperforms other approaches after second-pass rescoring on the Google voice search data set.

...read moreread less

108 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics