Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Intermediate Loss Regularization for CTC-Based Speech Recognition

[...]

Jaesong Lee¹, Shinji Watanabe²•Institutions (2)

Naver Corporation¹, Johns Hopkins University²

06 Jun 2021

TL;DR: In this article, an intermediate CTC loss is proposed to regularize CTC training and improve the performance with only small modification of the code and small and no overhead during training and inference.

...read moreread less

Abstract: We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification (CTC) objective. The proposed objective, an inter-mediate CTC loss, is attached to an intermediate layer in the CTC encoder network. This intermediate CTC loss well regularizes CTC training and improves the performance requiring only small modification of the code and small and no overhead during training and inference, respectively. In addition, we propose to combine this intermediate CTC loss with stochastic depth training, and apply this combination to a recently proposed Conformer network. We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively, based on CTC greedy search without a language model. Especially, the AISHELL-1 task is comparable to other state-of-the-art ASR systems based on auto-regressive decoder with beam search.

...read moreread less

85 citations

Proceedings Article•DOI•

A minimum-mean-square-error noise reduction algorithm on Mel-frequency cepstra for robust speech recognition

[...]

Dong Yu¹, Li Deng¹, Jasha Droppo¹, Jian Wu¹, Yifan Gong¹, Alejandro Acero¹ - Show less +2 more•Institutions (1)

Microsoft¹

12 May 2008

TL;DR: A non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition that performs slightly better than the ETSI AFE on the well-matched and mid-mismatched settings.

...read moreread less

Abstract: We present a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Distinguishing from the MMSE enhancement in log spectral amplitude proposed by Ephraim and Malah (E&M) (1985), the new algorithm presented in this paper develops the suppression rule that applies to power spectral magnitude of the filter-banks' outputs and to MFCC directly, making it demonstrably more effective in noise-robust speech recognition. The noise variance in the new algorithm contains a significant term resulting from instantaneous phase asynchrony between clean speech and mixing noise, missing in the E&M algorithm. Speech recognition experiments on the standard Aurora-3 task demonstrate a reduction of word error rate by 48% against the ICSLP02 baseline, by 26% against the cepstral mean normalization baseline, and by 13% against the conventional E&M log-MMSE noise suppressor. The new algorithm is also much more efficient than E&M noise suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins in the FFT domain (256). The results also show that our algorithm performs slightly better than the ETSI AFE on the well-matched and mid-mismatched settings.

...read moreread less

85 citations

Proceedings Article•DOI•

CIF: Continuous Integrate-And-Fire for End-To-End Speech Recognition

[...]

Linhao Dong¹, Bo Xu¹•Institutions (1)

Chinese Academy of Sciences¹

04 May 2020

TL;DR: Wang et al. as mentioned in this paper proposed a soft and monotonic alignment mechanism used for sequence transduction, inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate and Fire (CIF).

...read moreread less

Abstract: In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. It is inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate-and-Fire (CIF). Applied to the ASR task, CIF not only shows a concise calculation, but also supports online recognition and acoustic boundary positioning, thus suitable for various ASR scenarios. Several support strategies are also proposed to alleviate the unique problems of CIF-based model. With the joint action of these methods, the CIF-based model shows competitive performance. Notably, it achieves a word error rate (WER) of 2.86% on the test-clean of Librispeech and creates new state-of-the-art result on Mandarin telephone ASR benchmark.

...read moreread less

84 citations

Journal Article•DOI•

A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks

[...]

Abdulaziz Alarifi¹, Amr Tolba², Amr Tolba¹, Zafer Al-Makhadmeh¹, Wael Said³ - Show less +1 more•Institutions (3)

King Saud University¹, Menoufia University², Zagazig University³

01 Jun 2020-The Journal of Supercomputing

TL;DR: This work introduces a novel big data and machine learning technique for evaluating sentiment analysis processes to improve system efficiency, and results obtained are compared; CSO-LSTMNN outperforms PSO in terms of increasing accuracy and decreasing error rate.

...read moreread less

Abstract: Sentiment analysis is crucial in various systems such as opinion mining and predicting. Considerable research has been done to analyze sentiment using various machine learning techniques. However, the high error rates in these studies can reduce the entire system’s efficiency. We introduce a novel big data and machine learning technique for evaluating sentiment analysis processes to overcome this problem. The data are collected from a huge volume of datasets, helpful in the effective analysis of systems. The noise in the data is eliminated using a preprocessing data mining concept. From the cleaned sentiment data, effective features are selected using a greedy approach that selects optimal features processed by an optimal classifier called cat swarm optimization-based long short-term memory neural network (CSO-LSTMNN). The classifiers analyze sentiment-related features according to cat behavior, minimizing error rate while examining features. This technique helps improve system efficiency, analyzed using experimental results of error rate, precision, recall, and accuracy. The results obtained by implementing the greedy feature and CSO-LSTMNN algorithm and the particle swarm optimization (PSO) algorithm are compared; CSO-LSTMNN outperforms PSO in terms of increasing accuracy and decreasing error rate.

...read moreread less

84 citations

Journal Article•DOI•

The throughput efficiency of the go-back-N ARQ scheme under Markov and related error structures

[...]

Clement H. C. Leung¹, Y. Kikumoto, S.A. Sorensen•Institutions (1)

Birkbeck, University of London¹

01 Feb 1988-IEEE Transactions on Communications

TL;DR: It is found that for a given error rate, error patterns having zero correlation between successive transmission generally fare better than those with negative correlation, and that error patterns with positive correlation fare better still.

...read moreread less

Abstract: A formula for the go-back-N ARQ (automatic repeat request) scheme applicable to Markov error patterns is derived. It is a generalization of the well-known efficiency formula p/(p+m(1-p)) (where m is the round trip delay in number of block durations and p is the block transmission success probability), and it has been successfully validated against simulation measurements. It is found that for a given error rate, error patterns having zero correlation between successive transmission generally fare better than those with negative correlation, and that error patterns with positive correlation fare better still. It is shown that the present analysis can be extended in a straightforward manner to cope with error patterns of a more complex nature. Simple procedures for numerical evaluation of efficiency under quite general error structures are presented. >

...read moreread less

84 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics