scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
06 Jun 2021
TL;DR: In this article, an intermediate CTC loss is proposed to regularize CTC training and improve the performance with only small modification of the code and small and no overhead during training and inference.
Abstract: We present a simple and efficient auxiliary loss function for automatic speech recognition (ASR) based on the connectionist temporal classification (CTC) objective. The proposed objective, an inter-mediate CTC loss, is attached to an intermediate layer in the CTC encoder network. This intermediate CTC loss well regularizes CTC training and improves the performance requiring only small modification of the code and small and no overhead during training and inference, respectively. In addition, we propose to combine this intermediate CTC loss with stochastic depth training, and apply this combination to a recently proposed Conformer network. We evaluate the proposed method on various corpora, reaching word error rate (WER) 9.9% on the WSJ corpus and character error rate (CER) 5.2% on the AISHELL-1 corpus respectively, based on CTC greedy search without a language model. Especially, the AISHELL-1 task is comparable to other state-of-the-art ASR systems based on auto-regressive decoder with beam search.

85 citations

Proceedings ArticleDOI
Dong Yu1, Li Deng1, Jasha Droppo1, Jian Wu1, Yifan Gong1, Alejandro Acero1 
12 May 2008
TL;DR: A non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition that performs slightly better than the ETSI AFE on the well-matched and mid-mismatched settings.
Abstract: We present a non-linear feature-domain noise reduction algorithm based on the minimum mean square error (MMSE) criterion on Mel-frequency cepstra (MFCC) for environment-robust speech recognition. Distinguishing from the MMSE enhancement in log spectral amplitude proposed by Ephraim and Malah (E&M) (1985), the new algorithm presented in this paper develops the suppression rule that applies to power spectral magnitude of the filter-banks' outputs and to MFCC directly, making it demonstrably more effective in noise-robust speech recognition. The noise variance in the new algorithm contains a significant term resulting from instantaneous phase asynchrony between clean speech and mixing noise, missing in the E&M algorithm. Speech recognition experiments on the standard Aurora-3 task demonstrate a reduction of word error rate by 48% against the ICSLP02 baseline, by 26% against the cepstral mean normalization baseline, and by 13% against the conventional E&M log-MMSE noise suppressor. The new algorithm is also much more efficient than E&M noise suppressor since the number of the channels in the Mel-frequency filter bank is much smaller (23 in our case) than the number of bins in the FFT domain (256). The results also show that our algorithm performs slightly better than the ETSI AFE on the well-matched and mid-mismatched settings.

85 citations

Proceedings ArticleDOI
04 May 2020
TL;DR: Wang et al. as mentioned in this paper proposed a soft and monotonic alignment mechanism used for sequence transduction, inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate and Fire (CIF).
Abstract: In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. It is inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate-and-Fire (CIF). Applied to the ASR task, CIF not only shows a concise calculation, but also supports online recognition and acoustic boundary positioning, thus suitable for various ASR scenarios. Several support strategies are also proposed to alleviate the unique problems of CIF-based model. With the joint action of these methods, the CIF-based model shows competitive performance. Notably, it achieves a word error rate (WER) of 2.86% on the test-clean of Librispeech and creates new state-of-the-art result on Mandarin telephone ASR benchmark.

84 citations

Journal ArticleDOI
TL;DR: This work introduces a novel big data and machine learning technique for evaluating sentiment analysis processes to improve system efficiency, and results obtained are compared; CSO-LSTMNN outperforms PSO in terms of increasing accuracy and decreasing error rate.
Abstract: Sentiment analysis is crucial in various systems such as opinion mining and predicting. Considerable research has been done to analyze sentiment using various machine learning techniques. However, the high error rates in these studies can reduce the entire system’s efficiency. We introduce a novel big data and machine learning technique for evaluating sentiment analysis processes to overcome this problem. The data are collected from a huge volume of datasets, helpful in the effective analysis of systems. The noise in the data is eliminated using a preprocessing data mining concept. From the cleaned sentiment data, effective features are selected using a greedy approach that selects optimal features processed by an optimal classifier called cat swarm optimization-based long short-term memory neural network (CSO-LSTMNN). The classifiers analyze sentiment-related features according to cat behavior, minimizing error rate while examining features. This technique helps improve system efficiency, analyzed using experimental results of error rate, precision, recall, and accuracy. The results obtained by implementing the greedy feature and CSO-LSTMNN algorithm and the particle swarm optimization (PSO) algorithm are compared; CSO-LSTMNN outperforms PSO in terms of increasing accuracy and decreasing error rate.

84 citations

Journal ArticleDOI
TL;DR: It is found that for a given error rate, error patterns having zero correlation between successive transmission generally fare better than those with negative correlation, and that error patterns with positive correlation fare better still.
Abstract: A formula for the go-back-N ARQ (automatic repeat request) scheme applicable to Markov error patterns is derived. It is a generalization of the well-known efficiency formula p/(p+m(1-p)) (where m is the round trip delay in number of block durations and p is the block transmission success probability), and it has been successfully validated against simulation measurements. It is found that for a given error rate, error patterns having zero correlation between successive transmission generally fare better than those with negative correlation, and that error patterns with positive correlation fare better still. It is shown that the present analysis can be extended in a straightforward manner to cope with error patterns of a more complex nature. Simple procedures for numerical evaluation of efficiency under quite general error structures are presented. >

84 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528