scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
06 Sep 2009
TL;DR: The gain provided by various changes to the system: implementation of new search and training algorithms, new training data, vocabulary size, etc. is evaluated.
Abstract: This paper describes the new ASR system developed by the LIUM and analyzes the various origins of the significant drop of the word error rate observed in comparison to the previous LIUM ASR system. This study was made on the test data of the latest evaluation campaign of ASR systems on French broadcast news, called ESTER 2 and organized in December 2008. For the same computation time, the new system yields a word error rate about 38 % lower than what the previous system (which reached the second position during the ESTER 1 evaluation campaign) did. This paper evaluates the gain provided by various changes to the system: implementation of new search and training algorithms, new training data, vocabulary size, etc. The LIUM ASR system was the best open-source ASR system of the ESTER 2 campaign.

73 citations

Journal ArticleDOI
TL;DR: A supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions is presented and a framework that unifies separation and acoustic modeling via joint adaptive training is proposed.
Abstract: Although deep neural network (DNN) acoustic models are known to be inherently noise robust, especially with matched training and testing data, the use of speech separation as a frontend and for deriving alternative feature representations has been shown to improve performance in challenging environments. We first present a supervised speech separation system that significantly improves automatic speech recognition (ASR) performance in realistic noise conditions. The system performs separation via ratio time-frequency masking; the ideal ratio mask (IRM) is estimated using DNNs. We then propose a framework that unifies separation and acoustic modeling via joint adaptive training. Since the modules for acoustic modeling and speech separation are implemented using DNNs, unification is done by introducing additional hidden layers with fixed weights and appropriate network architecture. On the CHiME-2 medium-large vocabulary ASR task, and with log mel spectral features as input to the acoustic model, an independently trained ratio masking frontend improves word error rates by 10.9% (relative) compared to the noisy baseline. In comparison, the jointly trained system improves performance by 14.4%. We also experiment with alternative feature representations to augment the standard log mel features, like the noise and speech estimates obtained from the separation module, and the standard feature set used for IRM estimation. Our best system obtains a word error rate of 15.4% (absolute), an improvement of 4.6 percentage points over the next best result on this corpus.

73 citations

Journal ArticleDOI
TL;DR: This work proposes a window-based approach that automatically detects and extracts the air-writing event in a continuous stream of motion data, containing stray finger movements unrelated to writing.
Abstract: Air-writing refers to writing of characters or words in the free space by hand or finger movements. We address air-writing recognition problems in two companion papers. Part 2 addresses detecting and recognizing air-writing activities that are embedded in a continuous motion trajectory without delimitation. Detection of intended writing activities among superfluous finger movements unrelated to letters or words presents a challenge that needs to be treated separately from the traditional problem of pattern recognition. We first present a dataset that contains a mixture of writing and nonwriting finger motions in each recording. The LEAP from Leap Motion is used for marker-free and glove-free finger tracking. We propose a window-based approach that automatically detects and extracts the air-writing event in a continuous stream of motion data, containing stray finger movements unrelated to writing. Consecutive writing events are converted into a writing segment. The recognition performance is further evaluated based on the detected writing segment. Our main contribution is to build an air-writing system encompassing both detection and recognition stages and to give insights into how the detected writing segments affect the recognition result. With leave-one-out cross validation, the proposed system achieves an overall segment error rate of 1.15% for word-based recognition and 9.84% for letter-based recognition.

73 citations

Proceedings ArticleDOI
Wei Yu1, Jing Chang1, Cheng Yang1, Zhang Limin1, Han Shen1, Yongquan Xia1, Jin Sha1 
01 Oct 2017
TL;DR: The results demonstrate that the proposed system performs well in the leukocytes recognition task with less hardware limitations and higher accuracy compared to the traditional ones.
Abstract: The classification of white blood cells is critical for the diagnosis of anemia, leukemia and many other hematologic diseases. Current approaches are mainly based on traditional machine learning methods, which take quite a noticeable time and the recognition error rate is relatively high especially for the rare kinds of leukocytes. In this paper, we develop an automatic cell recognition system by applying deep learning methods. The results demonstrate that the proposed system performs well in the leukocytes recognition task with less hardware limitations and higher accuracy compared to the traditional ones.

73 citations

Journal ArticleDOI
TL;DR: A novel unsupervised Bayesian model that segments unlabeled speech and clusters the segments into hypothesized word groupings is presented, resulting in a complete unsuper supervised tokenization of the input speech in terms of discovered word types.
Abstract: In settings where only unlabeled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modeling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsu-pervised Bayesian model that segments unlabeled speech and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional acoustic vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this space while jointly performing segmentation. We report word error rates in a small-vocabulary connected digit recognition task by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% error rate, outperforming a previous HMM-based system by about 10% absolute. Moreover, in contrast to the baseline, our model does not require a pre-specified vocabulary size.

73 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528