Topic
Word error rate
About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.
Papers published on a yearly basis
Papers
More filters
••
01 Jan 2008
75 citations
01 Jan 2014
TL;DR: Two novel frontends for robust language identification (LID) using a convolutional neural network trained for automatic speech recognition (ASR) and the CNN is used to obtain the posterior probabilities for i-vector training and extraction instead of a universal background model (UBM).
Abstract: This paper proposes two novel frontends for robust language identification (LID) using a convolutional neural network (CNN) trained for automatic speech recognition (ASR). In the CNN/i-vector frontend, the CNN is used to obtain the posterior probabilities for i-vector training and extraction instead of a universal background model (UBM). The CNN/posterior frontend is somewhat similar to a phonetic system in that the occupation counts of (tied) triphone states (senones) given by the CNN are used for classification. They are compressed to a low dimensional vector using probabilistic principal component analysis (PPCA). Evaluated on heavily degraded speech data, the proposed front ends provide significant improvements of up to 50% on average equal error rate compared to a UBM/i-vector baseline. Moreover, the proposed frontends are complementary and give significant gains of up to 20% relative to the best single system when combined.
75 citations
••
TL;DR: New approaches to improve sparse application-specific language models by combining domain dependent and out-of-domain data are investigated, including a back-off scheme that effectively leads to context-dependent multiple interpolation weights, and a likelihood-based similarity weighting scheme to discriminatively use data to train a task- specific language model.
Abstract: Standard statistical language modeling techniques suffer from sparse data problems when applied to real tasks in speech recognition, where large amounts of domain-dependent text are not available. We investigate new approaches to improve sparse application-specific language models by combining domain dependent and out-of-domain data, including a back-off scheme that effectively leads to context-dependent multiple interpolation weights, and a likelihood-based similarity weighting scheme to discriminatively use data to train a task-specific language model. Experiments with both approaches on a spontaneous speech recognition task (switchboard), lead to reduced word error rate over a domain-specific n-gram language model, giving a larger gain than that obtained with previous brute-force data combination approaches.
75 citations
•
01 Jan 2003TL;DR: A method based on the Minimum Description Length principle is used to split words statistically into subword units allowing efficient language modeling and unlimited vocabulary and the resulting model outperforms both word and syllable based trigram models.
Abstract: We study continuous speech recognition based on sub-word units found in an unsupervised fashion. For agglutinative languages like Finnish, traditional word-based n-gram language modeling does not work well due to the huge number of different word forms. We use a method based on the Minimum Description Length principle to split words statistically into subword units allowing efficient language modeling and unlimited vocabulary. The perplexity and speech recognition experiments on Finnish speech data show that the resulting model outperforms both word and syllable based trigram models. Compared to the word trigram model, the out-of-vocabulary rate is reduced from 20% to 0% and the word error rate from 56% to 32%.
75 citations
••
TL;DR: The Hidden-Articulator Markov model (HAMM) as discussed by the authors is an extension of the articulatory-feature model introduced by Erler in 1996, which integrates articulatory information into speech recognition.
75 citations