scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Journal ArticleDOI
28 Sep 2017-Sensors
TL;DR: The enhancement of the unprecedented lesser quality of electrocardiogram signals through the combination of Savitzky-Golay and moving average filters, followed by outlier detection and removal based on normalised cross-correlation and clustering was able to render ensemble heartbeats of significantly higher quality.
Abstract: Electrocardiogram signals acquired through a steering wheel could be the key to seamless, highly comfortable, and continuous human recognition in driving settings. This paper focuses on the enhancement of the unprecedented lesser quality of such signals, through the combination of Savitzky-Golay and moving average filters, followed by outlier detection and removal based on normalised cross-correlation and clustering, which was able to render ensemble heartbeats of significantly higher quality. Discrete Cosine Transform (DCT) and Haar transform features were extracted and fed to decision methods based on Support Vector Machines (SVM), k-Nearest Neighbours (kNN), Multilayer Perceptrons (MLP), and Gaussian Mixture Models - Universal Background Models (GMM-UBM) classifiers, for both identification and authentication tasks. Additional techniques of user-tuned authentication and past score weighting were also studied. The method's performance was comparable to some of the best recent state-of-the-art methods (94.9% identification rate (IDR) and 2.66% authentication equal error rate (EER)), despite lesser results with scarce train data (70.9% IDR and 11.8% EER). It was concluded that the method was suitable for biometric recognition with driving electrocardiogram signals, and could, with future developments, be used on a continuous system in seamless and highly noisy settings.

86 citations

Journal ArticleDOI
TL;DR: The neural syntactic based model achieves the best published results in perplexity and WER for the given data sets and comparisons with the standard and neural net based N-gram models with arbitrarily long contexts show that the syntactic information is in fact very helpful in estimating the word string probability.
Abstract: This paper presents a study of using neural probabilistic models in a syntactic based language model. The neural probabilistic model makes use of a distributed representation of the items in the conditioning history, and is powerful in capturing long dependencies. Employing neural network based models in the syntactic based language model enables it to use efficiently the large amount of information available in a syntactic parse in estimating the next word in a string. Several scenarios of integrating neural networks in the syntactic based language model are presented, accompanied by the derivation of the training procedures involved. Experiments on the UPenn Treebank and the Wall Street Journal corpus show significant improvements in perplexity and word error rate over the baseline SLM. Furthermore, comparisons with the standard and neural net based N-gram models with arbitrarily long contexts show that the syntactic information is in fact very helpful in estimating the word string probability. Overall, our neural syntactic based model achieves the best published results in perplexity and WER for the given data sets.

86 citations

Journal ArticleDOI
TL;DR: This work describes a novel adaptation algorithm for language models with time and dialog-state varying parameters that allows for recognizing and understanding unconstrained speech at each stage of the dialog, enabling context-switching and error recovery.
Abstract: We are interested in adaptive spoken dialog systems for automated services. Peoples' spoken language usage varies over time for a given task, and furthermore varies depending on the state of the dialog. Thus, it is crucial to adapt automatic speech recognition (ASR) language models to these varying conditions. We characterize and quantify these variations based on a database of 30 K user-transactions with AT&T's experimental How May I Help You? spoken dialog system. We describe a novel adaptation algorithm for language models with time and dialog-state varying parameters. Our language adaptation framework allows for recognizing and understanding unconstrained speech at each stage of the dialog, enabling context-switching and error recovery. These models have been used to train state-dependent ASR language models. We have evaluated their performance with respect to word accuracy and perplexity over time and dialog states. We have achieved a reduction of 40% in perplexity and of 8.4% in word error rate over the baseline system, averaged across all dialog states.

85 citations

Proceedings ArticleDOI
12 May 1998
TL;DR: This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation to find the largest improvement with a model using automatically determined categories.
Abstract: This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation. Categories corresponding to parts-of-speech as well as automatically clustered groupings are considered. The category-based model employs variable-length n-grams and permits each word to belong to multiple categories. Relative word error rate reductions of between 2 and 7% over the baseline are achieved in N-best rescoring experiments on the Wall Street Journal corpus. The largest improvement is obtained with a model using automatically determined categories. Perplexities continue to decrease as the number of different categories is increased, but improvements in the word error rate reach an optimum.

85 citations

Posted Content
TL;DR: This work uses some preliminary experiments to indicate that wav2vec 2.0 can capture the information about the speaker and language and utilizes one model to achieve the unified modeling by the multi-task learning for the two tasks.
Abstract: Wav2vec 2.0 is a recently proposed self-supervised framework for speech representation learning. It follows a two-stage training process of pre-training and fine-tuning, and performs well in speech recognition tasks especially ultra-low resource cases. In this work, we attempt to extend self-supervised framework to speaker verification and language identification. First, we use some preliminary experiments to indicate that wav2vec 2.0 can capture the information about the speaker and language. Then we demonstrate the effectiveness of wav2vec 2.0 on the two tasks respectively. For speaker verification, we obtain a new state-of-the-art result, Equal Error Rate (EER) of 3.61% on the VoxCeleb1 dataset. For language identification, we obtain an EER of 12.02% on 1 second condition and an EER of 3.47% on full-length condition of the AP17-OLR dataset. Finally, we utilize one model to achieve the unified modeling by the multi-task learning for the two tasks.

85 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528