scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
07 May 2001
TL;DR: This paper trains and test a support vector machines classifier and compares the results with other statistical classification methods and proposes a new approach to training support vector Machines.
Abstract: Support vector machines represent a new approach to pattern classification developed from the theory of structural risk minimization. In this paper, we present an investigation into the application of support vector machines to the confidence measurement problem in speech recognition. Specifically, based on the results from an initial decoding of an utterance during speech recognition, we derive a feature vector consisting of parameters such as word score density, N-best word score density differences, relative word score and relative word duration as input to the confidence measurement process in which hypothetically correct utterances are accepted and utterances determined to be incorrect are rejected. We propose a new approach to training support vector machines. In this paper, we train and test a support vector machines classifier and compare the results with other statistical classification methods.

62 citations

Journal ArticleDOI
TL;DR: The present research is focused on the recognition of five Spanish words corresponding to the English words "up," "down," "left," "right" and "select", with which a computer cursor could be controlled, and shows a dependence relationship between EEG data and imagined words.
Abstract: It was searched the minimal subset of channels for imagined speech.Channel selection was approached as multi-objective to obtain a Pareto front.A fuzzy system inference was applied to find a promising solution from Pareto front.Channel selection had a statistically similar performance to the use of all channels.It was observed a dependence between features and classes of imagined speech. One of the main purposes of brain-computer interfaces (BCI) is to provide persons of an alternative communication channel. This objective was firstly focused on handicapped subjects but nowadays its scope has increased to healthy persons. Usually, BCIs record brain activity using electroencephalograms (EEG), according to four main neuro-paradigms (slow cortical potentials, motor imagery, P300 component and visual evoked potentials). These analytical paradigms are not intuitive and are difficult to implement. Accordingly, this work researches an alternative neuro-paradigm called imagined speech, which refers to the internal pronunciation of words without emitting sounds or doing facial movements. Specifically, the present research is focused on the recognition of five Spanish words corresponding to the English words "up," "down," "left," "right" and "select", with which a computer cursor could be controlled. We perform an offline computer automatic classification procedure of a dataset of EEG signals from 27 subjects. The method implements a channel selection composed of two stages; the first one obtains a Pareto front and is approached as a multi-objective optimization problem dealing with the error rate and the number of channels; the second stage selects a single solution (channel combination) from the front, applying a fuzzy inference system (FIS). We assess the method's performance through a channel combination and a test set not used to generate the front. Several FIS configurations were explored to evaluate if a FIS is able to select channel combinations that improve or, at least, keep the obtained accuracies using all channels for each subject's data. We found that a FIS configuration, FIS3×3 (three membership functions for both input variables: error rate and the number of channels), obtained the best trade-off between the number of fuzzy rules and its accuracy (68.18% using around 7 channels). Also, the FIS3×3 obtained a similar statistically accuracy compared to the use of all channels (70.33%). Results of our method demonstrate the feasibility of using a FIS to automatically select a solution from the Pareto front to select channels applied to a problem of imagined speech classification. The presented method outperforms previous works in accuracy and showed a dependence relationship between EEG data and imagined words.

62 citations

Proceedings Article
01 Jan 2000
TL;DR: This work developed a method combining phoneme probabilities generated by the different acoustic models trained on distinct feature extraction processes, which was possible to obtain relative improvements on word error rate larger than 20% for a large vocabulary speaker independent continuous speech recognition task.
Abstract: The combination of multiple sources of information has been an attractive approach in different areas. That is the case of speech recognition area where several combination methods have been presented. Our hybrid MLP/HMM systems use acoustic models based on different set of features and different MLP classifier structures. In this work we developed a method combining phoneme probabilities generated by the different acoustic models trained on distinct feature extraction processes. Two different algorithms were implemented for combining the acoustic models probabilities. The first covers the combination in the probability domain and the second one in the log-probability domain. We made combinations of two and three alternative baseline systems where was possible to obtain relative improvements on word error rate larger than 20% for a large vocabulary speaker independent continuous speech recognition task.

62 citations

Proceedings ArticleDOI
15 May 2006
TL;DR: This paper describes an application of SVMs to speaker verification and shows a 9% absolute improvement in equal error rate and a 33% relative improvement in minimum detection cost function when compared to a comparable HMM baseline system.
Abstract: Support vector machines (SVM) have become a very popular pattern recognition algorithm for speech processing. In this paper we describe an application of SVMs to speaker verification. Traditionally speaker verification systems have used hidden Markov models (HMM) and Gaussian mixture models (GMM). These classifiers are based on generative models and are prone to overfitting. They do not directly optimize discrimination. SVMs, which are based on the principle of structural risk minimization, consist of binary classifiers that maximize the margin between two classes. The power of SVMs lie in their ability to transform data to a higher dimensional space and to construct a linear binary classifier in this space. Experiments were conducted on the NIST 2003 speaker recognition evaluation dataset. The SVM training was made computationally feasible by selecting only a small subset of vectors for building the out-of-class data. The results obtained using the SVMs showed a 9% absolute improvement in equal error rate and a 33% relative improvement in minimum detection cost function when compared to a comparable HMM baseline system

62 citations

Posted Content
TL;DR: The extension and optimisation of previous work on very deep convolutional neural networks for effective recognition of noisy speech in the Aurora 4 task are described and it is shown that state-level weighted log likelihood score combination in a joint acoustic model decoding scheme is very effective.
Abstract: This paper describes the extension and optimization of our previous work on very deep convolutional neural networks (CNNs) for effective recognition of noisy speech in the Aurora 4 task. The appropriate number of convolutional layers, the sizes of the filters, pooling operations and input feature maps are all modified: the filter and pooling sizes are reduced and dimensions of input feature maps are extended to allow adding more convolutional layers. Furthermore appropriate input padding and input feature map selection strategies are developed. In addition, an adaptation framework using joint training of very deep CNN with auxiliary features i-vector and fMLLR features is developed. These modifications give substantial word error rate reductions over the standard CNN used as baseline. Finally the very deep CNN is combined with an LSTM-RNN acoustic model and it is shown that state-level weighted log likelihood score combination in a joint acoustic model decoding scheme is very effective. On the Aurora 4 task, the very deep CNN achieves a WER of 8.81%, further 7.99% with auxiliary feature joint training, and 7.09% with LSTM-RNN joint decoding.

62 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528