scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: It is shown that both the feature level fusion with the CMAES optimization algorithms and decision level fusion using a Bayesian network as a classifier improved system classification performance and can also be applied to other sensor fusion applications.
Abstract: The Cyranose 320 electronic nose (Enose) and zNose™ are two instruments used to detect volatile profiles. In this research, feature level and decision level multisensor data fusion models, combined with covariance matrix adaptation evolutionary strategy (CMAES), were developed to fuse the Enose and zNose data to improve detection and classification performance for damaged apples compared with using the individual instruments alone. Principal component analysis (PCA) was used for feature extraction and probabilistic neural networks (PNN) were developed as the classifier. Three feature-based fusion schemes were compared. Dynamic selective fusion achieved an average 1.8% and a best 0% classification error rate in a total of 30 independent runs. The static selective fusion approach resulted in a 6.1% classification error rate, which was not as good as using individual sensors (4.2% for the Enose and 2.6% for the zNose) if only selected features were applied. Simply adding the Enose and zNose features without selection (non-selective fusion) worsened the classification performance with a 32.5% classification error rate. This indicated that the feature selection using the CMAES is an indispensable process in multisensor data fusion, especially if multiple sources of sensors contain much irrelevant or redundant information. At the decision level, Bayesian network fusion achieved better performance than two individual sensors, with 11% error rate versus 13% error rate for the Enose and 20% error rate for the zNose. It is shown that both the feature level fusion with the CMAES optimization algorithms and decision level fusion using a Bayesian network as a classifier improved system classification performance. This methodology can also be applied to other sensor fusion applications.

105 citations

Journal ArticleDOI
TL;DR: This paper proposes the use of a coupled 3D convolutional neural network (3D CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio–visual streams using the learned multimodal features.
Abstract: Audio–visual recognition (AVR) has been considered as a solution for speech recognition tasks when the audio is corrupted, as well as a visual recognition method used for speaker verification in multi-speaker scenarios. The approach of AVR systems is to leverage the extracted information from one modality to improve the recognition ability of the other modality by complementing the missing information. The essential problem is to find the correspondence between the audio and visual streams, which is the goal of this paper. We propose the use of a coupled 3D convolutional neural network (3D CNN) architecture that can map both modalities into a representation space to evaluate the correspondence of audio–visual streams using the learned multimodal features. The proposed architecture will incorporate both spatial and temporal information jointly to effectively find the correlation between temporal information for different modalities. By using a relatively small network architecture and much smaller data set for training, our proposed method surpasses the performance of the existing similar methods for audio–visual matching, which use 3D CNNs for feature representation. We also demonstrate that an effective pair selection method can significantly increase the performance. The proposed method achieves relative improvements over 20% on the equal error rate and over 7% on the average precision in comparison to the state-of-the-art method.

105 citations

Proceedings ArticleDOI
18 Jun 2012
TL;DR: An input method which enables complex hands-free interaction through 3d handwriting recognition through Hidden Markov Models and a statistical language model is used to enhance recognition performance and restrict the search space.
Abstract: We present an input method which enables complex hands-free interaction through 3d handwriting recognition. Users can write text in the air as if they were using an imaginary blackboard. Motion sensing is done wirelessly by accelerometers and gyroscopes which are attached to the back of the hand. We propose a two-stage approach for spotting and recognition of handwriting gestures. The spotting stage uses a Support Vector Machine to identify data segments which contain handwriting. The recognition stage uses Hidden Markov Models (HMM) to generate the text representation from the motion sensor data. Individual characters are modeled by HMMs and concatenated to word models. Our system can continuously recognize arbitrary sentences, based on a freely definable vocabulary with over 8000 words. A statistical language model is used to enhance recognition performance and restrict the search space. We report the results from a nine-user experiment on sentence recognition for person dependent and person independent setups on 3d-space handwriting data. For the person independent setup, a word error rate of 11% is achieved, for the person dependent setup 3% are achieved. We evaluate the spotting algorithm in a second experiment on a realistic dataset including everyday activities and achieve a sample based recall of 99\% and a precision of 25%. We show that additional filtering in the recognition stage can detect up to 99% of the false positive segments.

105 citations

Journal ArticleDOI
TL;DR: The effects of parameter settings in linguistic profiling, a technique in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts, are explored.
Abstract: This article explores the effects of parameter settings in linguistic profiling, a technique in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts. Although the technique proves to be quite effective for authorship verification, with the best overall parameter settings yielding an equal error rate of 3p on a test corpus of student essays, the optimal parameters vary greatly depending on author and evaluation criterion.

105 citations

Journal ArticleDOI
TL;DR: When deep learning SDAE is applied to IoT convergence-based intrusion security detection, the Detection load can be reduced, the detection effect can be improved, and the operation is more secure and stable.
Abstract: In order to explore the application value of deep learning denoising autoencoder (DAE) in Internet-of-Things (IoT) fusion security, in this study, a hierarchical intrusion security detection model stacked DAE supporting vector machine (SDAE-SVM) is constructed based on the three-layer neural network of self-encoder. The sample data after dimension reduction are obtained by layer by layer pretraining and fine-tuning. The traditional deep learning algorithms [stacked noise autoencoder (SNAE), stacked autoencoder (SAE), stacked contractive autoencoder (SCAE), stacked sparse autoencoder (SSAE), deep belief network (DBN)] are introduced to carry out the comparative simulation with the model in this study. The results show that when the encoder in the model is a 4-layer network structure, the accuracy rate (Ac) of the model is the highest (97.83%), the false-negative rate (Fn) (1.27%) and the false-positive rate (Fp) (3.21%) are the lowest. When the number of nodes in the first hidden layer is about 110, the model accuracy is about 98%. When comparing the model designed in this study with the common feature dimension reduction methods, the Ac, Fn, and Fp of this model are the best, which are 98.12%, 3.21%, and 1.27%, respectively. When compared with other deep learning algorithms of the same type, the recognition rate, Ac, error rate, and rejection rate show good results. In multiple data sets, the recognition rate, Ac, error rate, and rejection rate of the model in this study are always better than the traditional deep learning algorithms. In conclusion, when deep learning SDAE is applied to IoT convergence-based intrusion security detection, the detection load can be reduced, the detection effect can be improved, and the operation is more secure and stable.

104 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528