scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 2006"


Journal ArticleDOI
TL;DR: It is shown that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error.
Abstract: Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.

1,314 citations


Journal ArticleDOI
TL;DR: A pairwise error probability (PEP) expression is derived and the transfer function technique is applied in conjunction with the derived PEP to obtain upper bounds on the bit error rate.
Abstract: Error control coding can be used over free-space optical (FSO) links to mitigate turbulence-induced fading. In this paper, we derive error performance bounds for coded FSO communication systems operating over atmospheric turbulence channels, considering the recently introduced gamma-gamma turbulence model. We derive a pairwise error probability (PEP) expression and then apply the transfer function technique in conjunction with the derived PEP to obtain upper bounds on the bit error rate. Simulation results are further demonstrated to confirm the analytical results

444 citations


Journal ArticleDOI
TL;DR: This paper presents a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity, and uses Boosting to learn a subset of feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category.
Abstract: This paper explores the power and the limitations of weakly supervised categorization. We present a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity. A variety of local descriptors can be applied to form a set of feature vectors for each local region. Boosting is used to learn a subset of such feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category. This combination of individual extractors and descriptors leads to recognition rates that are superior to other approaches which use only one specific extractor/descriptor setting. To explore the limitation of our system, we had to set up new, highly complex image databases that show the objects of interest at varying scales and poses, in cluttered background, and under considerable occlusion. We obtain classification results up to 81 percent ROC-equal error rate on the most complex of our databases. Our approach outperforms all comparable solutions on common databases.

422 citations


Proceedings ArticleDOI
04 Jun 2006
TL;DR: It is shown that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.
Abstract: We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

306 citations


Proceedings Article
01 May 2006
TL;DR: A framework for classification of the errors of a machine translation system is presented and an error analysis of the system used by the RWTH in the first TC-STAR evaluation is carried out.
Abstract: Evaluation of automatic translation output is a difficult task. Several performance measures like Word Error Rate, Position Independent Word Error Rate and the BLEU and NIST scores are widely use and provide a useful tool for comparing different systems and to evaluate improvements within a system. However the interpretation of all of these measures is not at all clear, and the identification of the most prominent source of errors in a given system using these measures alone is not possible. Therefore some analysis of the generated translations is needed in order to identify the main problems and to focus the research efforts. This area is however mostly unexplored and few works have dealt with it until now. In this paper we will present a framework for classification of the errors of a machine translation system and we will carry out an error analysis of the system used by the RWTH in the first TC-STAR evaluation.

293 citations


Journal ArticleDOI
TL;DR: This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers, and presents a novel approach for constraint co-reference information extraction.
Abstract: With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. CRFs provide a principled way for incorporating various local features, external lexicon features and globle layout features. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. We make an empirical exploration of several factors, including variations on Gaussian, Laplace and hyperbolic-L1 priors for improved regularization, and several classes of features. Based on CRFs, we further present a novel approach for constraint co-reference information extraction; i.e., improving extraction performance given that we know some citations refer to the same publication. On a standard benchmark dataset, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs. On four co-reference IE datasets, our system significantly improves extraction performance, with an error rate reduction of 6-14%.

249 citations


Journal ArticleDOI
TL;DR: This paper focuses on optimizing vector quantization (VQ) based speaker identification, which reduces the number of test vectors by pre-quantizing the test sequence prior to matching, and thenumber of speakers by pruning out unlikely speakers during the identification process.
Abstract: In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7% can be reached in 0.84 s on average when the length of test utterance is 30.4 s.

248 citations


Proceedings ArticleDOI
14 May 2006
TL;DR: This paper deals with phoneme recognition based on neural networks (NN), and focuses on temporal patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers and investigates into tandem NN architectures.
Abstract: This paper deals with phoneme recognition based on neural networks (NN). First, several approaches to improve the phoneme error rate are suggested and discussed. In the experimental part, we concentrate on TempoRAl Patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers. We also investigate into tandem NN architectures. The results of the final system reported on standard TIMIT database compare favorably to the best published results.

236 citations


Proceedings ArticleDOI
04 Jun 2006
TL;DR: This work investigates prototype-driven learning for primarily unsupervised sequence modeling, where prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label, then propagated across a corpus using distributional similarity features in a log-linear generative model.
Abstract: We investigate prototype-driven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similarity features in a log-linear generative model. On part-of-speech induction in English and Chinese, as well as an information extraction task, prototype features provide substantial error rate reductions over competitive baselines and outperform previous work. For example, we can achieve an English part-of-speech tagging accuracy of 80.5% using only three examples of each tag and no dictionary constraints. We also compare to semi-supervised learning and discuss the system's error trends.

209 citations


Proceedings ArticleDOI
M. Wenk1, Martin Zellweger1, Andreas Burg1, Norbert Felber1, Wolfgang Fichtner1 
21 May 2006
TL;DR: In this paper, a parallel implementation of the K-best algorithm for MIMO systems is presented, which achieves up to 424 Mbps throughput with an area that is almost on par with current state-of-the-art implementations.
Abstract: From an error rate performance perspective, maximum likelihood (ML) detection is the preferred detection method for multiple-input multiple-output (MIMO) communication systems. However, for high transmission rates a straight forward exhaustive search implementation suffers from prohibitive complexity. The K-best algorithm provides close-to-ML bit error rate (BER) performance, while its circuit complexity is reduced compared to an exhaustive search. In this paper, a new VLSI architecture for the implementation of the K-best algorithm is presented. Instead of the mostly sequential processing that has been applied in previous VLSI implementations of the algorithm, the presented solution takes a more parallel approach. Furthermore, the application of a simplified norm is discussed. The implementation in an ASIC achieves up to 424 Mbps throughput with an area that is almost on par with current state-of-the-art implementations.

166 citations


Journal ArticleDOI
TL;DR: This paper proposes methods for a tighter integration of ASR and SLU using word confusion networks (WCNs), which provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy.

Journal ArticleDOI
Ciprian Chelba1, Alex Acero1
TL;DR: A novel technique for maximum “a posteriori” (MAP) adaptation of maximum entropy (MaxEnt) and maximum entropy Markov models (MEMM) is presented and automatic capitalization error rate of 1.4% is achieved on BN data.

Patent
Shaoming Liu1
22 Feb 2006
TL;DR: In this paper, a word translation device stores a first-language word search TRIE structure (100 ), a second-language Word Search TRIE (200), a first language word information record (110), a second language word record ( 210 ) in a bilingual dictionary.
Abstract: A word translation device stores a first-language word search TRIE structure ( 100 ), a second-language word search TRIE structure ( 200 ), a first-language word information record ( 110 ), a second-language word information record ( 210 ) in a bilingual dictionary. The first-language word search TRIE structure ( 100 ) is provided for searching for words of the first language. The second-language word search TRIE structure ( 200 ) is provided for searching for words of the second language. The first-language word information record ( 110 ) includes first translation information for identifying a translation of each of the words stored in the first-language word search TRIE structure ( 100 ). The second-language word information record ( 210 ) includes second translation information for identifying a translation of each of the words stored in the second-language word search TRIE structure ( 200 ). By referring to the first and second word information records ( 110 ) and ( 210 ), word translation between the first and second languages can be performed with high precision at high speeds.

Journal ArticleDOI
TL;DR: This article presents a language-independent algorithm for discovering word fragments in an unsupervised manner from text that uses the Minimum Description Length principle to find an inventory of word fragments that is compact but models the training text effectively.

Journal ArticleDOI
TL;DR: The current work presents an algorithm for the automated analysis of character- level errors in input streams for unconstrained text entry evaluations and presents new character-level metrics that can aid method designers in refining text entry methods.
Abstract: Recent improvements in text entry error rate measurement have enabled the running of text entry experiments in which subjects are free to correct errors (or not) as they transcribe a presented string. In these “unconstrained” experiments, it is no longer necessary to force subjects to unnaturally maintain synchronicity with presented text for the sake of performing overall error rate calculations. However, the calculation of character-level error rates, which can be trivial in artificially constrained evaluations, is far more complicated in unconstrained text entry evaluations because it is difficult to infer a subject's intention at every character. For this reason, prior character-level error analyses for unconstrained experiments have only compared presented and transcribed strings, not input streams. But input streams are rich sources of character-level error information, since they contain all of the text entered (and erased) by a subject. The current work presents an algorithm for the automated analysis of character-level errors in input streams for unconstrained text entry evaluations. It also presents new character-level metrics that can aid method designers in refining text entry methods. To exercise these metrics, we perform two analyses on data from an actual text entry experiment. One analysis, available from the prior work, uses only presented and transcribed strings. The other analysis uses input streams, as described in the current work. The results confirm that input stream error analysis yields richer information for the same empirical data. To facilitate the use of these new analyses, we offer pseudocode and downloadable software for performing unconstrained text entry experiments and analyzing data.

Proceedings ArticleDOI
17 Jul 2006
TL;DR: A maximum entropy approach for restoring diacritics in a document that can easily integrate and make effective use of diverse types of information and integrates a wide array of lexical, segment-based and part-of-speech tag features.
Abstract: Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with different diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document. The approach can easily integrate and make effective use of diverse types of information; the model we propose integrates a wide array of lexical, segment-based and part-of-speech tag features. The combination of these feature types leads to a state-of-the-art diacritization model. Using a publicly available corpus (LDC's Arabic Treebank Part 3), we achieve a diacritic error rate of 5.1%, a segment error rate 8.5%, and a word error rate of 17.3%. In case-ending-less setting, we obtain a diacritic error rate of 2.2%, a segment error rate 4.0%, and a word error rate of 7.2%.

Proceedings Article
01 Jan 2006
TL;DR: This paper demonstrates how to train the phoneme-based acoustic models with carefully designed electromyographic feature extraction methods by decomposing the signal into different feature space and successfully keep the useful information while reducing the noise.
Abstract: We present our research on continuous speech recognition of the surface electromyographic signals that are generated by the human articulatory muscles. Previous research on electromyographic speech recognition was limited to isolated word recognition because it was very difficult to train phoneme-based acoustic models for the electromyographic speech recognizer. In this paper, we demonstrate how to train the phoneme-based acoustic models with carefully designed electromyographic feature extraction methods. By decomposing the signal into different feature space, we successfully keep the useful information while reducing the noise. Additionally, we also model the anticipatory effect of the electromyographic signals compared to the speech signal. With a 108-word decoding vocabulary, the experimental results show that the word error rate improves from 86.8% to 32.0% by using our novel feature extraction methods. Index Terms: speech recognition, electromyography, articulatory muscles, feature extraction.

Proceedings ArticleDOI
Yaser Al-Onaizan1, Kishore Papineni1
17 Jul 2006
TL;DR: A new distortion model is proposed that can be used with existing phrase-based SMT decoders to address n-gram language model limitations and a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments is proposed.
Abstract: In this paper, we argue that n-gram language models are not sufficient to address word reordering required for Machine Translation. We propose a new distortion model that can be used with existing phrase-based SMT decoders to address those n-gram language model limitations. We present empirical results in Arabic to English Machine Translation that show statistically significant improvements when our proposed model is used. We also propose a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments.

Journal ArticleDOI
TL;DR: A novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin by using a penalized gradient descent algorithm.
Abstract: In this paper, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin. The approach is named large margin HMM. First, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Second, we propose to solve this constrained minimax optimization problem by using a penalized gradient descent algorithm, where the original objective function, i.e., minimum margin, is approximated by a differentiable function and the constraints are cast as penalty terms in the objective function. The new training method is evaluated in the speaker-independent isolated E-set recognition and the TIDIGITS connected digit string recognition tasks. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods

Proceedings ArticleDOI
17 Jul 2006
TL;DR: A semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus is introduced.
Abstract: We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality.

Journal ArticleDOI
TL;DR: This paper will give a detailed review of quantile equalization applied to the Mel scaled filter bank, including considerations about the application in online systems and improvements through a second transformation step that combines neighboring filter channels.
Abstract: The noise robustness of automatic speech recognition systems can be improved by reducing an eventual mismatch between the training and test data distributions during feature extraction. Based on the quantiles of these distributions the parameters of transformation functions can be reliably estimated with small amounts of data. This paper will give a detailed review of quantile equalization applied to the Mel scaled filter bank, including considerations about the application in online systems and improvements through a second transformation step that combines neighboring filter channels. The recognition tests have shown that previous experimental observations on small vocabulary recognition tasks can be confirmed on the larger vocabulary Aurora 4 noisy Wall Street Journal database. The word error rate could be reduced from 45.7% to 25.5% (clean training) and from 19.5% to 17.0% (multicondition training).

Proceedings Article
01 Jan 2006
TL;DR: The gains seen with cross-system adaptation and system combination methods are demonstrated and it is shown that sequences of adaption and decoding make it possible to incrementally improve the performance of the recognition system.
Abstract: Cross-system adaptation and system combination methods,such as ROVER and confusion network combination, areknown to lower the word error rate of speech recognitionsystems. They require the training of systems that are rea-sonably close in performance but at the same time produceoutput that differs in its errors. This provides complemen-taryinformationwhichleadstoperformanceimprovements.In this paper we demonstrate the gains we have seen withcross-systemadaptationandsystemcombinationontheEn-glish EPPS and RT0-05S lecture meeting task. We obtainedthe necessary varying systems by using different acous-tic front-ends and phoneme sets on which our models arebased. Inasetofcontrastiveexperimentsweshowtheinflu-ence that the exchange of the components has on adaptationand system combination.Index Terms: automatic speech recognition, system com-bination, cross adaptation, EPPS, RT-05S. 1. Introduction In state-of-the-art speech recognition systems it is commonpractice to use multi-pass systems with adaptation of theacoustic model in-between passes. The adaptation aims atbetter fitting the system to the speakers and/or acoustic en-vironmentsfoundinthetestdata. Itisusuallyperformedona by-speaker basis, obtained either from manual speaker la-bels or automatic clustering methods. Common adaptationmethods try to transform either the models used in a systemor the features to which the models are applied.Three adaptation methods that can be found in manystate-of-the-art systems are Maximum Likelihood LinearRegression (MLLR) [1], a model transformation, Vo-cal Tract Length Normalization (VTLN) [2] and feature-space constrained MLLR (fMLLR) [3], two feature-transformation methods. Adaptation is performed in an un-supervisedmanner,suchthattheerror-pronehypothesesob-tainedfromthepreviousdecodingpassaretakenasthenec-essary reference for adaptation. Generally, the word errorrates of the hypotheses obtained from the adapted systemsarelowerthanthoseforhypothesesonwhichtheadaptationwas performed. This sequences of adaption and decodingmake it possible to incrementally improve the performanceof the recognition system. Unfortunately, this loop of adap-tation and decoding does not always lead to significant im-provements. Often, after two or three stages of adapting asystem on its own output, no more gains can be obtained.This problem can be overcome by adapting a system

Journal ArticleDOI
TL;DR: A hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information is constructed and Bagging was found to significantly improve system performance for each of the sampling methods.

Posted Content
TL;DR: This paper outlines a three-step procedure for determining the low bit error rate performance curve of a wide class of LDPC codes of moderate length and allows one to efficiently see error performance at bit error rates that were previously out of reach of Monte Carlo methods.
Abstract: This paper outlines a three-step procedure for determining the low bit error rate performance curve of a wide class of LDPC codes of moderate length. The traditional method to estimate code performance in the higher SNR region is to use a sum of the contributions of the most dominant error events to the probability of error. These dominant error events will be both code and decoder dependent, consisting of low-weight codewords as well as non-codeword events if ML decoding is not used. For even moderate length codes, it is not feasible to find all of these dominant error events with a brute force search. The proposed method provides a convenient way to evaluate very low bit error rate performance of an LDPC code without requiring knowledge of the complete error event weight spectrum or resorting to a Monte Carlo simulation. This new method can be applied to various types of decoding such as the full belief propagation version of the message passing algorithm or the commonly used min-sum approximation to belief propagation. The proposed method allows one to efficiently see error performance at bit error rates that were previously out of reach of Monte Carlo methods. This result will provide a solid foundation for the analysis and design of LDPC codes and decoders that are required to provide a guaranteed very low bit error rate performance at certain SNRs.

Patent
31 Oct 2006
TL;DR: In this paper, a speech segment is indexed by identifying at least two alternative word sequences for the speech segment, and information is placed in an entry for the word in the index.
Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.

Journal ArticleDOI
TL;DR: The results indicate that a small amount ofphase error or uncertainty does not affect the recognition rate, but a large amount of phase uncertainty significantly affects the recognition rates.
Abstract: In this paper, we analyze the effects of uncertainty in the phase of speech signals on the word recognition error rate of human listeners. The motivating goal is to get a quantitative measure on the importance of phase in automatic speech recognition by studying the effects of phase uncertainty on human perception. Listening tests were conducted for 18 listeners under different phase uncertainty and signal-to-noise ratio (SNR) conditions. These results indicate that a small amount of phase error or uncertainty does not affect the recognition rate, but a large amount of phase uncertainty significantly affects the recognition rate. The degree of the importance of phase also seems to be an SNR-dependent one, such that at lower SNRs the effects of phase uncertainty are more pronounced than at higher SNRs. For example, at an SNR of -10 dB, having random phases at all frequencies results in a word error rate (WER) of 63% compared to 24% if the phase was unaltered. In comparison, at 0 dB, random phase results in a 25% WER as compared to 11% for the unaltered phase case. Listening tests were also conducted for the case of reconstructed phase based on the least square error estimation approach. The results indicate that the recognition rate for the reconstructed phase case is very close to that of the perfect phase case (a WER difference of 4% on average)

Journal ArticleDOI
TL;DR: 3D face registration and recognition algorithms, which are based solely on 3D shape information and analyze methods based on the fusion of shape features, and fusion schemes such as product rules, improved consensus voting and proposed serial fusion schemes improve the classification accuracy are reviewed.

Proceedings ArticleDOI
01 Nov 2006
TL;DR: A parallel-serial architecture is designed to map the decoder of any structured LDPC code in this large family to a hardware emulation platform and this new characterization leads to an improved decoding strategy and higher performance.
Abstract: Several high performance LDPC codes have parity-check matrices composed of permutation submatrices. We design a parallel-serial architecture to map the decoder of any structured LDPC code in this large family to a hardware emulation platform. A peak throughput of 240 Mb/s is achieved in decoding the (2048,1723) Reed-Solomon based LDPC (RS-LDPC) code. Experiments in the low bit error rate (BER) region provide statistics of the error traces, which are used to investigate the causes of the error floor. In a low precision implementation, the error floors are dominated by the fixed-point decoding effects, whereas in a higher precision implementation the errors are attributed to special configurations within the code, whose effect is exacerbated in a fixed-point decoder. This new characterization leads to an improved decoding strategy and higher performance.

Journal ArticleDOI
TL;DR: The design and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy that allows the visual classifier to process visual frames with a constrained amount of asynchrony relative to proposed acoustic segments is presented.
Abstract: This paper presents the design and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. The audio and visual feature streams are integrated using a segment-constrained hidden Markov model, which allows the visual classifier to process visual frames with a constrained amount of asynchrony relative to proposed acoustic segments. The core experiments in this paper investigate several different visual model structures, each of which provides a different means for defining the units of the visual classifier and the synchrony constraints between the audio and visual streams. Word recognition experiments are conducted on the AV-TIMIT corpus under variable additive noise conditions. Over varying acoustic signal-to-noise ratios, word error rate reductions between 14% and 60% are observed when integrating the visual information into the automatic speech recognition process.

Journal ArticleDOI
TL;DR: A thorough and exact analysis of minimum-selection generalized selection combining (MS-GSC) is carried out, based on a new result on order statistics, and the closed-form expressions of important performance measures are derived for the Rayleigh fading scenario.
Abstract: Diversity combining techniques improve the performance of wireless communication systems at the cost of increased power consumption. Minimum-selection generalized selection combining (MS-GSC) scheme has been proposed as a power saving implementation of conventional generalized selection combining (GSC) scheme. In this paper, noting that previous analytical results on the error rate of MS-GSC are approximate, we carry out a thorough and exact analysis for MS-GSC. In particular, based on a new result on order statistics, we obtain the statistics of the combined SNR with MS-GSC and we then apply these results to analyze the performance of MS-GSC over fading channels. We derive the closed-form expressions of important performance measures, including outage probability and average error rate, for the Rayleigh fading scenario. In addition, we investigate the average number of active MRC branches with MS-GSC, as a quantification of the power saving