Showing papers on "Word error rate published in 2006"

PDF

Open Access

Journal Article•DOI•

Bias in error estimation when using cross-validation for model selection

[...]

23 Feb 2006-BMC Bioinformatics

TL;DR: It is shown that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error.

...read moreread less

Abstract: Cross-validation (CV) is an effective method for estimating the prediction error of a classifier. Some recent articles have proposed methods for optimizing classifiers by choosing classifier parameter values that minimize the CV error estimate. We have evaluated the validity of using the CV error estimate of the optimized classifier as an estimate of the true error expected on independent data. We used CV to optimize the classification parameters for two kinds of classifiers; Shrunken Centroids and Support Vector Machines (SVM). Random training datasets were created, with no difference in the distribution of the features between the two classes. Using these "null" datasets, we selected classifier parameter values that minimized the CV error estimate. 10-fold CV was used for Shrunken Centroids while Leave-One-Out-CV (LOOCV) was used for the SVM. Independent test data was created to estimate the true error. With "null" and "non null" (with differential expression between the classes) data, we also tested a nested CV procedure, where an inner CV loop is used to perform the tuning of the parameters while an outer CV is used to compute an estimate of the error. The CV error estimate for the classifier with the optimal parameters was found to be a substantially biased estimate of the true error that the classifier would incur on independent data. Even though there is no real difference between the two classes for the "null" datasets, the CV error estimate for the Shrunken Centroid with the optimal parameters was less than 30% on 18.5% of simulated training data-sets. For SVM with optimal parameters the estimated error rate was less than 30% on 38% of "null" data-sets. Performance of the optimized classifiers on the independent test set was no better than chance. The nested CV procedure reduces the bias considerably and gives an estimate of the error that is very close to that obtained on the independent testing set for both Shrunken Centroids and SVM classifiers for "null" and "non-null" data distributions. We show that using CV to compute an error estimate for a classifier that has itself been tuned using CV gives a significantly biased estimate of the true error. Proper use of CV for estimating true error of a classifier developed using a well defined algorithm requires that all steps of the algorithm, including classifier parameter tuning, be repeated in each CV loop. A nested CV procedure provides an almost unbiased estimate of the true error.

...read moreread less

1,314 citations

Journal Article•DOI•

Error rate performance analysis of coded free-space optical links over gamma-gamma atmospheric turbulence channels

[...]

Murat Uysal¹, Jing Li², Meng Yu²•Institutions (2)

University of Waterloo¹, Lehigh University²

01 Jun 2006-IEEE Transactions on Wireless Communications

TL;DR: A pairwise error probability (PEP) expression is derived and the transfer function technique is applied in conjunction with the derived PEP to obtain upper bounds on the bit error rate.

...read moreread less

Abstract: Error control coding can be used over free-space optical (FSO) links to mitigate turbulence-induced fading. In this paper, we derive error performance bounds for coded FSO communication systems operating over atmospheric turbulence channels, considering the recently introduced gamma-gamma turbulence model. We derive a pairwise error probability (PEP) expression and then apply the transfer function technique in conjunction with the derived PEP to obtain upper bounds on the bit error rate. Simulation results are further demonstrated to confirm the analytical results

...read moreread less

444 citations

Journal Article•DOI•

Generic object recognition with boosting

[...]

Andreas Opelt¹, Axel Pinz¹, Michael Fussenegger¹, Peter Auer²•Institutions (2)

Graz University of Technology¹, University of Leoben²

01 Mar 2006-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper presents a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity, and uses Boosting to learn a subset of feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category.

...read moreread less

Abstract: This paper explores the power and the limitations of weakly supervised categorization. We present a complete framework that starts with the extraction of various local regions of either discontinuity or homogeneity. A variety of local descriptors can be applied to form a set of feature vectors for each local region. Boosting is used to learn a subset of such feature vectors (weak hypotheses) and to combine them into one final hypothesis for each visual category. This combination of individual extractors and descriptors leads to recognition rates that are superior to other approaches which use only one specific extractor/descriptor setting. To explore the limitation of our system, we had to set up new, highly complex image databases that show the objects of interest at varying scales and poses, in cluttered background, and under considerable occlusion. We obtain classification results up to 81 percent ROC-equal error rate on the most complex of our databases. Our approach outperforms all comparable solutions on common databases.

...read moreread less

422 citations

Proceedings Article•DOI•

Learning for Semantic Parsing with Statistical Machine Translation

[...]

Yuk Wah Wong¹, Raymond J. Mooney¹•Institutions (1)

University of Texas at Austin¹

04 Jun 2006

TL;DR: It is shown that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

...read moreread less

Abstract: We present a novel statistical approach to semantic parsing, WASP, for constructing a complete, formal meaning representation of a sentence. A semantic parser is learned given a set of sentences annotated with their correct meaning representations. The main innovation of WASP is its use of state-of-the-art statistical machine translation techniques. A word alignment model is used for lexical acquisition, and the parsing model itself can be seen as a syntax-based translation model. We show that WASP performs favorably in terms of both accuracy and coverage compared to existing learning methods requiring similar amount of supervision, and shows better robustness to variations in task complexity and word order.

...read moreread less

306 citations

Proceedings Article•

Error Analysis of Statistical Machine Translation Output

[...]

David Vilar¹, Jia Xu¹, Luis Fernando D'Haro¹, Hermann Ney²•Institutions (2)

RWTH Aachen University¹, Technical University of Madrid²

01 May 2006

TL;DR: A framework for classification of the errors of a machine translation system is presented and an error analysis of the system used by the RWTH in the first TC-STAR evaluation is carried out.

...read moreread less

Abstract: Evaluation of automatic translation output is a difficult task. Several performance measures like Word Error Rate, Position Independent Word Error Rate and the BLEU and NIST scores are widely use and provide a useful tool for comparing different systems and to evaluate improvements within a system. However the interpretation of all of these measures is not at all clear, and the identification of the most prominent source of errors in a given system using these measures alone is not possible. Therefore some analysis of the generated translations is needed in order to identify the main problems and to focus the research efforts. This area is however mostly unexplored and few works have dealt with it until now. In this paper we will present a framework for classification of the errors of a machine translation system and we will carry out an error analysis of the system used by the RWTH in the first TC-STAR evaluation.

...read moreread less

293 citations

Journal Article•DOI•

Information extraction from research papers using conditional random fields

[...]

Fuchun Peng¹, Andrew McCallum²•Institutions (2)

BBN Technologies¹, University of Massachusetts Amherst²

01 Jul 2006-Information Processing and Management

TL;DR: This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers, and presents a novel approach for constraint co-reference information extraction.

...read moreread less

Abstract: With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This article employs conditional random fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. CRFs provide a principled way for incorporating various local features, external lexicon features and globle layout features. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. We make an empirical exploration of several factors, including variations on Gaussian, Laplace and hyperbolic-L1 priors for improved regularization, and several classes of features. Based on CRFs, we further present a novel approach for constraint co-reference information extraction; i.e., improving extraction performance given that we know some citations refer to the same publication. On a standard benchmark dataset, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs. On four co-reference IE datasets, our system significantly improves extraction performance, with an error rate reduction of 6-14%.

...read moreread less

249 citations

Journal Article•DOI•

Real-time speaker identification and verification

[...]

Tomi Kinnunen¹, Evgeny Karpov¹, Pasi Fränti¹•Institutions (1)

University of Eastern Finland¹

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper focuses on optimizing vector quantization (VQ) based speaker identification, which reduces the number of test vectors by pre-quantizing the test sequence prior to matching, and thenumber of speakers by pruning out unlikely speakers during the identification process.

...read moreread less

Abstract: In speaker identification, most of the computation originates from the distance or likelihood computations between the feature vectors of the unknown speaker and the models in the database. The identification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers. In this paper, we concentrate on optimizing vector quantization (VQ) based speaker identification. We reduce the number of test vectors by pre-quantizing the test sequence prior to matching, and the number of speakers by pruning out unlikely speakers during the identification process. The best variants are then generalized to Gaussian mixture model (GMM) based modeling. We apply the algorithms also to efficient cohort set search for score normalization in speaker verification. We obtain a speed-up factor of 16:1 in the case of VQ-based modeling with minor degradation in the identification accuracy, and 34:1 in the case of GMM-based modeling. An equal error rate of 7% can be reached in 0.84 s on average when the length of test utterance is 30.4 s.

...read moreread less

248 citations

Proceedings Article•DOI•

Hierarchical Structures of Neural Networks for Phoneme Recognition

[...]

Petr Schwarz¹, Pavel Matejka¹, Jan Cernocky¹•Institutions (1)

Brno University of Technology¹

14 May 2006

TL;DR: This paper deals with phoneme recognition based on neural networks (NN), and focuses on temporal patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers and investigates into tandem NN architectures.

...read moreread less

Abstract: This paper deals with phoneme recognition based on neural networks (NN). First, several approaches to improve the phoneme error rate are suggested and discussed. In the experimental part, we concentrate on TempoRAl Patterns (TRAPs) and novel split temporal context (STC) phoneme recognizers. We also investigate into tandem NN architectures. The results of the final system reported on standard TIMIT database compare favorably to the best published results.

...read moreread less

236 citations

Proceedings Article•DOI•

Prototype-Driven Learning for Sequence Models

[...]

Aria Haghighi¹, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

04 Jun 2006

TL;DR: This work investigates prototype-driven learning for primarily unsupervised sequence modeling, where prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label, then propagated across a corpus using distributional similarity features in a log-linear generative model.

...read moreread less

Abstract: We investigate prototype-driven learning for primarily unsupervised sequence modeling. Prior knowledge is specified declaratively, by providing a few canonical examples of each target annotation label. This sparse prototype information is then propagated across a corpus using distributional similarity features in a log-linear generative model. On part-of-speech induction in English and Chinese, as well as an information extraction task, prototype features provide substantial error rate reductions over competitive baselines and outperform previous work. For example, we can achieve an English part-of-speech tagging accuracy of 80.5% using only three examples of each tag and no dictionary constraints. We also compare to semi-supervised learning and discuss the system's error trends.

...read moreread less

209 citations

Proceedings Article•DOI•

K-best MIMO detection VLSI architectures achieving up to 424 Mbps

[...]

M. Wenk¹, Martin Zellweger¹, Andreas Burg¹, Norbert Felber¹, Wolfgang Fichtner¹ - Show less +1 more•Institutions (1)

ETH Zurich¹

21 May 2006

TL;DR: In this paper, a parallel implementation of the K-best algorithm for MIMO systems is presented, which achieves up to 424 Mbps throughput with an area that is almost on par with current state-of-the-art implementations.

...read moreread less

Abstract: From an error rate performance perspective, maximum likelihood (ML) detection is the preferred detection method for multiple-input multiple-output (MIMO) communication systems. However, for high transmission rates a straight forward exhaustive search implementation suffers from prohibitive complexity. The K-best algorithm provides close-to-ML bit error rate (BER) performance, while its circuit complexity is reduced compared to an exhaustive search. In this paper, a new VLSI architecture for the implementation of the K-best algorithm is presented. Instead of the mostly sequential processing that has been applied in previous VLSI implementations of the algorithm, the presented solution takes a more parallel approach. Furthermore, the application of a simplified norm is discussed. The implementation in an ASIC achieves up to 424 Mbps throughput with an area that is almost on par with current state-of-the-art implementations.

...read moreread less

166 citations

Journal Article•DOI•

Beyond ASR 1-best: Using word confusion networks in spoken language understanding

[...]

Dilek Hakkani-Tur¹, Frédéric Béchet², Giuseppe Riccardi³, Gokhan Tur¹•Institutions (3)

AT&T Labs¹, University of Avignon², University of Trento³

01 Oct 2006-Computer Speech & Language

TL;DR: This paper proposes methods for a tighter integration of ASR and SLU using word confusion networks (WCNs), which provide a compact representation of multiple aligned ASR hypotheses along with word confidence scores, without compromising recognition accuracy.

...read moreread less

Journal Article•DOI•

Adaptation of maximum entropy capitalizer: Little data can help a lot

[...]

Ciprian Chelba¹, Alex Acero¹•Institutions (1)

Microsoft¹

01 Oct 2006-Computer Speech & Language

TL;DR: A novel technique for maximum “a posteriori” (MAP) adaptation of maximum entropy (MaxEnt) and maximum entropy Markov models (MEMM) is presented and automatic capitalization error rate of 1.4% is achieved on BN data.

...read moreread less

Patent•

Word translation device, translation method, and computer readable medium

[...]

Shaoming Liu¹•Institutions (1)

Fuji Xerox¹

22 Feb 2006

TL;DR: In this paper, a word translation device stores a first-language word search TRIE structure (100 ), a second-language Word Search TRIE (200), a first language word information record (110), a second language word record ( 210 ) in a bilingual dictionary.

...read moreread less

Abstract: A word translation device stores a first-language word search TRIE structure ( 100 ), a second-language word search TRIE structure ( 200 ), a first-language word information record ( 110 ), a second-language word information record ( 210 ) in a bilingual dictionary. The first-language word search TRIE structure ( 100 ) is provided for searching for words of the first language. The second-language word search TRIE structure ( 200 ) is provided for searching for words of the second language. The first-language word information record ( 110 ) includes first translation information for identifying a translation of each of the words stored in the first-language word search TRIE structure ( 100 ). The second-language word information record ( 210 ) includes second translation information for identifying a translation of each of the words stored in the second-language word search TRIE structure ( 200 ). By referring to the first and second word information records ( 110 ) and ( 210 ), word translation between the first and second languages can be performed with high precision at high speeds.

...read moreread less

Journal Article•DOI•

Unlimited vocabulary speech recognition with morph language models applied to Finnish

[...]

Teemu Hirsimäki¹, Mathias Creutz¹, Vesa Siivola¹, Mikko Kurimo¹, Sami Virpioja¹, Janne Pylkkönen¹ - Show less +2 more•Institutions (1)

Helsinki University of Technology¹

01 Oct 2006-Computer Speech & Language

TL;DR: This article presents a language-independent algorithm for discovering word fragments in an unsupervised manner from text that uses the Minimum Description Length principle to find an inventory of word fragments that is compact but models the training text effectively.

...read moreread less

Journal Article•DOI•

Analyzing the input stream for character- level errors in unconstrained text entry evaluations

[...]

Jacob O. Wobbrock¹, Brad A. Myers²•Institutions (2)

University of Washington¹, University of Pittsburgh²

01 Dec 2006-ACM Transactions on Computer-Human Interaction

TL;DR: The current work presents an algorithm for the automated analysis of character- level errors in input streams for unconstrained text entry evaluations and presents new character-level metrics that can aid method designers in refining text entry methods.

...read moreread less

Abstract: Recent improvements in text entry error rate measurement have enabled the running of text entry experiments in which subjects are free to correct errors (or not) as they transcribe a presented string. In these “unconstrained” experiments, it is no longer necessary to force subjects to unnaturally maintain synchronicity with presented text for the sake of performing overall error rate calculations. However, the calculation of character-level error rates, which can be trivial in artificially constrained evaluations, is far more complicated in unconstrained text entry evaluations because it is difficult to infer a subject's intention at every character. For this reason, prior character-level error analyses for unconstrained experiments have only compared presented and transcribed strings, not input streams. But input streams are rich sources of character-level error information, since they contain all of the text entered (and erased) by a subject. The current work presents an algorithm for the automated analysis of character-level errors in input streams for unconstrained text entry evaluations. It also presents new character-level metrics that can aid method designers in refining text entry methods. To exercise these metrics, we perform two analyses on data from an actual text entry experiment. One analysis, available from the prior work, uses only presented and transcribed strings. The other analysis uses input streams, as described in the current work. The results confirm that input stream error analysis yields richer information for the same empirical data. To facilitate the use of these new analyses, we offer pseudocode and downloadable software for performing unconstrained text entry experiments and analyzing data.

...read moreread less

Proceedings Article•DOI•

Maximum Entropy Based Restoration of Arabic Diacritics

[...]

Imed Zitouni¹, Jeffrey Sorensen¹, Ruhi Sarikaya¹•Institutions (1)

IBM¹

17 Jul 2006

TL;DR: A maximum entropy approach for restoring diacritics in a document that can easily integrate and make effective use of diverse types of information and integrates a wide array of lexical, segment-based and part-of-speech tag features.

...read moreread less

Abstract: Short vowels and other diacritics are not part of written Arabic scripts. Exceptions are made for important political and religious texts and in scripts for beginning students of Arabic. Script without diacritics have considerable ambiguity because many words with different diacritic patterns appear identical in a diacritic-less setting. We propose in this paper a maximum entropy approach for restoring diacritics in a document. The approach can easily integrate and make effective use of diverse types of information; the model we propose integrates a wide array of lexical, segment-based and part-of-speech tag features. The combination of these feature types leads to a state-of-the-art diacritization model. Using a publicly available corpus (LDC's Arabic Treebank Part 3), we achieve a diacritic error rate of 5.1%, a segment error rate 8.5%, and a word error rate of 17.3%. In case-ending-less setting, we obtain a diacritic error rate of 2.2%, a segment error rate 4.0%, and a word error rate of 7.2%.

...read moreread less

Proceedings Article•

Towards Continuous Speech Recognition Using Surface Electromyography

[...]

Szu-Chen Stan Jou¹, Tanja Schultz¹, Matthias Walliczek¹, Florian Kraft, Alex Waibel - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

01 Jan 2006

TL;DR: This paper demonstrates how to train the phoneme-based acoustic models with carefully designed electromyographic feature extraction methods by decomposing the signal into different feature space and successfully keep the useful information while reducing the noise.

...read moreread less

Abstract: We present our research on continuous speech recognition of the surface electromyographic signals that are generated by the human articulatory muscles. Previous research on electromyographic speech recognition was limited to isolated word recognition because it was very difficult to train phoneme-based acoustic models for the electromyographic speech recognizer. In this paper, we demonstrate how to train the phoneme-based acoustic models with carefully designed electromyographic feature extraction methods. By decomposing the signal into different feature space, we successfully keep the useful information while reducing the noise. Additionally, we also model the anticipatory effect of the electromyographic signals compared to the speech signal. With a 108-word decoding vocabulary, the experimental results show that the word error rate improves from 86.8% to 32.0% by using our novel feature extraction methods. Index Terms: speech recognition, electromyography, articulatory muscles, feature extraction.

...read moreread less

Proceedings Article•DOI•

Distortion Models for Statistical Machine Translation

[...]

Yaser Al-Onaizan¹, Kishore Papineni¹•Institutions (1)

IBM¹

17 Jul 2006

TL;DR: A new distortion model is proposed that can be used with existing phrase-based SMT decoders to address n-gram language model limitations and a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments is proposed.

...read moreread less

Abstract: In this paper, we argue that n-gram language models are not sufficient to address word reordering required for Machine Translation. We propose a new distortion model that can be used with existing phrase-based SMT decoders to address those n-gram language model limitations. We present empirical results in Arabic to English Machine Translation that show statistically significant improvements when our proposed model is used. We also propose a novel metric to measure word order similarity (or difference) between any pair of languages based on word alignments.

...read moreread less

Journal Article•DOI•

Large margin hidden Markov models for speech recognition

[...]

Hui Jiang¹, Xinwei Li¹, Chaojun Liu¹•Institutions (1)

York University¹

01 Sep 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin by using a penalized gradient descent algorithm.

...read moreread less

Abstract: In this paper, motivated by large margin classifiers in machine learning, we propose a novel method to estimate continuous-density hidden Markov model (CDHMM) for speech recognition according to the principle of maximizing the minimum multiclass separation margin. The approach is named large margin HMM. First, we show this type of large margin HMM estimation problem can be formulated as a constrained minimax optimization problem. Second, we propose to solve this constrained minimax optimization problem by using a penalized gradient descent algorithm, where the original objective function, i.e., minimum margin, is approximated by a differentiable function and the constraints are cast as penalty terms in the objective function. The new training method is evaluated in the speaker-independent isolated E-set recognition and the TIDIGITS connected digit string recognition tasks. Experimental results clearly show that the large margin HMMs consistently outperform the conventional HMM training methods. It has been consistently observed that the large margin training method yields significant recognition error rate reduction even on top of some popular discriminative training methods

...read moreread less

Proceedings Article•DOI•

Semi-Supervised Training for Statistical Word Alignment

[...]

Alexander Fraser¹, Daniel Marcu¹•Institutions (1)

University of Southern California¹

17 Jul 2006

TL;DR: A semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus is introduced.

...read moreread less

Abstract: We introduce a semi-supervised approach to training for statistical machine translation that alternates the traditional Expectation Maximization step that is applied on a large training corpus with a discriminative step aimed at increasing word-alignment quality on a small, manually word-aligned sub-corpus. We show that our algorithm leads not only to improved alignments but also to machine translation outputs of higher quality.

...read moreread less

Journal Article•DOI•

Quantile based histogram equalization for noise robust large vocabulary speech recognition

[...]

F. Hilger, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper will give a detailed review of quantile equalization applied to the Mel scaled filter bank, including considerations about the application in online systems and improvements through a second transformation step that combines neighboring filter channels.

...read moreread less

Abstract: The noise robustness of automatic speech recognition systems can be improved by reducing an eventual mismatch between the training and test data distributions during feature extraction. Based on the quantiles of these distributions the parameters of transformation functions can be reliably estimated with small amounts of data. This paper will give a detailed review of quantile equalization applied to the Mel scaled filter bank, including considerations about the application in online systems and improvements through a second transformation step that combines neighboring filter channels. The recognition tests have shown that previous experimental observations on small vocabulary recognition tasks can be confirmed on the larger vocabulary Aurora 4 noisy Wall Street Journal database. The word error rate could be reduced from 45.7% to 25.5% (clean training) and from 19.5% to 17.0% (multicondition training).

...read moreread less

Proceedings Article•

Cross-System Adaptation and Combination for Continuous Speech Recognition: The Influence of Phoneme Set and Acoustic Front-End

[...]

Sebastian Stüker, Christian Fügen, Susanne Burger, Matthias Wölfel¹•Institutions (1)

Carnegie Mellon University¹

01 Jan 2006

TL;DR: The gains seen with cross-system adaptation and system combination methods are demonstrated and it is shown that sequences of adaption and decoding make it possible to incrementally improve the performance of the recognition system.

...read moreread less

Abstract: Cross-system adaptation and system combination methods,such as ROVER and confusion network combination, areknown to lower the word error rate of speech recognitionsystems. They require the training of systems that are rea-sonably close in performance but at the same time produceoutput that differs in its errors. This provides complemen-taryinformationwhichleadstoperformanceimprovements.In this paper we demonstrate the gains we have seen withcross-systemadaptationandsystemcombinationontheEn-glish EPPS and RT0-05S lecture meeting task. We obtainedthe necessary varying systems by using different acous-tic front-ends and phoneme sets on which our models arebased. Inasetofcontrastiveexperimentsweshowtheinﬂu-ence that the exchange of the components has on adaptationand system combination.Index Terms: automatic speech recognition, system com-bination, cross adaptation, EPPS, RT-05S. 1. Introduction In state-of-the-art speech recognition systems it is commonpractice to use multi-pass systems with adaptation of theacoustic model in-between passes. The adaptation aims atbetter ﬁtting the system to the speakers and/or acoustic en-vironmentsfoundinthetestdata. Itisusuallyperformedona by-speaker basis, obtained either from manual speaker la-bels or automatic clustering methods. Common adaptationmethods try to transform either the models used in a systemor the features to which the models are applied.Three adaptation methods that can be found in manystate-of-the-art systems are Maximum Likelihood LinearRegression (MLLR) [1], a model transformation, Vo-cal Tract Length Normalization (VTLN) [2] and feature-space constrained MLLR (fMLLR) [3], two feature-transformation methods. Adaptation is performed in an un-supervisedmanner,suchthattheerror-pronehypothesesob-tainedfromthepreviousdecodingpassaretakenasthenec-essary reference for adaptation. Generally, the word errorrates of the hypotheses obtained from the adapted systemsarelowerthanthoseforhypothesesonwhichtheadaptationwas performed. This sequences of adaption and decodingmake it possible to incrementally improve the performanceof the recognition system. Unfortunately, this loop of adap-tation and decoding does not always lead to signiﬁcant im-provements. Often, after two or three stages of adapting asystem on its own output, no more gains can be obtained.This problem can be overcome by adapting a system

...read moreread less

Journal Article•DOI•

A study in machine learning from imbalanced data for sentence boundary detection in speech

[...]

Yang Liu¹, Yang Liu², Nitesh V. Chawla³, Mary P. Harper¹, Elizabeth Shriberg², Elizabeth Shriberg⁴, Andreas Stolcke², Andreas Stolcke⁴ - Show less +4 more•Institutions (4)

Purdue University¹, International Computer Science Institute², University of Notre Dame³, SRI International⁴

01 Oct 2006-Computer Speech & Language

TL;DR: A hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information is constructed and Bagging was found to significantly improve system performance for each of the sampling methods.

...read moreread less

Posted Content•

A General Method for Finding Low Error Rates of LDPC Codes

[...]

Chad A. Cole, Stephen G. Wilson, Eric K. Hall, T.R. Giallorenzi

11 May 2006-arXiv: Information Theory

TL;DR: This paper outlines a three-step procedure for determining the low bit error rate performance curve of a wide class of LDPC codes of moderate length and allows one to efficiently see error performance at bit error rates that were previously out of reach of Monte Carlo methods.

...read moreread less

Abstract: This paper outlines a three-step procedure for determining the low bit error rate performance curve of a wide class of LDPC codes of moderate length. The traditional method to estimate code performance in the higher SNR region is to use a sum of the contributions of the most dominant error events to the probability of error. These dominant error events will be both code and decoder dependent, consisting of low-weight codewords as well as non-codeword events if ML decoding is not used. For even moderate length codes, it is not feasible to find all of these dominant error events with a brute force search. The proposed method provides a convenient way to evaluate very low bit error rate performance of an LDPC code without requiring knowledge of the complete error event weight spectrum or resorting to a Monte Carlo simulation. This new method can be applied to various types of decoding such as the full belief propagation version of the message passing algorithm or the commonly used min-sum approximation to belief propagation. The proposed method allows one to efficiently see error performance at bit error rates that were previously out of reach of Monte Carlo methods. This result will provide a solid foundation for the analysis and design of LDPC codes and decoders that are required to provide a guaranteed very low bit error rate performance at certain SNRs.

...read moreread less

Patent•

Speech index pruning

[...]

Alejandro Acero¹, Ciprian Chelba¹, Jorge Silva F. Sanchez¹•Institutions (1)

Microsoft¹

31 Oct 2006

TL;DR: In this paper, a speech segment is indexed by identifying at least two alternative word sequences for the speech segment, and information is placed in an entry for the word in the index.

...read moreread less

Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.

...read moreread less

Journal Article•DOI•

On the importance of phase in human speech recognition

[...]

Guangji Shi¹, Maryam M. Shanechi¹, Parham Aarabi¹•Institutions (1)

University of Toronto¹

01 Sep 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The results indicate that a small amount ofphase error or uncertainty does not affect the recognition rate, but a large amount of phase uncertainty significantly affects the recognition rates.

...read moreread less

Abstract: In this paper, we analyze the effects of uncertainty in the phase of speech signals on the word recognition error rate of human listeners. The motivating goal is to get a quantitative measure on the importance of phase in automatic speech recognition by studying the effects of phase uncertainty on human perception. Listening tests were conducted for 18 listeners under different phase uncertainty and signal-to-noise ratio (SNR) conditions. These results indicate that a small amount of phase error or uncertainty does not affect the recognition rate, but a large amount of phase uncertainty significantly affects the recognition rate. The degree of the importance of phase also seems to be an SNR-dependent one, such that at lower SNRs the effects of phase uncertainty are more pronounced than at higher SNRs. For example, at an SNR of -10 dB, having random phases at all frequencies results in a word error rate (WER) of 63% compared to 24% if the phase was unaltered. In comparison, at 0 dB, random phase results in a 25% WER as compared to 11% for the unaltered phase case. Listening tests were also conducted for the case of reconstructed phase based on the least square error estimation approach. The results indicate that the recognition rate for the reconstructed phase case is very close to that of the perfect phase case (a WER difference of 4% on average)

...read moreread less

Journal Article•DOI•

3D shape-based face representation and feature extraction for face recognition

[...]

Berk Gökberk¹, M. Okan Irfanoglu¹, Lale Akarun¹•Institutions (1)

Boğaziçi University¹

01 Aug 2006-Image and Vision Computing

TL;DR: 3D face registration and recognition algorithms, which are based solely on 3D shape information and analyze methods based on the fusion of shape features, and fusion schemes such as product rules, improved consensus voting and proposed serial fusion schemes improve the classification accuracy are reviewed.

...read moreread less

Proceedings Article•DOI•

GEN03-6: Investigation of Error Floors of Structured Low-Density Parity-Check Codes by Hardware Emulation

[...]

Zhengya Zhang¹, Lara Dolecek¹, Borivoje Nikolic¹, Venkat Anantharam¹, Martin J. Wainwright¹ - Show less +1 more•Institutions (1)

University of California, Berkeley¹

01 Nov 2006

TL;DR: A parallel-serial architecture is designed to map the decoder of any structured LDPC code in this large family to a hardware emulation platform and this new characterization leads to an improved decoding strategy and higher performance.

...read moreread less

Abstract: Several high performance LDPC codes have parity-check matrices composed of permutation submatrices. We design a parallel-serial architecture to map the decoder of any structured LDPC code in this large family to a hardware emulation platform. A peak throughput of 240 Mb/s is achieved in decoding the (2048,1723) Reed-Solomon based LDPC (RS-LDPC) code. Experiments in the low bit error rate (BER) region provide statistics of the error traces, which are used to investigate the causes of the error floor. In a low precision implementation, the error floors are dominated by the fixed-point decoding effects, whereas in a higher precision implementation the errors are attributed to special configurations within the code, whose effect is exacerbated in a fixed-point decoder. This new characterization leads to an improved decoding strategy and higher performance.

...read moreread less

Journal Article•DOI•

Visual model structures and synchrony constraints for audio-visual speech recognition

[...]

Timothy J. Hazen¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Dec 2006-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The design and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy that allows the visual classifier to process visual frames with a constrained amount of asynchrony relative to proposed acoustic segments is presented.

...read moreread less

Abstract: This paper presents the design and evaluation of a speaker-independent audio-visual speech recognition (AVSR) system that utilizes a segment-based modeling strategy. The audio and visual feature streams are integrated using a segment-constrained hidden Markov model, which allows the visual classifier to process visual frames with a constrained amount of asynchrony relative to proposed acoustic segments. The core experiments in this paper investigate several different visual model structures, each of which provides a different means for defining the units of the visual classifier and the synchrony constraints between the audio and visual streams. Word recognition experiments are conducted on the AV-TIMIT corpus under variable additive noise conditions. Over varying acoustic signal-to-noise ratios, word error rate reductions between 14% and 60% are observed when integrating the visual information into the automatic speech recognition process.

...read moreread less

Journal Article•DOI•

New results on ordered statistics and analysis of minimum-selection generalized selection combining (GSC)

[...]

Hong-Chuan Yang¹•Institutions (1)

Victoria University, Australia¹

01 Jul 2006-IEEE Transactions on Wireless Communications

TL;DR: A thorough and exact analysis of minimum-selection generalized selection combining (MS-GSC) is carried out, based on a new result on order statistics, and the closed-form expressions of important performance measures are derived for the Rayleigh fading scenario.

...read moreread less

Abstract: Diversity combining techniques improve the performance of wireless communication systems at the cost of increased power consumption. Minimum-selection generalized selection combining (MS-GSC) scheme has been proposed as a power saving implementation of conventional generalized selection combining (GSC) scheme. In this paper, noting that previous analytical results on the error rate of MS-GSC are approximate, we carry out a thorough and exact analysis for MS-GSC. In particular, based on a new result on order statistics, we obtain the statistics of the combined SNR with MS-GSC and we then apply these results to analyze the performance of MS-GSC over fading channels. We derive the closed-form expressions of important performance measures, including outage probability and average error rate, for the Rayleigh fading scenario. In addition, we investigate the average number of active MRC branches with MS-GSC, as a quantification of the power saving

...read moreread less

Collapse