Showing papers on "Word error rate published in 2004"

PDF

Open Access

Book Chapter•DOI•

[...]

João Gama¹, Pedro Medas¹, Gladys Castillo¹, Gladys Castillo², Pedro Pereira Rodrigues¹ - Show less +1 more•Institutions (2)

University of Porto¹, University of Aveiro²

29 Sep 2004

TL;DR: A method for detection of changes in the probability distribution of examples, to control the online error-rate of the algorithm and to observe that the method is independent of the learning algorithm.

...read moreread less

Abstract: Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k w , and the drift level at example k d . This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k w . The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.

...read moreread less

1,256 citations

Proceedings Article•DOI•

Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources

[...]

Bill Dolan¹, Chris Quirk¹, Chris Brockett¹•Institutions (1)

Microsoft¹

23 Aug 2004

TL;DR: Investigation of unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources shows that edit distance data is cleaner and more easily-aligned than the heuristic data.

...read moreread less

Abstract: We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results show that edit distance data is cleaner and more easily-aligned than the heuristic data, with an overall alignment error rate (AER) of 11.58% on a similarly-extracted test set. On test data extracted by the heuristic strategy, however, performance of the two training sets is similar, with AERs of 13.2% and 14.7% respectively. Analysis of 100 pairs of sentences from each set reveals that the edit distance data lacks many of the complex lexical and syntactic alternations that characterize monolingual paraphrase. The summary sentences, while less readily alignable, retain more of the non-trivial alternations that are of greatest interest learning paraphrase relationships.

...read moreread less

895 citations

Journal Article•DOI•

FloatBoost learning and statistical face detection

[...]

Stan Z. Li¹, ZhenQiu Zhang²•Institutions (2)

Microsoft¹, University of Illinois at Urbana–Champaign²

01 Sep 2004-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.

...read moreread less

Abstract: A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.

...read moreread less

585 citations

Proceedings Article•

Monolingual Machine Translation for Paraphrase Generation

[...]

Chris Quirk, Chris Brockett, William B. Dolan

01 Jul 2004

TL;DR: Human evaluation shows that this SMT system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches.

...read moreread less

Abstract: We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches.

...read moreread less

388 citations

Journal Article•DOI•

Offline recognition of unconstrained handwritten texts using HMMs and statistical language models

[...]

Horst Bunke, Samy Bengio, Alessandro Vinciarelli¹•Institutions (1)

University of Bern¹

01 Jun 2004-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The use of language models is shown to improve the accuracy of the system and the approach is described in detail and compared with other methods presented in the literature to deal with the same problem.

...read moreread less

Abstract: This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of statistical language models in order to improve the performance of our system. Several experiments have been performed using both single and multiple writer data. Lexica of variable size (from 10,000 to 50,000 words) have been used. The use of language models is shown to improve the accuracy of the system (when the lexicon contains 50,000 words, the error rate is reduced by /spl sim/50 percent for single writer data and by /spl sim/25 percent for multiple writer data). Our approach is described in detail and compared with other methods presented in the literature to deal with the same problem. An experimental setup to correctly deal with unconstrained text recognition is proposed.

...read moreread less

325 citations

Proceedings Article•

Accurate Information Extraction from Research Papers using Conditional Random Fields

[...]

Fuchun Peng¹, Andrew McCallum¹•Institutions (1)

University of Massachusetts Amherst¹

01 Jan 2004

TL;DR: New state-of-the-art performance is achieved on a standard benchmark data set, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results.

...read moreread less

Abstract: With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This paper employs Conditional Random Fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. This paper makes an empirical exploration of several factors, including variations on Gaussian, exponential and hyperbolic-L1 priors for improved regularization, and several classes of features and Markov order. On a standard benchmark data set, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs.

...read moreread less

319 citations

Journal Article•DOI•

Applications of support vector machines to speech recognition

[...]

Aravind Ganapathiraju, J.E. Hamaker¹, Joseph Picone²•Institutions (2)

Microsoft¹, Mississippi State University²

01 Aug 2004-IEEE Transactions on Signal Processing

TL;DR: It is shown that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data and an application of SVMs to large vocabulary speech recognition is described.

...read moreread less

Abstract: Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Alphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.

...read moreread less

278 citations

Journal Article•DOI•

multithreaded context for robust conversational interfaces: Context-sensitive speech recognition and interpretation of corrective fragments

[...]

Oliver Lemon¹, Alexander Gruenstein²•Institutions (2)

University of Edinburgh¹, Stanford University²

01 Sep 2004-ACM Transactions on Computer-Human Interaction

TL;DR: It is shown that by using context-sensitive recognition based on the predicted type of the user's next dialogue move, a more flexible dialogue system can also exhibit an improvement in speech recognition performance.

...read moreread less

Abstract: We focus on the issue of robustness of conversational interfaces that are flexible enough to allow natural "multithreaded" conversational flow. Our main advance is to use context-sensitive speech recognition in a general way, with a representation of dialogue context that is rich and flexible enough to support conversation about multiple interleaved topics, as well as the interpretation of corrective fragments. We explain, by use of worked examples, the use of our "Conversational Intelligence Architecture" (CIA) to represent conversational threads, and how each thread can be associated with a language model (LM) for more robust speech recognition. The CIA uses fine-grained dynamic representations of dialogue context, which supersede those used in finite-state or form-based dialogue managers. In an evaluation of a dialogue system built using this architecture we found that 87.9p of recognized utterances were recognized using a context-specific language model, resulting in an 11.5p reduction in the overall utterance recognition error rate, and a 13.4p reduction in concept error rate. Thus we show that by using context-sensitive recognition based on the predicted type of the user's next dialogue move, a more flexible dialogue system can also exhibit an improvement in speech recognition performance.

...read moreread less

249 citations

Proceedings Article•DOI•

Bootstrap estimates for confidence intervals in ASR performance evaluation

[...]

M. Bisani, Hermann Ney

17 May 2004

TL;DR: A bootstrap method for significance analysis is presented which is, at the same time, intuitive, precise and and easy to use and immediately interpretable in terms of word error rate.

...read moreread less

Abstract: The field of speech recognition has clearly benefited from precisely defined testing conditions and objective performance measures such as word error rate. In the development and evaluation of new methods, the question arises whether the empirically observed difference in performance is due to a genuine advantage of one system over the other, or just an effect of chance. However, many publications still do not concern themselves with the statistical significance of the results reported. We present a bootstrap method for significance analysis which is, at the same time, intuitive, precise and and easy to use. Unlike some methods, we make no (possibly ill-founded) approximations and the results are immediately interpretable in terms of word error rate.

...read moreread less

236 citations

Proceedings Article•DOI•

Discriminative Language Modeling with Conditional Random Fields and the Perceptron Algorithm

[...]

Brian Roark¹, Murat Saraclar¹, Michael Collins², Mark Johnson³•Institutions (3)

AT&T Labs¹, Massachusetts Institute of Technology², Brown University³

21 Jul 2004

TL;DR: This paper compares two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs), which have the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data.

...read moreread less

Abstract: This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5% reduction in word error rate, for a total 1.8% absolute reduction from the baseline of 39.2%.

...read moreread less

187 citations

Journal Article•DOI•

Error rate performance of coded free-space optical links over strong turbulence channels

[...]

Murat Uysal¹, S.M. Navidpour¹, Jing Li¹•Institutions (1)

University of Waterloo¹

18 Oct 2004-IEEE Communications Letters

TL;DR: An upper bound on the pairwise error probability (PEP) is derived and the union-bound technique is applied in conjunction with the derived PEP to obtain upper bounds on the bit error rate.

...read moreread less

Abstract: Error control coding can be used over free-space optical (FSO) links to mitigate turbulence-induced fading. We present error rate performance bounds for coded FSO communication systems operating over atmospheric turbulence channels, which are modeled as a correlated K distribution under strong turbulence conditions. We derive an upper bound on the pairwise error probability (PEP) and then apply the union-bound technique in conjunction with the derived PEP to obtain upper bounds on the bit error rate. Simulation results are further demonstrated to verify the analytical results.

...read moreread less

Patent•

System and method for configuring a solid-state storage device with error correction coding

[...]

Sarah M. Brandenberger, Terrel R. Munden, Jonathan Jedwab, James Davis, David Banks - Show less +1 more

29 Jan 2004

TL;DR: In this paper, a system for configuring solid-state storage devices comprises a solidstate storage device and an error correction code (ECC) selection system, which is configured to automatically select a set of error correction codes based on an error rate of the storage device.

...read moreread less

Abstract: A system for configuring solid-state storage devices comprises a solid-state storage device and an error correction code (ECC) selection system. The ECC selection system is configured to automatically select a set of error correction code based on an error rate of the storage device. The ECC selection system is further configured to install the selected set of error correction code in the solid-state storage device.

...read moreread less

Proceedings Article•DOI•

From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.

[...]

Andrew C. Morris, Viktoria Maier, Phil D. Green

04 Oct 2004

TL;DR: Two new absolute CSR performance measures are introduced: MER (match error rate) and WIL (word information lost), which are a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure.

...read moreread less

Abstract: The word error rate (WER), commonly used in ASR assessment, measures the cost of restoring the output word sequence to the original input sequence. However, for most CSR applications apart from dictation machines a more meaningful performance measure would be given by the proportion of information communicated. In this article we introduce two new absolute CSR performance measures: MER (match error rate) and WIL (word information lost). MER is the proportion of I/O word matches which are errors. WIL is a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure that was proposed half a century ago. Issues relating to ideal performance measurement are discussed and the commonly used Viterbi input/output alignment procedure, with zero weight for hits and equal weight for substitutions, deletions and insertions, is shown to be optimal.

...read moreread less

Proceedings Article•DOI•

Language recognition using phone latices.

[...]

Jean-Luc Gauvain, Abdelkhalek Messaoudi, Holger Schwenk

04 Oct 2004

TL;DR: The use of phone lattices both in training and testing significantly improves the accuracy of a phonotactically based LID system and is further enhanced by using a neural network to combine the results of multiple phone recognizers.

...read moreread less

Abstract: This paper proposes a new phone lattice based method for automatic language recognition from speech data. By using phone lattices some approximations usually made by language identification (LID) systems relying on phonotactic constraints to simplify the training and decoding processes can be avoided. We demonstrate the use of phone lattices both in training and testing significantly improves the accuracy of a phonotactically based LID system. Performance is further enhanced by using a neural network to combine the results of multiple phone recognizers. Using three phone recognizers with context independent phone models, the system achieves an equal error rate of 2.7% on the Eval03 NIST detection test (30s segment, primary condition) with an overall decoding process that runs faster than real-time (0.5xRT).

...read moreread less

Proceedings Article•DOI•

Cache scrubbing in microprocessors: myth or necessity?

[...]

Shubhendu S. Mukherjee¹, Joel Emer¹, Tryggve Fossum¹, S.K. Reinhardt¹•Institutions (1)

Intel¹

03 Mar 2004

TL;DR: This work shows how to compute the mean time to failure for temporal double-bit errors and shows how fixed-interval scrubbing - in which error checkers periodically access cache blocks and remove single- bit errors - can mitigate such errors in processor caches.

...read moreread less

Abstract: Transient faults from neutron and alpha particle strikes in large SRAM caches have become a major problem for microprocessor designers. To protect these caches, designers often use error correcting codes (ECC), which typically provide single-bit error correction and double-bit error detection (SECDED). Unfortunately, two separate strikes could still flip two different bits in the same ECC-protected word. This we call a temporal double-bit error. SECDED ECC can only detect, not correct such errors. We show how to compute the mean time to failure for temporal double-bit errors. Additionally, we show how fixed-interval scrubbing - in which error checkers periodically access cache blocks and remove single-bit errors - can mitigate such errors in processor caches. Our analysis using current soft error rates shows that only very large caches (e.g., hundreds of megabytes to gigabytes) need scrubbing to reduce the temporal double-bit error rate to a tolerable range.

...read moreread less

Proceedings Article•DOI•

Link error prediction methods for multicarrier systems

[...]

Yufei W. Blankenship¹, Philippe Sartori¹, Brian K. Classon¹, Vipul Desai¹, Kevin L. Baum¹ - Show less +1 more•Institutions (1)

Motorola¹

26 Sep 2004

TL;DR: The accuracy of two FER prediction methods is studied: Packet error rate indicator (PER-indicator) and exponential effective SIR mapping (Exp-ESM) which are shown to have accuracy within a few tenths of a dB under a wide range of modulation schemes, coding rates and channel types.

...read moreread less

Abstract: Multicarrier modulations such as OFDM with adaptive modulation and coding (AMC) are well suited for high data rate broadband systems that operate in multipath environments and are considered as promising candidates for future generation cellular systems (e.g., 4G). Cellular system performance is normally investigated with system level simulations that are computationally complex. For broadband multicarrier systems, incorporating a detailed physical layer emulator into the system simulator becomes impractical, so there is a need for simplified link performance predictors. However, due to the large variability of the channel in the frequency domain, two links with the same average SNR can experience drastically different performance, thus making it difficult to accurately predict the instantaneous link performance such as the frame error rate. In this paper, the accuracy of two FER prediction methods is studied: Packet error rate indicator (PER-indicator) and exponential effective SIR mapping (Exp-ESM). Both methods are shown to have accuracy within a few tenths of a dB under a wide range of modulation schemes, coding rates and channel types. These methods are then extended to handle more advanced link enhancements such as hybrid ARQ and Alamouti encoding. The Exp-ESM method has slightly better accuracy than the PER-indicator, and is the preferred link error predictor for a system simulator.

...read moreread less

Proceedings Article•DOI•

Statistical Machine Translation with Word- and Sentence-Aligned Parallel Corpora

[...]

Chris Callison-Burch¹, David Talbot¹, Miles Osborne¹•Institutions (1)

University of Edinburgh¹

21 Jul 2004

TL;DR: It is shown that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training by incorporating word-level alignments into the parameter estimation of the IBM models.

...read moreread less

Abstract: The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain.

...read moreread less

Proceedings Article•DOI•

Morphology-Based Language Modeling for Arabic Speech Recognition

[...]

Dimitra Vergyri¹, Katrin Kirchhoff¹, Kevin Duh², Andreas Stolcke²•Institutions (2)

SRI International¹, University of Washington²

08 Oct 2004

TL;DR: This paper explores the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic and evaluates the techniques on a large-vocabulary recognition task and demonstrates that they lead to perplexity and word error rate reductions.

...read moreread less

Abstract: : Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Class-based and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures: the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.

...read moreread less

Journal Article•

Towards lower error rates in phoneme recognition

[...]

Petr Schwarz, Pavel Matejka, Jan Cernocky

01 Jan 2004-Lecture Notes in Computer Science

TL;DR: This work investigates techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database and develops a faster system with about 23.6% relative improvement over the baseline in phoneme error rate.

...read moreread less

Abstract: We investigate techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database. The baseline phoneme recognizer is based on TempoRAl Patterns (TRAP). This recognizer is simplified to shorten processing times and reduce computational requirements. More states per phoneme and bi-gram language models are incorporated into the system and evaluated. The question of insufficient amount of training data is discussed and the system is improved. All modifications lead to a faster system with about 23.6% relative improvement over the baseline in phoneme error rate.

...read moreread less

Proceedings Article•DOI•

Improving IBM Word Alignment Model 1

[...]

Robert C. Moore¹•Institutions (1)

Microsoft¹

21 Jul 2004

TL;DR: Reduction in alignment error rate is demonstrated resulting from giving extra weight to the probability of alignment to the null word, smoothing probability estimates for rare words, and using a simple heuristic estimation method to initialize, or replace, EM training of model parameters.

...read moreread less

Abstract: We investigate a number of simple methods for improving the word-alignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM training of model parameters.

...read moreread less

On the Use of Information Retrieval Measures for Speech Recognition Evaluation

[...]

Iain McCowan, Darren Moore, John Dines, Daniel Gatica-Perez, Michael J. Flynn, Pierre Wellner, Hervé Bourlard - Show less +3 more

01 Jan 2004

TL;DR: It is suggested that posing speech recognition evaluation as an information retrieval problem, where each word is one unit of information, offers a flexible framework for application-oriented performance analysis based on the concepts of recall and precision.

...read moreread less

Abstract: This paper discusses the evaluation of automatic speech recognition (ASR) systems developed for practical applications, suggesting a set of criteria for application-oriented performance measures. The commonly used word error rate (WER), which poses ASR evaluation as a string editing process, is shown to have a number of limitations with respect to these criteria, motivating alternative or additional measures. This paper suggests that posing speech recognition evaluation as an information retrieval problem, where each word is one unit of information, offers a flexible framework for application-oriented performance analysis based on the concepts of recall and precision.

...read moreread less

Proceedings Article•DOI•

Design and implementation of a Bluetooth signal strength based location sensing system

[...]

Udana Bandara, Mikio Hasegawa, Masugi Inoue, Hiroyuki Morikawa, T. Aoyama¹ - Show less +1 more•Institutions (1)

University of Tokyo¹

19 Sep 2004

TL;DR: This work proposes a location sensing system based on the widely available Bluetooth medium, and discusses problems which arise when the Bluetooth RSSI is used as a signal strength indicator, and proposes a novel access point that supports variable attenuators to overcome these problems.

...read moreread less

Abstract: In a ubiquitous computing environment, location awareness is a basic necessity There are various research projects, which discuss the problem of indoor location sensing We have recognized acceptability, low power consumption, and cost as the key design factors for developing widely deployable location sensing systems As a good candidate technology that could satisfy these needs, we proposed a location sensing system based on the widely available Bluetooth medium Location evaluation is preformed by sensing Bluetooth signal strength with a reference model based approach We discuss problems which arise when the Bluetooth RSSI (received signal strength indicator) is used as a signal strength indicator, and propose a novel access point that supports variable attenuators to overcome these problems This access point allows the reading of a wider range of signal strengths using RSSI We show that our approach to location sensing has reduced the error rate about threefold compared to systems which do not use variable attenuator supported access points

...read moreread less

Proceedings Article•DOI•

Noise robust speech recognition with a switching linear dynamic model

[...]

Jasha Droppo¹, A. Acero¹•Institutions (1)

Microsoft¹

03 Sep 2004

TL;DR: This paper explores using a switching linear dynamic model (LDM) for the clean speech and presents preliminary results demonstrating that, even with relatively small model sizes, substantial word error rate improvement can be achieved.

...read moreread less

Abstract: Model based feature enhancement techniques are constructed from acoustic models for speech and noise, together with a model of how the speech and noise produce the noisy observations. Most techniques incorporate either Gaussian mixture models (GMM) or hidden Markov models (HMM). This paper explores using a switching linear dynamic model (LDM) for the clean speech. The linear dynamics of the model capture the smooth time evolution of speech. The switching states of the model capture the piecewise stationary characteristics of speech. However, incorporating a switching LDM causes the enhancement problem to become intractable. With a GMM or an HMM, the enhancement running time is proportional to the length of the utterance. The switching LDM causes the running time to become exponential in the length of the utterance. To overcome this drawback, the standard generalized pseudo-Bayesian technique is used to provide an approximate solution of the enhancement problem. We present preliminary results demonstrating that, even with relatively small model sizes, substantial word error rate improvement can be achieved.

...read moreread less

Proceedings Article•DOI•

Model selection via the AUC

[...]

Saharon Rosset¹•Institutions (1)

IBM¹

04 Jul 2004

TL;DR: It is shown that the AUC may be preferable to empirical error even in this case and the tradeoff between approximation error and estimation error underlying this phenomenon is discussed.

...read moreread less

Abstract: We present a statistical analysis of the AUC as an evaluation criterion for classification scoring models. First, we consider significance tests for the difference between AUC scores of two algorithms on the same test set. We derive exact moments under simplifying assumptions and use them to examine approximate practical methods from the literature. We then compare AUC to empirical misclassification error when the prediction goal is to minimize future error rate. We show that the AUC may be preferable to empirical error even in this case and discuss the tradeoff between approximation error and estimation error underlying this phenomenon.

...read moreread less

Book Chapter•DOI•

Towards Lower Error Rates in Phoneme Recognition

[...]

Petr Schwarz¹, Pavel Matějka¹, Jan Cernocký¹•Institutions (1)

Brno University of Technology¹

08 Sep 2004

TL;DR: In this article, the authors investigated techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database and reported a 23.6% relative improvement over the baseline in phoneme error rate.

...read moreread less

Patent•

Method and apparatus handling speech recognition errors in spoken dialogue systems

[...]

Jung-Eun Kim¹, Jae-won Lee¹•Institutions (1)

Samsung¹

05 Aug 2004

TL;DR: In this article, a meta-dialogue generation unit generates a question asking the user for additional information based on a content of a portion where the error exists and a type of the error.

...read moreread less

Abstract: In order to handle portions of a recognized sentence having an error, a speaker or user is questioned about contents associated with the portions, and according to a user's answer a result is obtained. A speech recognition unit extracts a speech feature of a speech signal inputted from a user and finds a phoneme nearest to the speech feature to recognize a word. A recognition error determination unit finds a sentence confidence based on a confidence of the recognized word, performs examination of a semantic structure of a recognized sentence, and determines whether or not an error exists in the recognized sentence which is subject to speech recognition according to a predetermined criterion based on both the sentence confidence and a result of examining the semantic structure of the recognized sentence. Further, a meta-dialogue generation unit generates a question asking the user for additional information based on a content of a portion where the error exists and a type of the error.

...read moreread less

Proceedings Article•DOI•

A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech.

[...]

Peng Yu¹, Frank Seide¹•Institutions (1)

Microsoft¹

04 Oct 2004

TL;DR: In this paper, the authors presented a system for phonetic indexing and searching of spontaneous speech based on phoneme lattices and combined it with word-based search into a hybrid approach.

...read moreread less

Abstract: For efficient organization of speech recordings – meetings, interviews, voice mails, and lectures – being able to search for spoken keywords is essential. Today, most spoken document retrieval systems use large-vocabulary recognition. For the above scenarios, such systems suffer from the unpredictable domain, out-ofvocabulary queries, and generally high word-error rate (WER). In [1], we presented a system for phonetic indexing and searching of spontaneous speech. It is vocabulary-independent and based on phoneme lattices. In the present paper, we propose to combine it with word-based search into a hybrid approach. We explore two methods of combination: posterior combination (merging search results of a word-based and a phoneme-based system) and prior combination (combining word and phoneme language models and vocabularies to form a hybrid recognizer). The search accuracy of our best purely phonetic baseline is 64% (Figure of Merit), and our purely word-based baselines are below 50%. The new hybrid approach achieves 73%, if the recognizer uses a language model that matches the test-set domain. With a mismatched language model, 71% is achieved. Our results show that the proposed hybrid model benefits from the best of two worlds: Word-level language context and robustness of phonetic search to unknown words and domain mismatch.

...read moreread less

Journal Article•DOI•

Exact error-rate analysis of diversity 16-QAM with channel estimation error

[...]

Lingzhi Cao¹, Norman C. Beaulieu¹•Institutions (1)

University of Alberta¹

21 Jun 2004-IEEE Transactions on Communications

TL;DR: The bit-error rate (BER) performance of multilevel quadrature amplitude modulation with pilot-symbol-assisted modulation channel estimation in static and Rayleigh fading channels is derived, both for single branch reception and maximal ratio combining diversity receiver systems.

...read moreread less

Abstract: The bit-error rate (BER) performance of multilevel quadrature amplitude modulation with pilot-symbol-assisted modulation channel estimation in static and Rayleigh fading channels is derived, both for single branch reception and maximal ratio combining diversity receiver systems. The effects of noise and estimator decorrelation on the received BER are examined. The high sensitivity of diversity systems to channel estimation error is investigated and quantified. The influence of the pilot-symbol interpolation filter windowing is also considered.

...read moreread less

Journal Article•DOI•

Generating non-native pronunciation variants for lexicon adaptation

[...]

Silke Goronzy, Stefan Rapp, Ralf Kompe

01 Jan 2004-Speech Communication

TL;DR: This work uses an English phoneme recogniser to generate English pronunciations for German words and uses these to train decision trees that are able to predict the respective English-accented variant from the German canonical transcription, and combines this approach with online, incremental weighted MLLR speaker adaptation.

...read moreread less

Proceedings Article•DOI•

Symmetric word alignments for statistical machine translation

[...]

Evgeny Matusov¹, Richard Zens¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

23 Aug 2004

TL;DR: This paper addresses the word alignment problem for statistical machine translation by creating a symmetric word alignment allowing for reliable one- to-many and many-to-one word relationships and shows statistically significant improvements of the alignment quality compared to the best results reported so far.

...read moreread less

Abstract: In this paper, we address the word alignment problem for statistical machine translation. We aim at creating a symmetric word alignment allowing for reliable one-to-many and many-to-one word relationships. We perform the iterative alignment training in the source-to-target and the target-to-source direction with the well-known IBM and HMM alignment models. Using these models, we robustly estimate the local costs of aligning a source word and a target word in each sentence pair. Then, we use efficient graph algorithms to determine the symmetric alignment with minimal total costs (i. e. maximal alignment probability). We evaluate the automatic alignments created in this way on the German--English Verbmobil task and the French--English Canadian Hansards task. We show statistically significant improvements of the alignment quality compared to the best results reported so far. On the Verbmobil task, we achieve an improvement of more than 1% absolute over the baseline error rate of 4.7%.

...read moreread less

Collapse