scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 2004"


Book ChapterDOI
29 Sep 2004
TL;DR: A method for detection of changes in the probability distribution of examples, to control the online error-rate of the algorithm and to observe that the method is independent of the learning algorithm.
Abstract: Most of the work in machine learning assume that examples are generated at random according to some stationary probability distribution. In this work we study the problem of learning when the distribution that generate the examples changes over time. We present a method for detection of changes in the probability distribution of examples. The idea behind the drift detection method is to control the online error-rate of the algorithm. The training examples are presented in sequence. When a new training example is available, it is classified using the actual model. Statistical theory guarantees that while the distribution is stationary, the error will decrease. When the distribution changes, the error will increase. The method controls the trace of the online error of the algorithm. For the actual context we define a warning level, and a drift level. A new context is declared, if in a sequence of examples, the error increases reaching the warning level at example k w , and the drift level at example k d . This is an indication of a change in the distribution of the examples. The algorithm learns a new model using only the examples since k w . The method was tested with a set of eight artificial datasets and a real world dataset. We used three learning algorithms: a perceptron, a neural network and a decision tree. The experimental results show a good performance detecting drift and with learning the new concept. We also observe that the method is independent of the learning algorithm.

1,256 citations


Proceedings ArticleDOI
23 Aug 2004
TL;DR: Investigation of unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources shows that edit distance data is cleaner and more easily-aligned than the heuristic data.
Abstract: We investigate unsupervised techniques for acquiring monolingual sentence-level paraphrases from a corpus of temporally and topically clustered news articles collected from thousands of web-based news sources. Two techniques are employed: (1) simple string edit distance, and (2) a heuristic strategy that pairs initial (presumably summary) sentences from different news stories in the same cluster. We evaluate both datasets using a word alignment algorithm and a metric borrowed from machine translation. Results show that edit distance data is cleaner and more easily-aligned than the heuristic data, with an overall alignment error rate (AER) of 11.58% on a similarly-extracted test set. On test data extracted by the heuristic strategy, however, performance of the two training sets is similar, with AERs of 13.2% and 14.7% respectively. Analysis of 100 pairs of sentences from each set reveals that the edit distance data lacks many of the complex lexical and syntactic alternations that characterize monolingual paraphrase. The summary sentences, while less readily alignable, retain more of the non-trivial alternations that are of greatest interest learning paraphrase relationships.

895 citations


Journal ArticleDOI
TL;DR: Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.
Abstract: A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported.

585 citations


Proceedings Article
01 Jul 2004
TL;DR: Human evaluation shows that this SMT system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches.
Abstract: We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pairs automatically extracted from clustered news articles available on the World Wide Web. Alignment Error Rate (AER) is measured to gauge the quality of the resulting corpus. A monotone phrasal decoder generates contextual replacements. Human evaluation shows that this system outperforms baseline paraphrase generation techniques and, in a departure from previous work, offers better coverage and scalability than the current best-of-breed paraphrasing approaches.

388 citations


Journal ArticleDOI
TL;DR: The use of language models is shown to improve the accuracy of the system and the approach is described in detail and compared with other methods presented in the literature to deal with the same problem.
Abstract: This paper presents a system for the offline recognition of large vocabulary unconstrained handwritten texts. The only assumption made about the data is that it is written in English. This allows the application of statistical language models in order to improve the performance of our system. Several experiments have been performed using both single and multiple writer data. Lexica of variable size (from 10,000 to 50,000 words) have been used. The use of language models is shown to improve the accuracy of the system (when the lexicon contains 50,000 words, the error rate is reduced by /spl sim/50 percent for single writer data and by /spl sim/25 percent for multiple writer data). Our approach is described in detail and compared with other methods presented in the literature to deal with the same problem. An experimental setup to correctly deal with unconstrained text recognition is proposed.

325 citations


Proceedings Article
01 Jan 2004
TL;DR: New state-of-the-art performance is achieved on a standard benchmark data set, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results.
Abstract: With the increasing use of research paper search engines, such as CiteSeer, for both literature search and hiring decisions, the accuracy of such systems is of paramount importance. This paper employs Conditional Random Fields (CRFs) for the task of extracting various common fields from the headers and citation of research papers. The basic theory of CRFs is becoming well-understood, but best-practices for applying them to real-world data requires additional exploration. This paper makes an empirical exploration of several factors, including variations on Gaussian, exponential and hyperbolic-L1 priors for improved regularization, and several classes of features and Markov order. On a standard benchmark data set, we achieve new state-of-the-art performance, reducing error in average F1 by 36%, and word error rate by 78% in comparison with the previous best SVM results. Accuracy compares even more favorably against HMMs.

319 citations


Journal ArticleDOI
TL;DR: It is shown that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data and an application of SVMs to large vocabulary speech recognition is described.
Abstract: Recent work in machine learning has focused on models, such as the support vector machine (SVM), that automatically control generalization and parameterization as part of the overall optimization process. In this paper, we show that SVMs provide a significant improvement in performance on a static pattern classification task based on the Deterding vowel data. We also describe an application of SVMs to large vocabulary speech recognition and demonstrate an improvement in error rate on a continuous alphadigit task (OGI Alphadigits) and a large vocabulary conversational speech task (Switchboard). Issues related to the development and optimization of an SVM/HMM hybrid system are discussed.

278 citations


Journal ArticleDOI
TL;DR: It is shown that by using context-sensitive recognition based on the predicted type of the user's next dialogue move, a more flexible dialogue system can also exhibit an improvement in speech recognition performance.
Abstract: We focus on the issue of robustness of conversational interfaces that are flexible enough to allow natural "multithreaded" conversational flow. Our main advance is to use context-sensitive speech recognition in a general way, with a representation of dialogue context that is rich and flexible enough to support conversation about multiple interleaved topics, as well as the interpretation of corrective fragments. We explain, by use of worked examples, the use of our "Conversational Intelligence Architecture" (CIA) to represent conversational threads, and how each thread can be associated with a language model (LM) for more robust speech recognition. The CIA uses fine-grained dynamic representations of dialogue context, which supersede those used in finite-state or form-based dialogue managers. In an evaluation of a dialogue system built using this architecture we found that 87.9p of recognized utterances were recognized using a context-specific language model, resulting in an 11.5p reduction in the overall utterance recognition error rate, and a 13.4p reduction in concept error rate. Thus we show that by using context-sensitive recognition based on the predicted type of the user's next dialogue move, a more flexible dialogue system can also exhibit an improvement in speech recognition performance.

249 citations


Proceedings ArticleDOI
17 May 2004
TL;DR: A bootstrap method for significance analysis is presented which is, at the same time, intuitive, precise and and easy to use and immediately interpretable in terms of word error rate.
Abstract: The field of speech recognition has clearly benefited from precisely defined testing conditions and objective performance measures such as word error rate. In the development and evaluation of new methods, the question arises whether the empirically observed difference in performance is due to a genuine advantage of one system over the other, or just an effect of chance. However, many publications still do not concern themselves with the statistical significance of the results reported. We present a bootstrap method for significance analysis which is, at the same time, intuitive, precise and and easy to use. Unlike some methods, we make no (possibly ill-founded) approximations and the results are immediately interpretable in terms of word error rate.

236 citations


Proceedings ArticleDOI
21 Jul 2004
TL;DR: This paper compares two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs), which have the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data.
Abstract: This paper describes discriminative language modeling for a large vocabulary speech recognition task. We contrast two parameter estimation methods: the perceptron algorithm, and a method based on conditional random fields (CRFs). The models are encoded as deterministic weighted finite state automata, and are applied by intersecting the automata with word-lattices that are the output from a baseline recognizer. The perceptron algorithm has the benefit of automatically selecting a relatively small feature set in just a couple of passes over the training data. However, using the feature set output from the perceptron algorithm (initialized with their weights), CRF training provides an additional 0.5% reduction in word error rate, for a total 1.8% absolute reduction from the baseline of 39.2%.

187 citations


Journal ArticleDOI
TL;DR: An upper bound on the pairwise error probability (PEP) is derived and the union-bound technique is applied in conjunction with the derived PEP to obtain upper bounds on the bit error rate.
Abstract: Error control coding can be used over free-space optical (FSO) links to mitigate turbulence-induced fading. We present error rate performance bounds for coded FSO communication systems operating over atmospheric turbulence channels, which are modeled as a correlated K distribution under strong turbulence conditions. We derive an upper bound on the pairwise error probability (PEP) and then apply the union-bound technique in conjunction with the derived PEP to obtain upper bounds on the bit error rate. Simulation results are further demonstrated to verify the analytical results.

Patent
29 Jan 2004
TL;DR: In this paper, a system for configuring solid-state storage devices comprises a solidstate storage device and an error correction code (ECC) selection system, which is configured to automatically select a set of error correction codes based on an error rate of the storage device.
Abstract: A system for configuring solid-state storage devices comprises a solid-state storage device and an error correction code (ECC) selection system. The ECC selection system is configured to automatically select a set of error correction code based on an error rate of the storage device. The ECC selection system is further configured to install the selected set of error correction code in the solid-state storage device.

Proceedings ArticleDOI
04 Oct 2004
TL;DR: Two new absolute CSR performance measures are introduced: MER (match error rate) and WIL (word information lost), which are a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure.
Abstract: The word error rate (WER), commonly used in ASR assessment, measures the cost of restoring the output word sequence to the original input sequence. However, for most CSR applications apart from dictation machines a more meaningful performance measure would be given by the proportion of information communicated. In this article we introduce two new absolute CSR performance measures: MER (match error rate) and WIL (word information lost). MER is the proportion of I/O word matches which are errors. WIL is a simple approximation to the proportion of word information lost which overcomes the problems associated with the RIL (relative information lost) measure that was proposed half a century ago. Issues relating to ideal performance measurement are discussed and the commonly used Viterbi input/output alignment procedure, with zero weight for hits and equal weight for substitutions, deletions and insertions, is shown to be optimal.

Proceedings ArticleDOI
04 Oct 2004
TL;DR: The use of phone lattices both in training and testing significantly improves the accuracy of a phonotactically based LID system and is further enhanced by using a neural network to combine the results of multiple phone recognizers.
Abstract: This paper proposes a new phone lattice based method for automatic language recognition from speech data. By using phone lattices some approximations usually made by language identification (LID) systems relying on phonotactic constraints to simplify the training and decoding processes can be avoided. We demonstrate the use of phone lattices both in training and testing significantly improves the accuracy of a phonotactically based LID system. Performance is further enhanced by using a neural network to combine the results of multiple phone recognizers. Using three phone recognizers with context independent phone models, the system achieves an equal error rate of 2.7% on the Eval03 NIST detection test (30s segment, primary condition) with an overall decoding process that runs faster than real-time (0.5xRT).

Proceedings ArticleDOI
03 Mar 2004
TL;DR: This work shows how to compute the mean time to failure for temporal double-bit errors and shows how fixed-interval scrubbing - in which error checkers periodically access cache blocks and remove single- bit errors - can mitigate such errors in processor caches.
Abstract: Transient faults from neutron and alpha particle strikes in large SRAM caches have become a major problem for microprocessor designers. To protect these caches, designers often use error correcting codes (ECC), which typically provide single-bit error correction and double-bit error detection (SECDED). Unfortunately, two separate strikes could still flip two different bits in the same ECC-protected word. This we call a temporal double-bit error. SECDED ECC can only detect, not correct such errors. We show how to compute the mean time to failure for temporal double-bit errors. Additionally, we show how fixed-interval scrubbing - in which error checkers periodically access cache blocks and remove single-bit errors - can mitigate such errors in processor caches. Our analysis using current soft error rates shows that only very large caches (e.g., hundreds of megabytes to gigabytes) need scrubbing to reduce the temporal double-bit error rate to a tolerable range.

Proceedings ArticleDOI
26 Sep 2004
TL;DR: The accuracy of two FER prediction methods is studied: Packet error rate indicator (PER-indicator) and exponential effective SIR mapping (Exp-ESM) which are shown to have accuracy within a few tenths of a dB under a wide range of modulation schemes, coding rates and channel types.
Abstract: Multicarrier modulations such as OFDM with adaptive modulation and coding (AMC) are well suited for high data rate broadband systems that operate in multipath environments and are considered as promising candidates for future generation cellular systems (e.g., 4G). Cellular system performance is normally investigated with system level simulations that are computationally complex. For broadband multicarrier systems, incorporating a detailed physical layer emulator into the system simulator becomes impractical, so there is a need for simplified link performance predictors. However, due to the large variability of the channel in the frequency domain, two links with the same average SNR can experience drastically different performance, thus making it difficult to accurately predict the instantaneous link performance such as the frame error rate. In this paper, the accuracy of two FER prediction methods is studied: Packet error rate indicator (PER-indicator) and exponential effective SIR mapping (Exp-ESM). Both methods are shown to have accuracy within a few tenths of a dB under a wide range of modulation schemes, coding rates and channel types. These methods are then extended to handle more advanced link enhancements such as hybrid ARQ and Alamouti encoding. The Exp-ESM method has slightly better accuracy than the PER-indicator, and is the preferred link error predictor for a system simulator.

Proceedings ArticleDOI
21 Jul 2004
TL;DR: It is shown that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training by incorporating word-level alignments into the parameter estimation of the IBM models.
Abstract: The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and translation quality of such models can be achieved by additionally including word-aligned data during training. Incorporating word-level alignments into the parameter estimation of the IBM models reduces alignment error rate and increases the Bleu score when compared to training the same models only on sentence-aligned data. On the Verbmobil data set, we attain a 38% reduction in the alignment error rate and a higher Bleu score with half as many training examples. We discuss how varying the ratio of word-aligned to sentence-aligned data affects the expected performance gain.

Proceedings ArticleDOI
08 Oct 2004
TL;DR: This paper explores the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic and evaluates the techniques on a large-vocabulary recognition task and demonstrates that they lead to perplexity and word error rate reductions.
Abstract: : Language modeling is a difficult problem for languages with rich morphology. In this paper we investigate the use of morphology-based language models at different stages in a speech recognition system for conversational Arabic. Class-based and single-stream factored language models using morphological word representations are applied within an N-best list rescoring framework. In addition, we explore the use of factored language models in first-pass recognition, which is facilitated by two novel procedures: the data-driven optimization of a multi-stream language model structure, and the conversion of a factored language model to a standard word-based model. We evaluate these techniques on a large-vocabulary recognition task and demonstrate that they lead to perplexity and word error rate reductions.

Journal Article
TL;DR: This work investigates techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database and develops a faster system with about 23.6% relative improvement over the baseline in phoneme error rate.
Abstract: We investigate techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database. The baseline phoneme recognizer is based on TempoRAl Patterns (TRAP). This recognizer is simplified to shorten processing times and reduce computational requirements. More states per phoneme and bi-gram language models are incorporated into the system and evaluated. The question of insufficient amount of training data is discussed and the system is improved. All modifications lead to a faster system with about 23.6% relative improvement over the baseline in phoneme error rate.

Proceedings ArticleDOI
Robert C. Moore1
21 Jul 2004
TL;DR: Reduction in alignment error rate is demonstrated resulting from giving extra weight to the probability of alignment to the null word, smoothing probability estimates for rare words, and using a simple heuristic estimation method to initialize, or replace, EM training of model parameters.
Abstract: We investigate a number of simple methods for improving the word-alignment accuracy of IBM Model 1. We demonstrate reduction in alignment error rate of approximately 30% resulting from (1) giving extra weight to the probability of alignment to the null word, (2) smoothing probability estimates for rare words, and (3) using a simple heuristic estimation method to initialize, or replace, EM training of model parameters.

01 Jan 2004
TL;DR: It is suggested that posing speech recognition evaluation as an information retrieval problem, where each word is one unit of information, offers a flexible framework for application-oriented performance analysis based on the concepts of recall and precision.
Abstract: This paper discusses the evaluation of automatic speech recognition (ASR) systems developed for practical applications, suggesting a set of criteria for application-oriented performance measures. The commonly used word error rate (WER), which poses ASR evaluation as a string editing process, is shown to have a number of limitations with respect to these criteria, motivating alternative or additional measures. This paper suggests that posing speech recognition evaluation as an information retrieval problem, where each word is one unit of information, offers a flexible framework for application-oriented performance analysis based on the concepts of recall and precision.

Proceedings ArticleDOI
19 Sep 2004
TL;DR: This work proposes a location sensing system based on the widely available Bluetooth medium, and discusses problems which arise when the Bluetooth RSSI is used as a signal strength indicator, and proposes a novel access point that supports variable attenuators to overcome these problems.
Abstract: In a ubiquitous computing environment, location awareness is a basic necessity There are various research projects, which discuss the problem of indoor location sensing We have recognized acceptability, low power consumption, and cost as the key design factors for developing widely deployable location sensing systems As a good candidate technology that could satisfy these needs, we proposed a location sensing system based on the widely available Bluetooth medium Location evaluation is preformed by sensing Bluetooth signal strength with a reference model based approach We discuss problems which arise when the Bluetooth RSSI (received signal strength indicator) is used as a signal strength indicator, and propose a novel access point that supports variable attenuators to overcome these problems This access point allows the reading of a wider range of signal strengths using RSSI We show that our approach to location sensing has reduced the error rate about threefold compared to systems which do not use variable attenuator supported access points

Proceedings ArticleDOI
Jasha Droppo1, A. Acero1
03 Sep 2004
TL;DR: This paper explores using a switching linear dynamic model (LDM) for the clean speech and presents preliminary results demonstrating that, even with relatively small model sizes, substantial word error rate improvement can be achieved.
Abstract: Model based feature enhancement techniques are constructed from acoustic models for speech and noise, together with a model of how the speech and noise produce the noisy observations. Most techniques incorporate either Gaussian mixture models (GMM) or hidden Markov models (HMM). This paper explores using a switching linear dynamic model (LDM) for the clean speech. The linear dynamics of the model capture the smooth time evolution of speech. The switching states of the model capture the piecewise stationary characteristics of speech. However, incorporating a switching LDM causes the enhancement problem to become intractable. With a GMM or an HMM, the enhancement running time is proportional to the length of the utterance. The switching LDM causes the running time to become exponential in the length of the utterance. To overcome this drawback, the standard generalized pseudo-Bayesian technique is used to provide an approximate solution of the enhancement problem. We present preliminary results demonstrating that, even with relatively small model sizes, substantial word error rate improvement can be achieved.

Proceedings ArticleDOI
Saharon Rosset1
04 Jul 2004
TL;DR: It is shown that the AUC may be preferable to empirical error even in this case and the tradeoff between approximation error and estimation error underlying this phenomenon is discussed.
Abstract: We present a statistical analysis of the AUC as an evaluation criterion for classification scoring models. First, we consider significance tests for the difference between AUC scores of two algorithms on the same test set. We derive exact moments under simplifying assumptions and use them to examine approximate practical methods from the literature. We then compare AUC to empirical misclassification error when the prediction goal is to minimize future error rate. We show that the AUC may be preferable to empirical error even in this case and discuss the tradeoff between approximation error and estimation error underlying this phenomenon.

Book ChapterDOI
08 Sep 2004
TL;DR: In this article, the authors investigated techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database and reported a 23.6% relative improvement over the baseline in phoneme error rate.
Abstract: We investigate techniques for acoustic modeling in automatic recognition of context-independent phoneme strings from the TIMIT database. The baseline phoneme recognizer is based on TempoRAl Patterns (TRAP). This recognizer is simplified to shorten processing times and reduce computational requirements. More states per phoneme and bi-gram language models are incorporated into the system and evaluated. The question of insufficient amount of training data is discussed and the system is improved. All modifications lead to a faster system with about 23.6% relative improvement over the baseline in phoneme error rate.

Patent
Jung-Eun Kim1, Jae-won Lee1
05 Aug 2004
TL;DR: In this article, a meta-dialogue generation unit generates a question asking the user for additional information based on a content of a portion where the error exists and a type of the error.
Abstract: In order to handle portions of a recognized sentence having an error, a speaker or user is questioned about contents associated with the portions, and according to a user's answer a result is obtained. A speech recognition unit extracts a speech feature of a speech signal inputted from a user and finds a phoneme nearest to the speech feature to recognize a word. A recognition error determination unit finds a sentence confidence based on a confidence of the recognized word, performs examination of a semantic structure of a recognized sentence, and determines whether or not an error exists in the recognized sentence which is subject to speech recognition according to a predetermined criterion based on both the sentence confidence and a result of examining the semantic structure of the recognized sentence. Further, a meta-dialogue generation unit generates a question asking the user for additional information based on a content of a portion where the error exists and a type of the error.

Proceedings ArticleDOI
Peng Yu1, Frank Seide1
04 Oct 2004
TL;DR: In this paper, the authors presented a system for phonetic indexing and searching of spontaneous speech based on phoneme lattices and combined it with word-based search into a hybrid approach.
Abstract: For efficient organization of speech recordings – meetings, interviews, voice mails, and lectures – being able to search for spoken keywords is essential. Today, most spoken document retrieval systems use large-vocabulary recognition. For the above scenarios, such systems suffer from the unpredictable domain, out-ofvocabulary queries, and generally high word-error rate (WER). In [1], we presented a system for phonetic indexing and searching of spontaneous speech. It is vocabulary-independent and based on phoneme lattices. In the present paper, we propose to combine it with word-based search into a hybrid approach. We explore two methods of combination: posterior combination (merging search results of a word-based and a phoneme-based system) and prior combination (combining word and phoneme language models and vocabularies to form a hybrid recognizer). The search accuracy of our best purely phonetic baseline is 64% (Figure of Merit), and our purely word-based baselines are below 50%. The new hybrid approach achieves 73%, if the recognizer uses a language model that matches the test-set domain. With a mismatched language model, 71% is achieved. Our results show that the proposed hybrid model benefits from the best of two worlds: Word-level language context and robustness of phonetic search to unknown words and domain mismatch.

Journal ArticleDOI
TL;DR: The bit-error rate (BER) performance of multilevel quadrature amplitude modulation with pilot-symbol-assisted modulation channel estimation in static and Rayleigh fading channels is derived, both for single branch reception and maximal ratio combining diversity receiver systems.
Abstract: The bit-error rate (BER) performance of multilevel quadrature amplitude modulation with pilot-symbol-assisted modulation channel estimation in static and Rayleigh fading channels is derived, both for single branch reception and maximal ratio combining diversity receiver systems. The effects of noise and estimator decorrelation on the received BER are examined. The high sensitivity of diversity systems to channel estimation error is investigated and quantified. The influence of the pilot-symbol interpolation filter windowing is also considered.

Journal ArticleDOI
TL;DR: This work uses an English phoneme recogniser to generate English pronunciations for German words and uses these to train decision trees that are able to predict the respective English-accented variant from the German canonical transcription, and combines this approach with online, incremental weighted MLLR speaker adaptation.

Proceedings ArticleDOI
23 Aug 2004
TL;DR: This paper addresses the word alignment problem for statistical machine translation by creating a symmetric word alignment allowing for reliable one- to-many and many-to-one word relationships and shows statistically significant improvements of the alignment quality compared to the best results reported so far.
Abstract: In this paper, we address the word alignment problem for statistical machine translation. We aim at creating a symmetric word alignment allowing for reliable one-to-many and many-to-one word relationships. We perform the iterative alignment training in the source-to-target and the target-to-source direction with the well-known IBM and HMM alignment models. Using these models, we robustly estimate the local costs of aligning a source word and a target word in each sentence pair. Then, we use efficient graph algorithms to determine the symmetric alignment with minimal total costs (i. e. maximal alignment probability). We evaluate the automatic alignments created in this way on the German--English Verbmobil task and the French--English Canadian Hansards task. We show statistically significant improvements of the alignment quality compared to the best results reported so far. On the Verbmobil task, we achieve an improvement of more than 1% absolute over the baseline error rate of 4.7%.