scispace - formally typeset
Search or ask a question

Showing papers on "Word error rate published in 2007"


Proceedings Article
03 Dec 2007
TL;DR: The Probabilistic Matrix Factorization (PMF) model is presented, which scales linearly with the number of observations and performs well on the large, sparse, and very imbalanced Netflix dataset and is extended to include an adaptive prior on the model parameters.
Abstract: Many existing approaches to collaborative filtering can neither handle very large datasets nor easily deal with users who have very few ratings. In this paper we present the Probabilistic Matrix Factorization (PMF) model which scales linearly with the number of observations and, more importantly, performs well on the large, sparse, and very imbalanced Netflix dataset. We further extend the PMF model to include an adaptive prior on the model parameters and show how the model capacity can be controlled automatically. Finally, we introduce a constrained version of the PMF model that is based on the assumption that users who have rated similar sets of movies are likely to have similar preferences. The resulting model is able to generalize considerably better for users with very few ratings. When the predictions of multiple PMF models are linearly combined with the predictions of Restricted Boltzmann Machines models, we achieve an error rate of 0.8861, that is nearly 7% better than the score of Netflix's own system.

4,022 citations


01 Jan 2007
TL;DR: An error measure is defined, the slot error rate, which combines the different types of error directly, without having to resort to precision and recall as preliminary measures.
Abstract: While precision and recall have served the information extraction community well as two separate measures of system performance, we show that the F -measure, the weighted harmonic mean of precision and recall, exhibits certain undesirable behaviors. To overcome these limitations, we define an error measure, the slot error rate, which combines the different types of error directly, without having to resort to precision and recall as preliminary measures. The slot error rate is analogous to the word error rate that is used for measuring speech recognition performance; it is intended to be a measure of the cost to the user for the system to make the different types of errors.

609 citations


Journal ArticleDOI
TL;DR: Highly efficient learning algorithms are described that enable the use of training corpora of several hundred million words and it is shown that this approach can be incorporated into a large vocabulary continuous speech recognizer using a lattice rescoring framework at a very low additional processing time.

547 citations


Journal ArticleDOI
TL;DR: A new worst-case metric is proposed for predicting practical system performance in the absence of matching failures, and the worst case theoretical equal error rate (EER) is predicted to be as low as 2.59 times 10-1 available data sets.
Abstract: This paper presents a novel iris coding method based on differences of discrete cosine transform (DCT) coefficients of overlapped angular patches from normalized iris images. The feature extraction capabilities of the DCT are optimized on the two largest publicly available iris image data sets, 2,156 images of 308 eyes from the CASIA database and 2,955 images of 150 eyes from the Bath database. On this data, we achieve 100 percent correct recognition rate (CRR) and perfect receiver-operating characteristic (ROC) curves with no registered false accepts or rejects. Individual feature bit and patch position parameters are optimized for matching through a product-of-sum approach to Hamming distance calculation. For verification, a variable threshold is applied to the distance metric and the false acceptance rate (FAR) and false rejection rate (FRR) are recorded. A new worst-case metric is proposed for predicting practical system performance in the absence of matching failures, and the worst case theoretical equal error rate (EER) is predicted to be as low as 2.59 times 10-1 available data sets

503 citations


Journal ArticleDOI
TL;DR: It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.
Abstract: Searching and indexing historical handwritten collections are a very challenging problem. We describe an approach called word spotting which involves grouping word images into clusters of similar words by using image matching to find similarity. By annotating “interesting” clusters, an index that links words to the locations where they occur can be built automatically. Image similarities computed using a number of different techniques including dynamic time warping are compared. The word similarities are then used for clustering using both K-means and agglomerative clustering techniques. It is shown in a subset of the George Washington collection that such a word spotting technique can outperform a Hidden Markov Model word-based recognition technique in terms of word error rates.

368 citations


Journal ArticleDOI
TL;DR: This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics, and is found to achieve lower error rates.
Abstract: This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model's complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy conditions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.

277 citations


Journal ArticleDOI
TL;DR: It is shown experimentally that the proposed nonlinear image deformation models performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity.
Abstract: We present the application of different nonlinear image deformation models to the task of image recognition The deformation models are especially suited for local changes as they often occur in the presence of image object variability We show that, among the discussed models, there is one approach that combines simplicity of implementation, low-computational complexity, and highly competitive performance across various real-world image recognition tasks We show experimentally that the model performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity In particular, an error rate of 054 percent on the MNIST benchmark is achieved, as well as the lowest reported error rate, specifically 126 percent, in the 2005 international ImageCLEF evaluation of medical image specifically categorization

257 citations


Journal ArticleDOI
TL;DR: This paper explains the state of affairs of the alignment task and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.
Abstract: Automatic word alignment plays a critical role in statistical machine translation. Unfortunately, the relationship between alignment quality and statistical machine translation performance has not been well understood. In the recent literature, the alignment task has frequently been decoupled from the translation task and assumptions have been made about measuring alignment quality for machine translation which, it turns out, are not justified. In particular, none of the tens of papers published over the last five years has shown that significant decreases in alignment error rate (AER) result in significant increases in translation performance. This paper explains this state of affairs and presents steps towards measuring alignment quality in a way which is predictive of statistical machine translation performance.

234 citations


Journal ArticleDOI
TL;DR: This paper describes a method based on regularized likelihood that makes use of the feature set given by the perceptron algorithm, and initialization with the perceptRON's weights; this method gives an additional 0.5% reduction in word error rate (WER) over training withThe perceptron alone.

215 citations


Proceedings ArticleDOI
12 Nov 2007
TL;DR: An approach that automatically recognizes acute pain in adult patients with rotator cuff injury while a physiotherapist manipulated their affected and unaffected shoulder is developed.
Abstract: Pain is typically assessed by patient self-report. Self-reported pain, however, is difficult to interpret and may be impaired or not even possible, as in young children or the severely ill. Behavioral scientists have identified reliable and valid facial indicators of pain. Until now they required manual measurement by highly skilled observers. We developed an approach that automatically recognizes acute pain. Adult patients with rotator cuff injury were video-recorded while a physiotherapist manipulated their affected and unaffected shoulder. Skilled observers rated pain expression from the video on a 5-point Likert-type scale. From these ratings, sequences were categorized as no-pain (rating of 0), pain (rating of 3, 4, or 5), and indeterminate (rating of 1 or 2). We explored machine learning approaches for pain-no pain classification. Active Appearance Models (AAM) were used to decouple shape and appearance parameters from the digitized face images. Support vector machines (SVM) were used with several representations from the AAM. Using a leave-one-out procedure, we achieved an equal error rate of 19% (hit rate = 81%) using canonical appearance and shape features. These findings suggest the feasibility of automatic pain detection from video.

206 citations


Proceedings ArticleDOI
15 Apr 2007
TL;DR: The gammatone features presented here lead to competitive results on the EPPS English task, and considerable improvements were obtained by subsequent combination to a number of standard acoustic features, i.e. MFCC, PLP, MF-PLP, and VTLN plus voicedness.
Abstract: In this work, an acoustic feature set based on a gammatone filterbank is introduced for large vocabulary speech recognition. The gammatone features presented here lead to competitive results on the EPPS English task, and considerable improvements were obtained by subsequent combination to a number of standard acoustic features, i.e. MFCC, PLP, MF-PLP, and VTLN plus voicedness. Best results were obtained when combining gammatone features to all other features using weighted ROVER, resulting in a relative improvement of about 12% in word error rate compared to the best single feature system. We also found that ROVER gives better results for feature combination than both log-linear model combination and LDA.

Journal ArticleDOI
TL;DR: The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate.
Abstract: Despite their known weaknesses, hidden Markov models (HMMs) have been the dominant technique for acoustic modeling in speech recognition for over two decades. Still, the advances in the HMM framework have not solved its key problems: it discards information about time dependencies and is prone to overgeneralization. In this paper, we attempt to overcome these problems by relying on straightforward template matching. The basis for the recognizer is the well-known DTW algorithm. However, classical DTW continuous speech recognition results in an explosion of the search space. The traditional top-down search is therefore complemented with a data-driven selection of candidates for DTW alignment. We also extend the DTW framework with a flexible subword unit mechanism and a class sensitive distance measure-two components suggested by state-of-the-art HMM systems. The added flexibility of the unit selection in the template-based framework leads to new approaches to speaker and environment adaptation. The template matching system reaches a performance somewhat worse than the best published HMM results for the Resource Management benchmark, but thanks to complementarity of errors between the HMM and DTW systems, the combination of both leads to a decrease in word error rate with 17% compared to the HMM results

Journal ArticleDOI
TL;DR: This paper deals with eigenchannel adaptation in more detail and includes its theoretical background and implementation issues, undermining a common myth that the more boxes in the scheme, the better the system.
Abstract: In this paper, several feature extraction and channel compensation techniques found in state-of-the-art speaker verification systems are analyzed and discussed. For the NIST SRE 2006 submission, cepstral mean subtraction, feature warping, RelAtive SpecTrAl (RASTA) filtering, heteroscedastic linear discriminant analysis (HLDA), feature mapping, and eigenchannel adaptation were incrementally added to minimize the system's error rate. This paper deals with eigenchannel adaptation in more detail and includes its theoretical background and implementation issues. The key part of the paper is, however, the post-evaluation analysis, undermining a common myth that ldquothe more boxes in the scheme, the better the system.rdquo All results are presented on NIST Speaker Recognition Evaluation (SRE) 2005 and 2006 data.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: The AMI transcription system for speech in meetings developed in collaboration by five research groups includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data.
Abstract: This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.

Journal ArticleDOI
TL;DR: The methodology is illustrated by comparing different pooling algorithms for the detection of individuals recently infected with HIV in North Carolina and Malawi.
Abstract: We derive and compare the operating characteristics of hierarchical and square array-based testing algorithms for case identification in the presence of testing error. The operating characteristics investigated include efficiency (i.e., expected number of tests per specimen) and error rates (i.e., sensitivity, specificity, positive and negative predictive values, per-family error rate, and per-comparison error rate). The methodology is illustrated by comparing different pooling algorithms for the detection of individuals recently infected with HIV in North Carolina and Malawi.

01 Jan 2007
TL;DR: A probabilistic model based on context dependent phonetic rewrite rules to derive a list of possible pronunciations for all words or sequences of words is developed and developed to reduce the confusion of this expanded dictionary.
Abstract: To provide rapid access to meetings between human beings, transcription, tracking, retrieval and summarization of on-going human-to-human conversation has to be achieved. In DARPA and DoD sponsored work (projects GENOA and CLARITY) we aim to develop strategies to transcribe human discourse and provide rapid access to the structure and content of this human exchange. The system consists of four major components: 1.) the speech transcription engine, based on the JANUS recognition toolkit, 2.) the summarizer, a statistical tool that attempts to find salient and novel turns in the exchange, 3.) the discourse component that attempts to identify the speech acts, and 4.) the non-verbal structure, including speaker types and non-verbal visual cues. The meeting browser also attempts to identify the speech acts found in the turns of the meeting, and track topics. The browser is implemented in Java and also includes video capture of the individuals in the meeting. It attempts to identify the speakers, and their focus of attention from acoustic and visual cues. 1. THE MEETING RECOGNITION ENGINE The speech recognition component of the meeting browser is based on the JANUS Switchboard recognizer trained for the 1997 NIST Hub-5E evaluation [3]. The gender independent, vocal tract length normalized, large vocabulary recognizer features dynamic, speaking mode adaptive acoustic and pronunciation models [2] which allow for robust recognition of conversational speech as observed in human to human dialogs. 1.1 Speaking Mode Dependent Pronunciation Modeling In spontaneous conversational human-to-human speech as observed in meetings there is a large amount of variability due to accents, speaking styles and speaking rates (also known as the speaking mode [6]. Because current recognition systems usually use only a relatively small number of pronunciation variants for the words in their dictionaries, the amount of variability that can be modeled is limited. Increasing the number of variants per dictionary entry may seem to be the obvious solution, but doing so actually results in a increase in error rate. This is explained by the greater confusion between the dictionary entries, particularly, for short reduced words. We developed a probabilistic model based on context dependent phonetic rewrite rules to derive a list of possible pronunciations for all words or sequences of words [2][4]. In order to reduce the confusion of this expanded dictionary, each variant of a word is annotated with an observation probability. To this aim we automatically retranscribe the corpus based on all allowable variants using flexible utterance transcription graphs (Flexible Transcription Alignment (FTA) [5]) and speaker adapted models. The alignments are then used to train a model of how likely which form of variation (i.e. rule) is and how likely a variant is, to be observed in a certain context (acoustic, word, speaking mode or dialogue) is. For decoding, the probability of encountering pronunciation variants is then defined to be a function of the speaking style (phonetic context, linguistic context, speaking rate and duration). The probability function is learned through decision trees from rule based generated pronunciation variants as observed on the Switchboard corpus [2]. 1.2 Experimental Setup To date, we have experimented with three different meeting environments and tasks to assess the performance in terms of word accuracy and summarization quality: i.) Switchboard human to human telephone conversations, ii.) Research group meetings recorded in the Interactive Systems labs and iii.) Simulated crisis management meetings (3 participants) which also include video capture of the individuals. We report results from speech recognition experiments in the first two conditions. 1) Human to Human Telephone The test set to evaluate the use of the flexible transcription alignment approach consisted of the Switchboard and CallHome partitions of the 1996 NIST Hub-5e evaluation set. All test runs were carried out using a Switchboard recognizer trained with the JANUS Recognition Toolkit (JRTk) [4]. The preprocessing of the system begins by extracting MFCC based feature vectors every 10 ms. A truncated LDA transformation is performed over a concatenation of MFCCs and their first and second order derivatives are determined. Vocal tract length normalization and cepstral mean subtraction are computed to reduce speaker and channel differences. The rule-based expanded dictionary that was used in these tests included 1.78 pronunciation variants/word, compared to 1.13 found in the baseform dictionary (PronLex). The first list of results in Table 1 is based on a recognizer whose polyphonic decision trees were still trained on Viterbi alignments based on the unexpanded dictionary. We compare a baseline system trained on the base dictionary with an expanded dictionary FTA trained system tested in two different ways: with the base dictionary and with the expanded one. It turns out, that FTA training reduces the word error rate significantly, which means, that we improved the quality of the transcriptions through FTA and pronunciation modeling. Due to the added confusion of the expanded dictionary the test with the large dictionary without any weighting of the variants yields slightly worse results than testing with the baseline dictionary. Condition SWB WER CH WER Baseline 32.2% 43.7% FTA traing test w.basedict 30.7% 41.9% FTA traing test w.expanded dict 31.1% 42.5% Table 1 Recognition results using flexible transcription alignment training and label boosting. The test using the expanded dictionary was done without weighting the variants Adding vowel stress related questions to the phonetic clustering procedure and regrowing the polyphonic decision tree based on FTA labels improved the performance by 2.6% absolute on SWB and 2.2% absolute on CallHome. Table 2 shows results for mode dependent pronunciation weighting. We gain an additional ~2% absolute by weighting the pronunciation based on mode related features. Condition SWB WER CH WER unweighted 28.7% 38.6% Weighted p(r|w) 27.1% 36.7% Weighted p(r|w,m) 26.7% 36.1% Table 2 Results using different pronunciation variant weighting schemes. 2) Research Group Meetings In a second experiment we used recorded during internal group meetings at our lab. We placed lapel microphones on three out of ten participants, and recorded the signals on those three channels. Each meeting was approximately one hour in length, for a total of three hours of speech on which to adapt and test. Since we have no additional training data collected in this particular environment, the following unsupervised adaptation techniques was used to adapt a read speech, clean environment Wall Street Journal dictation recognizer to the meeting conditions: 1. MLLR based adaptation: In our system, we employed a regression tree, constructed using an acoustic similarity criterion for the defnition of regression classes. The tree is pruned as necessary to ensure sufficient adaptation data on each leaf. For each leaf node we calculate a linear transformation that maximizes the likelihood of the adaptation data. The number of transformations is determined automatically. 2. Iterative batch-mode unsupervised adaptation: The quality of adaptation depends directly on the quality of the hypotheses on which the alignments are based. We iterate the adaptation procedure, improving both the acoustic models and the hypotheses they produce. Significant gains were observed during the two iterations, after which performance converges. 3. Adaptation wth confidence measures: Confidence measures were used to automatically select the best candidates for adaptation. We used the stability of a hypothesis in a lattice as indicator of confidence. If, in rescoring the lattice with a variety of language model weights and insertion penalties, a word appears in every possible top-1 hypothesis, acoustic stability is indicated. Such acoustic stability often identifies a good candidate for adaptation. Using only these words in the adaptation procedure produces 1-2% gains in word accuracy over blind adaptation [9]. The baseline performance of the JRTk based WSJ Recognizer over the Hub4-Nov94 test set is about 7% WER. These preliminary experiments suggest that due to the effects of spontaneous human-to-human speech, significant differences in recording conditions, significant crosstalk on the recorded channels, significantly different microphone characteristics, and inappropriate language models the error rate on meetings is in a range of 40-50\% WER. Adaptation Iterations Speaker 0 1 2 Adaptation Gain maxl 51.7 45.3 45.2 12% fdmg 48.4 43.8 44.9 9% flsl 63.8 59.5 59.6 7% Total 54.8 49.6 49.9 Table 3 Error rates for three different speakers in a research group meeting using JRTk trained over WSJ dictation data.

Journal ArticleDOI
TL;DR: This article introduces and evaluates several different word-level confidence measures for machine translation, which provide a method for labeling each word in an automatically generated translation as correct or incorrect.
Abstract: This article introduces and evaluates several different word-level confidence measures for machine translation. These measures provide a method for labeling each word in an automatically generated translation as correct or incorrect. All approaches to confidence estimation presented here are based on word posterior probabilities. Different concepts of word posterior probabilities as well as different ways of calculating them will be introduced and compared. They can be divided into two categories: System-based methods that explore knowledge provided by the translation system that generated the translations, and direct methods that are independent of the translation system. The system-based techniques make use of system output, such as word graphs or N-best lists. The word posterior probability is determined by summing the probabilities of the sentences in the translation hypothesis space that contains the target word. The direct confidence measures take other knowledge sources, such as word or phrase lexica, into account. They can be applied to output from nonstatistical machine translation systems as well. Experimental assessment of the different confidence measures on various translation tasks and in several language pairs will be presented. Moreover,the application of confidence measures for rescoring of translation hypotheses will be investigated.

Patent
30 Nov 2007
TL;DR: In this article, the start time and finish time of each phoneme unit in a phoneme sequence are added to the phoneme sequences to increase the accuracy of speech recognition, and a new word is additionally registered in a speech recognition dictionary by utilizing a correction result.
Abstract: An unknown word is additionally registered in a speech recognition dictionary by utilizing a correction result, and a new pronunciation of the word that has been registered in a speech recognition dictionary is additionally registered in the speech recognition dictionary, thereby increasing the accuracy of speech recognition. The start time and finish time of each phoneme unit in speech data corresponding to each phoneme included in a phoneme sequence acquired by a phoneme sequence converting section 13 are added to the phoneme sequence. A phoneme sequence extracting section 15 extracts from the phoneme sequence a phoneme sequence portion composed of phonemes existing in a segment corresponding to the period from the start time to the finish time of the word segment of the word corrected by a word correcting section 9 and the extracted phoneme sequence portion is determined as the pronunciation of the corrected word. An additional registration section 17 combines the corrected word with the pronunciation determined by a pronunciation determining section 16 and additionally registers the combination as new word pronunciation data in the speech recognition dictionary 5 if it is determined that a word obtained after correction has not been registered in the speech recognition dictionary 5. The additional registration section 17 additionally registers the pronunciation determined by the pronunciation determining section 16 as another pronunciation of the corrected word if it is determined that the corrected word has been registered.

Book
01 Jan 2007
TL;DR: This book introduces and motivates the problem of list decoding, and discusses the central algorithmic results of the subject, culminating with the recent results on achieving "list decoding capacity."
Abstract: Error-correcting codes are used to cope with the corruption of data by noise during communication or storage. A code uses an encoding procedure that judiciously introduces redundancy into the data to produce an associated codeword. The redundancy built into the codewords enables one to decode the original data even from a somewhat distorted version of the codeword. The central trade-off in coding theory is the one between the data rate (amount of non-redundant information per bit of codeword) and the error rate (the fraction of symbols that could be corrupted while still enabling data recovery). The traditional decoding algorithms did as badly at correcting any error pattern as they would do for the worst possible error pattern. This severely limited the maximum fraction of errors those algorithms could tolerate. In turn, this was the source of a big hiatus between the error-correction performance known for probabilistic noise models (pioneered by Shannon) and what was thought to be the limit for the more powerful, worst-case noise models (suggested by Hamming). In the last decade or so, there has been much algorithmic progress in coding theory that has bridged this gap (and in fact nearly eliminated it for codes over large alphabets). These developments rely on an error-recovery model called "list decoding," wherein for the pathological error patterns, the decoder is permitted to output a small list of candidates that will include the original message. This book introduces and motivates the problem of list decoding, and discusses the central algorithmic results of the subject, culminating with the recent results on achieving "list decoding capacity."

Journal ArticleDOI
TL;DR: It has been found that the recognition result achieved by the integrated system is more reliable than that by one method alone.

Patent
21 Mar 2007
TL;DR: In this article, a method for efficient use of resources of a speech recognition system includes determining a recognition rate, corresponding to either recognition of instances of a word or recognition of instance of various words among a set of words, and determining an accuracy range of the recognition rate.
Abstract: A method for efficient use of resources of a speech recognition system includes determining a recognition rate, corresponding to either recognition of instances of a word or recognition of instances of various words among a set of words, and determining an accuracy range of the recognition rate. The method may further include adjusting adaptation of a model for the word or various models for the various words, based on a comparison of at least one value in the accuracy range with a recognition rate threshold. An apparatus for efficient use of resources of a speech recognition system includes a processor adapted to determine a recognition rate corresponding to either recognition of instances of a one word or recognition of instances of various words among a set of words, and an accuracy range of the recognition rate. The apparatus may further include a controller adapted to adjust adaptation of a model for the word or various models for the various words, based on a comparison of at least one value in the accuracy range with a recognition rate threshold.

Patent
Tang Xiliu1, Ge Xianping1
10 Oct 2007
TL;DR: In this paper, a word corpus is identified and a word probability value is associated with each word in the word corpus, and candidate segmentations of the sentence are determined based on the word corpora and associated probability value for each word.
Abstract: A word corpus is identified and a word probability value is associated with each word in the word corpus. A sentence is identified, candidate segmentations of the sentence are determined based on the word corpus, and the associated probability value for each word in the word corpus is iteratively adjusted based on the probability values associated with the words and the candidate segmentations.

Journal ArticleDOI
TL;DR: It is shown that both the feature level fusion with the CMAES optimization algorithms and decision level fusion using a Bayesian network as a classifier improved system classification performance and can also be applied to other sensor fusion applications.
Abstract: The Cyranose 320 electronic nose (Enose) and zNose™ are two instruments used to detect volatile profiles. In this research, feature level and decision level multisensor data fusion models, combined with covariance matrix adaptation evolutionary strategy (CMAES), were developed to fuse the Enose and zNose data to improve detection and classification performance for damaged apples compared with using the individual instruments alone. Principal component analysis (PCA) was used for feature extraction and probabilistic neural networks (PNN) were developed as the classifier. Three feature-based fusion schemes were compared. Dynamic selective fusion achieved an average 1.8% and a best 0% classification error rate in a total of 30 independent runs. The static selective fusion approach resulted in a 6.1% classification error rate, which was not as good as using individual sensors (4.2% for the Enose and 2.6% for the zNose) if only selected features were applied. Simply adding the Enose and zNose features without selection (non-selective fusion) worsened the classification performance with a 32.5% classification error rate. This indicated that the feature selection using the CMAES is an indispensable process in multisensor data fusion, especially if multiple sources of sensors contain much irrelevant or redundant information. At the decision level, Bayesian network fusion achieved better performance than two individual sensors, with 11% error rate versus 13% error rate for the Enose and 20% error rate for the zNose. It is shown that both the feature level fusion with the CMAES optimization algorithms and decision level fusion using a Bayesian network as a classifier improved system classification performance. This methodology can also be applied to other sensor fusion applications.

Journal ArticleDOI
TL;DR: The effects of parameter settings in linguistic profiling, a technique in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts, are explored.
Abstract: This article explores the effects of parameter settings in linguistic profiling, a technique in which large numbers of counts of linguistic features are used as a text profile which can then be compared to average profiles for groups of texts. Although the technique proves to be quite effective for authorship verification, with the best overall parameter settings yielding an equal error rate of 3p on a test corpus of student essays, the optimal parameters vary greatly depending on author and evaluation criterion.

Proceedings ArticleDOI
23 Jun 2007
TL;DR: A novel method for obtaining more details about actual translation errors in the generated output by introducing the decomposition of Word Error Rate and Position independent word Error Rate over different Part-of-Speech (Pos) classes is proposed.
Abstract: Evaluation and error analysis of machine translation output are important but difficult tasks. In this work, we propose a novel method for obtaining more details about actual translation errors in the generated output by introducing the decomposition of Word Error Rate (Wer) and Position independent word Error Rate (Per) over different Part-of-Speech (Pos) classes. Furthermore, we investigate two possible aspects of the use of these decompositions for automatic error analysis: estimation of inflectional errors and distribution of missing words over Pos classes. The obtained results are shown to correspond to the results of a human error analysis. The results obtained on the European Parliament Plenary Session corpus in Spanish and English give a better overview of the nature of translation errors as well as ideas of where to put efforts for possible improvements of the translation system.

Proceedings ArticleDOI
Xiaodong He1
23 Jun 2007
TL;DR: A Bayesian Learning based method to train word dependent transition models for HMM based word alignment gives consistent and significant alignment error rate (AER) reduction and machine translation results show that word alignment can be used in a phrase-based machine translation system.
Abstract: In this paper, we present a Bayesian Learning based method to train word dependent transition models for HMM based word alignment. We present word alignment results on the Canadian Hansards corpus as compared to the conventional HMM and IBM model 4. We show that this method gives consistent and significant alignment error rate (AER) reduction. We also conducted machine translation (MT) experiments on the Europarl corpus. MT results show that word alignment based on this method can be used in a phrase-based machine translation system to yield up to 1% absolute improvement in BLEU score, compared to a conventional HMM, and 0.8% compared to a IBM model 4 based word alignment.

Journal ArticleDOI
TL;DR: This paper presents a novel cascade ensemble classifier system for the recognition of handwritten digits with encouraging results: a high reliability of 99.96% with minimal rejection, or a 99.59% correct recognition rate without rejection in the last cascade layer.

Proceedings ArticleDOI
22 Aug 2007
TL;DR: This paper introduces the widely used Hamming distance in channel coding to LBP so as to decrease the error rate caused by these noise disturbances of face images under expression and illumination condition changes.
Abstract: In this paper, we present a new LBP-based face recognition method with Hamming distance constraint. The traditional LBP operator uses a uniform pattern to describe the local features and assort the other nonuniform patterns to one additional class, for images under expression and illumination condition changes, this method could cause more inaccuracy and instability. By assuming that the illumination, pose or expression changes of a face image are some kinds of "noise", we introduce the widely used Hamming distance in channel coding to LBP so as to decrease the error rate caused by these noise disturbances. Experimental results on FRGC show that our method improves the recognition performance obviously than the traditional LBP-based face recognition methods when face images are under uncontrolled circumstances.

Journal ArticleDOI
TL;DR: This paper describes a new methodology and associated theoretical analysis for rapid and accurate extraction of level sets of a multivariate function from noisy data through a novel error metric sensitive to both the error in the location of the level-set estimate and the deviation of the function from the critical level.
Abstract: This paper describes a new methodology and associated theoretical analysis for rapid and accurate extraction of level sets of a multivariate function from noisy data. The identification of the boundaries of such sets is an important theoretical problem with applications for digital elevation maps, medical imaging, and pattern recognition. This problem is significantly different from classical segmentation because level-set boundaries may not correspond to singularities or edges in the underlying function; as a result, segmentation methods which rely upon detecting boundaries would be potentially ineffective in this regime. This issue is addressed in this paper through a novel error metric sensitive to both the error in the location of the level-set estimate and the deviation of the function from the critical level. Hoeffding's inequality is used to derive a novel regularization term that is distinctly different from regularization methods used in conventional image denoising settings. Building upon this foundation, it is possible to derive error performance bounds for the proposed estimator and demonstrate that it exhibits near minimax optimal error decay rates for large classes of level-set problems. The proposed method automatically adapts to the spatially varying regularity of both the boundary of the level set and the underlying function.

Proceedings Article
01 Jun 2007
TL;DR: It is shown how a BF containing n-grams can enable us to use much larger corpora and higher-order models complementing a conventional n- gram LM within an SMT system.
Abstract: A Bloom filter (BF) is a randomised data structure for set membership queries. Its space requirements are significantly below lossless information-theoretic lower bounds but it produces false positives with some quantifiable probability. Here we explore the use of BFs for language modelling in statistical machine translation. We show how a BF containing n-grams can enable us to use much larger corpora and higher-order models complementing a conventional n-gram LM within an SMT system. We also consider (i) how to include approximate frequency information efficiently within a BF and (ii) how to reduce the error rate of these models by first checking for lower-order sub-sequences in candidate ngrams. Our solutions in both cases retain the one-sided error guarantees of the BF while takingadvantageof theZipf-likedistribution of word frequencies to reduce the space requirements.