Showing papers on "Word error rate published in 2002"

PDF

Open Access

Journal Article•DOI•

A direct approach to false discovery rates

[...]

John D. Storey¹•Institutions (1)

01 Aug 2002-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: The calculation of the q‐value is discussed, the pFDR analogue of the p‐value, which eliminates the need to set the error rate beforehand as is traditionally done, and can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.

...read moreread less

Abstract: Summary. Multiple-hypothesis testing involves guarding against much more complicated errors than single-hypothesis testing. Whereas we typically control the type I error rate for a single-hypothesis test, a compound error rate is controlled for multiple-hypothesis tests. For example, controlling the false discovery rate FDR traditionally involves intricate sequential p-value rejection methods based on the observed data. Whereas a sequential p-value method fixes the error rate and estimates its corresponding rejection region, we propose the opposite approach—we fix the rejection region and then estimate its corresponding error rate. This new approach offers increased applicability, accuracy and power. We apply the methodology to both the positive false discovery rate pFDR and FDR, and provide evidence for its benefits. It is shown that pFDR is probably the quantity of interest over FDR. Also discussed is the calculation of the q-value, the pFDR analogue of the p-value, which eliminates the need to set the error rate beforehand as is traditionally done. Some simple numerical examples are presented that show that this new approach can yield an increase of over eight times in power compared with the Benjamini–Hochberg FDR method.

...read moreread less

5,414 citations

Proceedings Article•DOI•

Minimum Phone Error and I-smoothing for improved discriminative training

[...]

Daniel Povey¹, Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

13 May 2002

TL;DR: The Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria are smoothed approximations to the phone or word error rate respectively and I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE).

...read moreread less

Abstract: In this paper we introduce the Minimum Phone Error (MPE) and Minimum Word Error (MWE) criteria for the discriminative training of HMM systems. The MPE/MWE criteria are smoothed approximations to the phone or word error rate respectively. We also discuss I-smoothing which is a novel technique for smoothing discriminative training criteria using statistics for maximum likelihood estimation (MLE). Experiments have been performed on the Switchboard/Call Home corpora of telephone conversations with up to 265 hours of training data. It is shown that for the maximum mutual information estimation (MMIE) criterion, I-smoothing reduces the word error rate (WER) by 0.4% absolute over the MMIE baseline. The combination of MPE and I-smoothing gives an improvement of 1 % over MMIE and a total reduction in WER of 4.8% absolute over the original MLE system.

...read moreread less

758 citations

Book Chapter•DOI•

Phrase-Based Statistical Machine Translation

[...]

Richard Zens¹, Franz Josef Och¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

16 Sep 2002-Lecture Notes in Computer Science

TL;DR: A translation model that is based on bilingual phrases to explicitly model the local context is presented and it is shown that this model performs better than the single-word based model.

...read moreread less

Abstract: This paper is based on the work carried out in the framework of the VERBMOBIL project, which is a limited-domain speech translation task (German-English). In the final evaluation, the statistical approach was found to perform best among five competing approaches.In this paper, we will further investigate the used statistical translation models. A shortcoming of the single-word based model is that it does not take contextual information into account for the translation decisions. We will present a translation model that is based on bilingual phrases to explicitly model the local context. We will show that this model performs better than the single-word based model. We will compare monotone and non-monotone search for this model and we will investigate the benefit of using the sum criterion instead of the maximum approximation.

...read moreread less

408 citations

Journal Article•DOI•

Large scale discriminative training of hidden Markov models for speech recognition

[...]

Philip C. Woodland¹, Daniel Povey¹•Institutions (1)

University of Cambridge¹

01 Jan 2002-Computer Speech & Language

TL;DR: It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR), which has allowed the straightforward integration of MMIe- trained HMMs into complex multi-pass systems for transcription of conversational telephone speech.

...read moreread less

360 citations

Proceedings Article•DOI•

The impact of technology scaling on soft error rate performance and limits to the efficacy of error correction

[...]

Robert Baumann¹•Institutions (1)

Texas Instruments¹

08 Dec 2002

TL;DR: Memory and logic scaling trends are discussed along with a method for determining logic SER, the soft error rate of advanced CMOS devices, which may limit future product reliability.

...read moreread less

Abstract: The soft error rate (SER) of advanced CMOS devices is higher than all other reliability mechanisms combined. Memories can be protected with error correction circuitry but SER in logic may limit future product reliability. Memory and logic scaling trends are discussed along with a method for determining logic SER.

...read moreread less

336 citations

Proceedings Article•

PAC-Bayes & Margins

[...]

John Langford¹, John Shawe-Taylor²•Institutions (2)

IBM¹, Royal Holloway, University of London²

01 Jan 2002

TL;DR: A new true error bound for classifiers with a margin which is simpler, functionally tighter, and more data-dependent than all previous bounds is shown.

...read moreread less

Abstract: We show two related things: (1) Given a classifier which consists of a weighted sum of features with a large margin, we can construct a stochastic classifier with negligibly larger training error rate. The stochastic classifier has a future error rate bound that depends on the margin distribution and is independent of the size of the base hypothesis class. (2) A new true error bound for classifiers with a margin which is simpler, functionally tighter, and more data-dependent than all previous bounds.

...read moreread less

209 citations

Journal Article•DOI•

Fast string correction with Levenshtein automata

[...]

Klaus U. Schulz¹, Stoyan Mihov²•Institutions (2)

Ludwig Maximilian University of Munich¹, Bulgarian Academy of Sciences²

01 Nov 2002-International Journal on Document Analysis and Recognition

TL;DR: This work shows how to compute, for any fixed bound n and any input word W, a deterministic Levenshtein automaton of degree n for W in time linear to the length of W, which leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries.

...read moreread less

Abstract: The Levenshtein distance between two words is the minimal number of insertions, deletions or substitutions that are needed to transform one word into the other. Levenshtein automata of degree n for a word W are defined as finite state automata that recognize the set of all words V where the Levenshtein distance between V and W does not exceed n. We show how to compute, for any fixed bound n and any input word W, a deterministic Levenshtein automaton of degree n for W in time linear to the length of W. Given an electronic dictionary that is implemented in the form of a trie or a finite state automaton, the Levenshtein automaton for W can be used to control search in the lexicon in such a way that exactly the lexical words V are generated where the Levenshtein distance between V and W does not exceed the given bound. This leads to a very fast method for correcting corrupted input words of unrestricted text using large electronic dictionaries. We then introduce a second method that avoids the explicit computation of Levenshtein automata and leads to even improved efficiency. Evaluation results are given that also address variants of both methods that are based on modified Levenshtein distances where further primitive edit operations (transpositions, merges and splits) are used.

...read moreread less

192 citations

Journal Article•DOI•

Combining acoustic and articulatory feature information for robust speech recognition

[...]

Katrin Kirchhoff¹, Gernot A. Fink², Gerhard Sagerer²•Institutions (2)

University of Washington¹, Bielefeld University²

01 Jul 2002-Speech Communication

TL;DR: It is shown that articulatory feature (AF) systems are capable of achieving a superior performance at high noise levels and that the combination of acoustic and AFs consistently leads to a significant reduction of word error rate across all acoustic conditions.

...read moreread less

180 citations

Journal Article•DOI•

Testing the correlation of word error rate and perplexity

[...]

Dietrich Klakow¹, Jochen Peters¹•Institutions (1)

Philips¹

01 Sep 2002-Speech Communication

TL;DR: This paper first presents some theoretical arguments for a close relationship between perplexity and word error rate, and the notion of uncertainty of a measurement is introduced and is then used to test the hypothesis thatword error rate and perplexity are correlated by a power law.

...read moreread less

180 citations

Journal Article•DOI•

Recognition confidence scoring and its use in speech understanding systems

[...]

Timothy J. Hazen¹, Stephanie Seneff¹, Joseph Polifroni¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2002-Computer Speech & Language

TL;DR: This paper presents an approach to recognition confidence scoring and a set of techniques for integrating confidence scores into the understanding and dialogue components of a speech understanding system and demonstrates a relative reduction in concept error rate.

...read moreread less

179 citations

Proceedings Article•DOI•

Generation of Word Graphs in Statistical Machine Translation

[...]

Nicola Ueffing¹, Franz Josef Och¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

06 Jul 2002

TL;DR: A method for constructing a word graph to represent alternative hypotheses in an efficient way to ensure that these hypotheses can be rescored using a refined language or translation model.

...read moreread less

Abstract: Statistical machine translation systems usually compute the single sentence that has the highest probability according to the models that are trained on data. We describe a method for constructing a word graph to represent alternative hypotheses in an efficient way. The advantage is that these hypotheses can be rescored using a refined language or translation model. Results are presented on the German-English Verbmobil corpus.

...read moreread less

Proceedings Article•DOI•

Connectionist language modeling for large vocabulary continuous speech recognition

[...]

Holger Schwenk¹, Jean-Luc Gauvain¹•Institutions (1)

Centre national de la recherche scientifique¹

13 May 2002

TL;DR: The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate.

...read moreread less

Abstract: This paper describes ongoing work on a new approach for language modeling for large vocabulary continuous speech recognition. Almost all state.. o. f-the-art systems use statistical n-gram language models estimated on text corpora. One principle problem with such language models is the fact that many of the n-grams are never observed even in very large training corpora, and therefore it is common to back-off to a lower-order model. In this paper we propose to address this problem by carrying out the estimation task in a continuous space, enabling a smooth interpolation of the probabilities. A neural network is used to learn the projection of the words onto a continuous space and to estimate the n-gram probabilities. The connectionist language model is being evaluated on the DARPA HUB5 conversational telephone speech recognition task and preliminary results show consistent improvements in both perplexity and word error rate.

...read moreread less

Journal Article•DOI•

Accurate DS-CDMA bit-error probability calculation in Rayleigh fading

[...]

Julian Cheng¹, Norman C. Beaulieu¹•Institutions (1)

University of Alberta¹

01 Jan 2002-IEEE Transactions on Wireless Communications

TL;DR: It is shown that the overall error rate can be expressed by a single integral whose integrand is nonnegative and exponentially decaying, and bit-error rates (BERs) are obtained to any desired accuracy with minimal computational complexity.

...read moreread less

Abstract: A binary direct-sequence spread-spectrum multiple-access system with random sequences in flat Rayleigh fading is considered. A new explicit closed-form expression is obtained for the characteristic function of the multiple-access interference signals. It is shown that the overall error rate can be expressed by a single integral whose integrand is nonnegative and exponentially decaying. Bit-error rates (BERs) are obtained with this expression to any desired accuracy with minimal computational complexity. The dependence of the system BER on the number of transitions in the target user signature chip sequence is explicitly derived. The results are used to examine definitively the validity of three Gaussian approximations and to compare the performances of synchronous systems to asynchronous systems.

...read moreread less

Proceedings Article•DOI•

Short-time Gaussianization for robust speaker verification

[...]

Bing Xiang¹, Upendra V. Chaudhari¹, Jiri Navratil¹, Ganesh N. Ramaswamy¹, Ramesh A. Gopinath¹ - Show less +1 more•Institutions (1)

IBM¹

13 May 2002

TL;DR: It is shown that one of the recent techniques used for speaker recognition, feature warping can be formulated within the framework of Gaussianization, and around 20% relative improvement in both equal error rate (EER) and minimum detection cost function (DCF) is obtained on NIST 2001 cellular phone data evaluation.

...read moreread less

Abstract: In this paper, a novel approach for robust speaker verification, namely short-time Gaussianization, is proposed. Short-time Gaussianization is initiated by a global linear transformation of the features, followed by a short-time windowed cumulative distribution function (CDF) matching. First, the linear transformation in the feature space leads to local independence or decorrelation. Then the CDF matching is applied to segments of speech localized in time and tries to warp a given feature so that its CDF matches normal distribution. It is shown that one of the recent techniques used for speaker recognition, feature warping [l] can be formulated within the framework of Gaussianization. Compared to the baseline system with cepstral mean subtraction (CMS), around 20% relative improvement in both equal error rate(EER) and minimum detection cost function (DCF) is obtained on NIST 2001 cellular phone data evaluation.

...read moreread less

A syllable-based approach for improved recognition of spoken names

[...]

Abhinav Sethy, Shrikanth S. Narayanan, Sarangarajan Parthasarathy

01 Sep 2002

TL;DR: This paper proposes the use of the syllable as the acoustic unit for spoken name recognition and shows how pronunciation variation modeling by syllables can help in improving recognition performance and reducing the system perplexity.

...read moreread less

Abstract: Recognition of spoken names is a challenging task for speech recognition systems because of the large variations in speaking styles, linguistic origins and pronunciation found in names. The complex linguistic nature of names makes it difficult to automatically generate pronunciation variations. For many applications the list of names tends to be in the order of several hundred thousands, making spoken name recognition a high perplexity task. Use of multiple pronunciations to account for the variations in names further increases the perplexity of the recognition system substantially. In this paper we propose the use of the syllable as the acoustic unit for spoken name recognition and show how pronunciation variation modeling with syllables can help in improving recognition performance and reducing the system perplexity. We present results comparing systems which use context dependent phones with syllable based systems, and demonstrate that a significant increase in recognition accuracy and speed, can be achieved by using the syllable as the acoustic unit for spoken name recognition. With a finite state grammar network for spoken name recognition, the observed recognition error rate for the syllable-based system was 40% less than the phone-based system. For syllable bigram based information retrieval schemes the observed recognition error rate was about 60% less than the corresponding phone system.

...read moreread less

Journal Article•DOI•

Structural maximum a posteriori linear regression for fast HMM adaptation

[...]

Olivier Siohan¹, Tor André Myrvoll¹, Chin-Hui Lee¹•Institutions (1)

Alcatel-Lucent¹

01 Jan 2002-Computer Speech & Language

TL;DR: The proposed algorithm, called structural MAPLR (SMAPLR), has been evaluated on the Spoke3 1993 test set of the WSJ task and it is shown that SMAPLR reduces the risk of overtraining and exploits the adaptation data much more efficiently than MLLR, leading to a significant reduction of the word error rate for any amount of adaptation data.

...read moreread less

Proceedings Article•DOI•

Feature analysis for automatic detection of pathological speech

[...]

Alireza A. Dibazar¹, Shrikanth S. Narayanan, Theodore W. Berger•Institutions (1)

University of Southern California¹

23 Oct 2002

TL;DR: This system employs noninvasive, non-expensive and fully automated measures of vocal tract characteristics and excitation information that represent 8% detection error rate improvement over the best performing classifier using carefully measured features prevalent in the state-of-the-art in pathological speech analysis.

...read moreread less

Abstract: This study focuses on a robust, rapid and accurate system for automatic detection of normal and pathological speech. This system employs noninvasive, non-expensive and fully automated measures of vocal tract characteristics and excitation information. Mel-frequency filterbank cepstral coefficients and measures of pitch dynamics were modeled by Gaussian mixtures in a hidden Markov model (HMM) classifier. The method was evaluated using the sustained phoneme /a/ data obtained from over 700 subjects of normal and different pathological cases from the Massachusetts Eye and Ear Infirmary (MEEI) database. This method attained 99.44% correct classification rates for discrimination of normal and pathological speech for sustained /a/. This represents 8% detection error rate improvement over the best performing classifier using carefully measured features prevalent in the state-of-the-art in pathological speech analysis.

...read moreread less

Journal Article•DOI•

The development of the HTK Broadcast News transcription system: an overview

[...]

Philip C. Woodland¹•Institutions (1)

University of Cambridge¹

01 May 2002-Speech Communication

TL;DR: A version of the HTK Broadcast News transcription system was developed that ran in less than 10 times real time with only a small increase in error rate which has been used for the bulk transcription of broadcast news for information retrieval from audio data.

...read moreread less

Journal Article•

Towards an automatic sign language recognition system using subunits

[...]

Britta Bauer, Karl-Friedrich Kraiss

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: In this article, an automatic recognition of German continuous sign language is presented. The statistical approach is based on the Bayes decision rule for minimum error rate, which can be used to reduce the amount of necessary training material.

...read moreread less

Abstract: This paper is concerned with the automatic recognition of German continuous sign language. For the most user-friendliness only one single color video camera is used for image recording. The statistical approach is based on the Bayes decision rule for minimum error rate. Following speech recognition system design, which are in general based on subunits, here the idea of an automatic sign language recognition system using subunits rather than models for whole signs will be outlined. The advantage of such a system will be a future reduction of necessary training material. Furthermore, a simplified enlargement of the existing vocabulary is expected. Since it is difficult to define subunits for sign language, this approach employs totally self-organized subunits called fenone. K-means algorithm is used for the definition of such fenones. The software prototype of the system is currently evaluated in experiments.

...read moreread less

Journal Article•DOI•

Writer adaptation for online handwriting recognition

[...]

S.D. Connell¹, Anil K. Jain²•Institutions (2)

Agilent Technologies¹, Michigan State University²

01 Mar 2002-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work uses writer-independent writing style models (lexemes) to identify the styles present in a particular writer's training data and updates these models using the writer's data, demonstrating the feasibility of this approach on both isolated handwritten character recognition and unconstrained word recognition tasks.

...read moreread less

Abstract: Writer-adaptation is the process of converting a writer-independent handwriting recognition system into a writer-dependent system. It can greatly increasing recognition accuracy, given adequate writer models. The limited amount of data a writer provides during training constrains the models' complexity. We show how appropriate use of writer-independent models is important for the adaptation. Our approach uses writer-independent writing style models (lexemes) to identify the styles present in a particular writer's training data. These models are then updated using the writer's data. Lexemes in the writer's data for which an inadequate number of training examples is available are replaced with the writer-independent models. We demonstrate the feasibility of this approach on both isolated handwritten character recognition and unconstrained word recognition tasks. Our results show an average reduction in error rate of 16.3 percent for lowercase characters as compared against representing each of the writer's character classes with a single model. In addition, an average error rate reduction of 9.2 percent is shown on handwritten words using only a small amount of data for adaptation.

...read moreread less

Proceedings Article•DOI•

Speaker verification using text-constrained Gaussian Mixture Models

[...]

Douglas E. Sturim¹, D.A. Reynolds¹, Robert B. Dunn¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

13 May 2002

TL;DR: An approach to close the gap between text-dependent and text-independent speaker verification performance is presented and results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate.

...read moreread less

Abstract: In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate of < 1 %.

...read moreread less

Bionic (Topological) Pattern Recognition——A New Model of Pattern Recognition Theory and Its Applications

[...]

Wang Shou-jue

01 Jan 2002

TL;DR: The Bionic Pattern Recognition uses neural networks, which acts by the method of covering the high dimensional geometrical distribution of the sample set in the feature space of any one of the certain kinds of samples.

...read moreread less

Abstract: A new model of pattern recognition principles,witch is based on "matter cognition"instead of "matter classification"in traditional statistical pattern recognition,has been proposed.This new model is better closer to the function of human being,rather than traditional statistical pattern recognition using"optimal seperating"as its main principle.So the new model of pattern recognition is called the Bionic Pattern Recognition.Its mathematical basis are topological analysis of sample set in the high dimensional feature space,therefore it is also called the Topological Pattern Recognition.The basic idea of this model is based on the fact of the continuity in the feature space of any one of the certain kinds of samples.We did experiments on recognition of omnidirectionally oriented rigid objects on the same level,with the Bionic Pattern Recognition using neural networks,which acts by the method of covering the high dimensional geometrical distribution of the sample set in the feature space.Many animal and vehicle models(even with rather similar shapes) were recognized omnidirectionally thousands of times.For total 8800 tests,the correct recognition rate is 99.75%,the error rate and the rejection rate are 0 and 0.25 respectively.

...read moreread less

Journal Article•DOI•

Finding optimal strategies for minimum-error quantum-state discrimination

[...]

Miroslav Ježek, Jaroslav Řeháček, Jaromír Fiurášek

19 Jun 2002-Physical Review A

TL;DR: The minimum attainable error rate of a device discriminating between three particularly chosen pure qubit states is calculated with the help of the algorithm proposed.

...read moreread less

Abstract: We propose a numerical algorithm for finding optimal measurements for quantum-state discrimination. The theory of the semidefinite programming provides a simple check of the optimality of the numerically obtained results. With the help of our algorithm we calculate the minimum attainable error rate of a device discriminating between three particularly chosen pure qubit states.

...read moreread less

Proceedings Article•DOI•

Gender-dependent phonetic refraction for speaker recognition

[...]

Walter Andrews, Mary A. Kohler, Joseph P. Campbell¹, John J. Godfrey, Jaime Hernandez-Cordero - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

13 May 2002

TL;DR: Improvements to an innovative high-performance speaker recognition system are described, incorporating gender-dependent phone models, pre-processing the speech files to remove cross-talk, and developing more sophisticated fusion techniques for the multi-language likelihood scores.

...read moreread less

Abstract: This paper describes improvements to an innovative high-performance speaker recognition system. Recent experiments showed that with sufficient training data phone strings from multiple languages are exceptional features for speaker recognition. The prototype phonetic speaker recognition system used phone sequences from six languages to produce an equal error rate of 11.5% on Switchboard-I audio files. The improved system described in this paper reduces the equal error rate to less then 4%. This is accomplished by incorporating gender-dependent phone models, pre-processing the speech files to remove cross-talk, and developing more sophisticated fusion techniques for the multi-language likelihood scores.

...read moreread less

Proceedings Article•DOI•

Video-based sign recognition using self-organizing subunits

[...]

B. Bauer¹, K.-F. Kraiss¹•Institutions (1)

RWTH Aachen University¹

10 Dec 2002

TL;DR: The idea of an automatic sign language recognition system using subunits rather than models for whole signs is outlined, which will be a future reduction of necessary training material and a simplified enlargement of the existing vocabulary.

...read moreread less

Abstract: This paper deals with the automatic recognition of German signs. The statistical approach is based on the Bayes decision rule for minimum error rate. Following speech recognition system designs, which are in general based on phonemes, here the idea of an automatic sign language recognition system using subunits rather than models for whole signs is outlined. The advantage of such a system will be a future reduction of necessary training material. Furthermore, a simplified enlargement of the existing vocabulary is expected, as new signs can be added to the vocabulary database without re-training the existing hidden Markov models (HMMs) for subunits. Since it is difficult to define subunits for sign language, this approach employs totally self-organized subunits. In first experiences a recognition accuracy of 92,5% was achieved for 100 signs, which were previously trained. For 50 new signs an accuracy of 81% was achieved without retraining of subunit-HMMs.

...read moreread less

Proceedings Article•DOI•

Speaker independent audio-visual continuous speech recognition

[...]

Luhong Liang¹, Xiaoxing Liu¹, Yibao Zhao¹, Xiaobo Pi¹, Ara V. Nefian¹ - Show less +1 more•Institutions (1)

Intel¹

07 Nov 2002

TL;DR: The speaker independent audio-visual continuous speech recognition system presented relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region using a coupled hidden Markov (CHMM) model.

...read moreread less

Abstract: The increase in the number of multimedia applications that require robust speech recognition systems determined a large interest in the study of audio-visual speech recognition (AVSR) systems. The use of visual features in AVSR is justified by both the audio and visual modality of the speech generation and the need for features that are invariant to acoustic noise perturbation. The speaker independent audio-visual continuous speech recognition system presented relies on a robust set of visual features obtained from the accurate detection and tracking of the mouth region. Further, the visual and acoustic observation sequences are integrated using a coupled hidden Markov (CHMM) model. The statistical properties of the CHMM can model the audio and visual state asynchrony while preserving their natural correlation over time. The experimental results show that the current system tested on the XM2VTS database reduces by over 55% the error rate of the audio only speech recognition system at SNR of 0 dB.

...read moreread less

Patent•

Multiple input, multiple output system and method

[...]

Eko N. Onggosanusi¹, Anand G. Dabak¹•Institutions (1)

Texas Instruments¹

13 May 2002

TL;DR: A linear transformation of parallel multiple input, multiple output (MIMO) encoded streams; also, space-time diversity and asymmetrical symbol mapping of parallel streams are discussed in this paper.

...read moreread less

Abstract: A linear transformation of parallel multiple input, multiple output (MIMO) encoded streams; also, space-time diversity and asymmetrical symbol mapping of parallel streams. Separately or together, these improve error rate performance as well as system throughput. Preferred embodiments include CDMA wireless systems with multiple antennas.

...read moreread less

Journal Article•DOI•

Combining Classifiers for word sense disambiguation

[...]

Radu Florian¹, Silviu Cucerzan¹, Charles Schafer¹, David Yarowsky¹•Institutions (1)

Johns Hopkins University¹

01 Dec 2002-Natural Language Engineering

TL;DR: This study examines several key issues in system combination for the word sense disambiguation task, ranging from algorithmic structure to parameter estimation, and demonstrates that the combination system obtains a significantly lower error rate than other systems participating in the SENSEVAL2 exercise.

...read moreread less

Abstract: Classifier combination is an effective and broadly useful method of improving system performance. This article investigates in depth a large number of both well-established and novel classifier combination approaches for the word sense disambiguation task, studied over a diverse classifier pool which includes feature-enhanced Naive Bayes, Cosine, Decision List, Transformation-based Learning and MMVC classifiers. Each classifier has access to the same rich feature space, comprised of distance weighted bag-of-lemmas, local ngram context and specific syntactic relations, such as Verb-Object and Noun-Modifier. This study examines several key issues in system combination for the word sense disambiguation task, ranging from algorithmic structure to parameter estimation. Experiments using the standard SENSEVAL2 lexical-sample data sets in four languages (English, Spanish, Swedish and Basque) demonstrate that the combination system obtains a significantly lower error rate when compared with other systems participating in the SENSEVAL2 exercise, yielding state-of-the-art performance on these data sets.

...read moreread less

Journal Article•DOI•

Phonetic Searching vs. LVCSR: How to Find What You Really Want in Audio Archives

[...]

Peter S. Cardillo, Mark A. Clements, Michael S. Miller

01 Jan 2002-International Journal of Speech Technology

TL;DR: In this paper, a new technique is presented for searching digital audio at the word/phrase level, which combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment.

...read moreread less

Abstract: A new technique is presented for searching digital audio at the word/phrase level. Unlike previous methods based upon Large Vocabulary Continuous Speech Recognition (LVCSR, with inherent problems of closed vocabulary and high word error rate), phonetic searching combines high speed and accuracy, supports open vocabulary, imposes low penalty for new words, permits phonetic and inexact spelling, enables user-determined depth of search, and is amenable to parallel execution for highly scalable deployment. A detailed comparison of accuracy between phonetic searching and one popular embodiment of LVCSR is presented along with other operating characteristics of the new technique. The current implementation for Digital Media Asset Management (DMAM) is described along with suggested applications in other domains.

...read moreread less

Proceedings Article•

Evaluation of SPLICE on the Aurora 2 and 3 tasks.

[...]

Jasha Droppo, Li Deng, Alex Acero

01 Sep 2002

TL;DR: Stereo-based Piecewise Linear Compensation for Environments (SPLICE) is a general framework for removing distortions from noisy speech cepstra that contains a non-parametric model for cepstral corruption, which is learned from two channels of training data.

...read moreread less

Abstract: Stereo-based Piecewise Linear Compensation for Environments (SPLICE) is a general framework for removing distortions from noisy speech cepstra. It contains a non-parametric model for cepstral corruption, which is learned from two channels of training data. We evaluate SPLICE on both the Aurora 2 and 3 tasks. These tasks consist of digit sequences in five European languages. Noise corruption is both synthetic (Aurora 2) and realistic (Aurora 3). For both the Aurora 2 and 3 tasks, we use the same training and testing procedure provided with the corpora. By holding the back-end constant, we ensure that any increase in word accuracy is due to our front-end processing techniques. In the Aurora 2 task, we achieve a 76.86% average decrease in word error rate with clean acoustic models, and an overall improvement of 62.63%. For the Aurora 3 task, we achieve a 75.06% average decrease in word error rate for the high-mismatch experiment, and an overall improvement of 47.19%.

...read moreread less

Collapse