Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Style & Topic Language Model Adaptation Using HMM-LDA

[...]

Bo-June (Paul) Hsu¹, James Glass¹•Institutions (1)

Massachusetts Institute of Technology¹

22 Jul 2006

TL;DR: This work investigates the use of the Hidden Markov Model with Latent Dirichlet Allocation (HMM-LDA) to obtain syntactic state and semantic topic assignments to word instances in the training corpus and constructs style and topic models that better model the target document.

...read moreread less

Abstract: Adapting language models across styles and topics, such as for lecture transcription, involves combining generic style models with topic-specific content relevant to the target document. In this work, we investigate the use of the Hidden Markov Model with Latent Dirichlet Allocation (HMM-LDA) to obtain syntactic state and semantic topic assignments to word instances in the training corpus. From these context-dependent labels, we construct style and topic models that better model the target document, and extend the traditional bag-of-words topic models to n-grams. Experiments with static model interpolation yielded a perplexity and relative word error rate (WER) reduction of 7.1% and 2.1%, respectively, over an adapted trigram baseline. Adaptive interpolation of mixture components further reduced perplexity by 9.5% and WER by a modest 0.3%.

...read moreread less

78 citations

Dissertation•

Novel estimation methods for unsupervised discovery of latent structure in natural language text

[...]

Jason Eisner¹, Noah A. Smith¹•Institutions (1)

Johns Hopkins University¹

01 Jan 2007

TL;DR: The novel estimation methods presented are better suited to adaptation for real engineering tasks than the maximum likelihood baseline, and are shown to achieve significant improvements over maximum likelihood estimation and maximum a posteriori estimation, for a state-of-the-art probabilistic model used in dependency grammar induction.

...read moreread less

Abstract: This thesis is about estimating probabilistic models to uncover useful hidden structure in data; specifically, we address the problem of discovering syntactic structure in natural language text. We present three new parameter estimation techniques that generalize the standard approach, maximum likelihood estimation, in different ways. Contrastive estimation maximizes the conditional probability of the observed data given a "neighborhood" of implicit negative examples. Skewed deterministic annealing locally maximizes likelihood using a cautious parameter search strategy that starts with an easier optimization problem than likelihood, and iteratively moves to harder problems, culminating in likelihood. Structural annealing is similar, but starts with a heavy bias toward simple syntactic structures and gradually relaxes the bias. Our estimation methods do not make use of annotated examples. We consider their performance in both an unsupervised model selection setting, where models trained under different initialization and regularization settings are compared by evaluating the training objective on a small set of unseen, unannotated development data, and supervised model selection, where the most accurate model on the development set (now with annotations) is selected. The latter is far superior, but surprisingly few annotated examples are required. The experimentation focuses on a single dependency grammar induction task, in depth. The aim is to give strong support for the usefulness of the new techniques in one scenario. It must be noted, however, that the task (as defined here and in prior work) is somewhat artificial, and improved performance on this particular task is not a direct contribution to the greater field of natural language processing. The real problem the task seeks to simulate---the induction of syntactic structure in natural language text---is certainly of interest to the community, but this thesis does not directly approach the problem of exploiting induced syntax in applications. We also do not attempt any realistic simulation of human language learning, as our newspaper text data do not resemble the data encountered by a child during language acquisition. Further, our iterative learning algorithms assume a fixed batch of data that can be repeatedly accessed, not a long stream of data observed over time in tandem with acquisition. (Of course, the cognitive criticisms apply to virtually all existing learning methods in natural language processing, not just the new ones presented here.) Nonetheless, the novel estimation methods presented are, we will argue, better suited to adaptation for real engineering tasks than the maximum likelihood baseline. Our new methods are shown to achieve significant improvements over maximum likelihood estimation and maximum a posteriori estimation, using the EM algorithm, for a state-of-the-art probabilistic model used in dependency grammar induction (Klein and Manning, 2004). The task is to induce dependency trees from part-of-speech tag sequences; we follow standard practice and train and test on sequences of ten tags or fewer. Our results are the best published to date for six languages, with supervised model selection: English (improvement from 41.6% directed attachment accuracy to 66.7%, a 43% relative error rate reduction), German (54.4 → 71.8%, a 38% error reduction), Bulgarian (45.6% → 58.3%, a 23% error reduction), Mandarin (50.0% → 58.0%, a 16% error reduction), Turkish (48.0% → 62.4%, a 28% error reduction, but only 2% error reduction from a left-branching baseline, which gives 61.8%), and Portuguese (42.5% → 71.8%, a 51% error reduction). We also demonstrate the success of contrastive estimation at learning to disambiguate part-of-speech tags (from unannotated English text): 78.0% to 88.7% tagging accuracy on a known-dictionary task (a 49% relative error rate reduction), and 66.5% to 78.4% on a more difficult task with less dictionary knowledge (a 35% error rate reduction). The experiments presented in this thesis give one of the most thorough explorations to date of unsupervised parameter estimation for models of discrete structures. Two sides of the problem are considered in depth: the choice of objective function to be optimized during training, and the method of optimizing it. We find that both are important in unsupervised learning. Our best results on most of the six languages involve both improved objectives and improved search. The methods presented in this thesis were originally presented in Smith and Eisner (2004, 2005a,b, 2006). The thesis gives a more thorough exposition, relating the methods to other work, presents more experimental results and error analysis, and directly compares the methods to each other.

...read moreread less

78 citations

Proceedings Article•DOI•

N-Gram Posterior Probabilities for Statistical Machine Translation

[...]

Richard Zens¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

08 Jun 2006

TL;DR: A sentence length model based on posterior probabilities is introduced and significant improvements on the Chinese-English NIST task and n-gram posterior probabilities are introduced.

...read moreread less

Abstract: Word posterior probabilities are a common approach for confidence estimation in automatic speech recognition and machine translation. We will generalize this idea and introduce n-gram posterior probabilities and show how these can be used to improve translation quality. Additionally, we will introduce a sentence length model based on posterior probabilities. We will show significant improvements on the Chinese-English NIST task. The absolute improvements of the BLEU score is between 1.1% and 1.6%.

...read moreread less

78 citations

Journal Article•DOI•

Likelihood normalization for speaker verification using a phoneme- and speaker-independent model

[...]

Tomoko Matsui, Sadaoki Furui

01 Aug 1995-Speech Communication

TL;DR: Two methods for creating a phoneme- and speaker-independent model that greatly reduce the amount of calculation needed for similarity (or likelihood) normalization in speaker verification are proposed.

...read moreread less

78 citations

Journal Article•DOI•

Hierarchical fusion of multi-spectral face images for improved recognition performance

[...]

Richa Singh¹, Mayank Vatsa¹, Afzel Noore¹•Institutions (1)

West Virginia University¹

01 Apr 2008-Information Fusion

TL;DR: Experimental results show that the combination of visible light and short-wave IR spectrum face images yielded the best recognition performance with an equal error rate, and the proposed image-feature fusion algorithm also performed better than existing fusion algorithms.

...read moreread less

78 citations

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics