Showing papers on "Perplexity published in 2007"

PDF

Open Access

Journal Article•DOI•

Automatic speech recognition and speech variability: A review

[...]

Mohamed Faouzi BenZeghiba, R. De Mori, Olivier Deroo, Stéphane Dupont, T. Erbes, D. Jouvet, Luciano Fissore, Pietro Laface, Alfred Mertins, Christophe Ris, Richard Rose, Vivek Tyagi, Christian Wellekens - Show less +9 more

01 Oct 2007-Speech Communication

TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

...read moreread less

507 citations

Journal Article•DOI•

Topic and role discovery in social networks with experiments on enron and academic email

[...]

Andrew McCallum¹, Xuerui Wang¹, Andres Corrada-Emmanuel¹•Institutions (1)

University of Massachusetts Amherst¹

01 Sep 2007-Journal of Artificial Intelligence Research

TL;DR: The Author-Recipient-Topic model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities, is presented and results are given, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages.

...read moreread less

Abstract: Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the attributes such as language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient--steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher's email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages. We also present the Role-Author-Recipient-Topic (RART) model, an extension to ART that explicitly represents people's roles.

...read moreread less

484 citations

Proceedings Article•

Distributed Inference for Latent Dirichlet Allocation

[...]

David Newman¹, Padhraic Smyth¹, Max Welling¹, Arthur U. Asuncion¹•Institutions (1)

University of California, Irvine¹

03 Dec 2007

TL;DR: Using five real-world text corpora, it is shown that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning.

...read moreread less

Abstract: We investigate the problem of learning a widely-used latent-variable model - the Latent Dirichlet Allocation (LDA) or "topic" model - using distributed computation, where each of P processors only sees 1/P of the total data set. We propose two distributed inference schemes that are motivated from different perspectives. The first scheme uses local Gibbs sampling on each processor with periodic updates—it is simple to implement and can be viewed as an approximation to a single processor implementation of Gibbs sampling. The second scheme relies on a hierarchical Bayesian extension of the standard LDA model to directly account for the fact that data are distributed across P processors—it has a theoretical guarantee of convergence but is more complex to implement than the approximate method. Using five real-world text corpora we show that distributed learning works very well for LDA models, i.e., perplexity and precision-recall scores for distributed learning are indistinguishable from those obtained with single-processor learning. Our extensive experimental results include large-scale distributed computation on 1000 virtual processors; and speedup experiments of learning topics in a 100-million word corpus using 16 processors.

...read moreread less

264 citations

Journal Article•DOI•

Random forests and the data sparseness problem in language modeling

[...]

Peng Xu¹, Frederick Jelinek¹•Institutions (1)

Johns Hopkins University¹

01 Jan 2007-Computer Speech & Language

TL;DR: A new smoothing technique based on randomly grown decision trees (DTs) is developed and applied to language modeling and the resulting RF language models are superior to the best known smoothing techniques, the interpolated Kneser–Ney smoothing in reducing both the perplexity and word error rate in large vocabulary state-of-the-art speech recognition systems.

...read moreread less

69 citations

Proceedings Article•DOI•

Multiscale topic tomography

[...]

Ramesh Nallapati¹, Susan Ditmore¹, John Lafferty¹, Kin Ung²•Institutions (2)

Carnegie Mellon University¹, Johnson & Johnson²

12 Aug 2007

TL;DR: A new probabilistic graphical model is proposed that employs non-homogeneous Poisson processes to model generation of word-counts and its modeling the evolution of topics at various time-scales of resolution, allowing the user to zoom in and out of the time-Scales.

...read moreread less

Abstract: Modeling the evolution of topics with time is of great value in automatic summarization and analysis of large document collections. In this work, we propose a new probabilistic graphical model to address this issue. The new model, which we call the Multiscale Topic Tomography Model (MTTM), employs non-homogeneous Poisson processes to model generation of word-counts. The evolution of topics is modeled through a multi-scale analysis using Haar wavelets. One of the new features of the model is its modeling the evolution of topics at various time-scales of resolution, allowing the user to zoom in and out of the time-scales. Our experiments on Science data using the new model uncovers some interesting patterns in topics. The new model is also comparable to LDA in predicting unseen data as demonstrated by our perplexity experiments.

...read moreread less

62 citations

Proceedings Article•DOI•

A computational model for unsupervised word discovery

[...]

L.F.M. ten Bosch, Bert Cranen¹•Institutions (1)

Radboud University Nijmegen¹

27 Aug 2007

TL;DR: An unsupervised algorithm for the discovery of words and word-like fragments from the speech signal, without using an upfront defined lexicon or acoustic phone models, based on a combination of acoustic pattern discovery, clustering, and temporal sequence learning is presented.

...read moreread less

Abstract: We present an unsupervised algorithm for the discovery of words and word-like fragments from the speech signal, without using an upfront defined lexicon or acoustic phone models. The algorithm is based on a combination of acoustic pattern discovery, clustering, and temporal sequence learning. It exploits the acoustic similarity between multiple acoustic tokens of the same words or word-like fragments. In its current form, the algorithm is able to discover words in speech with low perplexity (connected digits). Although its performance still falls off compared to mainstream ASR approaches, the value of the algorithm is its potential to serve as a computational model in two research directions. First, the algorithm may lead to an approach for speech recognition that is fundamentally liberated from the modelling constraints in conventional ASR. Second, the proposed algorithm can be interpreted as a computational model of language acquisition that takes actual speech as input and is able to find words as ’emergent’ properties from raw input.

...read moreread less

54 citations

Journal Article•DOI•

Backoff hierarchical class n-gram language models: effectiveness to model unseen events in speech recognition

[...]

Imed Zitouni¹•Institutions (1)

IBM¹

01 Jan 2007-Computer Speech & Language

TL;DR: The results suggest that the largest gains in performance are obtained when the test set contains a large number of unseen events, and the proposed backoff hierarchical class n-gram language models outperforms backoff n- gram language models.

...read moreread less

41 citations

Journal Article•DOI•

Evaluating Academic Development in a Time of Perplexity

[...]

Angela Brew¹•Institutions (1)

University of Sydney¹

01 Oct 2007-International Journal for Academic Development

TL;DR: In this article, the editorial team has been working to develop a strategic plan for this journal and this has raised some interesting issues. In the journal we aim to present the best available re...

...read moreread less

Abstract: In recent months the editorial team has been working to develop a strategic plan for this journal and this has raised some interesting issues. In the journal we aim to present the best available re...

...read moreread less

38 citations

Proceedings Article•DOI•

Language Model Adaptation in Machine Translation from Speech

[...]

Ivan Bulyko, Spyros Matsoukas, Richard Schwartz, Long Nguyen, J. Makhoul - Show less +1 more

15 Apr 2007

TL;DR: This paper investigates the use of several language model adaptation techniques applied to the task of machine translation from Arabic broadcast speech and finds that unsupervised and discriminative approaches slightly outperform the traditional perplexity-based optimization technique.

...read moreread less

Abstract: This paper investigates the use of several language model adaptation techniques applied to the task of machine translation from Arabic broadcast speech. Unsupervised and discriminative approaches slightly outperform the traditional perplexity-based optimization technique. Language model adaptation, when used for n-best rescoring, improves machine translation performance by 0.3-0.4 BLEU and reduces translation edit rate (TER) by 0.2-0.5% compared to an unadapted LM.

...read moreread less

34 citations

Proceedings Article•DOI•

People-LDA: Anchoring Topics to People using Face Recognition

[...]

Vidit Jain¹, Erik Learned-Miller¹, Andrew McCallum¹•Institutions (1)

University of Massachusetts Amherst¹

26 Dec 2007

TL;DR: This paper presents People-LDA, a new graphical model that tightly couples images and captions through a modern face recognizer, and shows how topics can be refined to be more closely related to a single person rather than describing groups of people in a related area.

...read moreread less

Abstract: Topic models have recently emerged as powerful tools for modeling topical trends in documents. Often the resulting topics are broad and generic, associating large groups of people and issues that are loosely related. In many cases, it may be desirable to influence the direction in which topic models develop. In this paper, we explore the idea of centering topics around people. In particular, given a large corpus of images featuring collections of people and associated captions, it seems natural to extract topics specifically focussed on each person. What words are most associated with George Bush? Which with Condoleezza Rice? Since people play such an important role in life, it is natural to anchor one topic to each person. In this paper, we present People-LDA, which uses the coherence efface images in news captions to guide the development of topics. In particular, we show how topics can be refined to be more closely related to a single person (like George Bush) rather than describing groups of people in a related area (like politics). To do this we introduce a new graphical model that tightly couples images and captions through a modern face recognizer. In addition to producing topics that are people specific (using images as a guiding force), the model also performs excellent soft clustering efface images, using the language model to boost performance. We present a variety of experiments comparing our method to recent developments in topic modeling and joint image-language modeling, showing that our model has lower perplexity for face identification than competing models and produces more refined topics.

...read moreread less

33 citations

Proceedings Article•DOI•

Hierarchical Pitman-Yor language models for ASR in meetings

[...]

Songfang Huang¹, Steve Renals¹•Institutions (1)

University of Edinburgh¹

01 Dec 2007

TL;DR: Experimental results on NIST RT06s evaluation meeting data verify that HPYLM is a competitive and promising language modeling technique, which consistently performs better than interpolated Kneser-Ney and modified Kneserser-ney n-gram LMs in terms of both perplexity and word error rate.

...read moreread less

Abstract: In this paper we investigate the application of a hierarchical Bayesian language model (LM) based on the Pitman-Yor process for automatic speech recognition (ASR) of multiparty meetings. The hierarchical Pitman-Yor language model (HPY-LM) provides a Bayesian interpretation of LM smoothing. An approximation to the HPYLM recovers the exact formulation of the interpolated Kneser-Ney smoothing method in n-gram models. This paper focuses on the application and scalability of HPYLM on a practical large vocabulary ASR system. Experimental results on NIST RT06s evaluation meeting data verify that HPYLM is a competitive and promising language modeling technique, which consistently performs better than interpolated Kneser-Ney and modified Kneser-Ney n-gram LMs in terms of both perplexity and word error rate.

...read moreread less

Proceedings Article•

Bilingual-LSA Based LM Adaptation for Spoken Language Translation

[...]

Yik-Cheung Tam, Ian R. Lane, Tanja Schultz

01 Jun 2007

TL;DR: A bLSA model is introduced which enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training and consistently improved machine translation quality on both speech and text based adaptation.

...read moreread less

Abstract: We propose a novel approach to crosslingual language model (LM) adaptation based on bilingual Latent Semantic Analysis (bLSA). A bLSA model is introduced which enables latent topic distributions to be efficiently transferred across languages by enforcing a one-to-one topic correspondence during training. Using the proposed bLSA framework crosslingual LM adaptation can be performed by, first, inferring the topic posterior distribution of the source text and then applying the inferred distribution to the target language N-gram LM via marginal adaptation. The proposed framework also enables rapid bootstrapping of LSA models for new languages based on a source LSA model from another language. On Chinese to English speech and text translation the proposed bLSA framework successfully reduced word perplexity of the English LM by over 27% for a unigram LM and up to 13.6% for a 4-gram LM. Furthermore, the proposed approach consistently improved machine translation quality on both speech and text based adaptation.

...read moreread less

Proceedings Article•DOI•

Word Topical Mixture Models for Dynamic Language Model Adaptation

[...]

Hsuan-Sheng Chin, Berlin Chen

15 Apr 2007

TL;DR: A word topical mixture model (TMM) is proposed to explore the co-occurrence relationship between words, as well as the long-span latent topical information, for language model adaptation for Mandarin broadcast news recognition.

...read moreread less

Abstract: This paper considers dynamic language model adaptation for Mandarin broadcast news recognition. A word topical mixture model (TMM) is proposed to explore the co-occurrence relationship between words, as well as the long-span latent topical information, for language model adaptation. The search history is modeled as a composite word TMM model for predicting the decoded word. The underlying characteristics and different kinds of model structures were extensively investigated, while the performance of word TMM was analyzed and verified by comparison with the conventional probabilistic latent semantic analysis-based language model (PLSALM) and trigger-based language model (TBLM) adaptation approaches. The large vocabulary continuous speech recognition (LVCSR) experiments were conducted on the Mandarin broadcast news collected in Taiwan. Very promising results in perplexity as well as character error rate reductions were initially obtained.

...read moreread less

Journal Article•DOI•

Symbolic phonetic features for modeling of pronunciation variation

[...]

Rebecca Bates¹, Mari Ostendorf¹, Richard Wright¹•Institutions (1)

University of Washington¹

01 Feb 2007-Speech Communication

TL;DR: A phonetic-feature-based prediction model is presented where phones are represented by a vector of symbolic features that can be on, off, unspecified or unused, and experiments show that feature-based models benefit from prosody cues, but not text, and that phone- based models do not benefit from any of the high-level cues explored here.

...read moreread less

Proceedings Article•DOI•

The keepup recommender system

[...]

Andrew Webster¹, Julita Vassileva¹•Institutions (1)

University of Saskatchewan¹

19 Oct 2007

TL;DR: This short paper outlines the design of a recommendation process that is based on an implicit social network where the relevancy and meaning of information can be negotiated not only with the recommender system but also with other users.

...read moreread less

Abstract: In this short paper, we describe our RSS recommender system, KeepUP. Too often recommender systems are seen as black box systems, resulting in general perplexity and dissatisfaction from users who are treated as passive, isolated consumers. Recent literature observes that recommendations rarely occur within such isolation and that there may be potential within more socially-orientated approaches. With KeepUP, we outline the design of a recommendation process that is based on an implicit social network where the relevancy and meaning of information can be negotiated not only with the recommender system but also with other users. Our overall goal is to support the formation and development of online communities of interest.

...read moreread less

Proceedings Article•DOI•

Joint Morphological-Lexical Language Modeling (JMLLM) for Arabic

[...]

Ruhi Sarikaya¹, Mohamed Afify¹, Yuqing Gao¹•Institutions (1)

IBM¹

15 Apr 2007

TL;DR: A new language modeling method is presented that takes advantage of Arabic morphology by combining morphological segments with the underlying lexical items and additional available information sources with regards to Morphological segments and lexical Items within a single joint model.

...read moreread less

Abstract: Language modeling for inflected languages such as Arabic poses new challenges for speech recognition due to rich morphology. The rich morphology results in large increases in perplexity and out-of-vocabulary (OOV) rate. In this study, we present a new language modeling method that takes advantage of Arabic morphology by combining morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items within a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates. Preliminary experiments detailed in this paper show satisfactory improvements over word and morpheme based trigram language models and their interpolations.

...read moreread less

A comparison of dialogue-state dependent language models

[...]

Frank Wessel¹, Andrea Baader, Hermann Ney•Institutions (1)

RWTH Aachen University¹

01 Jan 2007

TL;DR: In this paper, the dialogue state is defined by the set of parameters contained in the system prompt and a separate language model for each state can be constructed using the linear interpolation of all dialogue-state dependent language models and an automatic text clustering algorithm.

...read moreread less

Abstract: Dialogue-state dependent language models in automatic inquiry systems can be employed to improve speech recognition and understanding. In this paper, the dialogue state is defined by the set of parameters contained in the system prompt. Using this knowledge, a separate language model for each state can be constructed. In order to obtain robust language models we study the linear interpolation of all dialogue-state dependent language models and an automatic text clustering algorithm. In particular, we extend the clustering algorithm so as to automatically determine the optimal number of clusters. These clusters are then be combined with linear interpolation. We present experimental results on a Dutch corpus which has been recorded in the Netherlands with a train timetable information system in the framework of the ARISE project [1]. The perplexity, the word error rate, and the attribute error rate can be reduced significantly with all of these methods.

...read moreread less

Book Chapter•DOI•

Using prosodic features in language models for meetings

[...]

Songfang Huang¹, Steve Renals¹•Institutions (1)

University of Edinburgh¹

28 Jun 2007

TL;DR: Fourfold cross-validation experiments on the ICSI Meeting Corpus show that exploiting prosody for language modeling can significantly reduce the perplexity, and also have marginal reductions in word error rate.

...read moreread less

Abstract: Prosody has been actively studied as an important knowledge source for speech recognition and understanding. In this paper, we are concerned with the question of exploiting prosody for language models to aid automatic speech recognition in the context of meetings. Using an automatic syllable detection algorithm, the syllable-based prosodic features are extracted to form the prosodic representation for each word. Two modeling approaches are then investigated. One is based on a factored language model, which directly uses the prosodic representation and treats it as a 'word'. Instead of direct association, the second approach provides a richer probabilistic structure within a hierarchical Bayesian framework by introducing an intermediate latent variable to represent similar prosodic patterns shared by groups of words. Fourfold cross-validation experiments on the ICSI Meeting Corpus show that exploiting prosody for language modeling can significantly reduce the perplexity, and also have marginal reductions in word error rate.

...read moreread less

Proceedings Article•DOI•

PLSA-based Topic Detection in Meetings for Adaptation of Lexicon and Language Model

[...]

Yuya Akita, Yusuke Nemoto, Tatsuya Kawahara

27 Aug 2007

TL;DR: A topic detection approach based on a probabilistic framework is proposed to realize topic adaptation of speech recognition systems for long speech archives such as meetings, demonstrating significant reduction of perplexity and outof-vocabulary rates as well as robustness against ASR errors.

...read moreread less

Abstract: A topic detection approach based on a probabilistic framework is proposed to realize topic adaptation of speech recognition systems for long speech archives such as meetings. Since topics in such speech are not clearly defined unlike news stories, we adopt a probabilistic representation of topics based on probabilistic latent semantic analysis (PLSA). A topical sub-space is constructed by PLSA, and speech segments are projected to the subspace, then each segment is represented by a vector which consists of topic probabilities obtained by the projection. Topic detection is performed by clustering these vectors, and topic adaptation is done by collecting relevant texts based on the similarity in this probabilistic representation. In experimental evaluations, the proposed approach demonstrated significant reduction of perplexity and outof-vocabulary rates as well as robustness against ASR errors.

...read moreread less

Proceedings Article•DOI•

Investigating linguistic knowledge in a maximum entropy token-based language model

[...]

Jia Cui¹, Yi Su¹, Keith Hall¹, Frederick Jelinek¹•Institutions (1)

Johns Hopkins University¹

01 Dec 2007

TL;DR: A novel language model capable of incorporating various types of linguistic information as encoded in the form of a token, a (word, label)-tuple producing sequences of words with trivial output distributions is presented.

...read moreread less

Abstract: We present a novel language model capable of incorporating various types of linguistic information as encoded in the form of a token, a (word, label)-tuple. Using tokens as hidden states, our model is effectively a hidden Markov model (HMM) producing sequences of words with trivial output distributions. The transition probabilities, however, are computed using a maximum entropy model to take advantage of potentially overlapping features. We investigated different types of labels with a wide range of linguistic implications. These models outperform Kneser-Ney smoothed n-gram models both in terms of perplexity on standard datasets and in terms of word error rate for a large vocabulary speech recognition system.

...read moreread less

Journal Article•DOI•

A Novel Method of Language Modeling for Automatic Captioning in TC Video Teleconferencing

[...]

Xiaojia Zhang¹, Yunxin Zhao¹, L. Schopp¹•Institutions (1)

University of Missouri¹

01 May 2007

TL;DR: The proposed modeling and estimation methods for the mixture language model (LM) led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy.

...read moreread less

Abstract: We are developing an automatic captioning system for teleconsultation video teleconferencing (TC-VTC) in telemedicine, based on large vocabulary conversational speech recognition. In TC-VTC, doctors' speech contains a large number of infrequently used medical terms in spontaneous styles. Due to insufficiency of data, we adopted mixture language modeling, with models trained from several datasets of medical and nonmedical domains. This paper proposes novel modeling and estimation methods for the mixture language model (LM). Component LMs are trained from individual datasets, with class n-gram LMs trained from in-domain datasets and word n-gram LMs trained from out-of-domain datasets, and they are interpolated into a mixture LM. For class LMs, semantic categories are used for class definition on medical terms, names, and digits. The interpolation weights of a mixture LM are estimated by a greedy algorithm of forward weight adjustment (FWA). The proposed mixing of in-domain class LMs and out-of-domain word LMs, the semantic definitions of word classes, as well as the weight-estimation algorithm of FWA are effective on the TC-VTC task. As compared with using mixtures of word LMs with weights estimated by the conventional expectation-maximization algorithm, the proposed methods led to a 21% reduction of perplexity on test sets of five doctors, which translated into improvements of captioning accuracy

...read moreread less

Proceedings Article•DOI•

Refine bigram PLSA model by assigning latent topics unevenly

[...]

Jiazhong Nie¹, Runxin Li¹, Dingsheng Luo¹, Xihong Wu¹•Institutions (1)

Peking University¹

01 Dec 2007

TL;DR: A new bigram topic model, the bigram PLSA model, is presented, and a modified training strategy that unevenly assigns latent topics to context words according to an estimation of their latent semantic complexities is proposed.

...read moreread less

Abstract: As an important component in many speech and language processing applications, statistical language model has been widely investigated. The bigram topic model, which combines advantages of both the traditional n-gram model and the topic model, turns out to be a promising language modeling approach. However, the original bigram topic model assigns the same topic number for each context word but ignores the fact that there are different complexities to the latent semantics of context words, we present a new bigram topic model, the bigram PLSA model, and propose a modified training strategy that unevenly assigns latent topics to context words according to an estimation of their latent semantic complexities. As a consequence, a refined bigram PLSA model is reached. Experiments on HUB4 Mandarin test transcriptions reveal the superiority over existing models and further performance improvements on perplexity are achieved through the use of the refined bigram PLSA model.

...read moreread less

Proceedings Article•DOI•

Topic-Independent Speaking-Style Transformation of Language Model for Spontaneous Speech Recognition

[...]

Yuya Akita¹, Tatsuya Kawahara¹•Institutions (1)

Kyoto University¹

15 Apr 2007

TL;DR: This paper investigates several methods that combine POS-based model or integrate POS information in the ME (maximum entropy) scheme, which achieve significant reduction in perplexity and WER in a meeting transcription task.

...read moreread less

Abstract: For language modeling of spontaneous speech, we propose a novel approach, based on the statistical machine translation framework, which transforms a document-style model to the spoken style. For better coverage and more reliable estimation, incorporation of POS (part-of-speech) information is explored in addition to lexical information. In this paper, we investigate several methods that combine POS-based model or integrate POS information in the ME (maximum entropy) scheme. They achieve significant reduction in perplexity and WER in a meeting transcription task. Moreover, the model is applied to different domains or committee meetings of different topics. As a result, even larger perplexity reduction is achieved compared with the case tested in the same domain. The result demonstrates the generality and portability of the model.

...read moreread less

Book Chapter•DOI•

Speech and Speaker Recognition Evaluation

[...]

Sadaoki Furui¹•Institutions (1)

Tokyo Institute of Technology¹

01 Jan 2007

TL;DR: This chapter overviews techniques for evaluating speech and speaker recognition systems, and describes principles of recognition methods, and specifies types of systems as well as their applications.

...read moreread less

Abstract: This chapter overviews techniques for evaluating speech and speaker recognition systems. The chapter first describes principles of recognition methods, and specifies types of systems as well as their applications. The evaluation methods can be classified into subjective and objective methods, among which the chap- ter focuses on the latter methods. In order to compare/normalize performances of different speech recognition systems, test set perplexity is introduced as a measure of the difficulty of each task. Objective evaluation methods of spoken dialogue and transcription systems are respectively described. Speaker recogni- tion can be classified into speaker identification and verification, and most of the application systems fall into the speaker verification category. Since varia- tion of speech features over time is a serious problem in speaker recognition, normalization and adaptation techniques are also described. Speaker verification performance is typically measured by equal error rate, detection error trade-off (DET) curves, and a weighted cost value. The chapter concludes by summarizing various issues for future research.

...read moreread less

Proceedings Article•DOI•

Implicitly Supervised Language Model Adaptation for Meeting Transcription

[...]

David Huggins-Daines¹, Alexander I. Rudnicky¹•Institutions (1)

Carnegie Mellon University¹

22 Apr 2007

TL;DR: By applying a two-step language model adaptation process based on notes and agenda items, this work was able to reduce perplexity by 9% and word error rate by 4% relative on a set of ten meetings recorded in-house.

...read moreread less

Abstract: We describe the use of meeting metadata, acquired using a computerized meeting organization and note-taking system, to improve automatic transcription of meetings. By applying a two-step language model adaptation process based on notes and agenda items, we were able to reduce perplexity by 9% and word error rate by 4% relative on a set of ten meetings recorded in-house. This approach can be used to leverage other types of metadata.

...read moreread less

Proceedings Article•DOI•

Dialect Classification on Printed Text using Perplexity Measure and Conditional Random Fields

[...]

Rongqing Huang, John H. L. Hansen

15 Apr 2007

TL;DR: Conditional random fields (CRF) is applied to train the language model and classify documents, and significant improvement on dialect classification is achieved by using the CRF based classifier, especially on the small size documents.

...read moreread less

Abstract: Studies have shown that dialect variation has a significant impact in speech recognition performance, and therefore it is important to be able to perform effective dialect classification to improve speech systems. Dialects differ at the acoustic, grammar, and vocabulary levels. In this study, topic-specific printed text dialect data are collected from the ten major newspapers in Australia, United Kingdom, and United States. An n-gram language model is trained for each topic in each country/dialect. The perplexity measure is applied to classify the dialect-dependent documents. In addition to the n-gram information, further features can be extracted from text structure. Conditional random fields (CRF) is such a model which can extract different levels of features and is still mathematically tractable. The CRF is applied to train the language model and classify documents. Significant improvement on dialect classification is achieved by using the CRF based classifier, especially on the small size documents (10% to 22% relative error reduction). Text classification based on variable size documents is explored and a document with several hundred words is shown to be sufficient for dialect classification. The vocabulary difference among the text documents from different countries are explored and the dialect difference is smoothly connected with the vocabulary difference. Five document topics are evaluated and performance for cross topic dialect classification is explored.

...read moreread less

Journal Issue•DOI•

Topic-based language models using Dirichlet Mixtures

[...]

Sadamitsu Kugatsu¹, Takuya Mishina², Mikio Yamamoto¹•Institutions (2)

University of Tsukuba¹, IBM²

01 Nov 2007-Systems and Computers in Japan

TL;DR: A generative text model using Dirichlet Mixtures as a distribution for parameters of a multinomial distribution, whose compound distribution is Polya Mixtures, is proposed and it is shown that the model exhibits high performance in application to statistical language models.

...read moreread less

Abstract: We propose a generative text model using Dirichlet Mixtures as a distribution for parameters of a multinomial distribution, whose compound distribution is Polya Mixtures, and show that the model exhibits high performance in application to statistical language models. In this paper, we discuss some methods for estimating parameters of Dirichlet Mixtures and for estimating the expectation values of the a posteriori distribution needed for adaptation, and then compare them with two previous text models. The first conventional model is the Mixture of Unigrams, which is often used for incorporating topics into statistical language models. The second one is LDA (Latent Dirichlet Allocation), a typical generative text model. In an experiment using document probabilities and dynamic adaptation of n-gram models for newspaper articles, we show that the proposed model, in comparison with the two previous models, can achieve a lower perplexity at low mixture numbers. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(12): 76– 85, 2007; Published online in Wiley InterScience (). DOI 10.1002-scj.20629

...read moreread less

Proceedings Article•DOI•

Discriminative language model adaptation for Mandarin broadcast speech transcription and translation

[...]

Xunying Liu¹, William Byrne¹, Mark J. F. Gales¹, A. de Gispert¹, Marcus Tomalin¹, Philip C. Woodland¹, Kai Yu¹ - Show less +3 more•Institutions (1)

University of Cambridge¹

01 Dec 2007

TL;DR: The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation and generalizes to a variety of forms of recognition and translation error metrics.

...read moreread less

Abstract: This paper investigates unsupervised test-time adaptation of language models (LM) using discriminative methods for a Mandarin broadcast speech transcription and translation task. A standard approach to adapt interpolated language models to is to optimize the component weights by minimizing the perplexity on supervision data. This is a widely made approximation for language modeling in automatic speech recognition (ASR) systems. For speech translation tasks, it is unclear whether a strong correlation still exists between perplexity and various forms of error cost functions in recognition and translation stages. The proposed minimum Bayes risk (MBR) based approach provides a flexible framework for unsupervised LM adaptation. It generalizes to a variety of forms of recognition and translation error metrics. LM adaptation is performed at the audio document level using either the character error rate (CER), or translation edit rate (TER) as the cost function. An efficient parameter estimation scheme using the extended Baum-Welch (EBW) algorithm is proposed. Experimental results on a state-of-the-art speech recognition and translation system are presented. The MBR adapted language models gave the best recognition and translation performance and reduced the TER score by up to 0.54% absolute.

...read moreread less

Proceedings Article•DOI•

Smoothing Algorithm for N-Gram Model Using Agglutinative Characteristic of Korean

[...]

Jae Hyun Park¹, Young-In Song¹, Hae-Chang Rim¹•Institutions (1)

Korea University¹

17 Sep 2007

TL;DR: A novel method is presented that adjusts the improperly assigned probabilities of unseen n-grams by taking advantage of the agglutinative characteristics of Korean language to prevent grammatically improper n-rams from achieving relatively higher probability and to assign more probability mass to proper n- grams.

...read moreread less

Abstract: Smoothing for an n-gram language model is an algorithm that can assign a non-zero probability to an unseen n-gram. Smoothing is an essential technique for an n-gram language model due to the data sparseness problem. However, in some circumstances it assigns an improper amount of probability to unseen n-grams. In this paper, we present a novel method that adjusts the improperly assigned probabilities of unseen n-grams by taking advantage of the agglutinative characteristics of Korean language. In Korean, the grammatically proper class of a morpheme can be predicted by knowing the previous morpheme. By using this characteristic, we try to prevent grammatically improper n-grams from achieving relatively higher probability and to assign more probability mass to proper n-grams. Experimental results show that the proposed method can achieve 8.6% - 12.5% perplexity reductions for Katz backoff algorithm and 4.9% - 7.0% perplexity reductions for Kneser-Ney Smoothing.

...read moreread less

Proceedings Article•DOI•

Improving language models by using distant information

[...]

Armelle Brun, David Langlois¹, Kamel Smaïli•Institutions (1)

Institut Universitaire de Formation des Maîtres¹

01 Feb 2007

TL;DR: It is shown that it is possible to use n-gram models considering histories different from those used during training, called crossing context models, which achieves an improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER.

...read moreread less

Abstract: This study examines how to take originally advantage from distant information in statistical language models. We show that it is possible to use n-gram models considering histories different from those used during training. These models are called crossing context models. Our study deals with classical and distant n-gram models. A mixture of four models is proposed and evaluated. A bigram linear mixture achieves an improvement of 14% in terms of perplexity. Moreover the trigram mixture outperforms the standard trigram by 5.6%. These improvements have been obtained without complexifying standard n-gram models. The resulting mixture language model has been integrated into a speech recognition system. Its evaluation achieves a slight improvement in terms of word error rate on the data used for the francophone evaluation campaign ESTER [1]. Finally, the impact of the proposed crossing context language models on performance is presented according to various speakers.

...read moreread less