scispace - formally typeset
Search or ask a question

Showing papers on "Perplexity published in 1992"


Proceedings ArticleDOI
23 Feb 1992
TL;DR: This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus, a corpus containing significant quantities of both speech data and text data.
Abstract: The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.

1,100 citations


Proceedings Article
01 Jan 1992
TL;DR: The WSJ CSR Corpus as mentioned in this paper is the first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value.
Abstract: The DARPA Spoken Language System (SLS) community has long taken a leadership position in designing, implementing, and globally distributing significant speech corpora widely used for advancing speech recognition research. The Wall Street Journal (WSJ) CSR Corpus described here is the newest addition to this valuable set of resources. In contrast to previous corpora, the WSJ corpus will provide DARPA its first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value. This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus.

1,032 citations


Journal ArticleDOI
TL;DR: The SPHINX-II speech recognition system is reviewed and recent efforts on improved speech recognition are summarized.

576 citations



Proceedings ArticleDOI
Ute Essen1, Volker Steinbiss1
23 Mar 1992
TL;DR: Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus.
Abstract: Training corpora for stochastic language models are virtually always too small for maximum-likelihood estimation, so smoothing the models is of great importance. The authors derive the cooccurrence smoothing technique for stochastic language modeling and give experimental evidence for its validity. Using word-bigram language models, cooccurrence smoothing improved the test-set perplexity by 14% on a German 100000-word text corpus and by 10% on an English 1-million word corpus. >

91 citations


Proceedings ArticleDOI
23 Feb 1992
TL;DR: An algorithm to adapt a n-gram language model to a document as it is dictated is presented and the resulting minimum discrimination information model results in a perplexity of 208 instead of 290 for the static trigram model on a document of 321 words.
Abstract: We present an algorithm to adapt a n-gram language model to a document as it is dictated. The observed partial document is used to estimate a unigram distribution for the words that already occurred. Then, we find the closest n-gram distribution to the static n-gram distribution (using the discrimination information distance measure) and that satisfies the marginal constraints derived from the document. The resulting minimum discrimination information model results in a perplexity of 208 instead of 290 for the static trigram model on a document of 321 words.

90 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors present an algorithm to adapt a n-gram language model to a document as it is dictated that results in a perplexity of 208 instead of 290 for the static trigram model on a document of 321 words.
Abstract: The authors present an algorithm to adapt a n-gram language model to a document as it is dictated. The observed partial document is used to estimate a unigram distribution for the words that already occurred. Then, they find the closest n-gram distribution to the static n-gram distribution (using the discrimination information distance measure) that satisfies the marginal constraints derived from the document. The resulting minimum discrimination information model results in a perplexity of 208 instead of 290 for the static trigram model on a document of 321 words. >

88 citations


Proceedings ArticleDOI
23 Feb 1992
TL;DR: While users may adapt to some aspects of an SLS, certain types of user behavior may require technological solutions and hyperarticulation increases recognition errors, and while instructions can reduce this behavior, they do not result in improved recognition performance.
Abstract: We have analyzed three factors affecting user satisfaction and system performance using an SLS implemented in the ATIS domain. We have found that: (1) trade-offs between speed and accuracy have different implications for user satisfaction; (2) recognition performance improves over time, at least in part because of a reduction in sentence perplexity; and (3) hyperarticulation increases recognition errors, and while instructions can reduce this behavior, they do not result in improved recognition performance. We conclude that while users may adapt to some aspects of an SLS, certain types of user behavior may require technological solutions.

81 citations


Proceedings ArticleDOI
23 Feb 1992
TL;DR: Two attempt to improve stochastic language models are described, and a new type of adaptive language model is proposed, using a framework where one word sequence triggers another, causing its estimated probability to be raised.
Abstract: We describe two attempt to improve our stochastic language models. In the first, we identify a systematic overestimation in the traditional backoff model, and use statistical reasoning to correct it. Our modification results in up to 6% reduction in the perplexity of various tasks. Although the improvement is modest, it is achieved with hardly any increase in the complexity of the model. Both analysis and empirical data suggest that the modification is most suitable when training data is sparse.In the second attempt, we propose a new type of adaptive language model. Existing adaptive models use a dynamic cache, based on the history of the document seen up to that point. But another source of information in the history, within-document word sequence correlations, has not yet been tapped. We describe a model that attempts to capture this information, using a framework where one word sequence triggers another, causing its estimated probability to be raised. We discuss various issues in the design of such a model, and describe our first attempt at building one. Our preliminary results include a perplexity reduction of between 10% and 32%, depending on the test set.

45 citations


Proceedings ArticleDOI
Giulio Maltese1, F. Mancini1
23 Mar 1992
TL;DR: A technique to take into account grammatical and morphological information in a trigram-based statistical language model is presented, which reduces the effect of data sparseness in the trigram model due also to the way interpolation coefficients are chosen.
Abstract: A technique to take into account grammatical and morphological information in a trigram-based statistical language model is presented. This is automatically achieved by interpolating the trigram model (which uses sequences of words) with statistical models based on sequences of grammatical categories and/or lemmas. Such an approach reduces the effect of data sparseness in the trigram model due also to the way interpolation coefficients are chosen. With respect to trigrams, the authors obtained a significant reduction in perplexity on various texts even when combining a well-trained trigram model with a small grammatical/morphological model. >

35 citations


Proceedings ArticleDOI
23 Mar 1992
TL;DR: A novel type of hierarchical phoneme model for speaker adaptation, based on both hidden Markov models (HMM) and learned vector quantization (LVQ) networks is presented, achieving 82% word accuracy for speaker-dependent recognition and 73% in the speaker-adaptive mode.
Abstract: A novel type of hierarchical phoneme model for speaker adaptation, based on both hidden Markov models (HMM) and learned vector quantization (LVQ) networks is presented. Low-level tied LVQ phoneme models are trained speaker-dependently and independently, yielding a pool of speaker-biased phoneme models which can be mixed into high-level speaker-adaptive phoneme models. Rapid speaker adaptation is performed by finding an optimal mixture for these models at recognition time, given only a small amount of speech data; subsequently, the models are fine-tuned to the new speaker's voice by further parameter reestimation. In preliminary experiments with a continuous speech task using 40 context-free phoneme models at task perplexity 111, the authors achieved 82% word accuracy for speaker-dependent recognition and 73% in the speaker-adaptive mode. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: Two approaches for adapting a specific syllable trigram model to a new task are described, one uses a small amount of text data similar to the target task, and the other uses supervised learning using the most recent input phrases.
Abstract: The authors describe two approaches for adapting a specific syllable trigram model to a new task. One uses a small amount of text data similar to the target task, and the other uses supervised learning using the most recent input phrases. The effect of each adaptation is verified with syllable perplexity and phrase recognition. Where the syntactic knowledge was only the syllable trigram model, the perplexity was reduced from 54.5 to 18.1 for the adaptation using 100 phrases of similar text, and was reduced to 14.6 by the supervised learning. The recognition rates were also improved from 42.3% to 46.6% and 50.9%, respectively. Text similarity for speech recognition is also studied. >

Proceedings ArticleDOI
23 Mar 1992
TL;DR: A continuous speech recognition system 'niNja' (Natural language INterface in JApanese), is presented and an LR parsing algorithm with context-dependent phone models is proposed to get high accuracy and to reduce the required computations.
Abstract: A continuous speech recognition system 'niNja' (Natural language INterface in JApanese), is presented. Efficient search algorithms are proposed to get high accuracy and to reduce the required computations. First, an LR parsing algorithm with context-dependent phone models is proposed. Second, scores of the same phone models in different hypotheses at the phone-level are represented by the single score of the best hypotheses. The system is tested for the task with a 113 word vocabulary, with a word perplexity of 4.1. It produces a sentence accuracy of 97.3% for the 10 open speakers' 110 sentences and the error reduction is as much as 77% compared with using context independent phone models. >

Book ChapterDOI
01 Jan 1992
TL;DR: A large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories is described, and the methods used to provide high word recognition accuracy are discussed, focusing on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework.
Abstract: The field of large vocabulary continuous speech recognition has advanced to the point where there are several systems capable of providing greater than 95% word accuracy for speaker independent recognition, of a 1000 word vocabulary, spoken fluently for a task with a perplexity of about 60. There are several factors which account for the high performance achieved by these systems, including the use of effective feature analysis, the use of hidden Markov model (HMM) methodology, the use of context-dependent sub-word units to capture intra-word and inter-word phonemic variations, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe a large vocabulary continuous speech recognition system developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular we focus our discussion on the techniques adopted to select the set of fundamental speech units and to provide the acoustic models of these sub-word units based on a continuous density HMM (CDHMM) framework. Different modeling approaches, such as a discrete HMM and a tied-mixture HMM, will also be discussed and compared to the CDHMM approach.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: The authors describe their evaluation method, which is based on a relationship among perplexity (V/sub p/) on word-unit, sentence length, word (or phoneme) recognition rate (R/sub w/), and sentence recognition rate, which can predict the sentence recognition rates.
Abstract: An evaluation technique is very important for developing a successful continuous speech recognition system. The branching factor and the perplexity have been used to measure the complexity of speech recognition task. The authors describe their evaluation method, which is based on such a measure. They found the relationship among perplexity (V/sub p/) on word-unit (or phoneme-unit), sentence length (L), word (or phoneme) recognition rate (R/sub w/), and sentence recognition rate. From this relationship, they can predict the sentence recognition rate, if the word (or phoneme) recognition performance and task definition are given. The approximate equation is as follows: sentence recognition rate=(f(V/sub p/, R/sub w/))/sup L/, where f(V/sub p/,R/sub w/) denotes the word recognition rate for the vocabulary size V/sub p/ obtained by using this recognizer (R/sub w/) and this is estimated from the relationship between the number of categories and recognition rate. >

Proceedings ArticleDOI
23 Feb 1992
TL;DR: This paper introduces three recent topics in speech recognition research at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories, including a new HMM (hidden Markov model) technique that uses VQ-code bigrams to constrain the output probability distribution of the model according to theVQ-codes of previous frames.
Abstract: This paper introduces three recent topics in speech recognition research at NTT (Nippon Telegraph and Telephone) Human Interface Laboratories.The first topic is a new HMM (hidden Markov model) technique that uses VQ-code bigrams to constrain the output probability distribution of the model according to the VQ-codes of previous frames. The output probability distribution changes depending on the previous frames even in the same state, so this method reduces the overlap of feature distributions with different phonemes.The second topic is approaches for adapting a syllable trigram model to a new task in Japanese continuous speech recognition. An approach which uses the most recent input phrases for adaptation is effective in reducing the perplexity and improving phrase recognition rates.The third topic is stochastic language models for sequences of Japanese characters to be used in a Japanese dictation system with unlimited vocabulary. Japanese characters consist of Kanji (Chinese characters) and Kana (Japanese alphabets), and each Kanji has several readings depending on the context. Our dictation system uses character-trigram probabilities as a source model obtained from a text database consisting of both Kanji and Kana, and generates Kanji-and-Kana sequences directly from input speech.

Proceedings ArticleDOI
23 Feb 1992
TL;DR: A large, multi-component "general-purpose English, large vocabulary, natural language, high perplexity corpus" known as the DARPA Continuous speech Recognition (CSR) Corpus is developed.
Abstract: Continuous speech recognition research activities within the DARPA Spoken Language community have, within the past several years, been focussed on the Resource Management (RM) and Air Travel Information System (ATIS) corpora Within the past year, plans have been developed for a large, multi-component "general-purpose English, large vocabulary, natural language, high perplexity corpus" known as the DARPA [Wall Street Journal-based] Continuous speech Recognition (CSR) Corpus [1] Doug Paul, of MIT Lincoln Laboratory (MIT/LL), and Janet Baker, of Dragon Systems, are responsible for many of the details of these plans This corpus is intended to supplant the RM corpora and to supplement the ATIS corpora as resources for the DARPA speech recognition research community

Book ChapterDOI
01 Jan 1992
TL;DR: It is shown that the bigram language model component which can be dynamically changed according to the context of the dialogue, can further reduce this perplexity in the continuous speech recognizer.
Abstract: The German prototype (SUn Germ) of the European project SUndial 1 aims at a telephone based real time system for oral dialogues to inquire a database on intercity train schedules. To enhance system Performance one of our goals is to improve the recognition accuracy by minimizing the perplexity. We will show that our bigram language model component which can be dynamically changed according to the context of the dialogue, can further reduce this perplexity. This paper describes the integration of linguistic knowledge into the continuous speech recognizer2. Especially our experiments in the interaction of the language model component and the dialogue manager are presented. In our evaluation part we compare our approaches with and without dynamically varied language models by an overview of results of the Performance measurement.

Book ChapterDOI
01 Jan 1992
TL;DR: An automatic speech recognition system which is based on syllabic segmentation of the speech signal and stochastic models (HMMs) are used for representing demisyllable segments.
Abstract: The paper describes an automatic speech recognition system which is based on syllabic segmentation of the speech signal. Stochastic models (HMMs) are used for representing demisyllable segments. The advantages of syllabic processing within the different stages of the system (i.e. segmentation, phonetic classification, word and sentence recognition) are demonstrated and discussed on the basis of experimental results. Word and sentence recognition with a perplexity of 27 reached 74% and 96%, respectively.

Proceedings ArticleDOI
23 Mar 1992
TL;DR: It is shown how the compositional representation (CR) previously used for lexical access from sub-word recognizers for a relatively small word vocabulary can be extended to much larger vocabularies without further training.
Abstract: It is shown how the compositional representation (CR) previously used for lexical access from sub-word recognizers for a relatively small word vocabulary can be extended to much larger vocabularies without further training. This is demonstrated for the DARPA Resource Management database where, using sub-word units as input, words are presented distributively over a fixed number of units and classified using a simple network. Initially, the architecture is trained on 147 words achieving an accuracy 91.2%. Then, leaving the recognizer unchanged, it is shown how additional output units can be added to the network to increase the vocabulary to the complete set of 975 phonetically distinct words. On this extended vocabulary the performance dropped to 66% but this drop is less than the expected drop due to the perplexity increase. Further improvement would be achieved by improving the performance on the original data set. >

01 Mar 1992
TL;DR: The use of the 'thaumazo' expression in papyri letters has been investigated in this article, where it is shown how Paul's use of it concurs with this.
Abstract: The results of a previous study on the 'thaumazo' expression in a number of papyri letters are summarised and it is shown how Paul's use of the expression concurs with this. By means of xxxxxxx 'thaumazo', Paul expresses his perplexity about the conduct of the Galatians. The use of this expression results in a number of emotive implications regarding writer and recipients that have direct bearing on the function it performs in the letter. Among others it implies a severe rebuke of the recipients, with a view, however, to challenging them to return to the only message that holds good news for them. Since it also implies that returning to the one true gospel of salvation by faith is the only way open to them as people called by God, it is possible that this emotional transition to the arguments of the letter may already have won the day at this stage, at least in the case of some recipients.