scispace - formally typeset
Search or ask a question
Journal ArticleDOI

How may I help you

01 Oct 1997-Speech Communication (Elsevier Science Publishers B. V.)-Vol. 23, Iss: 1, pp 113-127
TL;DR: This paper focuses on the task of automatically routing telephone calls based on a user's fluently spoken response to the open-ended prompt of “ How may I help you? ”.
About: This article is published in Speech Communication.The article was published on 1997-10-01. It has received 664 citations till now. The article focuses on the topics: Spoken dialog systems & Natural language.
Citations
More filters
Book
01 Jan 2000
TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.
Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

3,794 citations


Cites background from "How may I help you"

  • ...(University of Colorado, Boulder) Upper Saddle River, NJ: Prentice Hall (Prentice Hall series in artificial intelligence, edited by Stuart Russell and Peter Norvig), 2000, xxvi+934 pp; hardbound, ISBN 0-13-095069-6, $64.00 Reviewed by Virginia Teller Hunter College and the Graduate Center, The City…...

    [...]

  • ...(University of Colorado, Boulder) Upper Saddle River, NJ: Prentice Hall (Prentice Hall series in artificial intelligence, edited by Stuart Russell and Peter Norvig), 2000, xxvi+934 pp; hardbound, ISBN 0-13-095069-6, $64.00 Reviewed by Virginia Teller Hunter College and the Graduate Center, The City University of New York...

    [...]

Journal ArticleDOI
TL;DR: In this article, a new and improved family of boosting algorithms is proposed for text categorization tasks, called BoosTexter, which learns from examples to perform multiclass text and speech categorization.
Abstract: This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identification from unconstrained spoken customer responses.

2,108 citations

Journal ArticleDOI
TL;DR: This paper cast a spoken dialog system as a partially observable Markov decision process (POMDP) and shows how this formulation unifies and extends existing techniques to form a single principled framework.

972 citations

Posted Content
TL;DR: A neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps, that improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.
Abstract: Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue. In an effort to model this kind of generative process, we propose a neural network-based generative architecture, with latent stochastic variables that span a variable number of time steps. We apply the proposed model to the task of dialogue response generation and compare it with recent neural network architectures. We evaluate the model performance through automatic evaluation metrics and by carrying out a human evaluation. The experiments demonstrate that our model improves upon recently proposed models and that the latent variables facilitate the generation of long outputs and maintain the context.

853 citations

Journal ArticleDOI
TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.
Abstract: In early 1997, our group initiated a project to develop JUPITER, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephone-based speech recognition, robust language understanding, language generation, dialogue modeling, and multilingual interfaces. Over a two year period since coming online in May 1997, JUPITER has received, via a toll-free number in North America, over 30000 calls (totaling over 180000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. We also present some evaluation results on the system and its components.

697 citations

References
More filters
Book
01 Jan 1991
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Abstract: Preface to the Second Edition. Preface to the First Edition. Acknowledgments for the Second Edition. Acknowledgments for the First Edition. 1. Introduction and Preview. 1.1 Preview of the Book. 2. Entropy, Relative Entropy, and Mutual Information. 2.1 Entropy. 2.2 Joint Entropy and Conditional Entropy. 2.3 Relative Entropy and Mutual Information. 2.4 Relationship Between Entropy and Mutual Information. 2.5 Chain Rules for Entropy, Relative Entropy, and Mutual Information. 2.6 Jensen's Inequality and Its Consequences. 2.7 Log Sum Inequality and Its Applications. 2.8 Data-Processing Inequality. 2.9 Sufficient Statistics. 2.10 Fano's Inequality. Summary. Problems. Historical Notes. 3. Asymptotic Equipartition Property. 3.1 Asymptotic Equipartition Property Theorem. 3.2 Consequences of the AEP: Data Compression. 3.3 High-Probability Sets and the Typical Set. Summary. Problems. Historical Notes. 4. Entropy Rates of a Stochastic Process. 4.1 Markov Chains. 4.2 Entropy Rate. 4.3 Example: Entropy Rate of a Random Walk on a Weighted Graph. 4.4 Second Law of Thermodynamics. 4.5 Functions of Markov Chains. Summary. Problems. Historical Notes. 5. Data Compression. 5.1 Examples of Codes. 5.2 Kraft Inequality. 5.3 Optimal Codes. 5.4 Bounds on the Optimal Code Length. 5.5 Kraft Inequality for Uniquely Decodable Codes. 5.6 Huffman Codes. 5.7 Some Comments on Huffman Codes. 5.8 Optimality of Huffman Codes. 5.9 Shannon-Fano-Elias Coding. 5.10 Competitive Optimality of the Shannon Code. 5.11 Generation of Discrete Distributions from Fair Coins. Summary. Problems. Historical Notes. 6. Gambling and Data Compression. 6.1 The Horse Race. 6.2 Gambling and Side Information. 6.3 Dependent Horse Races and Entropy Rate. 6.4 The Entropy of English. 6.5 Data Compression and Gambling. 6.6 Gambling Estimate of the Entropy of English. Summary. Problems. Historical Notes. 7. Channel Capacity. 7.1 Examples of Channel Capacity. 7.2 Symmetric Channels. 7.3 Properties of Channel Capacity. 7.4 Preview of the Channel Coding Theorem. 7.5 Definitions. 7.6 Jointly Typical Sequences. 7.7 Channel Coding Theorem. 7.8 Zero-Error Codes. 7.9 Fano's Inequality and the Converse to the Coding Theorem. 7.10 Equality in the Converse to the Channel Coding Theorem. 7.11 Hamming Codes. 7.12 Feedback Capacity. 7.13 Source-Channel Separation Theorem. Summary. Problems. Historical Notes. 8. Differential Entropy. 8.1 Definitions. 8.2 AEP for Continuous Random Variables. 8.3 Relation of Differential Entropy to Discrete Entropy. 8.4 Joint and Conditional Differential Entropy. 8.5 Relative Entropy and Mutual Information. 8.6 Properties of Differential Entropy, Relative Entropy, and Mutual Information. Summary. Problems. Historical Notes. 9. Gaussian Channel. 9.1 Gaussian Channel: Definitions. 9.2 Converse to the Coding Theorem for Gaussian Channels. 9.3 Bandlimited Channels. 9.4 Parallel Gaussian Channels. 9.5 Channels with Colored Gaussian Noise. 9.6 Gaussian Channels with Feedback. Summary. Problems. Historical Notes. 10. Rate Distortion Theory. 10.1 Quantization. 10.2 Definitions. 10.3 Calculation of the Rate Distortion Function. 10.4 Converse to the Rate Distortion Theorem. 10.5 Achievability of the Rate Distortion Function. 10.6 Strongly Typical Sequences and Rate Distortion. 10.7 Characterization of the Rate Distortion Function. 10.8 Computation of Channel Capacity and the Rate Distortion Function. Summary. Problems. Historical Notes. 11. Information Theory and Statistics. 11.1 Method of Types. 11.2 Law of Large Numbers. 11.3 Universal Source Coding. 11.4 Large Deviation Theory. 11.5 Examples of Sanov's Theorem. 11.6 Conditional Limit Theorem. 11.7 Hypothesis Testing. 11.8 Chernoff-Stein Lemma. 11.9 Chernoff Information. 11.10 Fisher Information and the Cram-er-Rao Inequality. Summary. Problems. Historical Notes. 12. Maximum Entropy. 12.1 Maximum Entropy Distributions. 12.2 Examples. 12.3 Anomalous Maximum Entropy Problem. 12.4 Spectrum Estimation. 12.5 Entropy Rates of a Gaussian Process. 12.6 Burg's Maximum Entropy Theorem. Summary. Problems. Historical Notes. 13. Universal Source Coding. 13.1 Universal Codes and Channel Capacity. 13.2 Universal Coding for Binary Sequences. 13.3 Arithmetic Coding. 13.4 Lempel-Ziv Coding. 13.5 Optimality of Lempel-Ziv Algorithms. Compression. Summary. Problems. Historical Notes. 14. Kolmogorov Complexity. 14.1 Models of Computation. 14.2 Kolmogorov Complexity: Definitions and Examples. 14.3 Kolmogorov Complexity and Entropy. 14.4 Kolmogorov Complexity of Integers. 14.5 Algorithmically Random and Incompressible Sequences. 14.6 Universal Probability. 14.7 Kolmogorov complexity. 14.9 Universal Gambling. 14.10 Occam's Razor. 14.11 Kolmogorov Complexity and Universal Probability. 14.12 Kolmogorov Sufficient Statistic. 14.13 Minimum Description Length Principle. Summary. Problems. Historical Notes. 15. Network Information Theory. 15.1 Gaussian Multiple-User Channels. 15.2 Jointly Typical Sequences. 15.3 Multiple-Access Channel. 15.4 Encoding of Correlated Sources. 15.5 Duality Between Slepian-Wolf Encoding and Multiple-Access Channels. 15.6 Broadcast Channel. 15.7 Relay Channel. 15.8 Source Coding with Side Information. 15.9 Rate Distortion with Side Information. 15.10 General Multiterminal Networks. Summary. Problems. Historical Notes. 16. Information Theory and Portfolio Theory. 16.1 The Stock Market: Some Definitions. 16.2 Kuhn-Tucker Characterization of the Log-Optimal Portfolio. 16.3 Asymptotic Optimality of the Log-Optimal Portfolio. 16.4 Side Information and the Growth Rate. 16.5 Investment in Stationary Markets. 16.6 Competitive Optimality of the Log-Optimal Portfolio. 16.7 Universal Portfolios. 16.8 Shannon-McMillan-Breiman Theorem (General AEP). Summary. Problems. Historical Notes. 17. Inequalities in Information Theory. 17.1 Basic Inequalities of Information Theory. 17.2 Differential Entropy. 17.3 Bounds on Entropy and Relative Entropy. 17.4 Inequalities for Types. 17.5 Combinatorial Bounds on Entropy. 17.6 Entropy Rates of Subsets. 17.7 Entropy and Fisher Information. 17.8 Entropy Power Inequality and Brunn-Minkowski Inequality. 17.9 Inequalities for Determinants. 17.10 Inequalities for Ratios of Determinants. Summary. Problems. Historical Notes. Bibliography. List of Symbols. Index.

45,034 citations


"How may I help you" refers methods in this paper

  • ...We remark that this is an approximation to the mutual information of Ž .the full n-tuple Cover and Thomas, 1991 ....

    [...]

Journal ArticleDOI
TL;DR: The modifications made to a connected word speech recognition algorithm based on hidden Markov models which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described.
Abstract: The modifications made to a connected word speech recognition algorithm based on hidden Markov models (HMMs) which allow it to recognize words from a predefined vocabulary list spoken in an unconstrained fashion are described. The novelty of this approach is that statistical models of both the actual vocabulary word and the extraneous speech and background are created. An HMM-based connected word recognition system is then used to find the best sequence of background, extraneous speech, and vocabulary word models for matching the actual input. Word recognition accuracy of 99.3% on purely isolated speech (i.e., only vocabulary items and background noise were present), and 95.1% when the vocabulary word was embedded in unconstrained extraneous speech, were obtained for the five word vocabulary using the proposed recognition algorithm. >

472 citations


Additional excerpts

  • ...Wilpon et al., 1990 ....

    [...]

Book ChapterDOI
Frederick Jelinek1
01 May 1990

422 citations

01 Jan 1996
TL;DR: In this article, a general framework based on weighted finite automata and weighted finite state transducers for describing and implementing speech recognizers is presented, which allows representing uniformly the information sources and data structures used in recognition, including contextdependent units, pronunciation dictionaries, language models and lattices.
Abstract: We present a general framework based on weighted finite automata and weighted finite-state transducers for describing and implementing speech recognizers. The framework allows us to represent uniformly the information sources and data structures used in recognition, including context-dependent units, pronunciation dictionaries, language models and lattices. Furthermore, general but efficient algorithms can used for combining information sources in actual recognizers and for optimizing their application. In particular, a single composition algorithm is used both to combine in advance information sources such as language models and dictionaries, and to combine acoustic observations and information sources dynamically during recognition.

224 citations

Book
01 Jan 1994
TL;DR: Neural Networks for speech processing, neural networks for identification and control of nonlinear systems, Hybrid neural networks and image restoration, and Dynamic systems and perception.
Abstract: Neural Networks for speech processing. Neural Networks in the acquisition of speech by machine. The nervous system - fantasy and reality. Processing of complex stimuli in the mammalian cochlear nucleus. On the possible role of auditory peripheral feedback in the representation of speech sounds. Is there a role for neural networks in speech recognition? The neurosphysiology of world reading - a connectionist approach. Status versus stacks - representing grammatical strucure in a recurrent neural network. Connections and associations in language acquisition. Some relationships between artificial neural nets and hidden markov models. A learning neural tree for phoneme classification. Decision feedback learning of neural networks. Visual focus of attention in language acquisition. Integrated segmentation and recognition of handprinted characters. Neural net image analysis for postal applications - from locating address blocks to determining zip codes. Space invariant active vision. Engineering document processing with neural networks. Goal-oriented training of neural networks. Hybrid neural networks and image restoration. Dynamic systems and perception. Deterministic annealing for optimization. Neural networks in vision. A neural chip set for supervised learning and CAM. A discrete radon transform method for invariant image analysis using artificial neural networks. Recurrent neural networks and sequential machines. Non-literal transfer of information among inductive learners. Neural networks for identification and control of nonlinear systems. Using neural networks to identify DNA sequences.

174 citations


"How may I help you" refers background in this paper

  • ...Following this paradigm, we have constructed several devices which acquire the capability of understanding language via building statistical associations between input stimuli and appropriate machine Žresponses Gertner and Gorin, 1993; Gorin et al., 1994a; Miller and Gorin, 1993; Sankar and Gorin, .1993; Henis et al., 1994 ....

    [...]

  • ...…have constructed several devices which acquire the capability of understanding language via building statistical associations between input stimuli and appropriate machine Žresponses Gertner and Gorin, 1993; Gorin et al., 1994a; Miller and Gorin, 1993; Sankar and Gorin, .1993; Henis et al., 1994 ....

    [...]

  • ...We have explored Ž .this paradigm in some detail Gorin, 1995a , in particular contrasting it with traditional communication theory, where the goal is to reproduce a signal at some distant point....

    [...]

  • ...In previous experiments, salient words have been exploited to learn the mapping from unconstrained input to machine action for a Žvariety of tasks Gertner and Gorin, 1993; Gorin et al., 1994a,b; Henis et al., 1994; Miller and Gorin, .1993; Sankar and Gorin, 1993 ....

    [...]

  • ...We then introduce an additional between-channel criterion, which is that a fragment should have high information content for the call-action channel....

    [...]