scispace - formally typeset
Book ChapterDOI

Historical Perspective of the Field of ASR/NLU

TLDR
The goal of this section is to document the history of research in speech recognition and natural language understanding, and to point out areas where great progress has been made, along with the challenges that remain to be solved in the future.
Abstract
The quest for a machine that can recognize and understand speech, from any speaker, and in any environment has been the holy grail of speech recognition research for more than 70 years. Although we have made great progress in understanding how speech is produced and analyzed, and although we have made enough advances to build and deploy in the field a number of viable speech recognition systems, we still remain far from the ultimate goal of a machine that communicates naturally with any human being. It is the goal of this section to document the history of research in speech recognition and natural language understanding, and to point out areas where great progress has been made, along with the challenges that remain to be solved in the future.

read more

Citations
More filters
Journal ArticleDOI

Far-Field Automatic Speech Recognition

TL;DR: This tutorial article gives an account of the algorithms used to enable accurate speech recognition from a distance, and it will be seen that a clever combination with traditional signal processing can lead to surprisingly effective solutions.
Posted Content

Far-Field Automatic Speech Recognition

TL;DR: In this article, the authors describe an end-to-end approach for far-field automatic speech recognition (ASR) for close-talk speech recorded at a distance from the microphones, which has received a significant increase in science and industry, which caused or was caused by an equally significant improvement in recognition accuracy.
Journal ArticleDOI

Word Play: A History of Voice Interaction in Digital Games:

TL;DR: The use of voice interaction in digital games has a long and varied history of experimentation but has never achieved sustained, widespread success as discussed by the authors, and a review of the history of voice interactions in games can be found in this article.
Journal ArticleDOI

Kernel power flow orientation coefficients for noise-robust speech recognition

TL;DR: KPOCs are a novel feature set based on spectro-temporal analysis that uses a bank of 2D kernels to extract the dominant orientation of the power flow at each point in the auditory spectrogram of the speech signal, which is innately resistant to the spectral masking introduced by the presence of noise and reverberation.
References
More filters
Journal ArticleDOI

A tutorial on hidden Markov models and selected applications in speech recognition

TL;DR: In this paper, the authors provide an overview of the basic theory of hidden Markov models (HMMs) as originated by L.E. Baum and T. Petrie (1966) and give practical details on methods of implementation of the theory along with a description of selected applications of HMMs to distinct problems in speech recognition.
Journal ArticleDOI

A logical calculus of the ideas immanent in nervous activity

TL;DR: In this article, it is shown that many particular choices among possible neurophysiological assumptions are equivalent, in the sense that for every net behaving under one assumption, there exists another net which behaves under another and gives the same results, although perhaps not in the same time.
Journal ArticleDOI

An Algorithm for Vector Quantizer Design

TL;DR: An efficient and intuitive algorithm is presented for the design of vector quantizers based either on a known probabilistic model or on a long training sequence of data.
Book

Vector Quantization and Signal Compression

TL;DR: The author explains the design and implementation of the Levinson-Durbin Algorithm, which automates the very labor-intensive and therefore time-heavy and expensive process of designing and implementing a Quantizer.
Journal ArticleDOI

Dynamic programming algorithm optimization for spoken word recognition

TL;DR: This paper reports on an optimum dynamic progxamming (DP) based time-normalization algorithm for spoken word recognition, in which the warping function slope is restricted so as to improve discrimination between words in different categories.