scispace - formally typeset
Search or ask a question
Author

Steve Young

Bio: Steve Young is an academic researcher from University of Cambridge. The author has contributed to research in topics: Hidden Markov model & Reinforcement learning. The author has an hindex of 83, co-authored 317 publications receiving 27056 citations. Previous affiliations of Steve Young include University of Manchester & Carnegie Mellon University.


Papers
More filters
16 Sep 1995
TL;DR: The Fundamentals of HTK: General Principles of HMMs, Recognition and Viterbi Decoding, and Continuous Speech Recognition.
Abstract: 1 The Fundamentals of HTK 2 1.1 General Principles of HMMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Isolated Word Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Output Probability Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Baum-Welch Re-Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Recognition and Viterbi Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.6 Continuous Speech Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7 Speaker Adaptation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2,095 citations

Journal ArticleDOI
TL;DR: This paper cast a spoken dialog system as a partially observable Markov decision process (POMDP) and shows how this formulation unifies and extends existing techniques to form a single principled framework.
Abstract: In a spoken dialog system, determining which action a machine should take in a given situation is a difficult problem because automatic speech recognition is unreliable and hence the state of the conversation can never be known with certainty. Much of the research in spoken dialog systems centres on mitigating this uncertainty and recent work has focussed on three largely disparate techniques: parallel dialog state hypotheses, local use of confidence scores, and automated planning. While in isolation each of these approaches can improve action selection, taken together they currently lack a unified statistical framework that admits global optimization. In this paper we cast a spoken dialog system as a partially observable Markov decision process (POMDP). We show how this formulation unifies and extends existing techniques to form a single principled framework. A number of illustrations are used to show qualitatively the potential benefits of POMDPs compared to existing techniques, and empirical results from dialog simulations are presented which demonstrate significant quantitative gains. Finally, some of the key challenges to advancing this method - in particular scalability - are briefly outlined.

972 citations

Journal ArticleDOI
09 Jan 2013
TL;DR: This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.
Abstract: Statistical dialog systems (SDSs) are motivated by the need for a data-driven framework that reduces the cost of laboriously handcrafting complex dialog managers and that provides robustness against the errors created by speech recognizers operating in noisy environments. By including an explicit Bayesian model of uncertainty and by optimizing the policy via a reward-driven process, partially observable Markov decision processes (POMDPs) provide such a framework. However, exact model representation and optimization is computationally intractable. Hence, the practical application of POMDP-based systems requires efficient algorithms and carefully constructed approximations. This review article provides an overview of the current state of the art in the development of POMDP-based spoken dialog systems.

930 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: The authors introduced a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework.
Abstract: Teaching machines to accomplish tasks by conversing naturally with humans is challenging. Currently, developing task-oriented dialogue systems requires creating multiple components and typically this involves either a large amount of handcrafting, or acquiring costly labelled datasets to solve a statistical learning problem for each component. In this work we introduce a neural network-based text-in, text-out end-to-end trainable goal-oriented dialogue system along with a new way of collecting dialogue data based on a novel pipe-lined Wizard-of-Oz framework. This approach allows us to develop dialogue systems easily and without making too many assumptions about the task at hand. The results show that the model can converse with human subjects naturally whilst helping them to accomplish tasks in a restaurant search domain.

796 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.
Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

9,091 citations

Journal ArticleDOI
TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Abstract: In the first part of this paper, a regular recurrent neural network (RNN) is extended to a bidirectional recurrent neural network (BRNN). The BRNN can be trained without the limitation of using input information just up to a preset future frame. This is accomplished by training it simultaneously in positive and negative time direction. Structure and training procedure of the proposed network are explained. In regression and classification experiments on artificial data, the proposed structure gives better results than other approaches. For real data, classification experiments for phonemes from the TIMIT database show the same tendency. In the second part of this paper, it is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution. For this part, experiments on real data are reported.

7,290 citations

01 Jan 2009

7,241 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide an up-to-date critical survey of still-and video-based face recognition research, and provide some insights into the studies of machine recognition of faces.
Abstract: As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system.This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.

6,384 citations