scispace - formally typeset
Search or ask a question
Author

Biing-Hwang Juang

Bio: Biing-Hwang Juang is an academic researcher from Georgia Institute of Technology. The author has contributed to research in topics: Bayes error rate & Hidden Markov model. The author has an hindex of 3, co-authored 8 publications receiving 8384 citations.

Papers
More filters
Book
01 Jan 1993
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Abstract: 1. Fundamentals of Speech Recognition. 2. The Speech Signal: Production, Perception, and Acoustic-Phonetic Characterization. 3. Signal Processing and Analysis Methods for Speech Recognition. 4. Pattern Comparison Techniques. 5. Speech Recognition System Design and Implementation Issues. 6. Theory and Implementation of Hidden Markov Models. 7. Speech Recognition Based on Connected Word Models. 8. Large Vocabulary Continuous Speech Recognition. 9. Task-Oriented Applications of Automatic Speech Recognition.

8,442 citations

Book ChapterDOI
01 Jan 2008
TL;DR: The goal of this section is to document the history of research in speech recognition and natural language understanding, and to point out areas where great progress has been made, along with the challenges that remain to be solved in the future.
Abstract: The quest for a machine that can recognize and understand speech, from any speaker, and in any environment has been the holy grail of speech recognition research for more than 70 years. Although we have made great progress in understanding how speech is produced and analyzed, and although we have made enough advances to build and deploy in the field a number of viable speech recognition systems, we still remain far from the ultimate goal of a machine that communicates naturally with any human being. It is the goal of this section to document the history of research in speech recognition and natural language understanding, and to point out areas where great progress has been made, along with the challenges that remain to be solved in the future.

15 citations

Proceedings ArticleDOI
12 Dec 2010
TL;DR: A new loss function has been introduced for Minimum Classification Error, that approaches optimal Bayes' risk and also gives an improvement in performance over standard MCE systems when evaluated on the Aurora connected digits database.
Abstract: A new loss function has been introduced for Minimum Classification Error, that approaches optimal Bayes’ risk and also gives an improvement in performance over standard MCE systems when evaluated on the Aurora connected digits database.

12 citations

Proceedings ArticleDOI
11 Dec 2008
TL;DR: Some viable techniques for the design of the non-uniform error cost function in the context of automatic speech recognition (ASR) according to different training scenarios are proposed.
Abstract: The classical Bayes decision theory [3] is the foundation of statistical pattern recognition. In [4], we have addressed the issue of non-uniform error criteria in statistical pattern recognition, and generalized the Bayes decision theory for pattern recognition tasks where errors over different classes have varying degrees of significance. We further introduced the weighted minimum classification error (MCE) method for a practical design of a statistical pattern recognition system to achieve empirical optimality when non-uniform error criteria are prescribed. However, one key issue in the weighted MCE method, the methodology of building a suitable non-uniform error cost function given the userpsilas requirements, has not been addressed yet. In this paper, we propose some viable techniques for the design of the non-uniform error cost function in the context of automatic speech recognition (ASR) according to different training scenarios. The experimental results on the TIDIGITS database [8] are presented to demonstrate the effectiveness of our methodologies.

3 citations

Proceedings ArticleDOI
12 Dec 2010
TL;DR: A Minimum Classification Error (MCE) based recognition system that also estimates a global feature transformation matrix has been implemented that makes the explicit assumption that the covariance matrix of the Gaussian mixtures is diagonal when estimating the transformation matrix.
Abstract: A Minimum Classification Error (MCE) based recognition system that also estimates a global feature transformation matrix has been implemented. Unlike earlier studies, we make the explicit assumption that the covariance matrix of the Gaussian mixtures is diagonal when estimating the transformation matrix. This is necessary for mathematical consistency between the model and the transformation matrix estimates. Experimental results show a reduction of up to 50% in the word error rate as compared to Maximum Likelihood estimation.

2 citations


Cited by
More filters
Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations

Journal ArticleDOI
TL;DR: The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set.
Abstract: The problem of tracking curves in dense visual clutter is challenging. Kalman filtering is inadequate because it is based on Gaussian densities which, being unimo dal, cannot represent simultaneous alternative hypotheses. The Condensation algorithm uses “factored sampling”, previously applied to the interpretation of static images, in which the probability distribution of possible interpretations is represented by a randomly generated set. Condensation uses learned dynamical models, together with visual observations, to propagate the random set over time. The result is highly robust tracking of agile motion. Notwithstanding the use of stochastic methods, the algorithm runs in near real-time.

5,804 citations

Book
16 Dec 2008
TL;DR: The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.
Abstract: The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building large-scale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide variety of algorithms — among them sum-product, cluster variational methods, expectation-propagation, mean field methods, max-product and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

4,335 citations

Posted Content
TL;DR: This paper proposed WaveNet, a deep neural network for generating audio waveforms, which is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones.
Abstract: This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.

4,002 citations

Journal ArticleDOI
TL;DR: In this article, the authors categorize and evaluate face detection algorithms and discuss relevant issues such as data collection, evaluation metrics and benchmarking, and conclude with several promising directions for future research.
Abstract: Images containing faces are essential to intelligent vision-based human-computer interaction, and research efforts in face processing include face recognition, face tracking, pose estimation and expression recognition. However, many reported methods assume that the faces in an image or an image sequence have been identified and localized. To build fully automated systems that analyze the information contained in face images, robust and efficient face detection algorithms are required. Given a single image, the goal of face detection is to identify all image regions which contain a face, regardless of its 3D position, orientation and lighting conditions. Such a problem is challenging because faces are non-rigid and have a high degree of variability in size, shape, color and texture. Numerous techniques have been developed to detect faces in a single image, and the purpose of this paper is to categorize and evaluate these algorithms. We also discuss relevant issues such as data collection, evaluation metrics and benchmarking. After analyzing these algorithms and identifying their limitations, we conclude with several promising directions for future research.

3,894 citations