Showing papers on "Hidden Markov model published in 1996"

PDF

Open Access

Proceedings Article•DOI•

Unit selection in a concatenative speech synthesis system using a large speech database

[...]

Andrew Hunt, Alan W. Black¹•Institutions (1)

07 May 1996

TL;DR: In this paper, a state transition network is proposed to select and concatenate phonemes from a large speech database to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information.

...read moreread less

Abstract: One approach to the generation of natural-sounding synthesized speech waveforms is to select and concatenate units from a large speech database. Units (in the current work, phonemes) are selected to produce a natural realisation of a target phoneme sequence predicted from text which is annotated with prosodic and phonetic context information. We propose that the units in a synthesis database can be considered as a state transition network in which the state occupancy cost is the distance between a database unit and a target, and the transition cost is an estimate of the quality of concatenation of two consecutive units. This framework has many similarities to HMM-based speech recognition. A pruned Viterbi search is used to select the best units for synthesis from the database. This approach to waveform synthesis permits training from natural speech: two methods for training from speech are presented which provide weights which produce more natural speech than can be obtained by hand-tuning.

...read moreread less

1,207 citations

Journal Article•DOI•

Hidden Markov models

[...]

Sean R. Eddy¹•Institutions (1)

Washington University in St. Louis¹

01 Jun 1996-Current Opinion in Structural Biology

TL;DR: HMM-based profiles have been used in the fields of protein-structure prediction and large-scale genome-sequence analysis as discussed by the authors, and have been applied in protein-seq analysis.

...read moreread less

1,183 citations

Proceedings Article•DOI•

HMM-based word alignment in statistical translation

[...]

Stephan Vogel¹, Hermann Ney¹, Christoph Tillmann¹•Institutions (1)

RWTH Aachen University¹

05 Aug 1996

TL;DR: A new model for word alignment in statistical translation using a first-order Hidden Markov model for the word alignment problem as they are used successfully in speech recognition for the time alignment problem.

...read moreread less

Abstract: In this paper, we describe a new model for word alignment in statistical translation and present experimental results. The idea of the model is to make the alignment probabilities dependent on the differences in the alignment positions rather than on the absolute positions. To achieve this goal, the approach uses a first-order Hidden Markov model (HMM) for the word alignment problem as they are used successfully in speech recognition for the time alignment problem. The difference to the time alignment HMM is that there is no monotony constraint for the possible word orderings. We describe the details of the model and test the model on several bilingual corpora.

...read moreread less

976 citations

Journal Article•DOI•

From HMM's to segment models: a unified view of stochastic modeling for speech recognition

[...]

Mari Ostendorf¹, Vassilios Digalakis², Owen Kimball•Institutions (2)

Boston University¹, Technical University of Crete²

01 Sep 1996-IEEE Transactions on Speech and Audio Processing

TL;DR: A general stochastic model is described that encompasses most of the models proposed in the literature for speech recognition, pointing out similarities in terms of correlation and parameter tying assumptions, and drawing analogies between segment models and HMMs.

...read moreread less

Abstract: Many alternative models have been proposed to address some of the shortcomings of the hidden Markov model (HMM), which is currently the most popular approach to speech recognition. In particular, a variety of models that could be broadly classified as segment models have been described for representing a variable-length sequence of observation vectors in speech recognition applications. Since there are many aspects in common between these approaches, including the general recognition and training problems, it is useful to consider them in a unified framework. The paper describes a general stochastic model that encompasses most of the models proposed in the literature, pointing out similarities of the models in terms of correlation and parameter tying assumptions, and drawing analogies between segment models and HMMs. In addition, we summarize experimental results assessing different modeling assumptions and point out remaining open questions.

...read moreread less

680 citations

Proceedings Article•DOI•

A compact model for speaker-adaptive training

[...]

Tasos Anastasakos¹, J. McDonough, Richard Schwartz, John Makhoul•Institutions (1)

Northeastern University¹

03 Oct 1996

TL;DR: A novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition that jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models.

...read moreread less

Abstract: We formulate a novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition. It is motivated by the fact that variability in SI acoustic models is attributed to both phonetic variation and variation among the speakers of the training population, that is independent of the information content of the speech signal. These two variation sources are decoupled and the proposed method jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models. We compare the proposed training algorithm to the common SI training paradigm within the context of supervised adaptation. We show that the proposed acoustic models are more efficiently adapted to the test speakers, thus achieving significant overall word error rate reductions of 19% and 25% for 20K and 5K vocabulary tasks respectively.

...read moreread less

586 citations

Journal Article•DOI•

Hidden Markov models for sequence analysis: extension and analysis of the basic method

[...]

Richard Hughey¹, Anders Krogh•Institutions (1)

University of California, Santa Cruz¹

01 Apr 1996-Bioinformatics

TL;DR: The mathematical extensions and heuristics that move the method from the theoretical to the practical are reviewed and the effectiveness of model regularization, dynamic model modification and optimization strategies are experimentally analyzed.

...read moreread less

Abstract: Hidden Markov models (HMMs) are a highly effective means of modeling a family of unaligned sequences or a common motif within a set of unaligned sequences. The trained HMM can then be used for discrimination or multiple alignment. The basic mathematical description of an HMM and its expectation-maximization training procedure is relatively straightforward. In this paper, we review the mathematical extensions and heuristics that move the method from the theoretical to the practical. We then experimentally analyze the effectiveness of model regularization, dynamic model modification and optimization strategies. Finally it is demonstrated on the SH2 domain how a domain can be found from unaligned sequences using a special model type. The experimental work was completed with the aid of the Sequence Alignment and Modeling software suite.

...read moreread less

488 citations

Proceedings Article•DOI•

A vector Taylor series approach for environment-independent speech recognition

[...]

Pedro J. Moreno¹, Bhiksha Raj¹, Richard M. Stern•Institutions (1)

Carnegie Mellon University¹

07 May 1996

TL;DR: This work introduces the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel.

...read moreread less

Abstract: In this paper we introduce a new analytical approach to environment compensation for speech recognition. Previous attempts at solving analytically the problem of noisy speech recognition have either used an overly-simplified mathematical description of the effects of noise on the statistics of speech or they have relied on the availability of large environment-specific adaptation sets. Some of the previous methods required the use of adaptation data that consists of simultaneously-recorded or "stereo" recordings of clean and degraded speech. In this work we introduce the use of a vector Taylor series (VTS) expansion to characterize efficiently and accurately the effects on speech statistics of unknown additive noise and unknown linear filtering in a transmission channel. The VTS approach is computationally efficient. It can be applied either to the incoming speech feature vectors, or to the statistics representing these vectors. In the first case the speech is compensated and then recognized; in the second case HMM statistics are modified using the VTS formulation. Both approaches use only the actual speech segment being recognized to compute the parameters required for environmental compensation. We evaluate the performance of two implementations of VTS algorithms using the CMU SPHINX-II system on the 100-word alphanumeric CENSUS database and on the 1993 5000-word ARPA Wall Street Journal database. Artificial white Gaussian noise is added to both databases. The VTS approaches provide significant improvements in recognition accuracy compared to previous algorithms.

...read moreread less

480 citations

Proceedings Article•

Clustering Sequences with Hidden Markov Models

[...]

Padhraic Smyth¹•Institutions (1)

University of California, Irvine¹

03 Dec 1996

TL;DR: Experimental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.

...read moreread less

Abstract: This paper discusses a probabilistic model-based approach to clustering sequences, using hidden Markov models (HMMs). The problem can be framed as a generalization of the standard mixture model approach to clustering in feature space. Two primary issues are addressed. First, a novel parameter initialization procedure is proposed, and second, the more difficult problem of determining the number of clusters K, from the data, is investigated. Experimental results indicate that the proposed techniques are useful for revealing hidden cluster structure in data sets of sequences.

...read moreread less

437 citations

Proceedings Article•

A Generalized Hidden Markov Model for the Recognition of Human Genes in DNA

[...]

David Kulp¹, David Haussler, Martin G. Reese, Frank H. Eeckman•Institutions (1)

University of California, Santa Cruz¹

12 Jun 1996

TL;DR: A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence and provides simple solutions for integrating cardinality constraints, reading frame constraints, "indels", and homology searching.

...read moreread less

Abstract: We present a statistical model of genes in DNA A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence (Stormo & Haussler 1994) Probabilities are assigned to transitions between states in the GHMM and to the generation of each nucleotide base given a particular state Machine learning techniques are applied to optimize these probabilities using a standardized training set Given a new candidate sequence, the best parse is deduced from the model using a dynamic programming algorithm to identify the path through the model with maximum probability The GHMM is flexible and modular, so new sensors and additional states can be inserted easily In addition, it provides simple solutions for integrating cardinality constraints, reading frame constraints, "indels", and homology searching The description and results of an implementation of such a gene-finding model, called Genie, is presented The exon sensor is a codon frequency model conditioned on windowed nucleotide frequency and the preceding codon Two neural networks are used, as in (Brunak, Engelbrecht, & Knudsen 1991), for splice site prediction We show that this simple model performs quite well For a cross-validated standard test set of 304 genes [ftp:@www-hgclblgov/pub/genesets] in human DNA, our gene-finding system identified up to 85% of protein-coding bases correctly with a specificity of 80% 58% of exons were exactly identified with a specificity of 51% Genie is shown to perform favorably compared with several other gene-finding systems

...read moreread less

434 citations

Journal Article•DOI•

Dirichlet mixtures: a method for improving detection of weak but significant protein sequence homology

[...]

Kimmen Sjölander¹, Kevin Karplus¹, Michael F. Brown¹, Richard Hughey¹, Anders Krogh, I. S Mian¹, David Haussler¹ - Show less +3 more•Institutions (1)

University of California, Santa Cruz¹

01 Mar 1996-Bioinformatics

TL;DR: This paper corrects the previously published formula for estimating expected amino acid probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.

...read moreread less

Abstract: This paper presents the mathematical foundations of Dirichlet mixtures, which have been used to improve database search results for homologous sequences, when a variable number of sequences from a protein family or domain are known. We present a method for condensing the information in a protein database into a mixture of Dirichlet densities. These mixtures are designed to be combined with observed amino acid frequencies, to form estimates of expected amino acid probabilities at each position in a profile, hidden Markov model, or other statistical model. These estimates give a statistical model greater generalization capacity, such that remotely related family members can be more reliably recognized by the model. Dirichlet mixtures have been shown to outperform substitution matrices and other methods for computing these expected amino acid distributions in database search, resulting in fewer false positives and false negatives for the families tested. This paper corrects a previously published formula for estimating these expected probabilities, and contains complete derivations of the Dirichlet mixture formulas, methods for optimizing the mixtures to match particular databases, and suggestions for efficient implementation.

...read moreread less

362 citations

Journal Article•DOI•

Input-output HMMs for sequence processing

[...]

Yoshua Bengio¹, Paolo Frasconi•Institutions (1)

Université de Montréal¹

01 Sep 1996-IEEE Transactions on Neural Networks

TL;DR: It is demonstrated that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem and able to map input sequences to output sequences, using the same processing style as recurrent neural networks.

...read moreread less

Abstract: We consider problems of sequence processing and propose a solution based on a discrete-state model in order to represent past context. We introduce a recurrent connectionist architecture having a modular structure that associates a subnetwork to each state. The model has a statistical interpretation we call input-output hidden Markov model (IOHMM). It can be trained by the estimation-maximization (EM) or generalized EM (GEM) algorithms, considering state trajectories as missing data, which decouples temporal credit assignment and actual parameter estimation. The model presents similarities to hidden Markov models (HMMs), but allows us to map input sequences to output sequences, using the same processing style as recurrent neural networks. IOHMMs are trained using a more discriminant learning paradigm than HMMs, while potentially taking advantage of the EM algorithm. We demonstrate that IOHMMs are well suited for solving grammatical inference problems on a benchmark problem. Experimental results are presented for the seven Tomita grammars, showing that these adaptive models can attain excellent generalization.

...read moreread less

Proceedings Article•DOI•

A mew ASR approach based on independent processing and recombination of partial frequency bands

[...]

Hervé Bourlard, Stéphane Dupont¹•Institutions (1)

Faculté polytechnique de Mons¹

03 Oct 1996

TL;DR: The preliminary results presented in this paper show that such an approach, even using quite simple recombination strategies, can yield at least comparable performance on clean speech while providing better robustness in the case of noisy speech.

...read moreread less

Abstract: In the framework of hidden Markov models (HMM) or hybrid HMM/artificial neural network (ANN) systems, we present a new approach towards automatic speech recognition (ASR). The general idea is to split the whole frequency band (represented in terms of critical bands) into a few sub-bands on which different recognizers are independently applied and then recombined at a certain speech unit level to yield global scores and a global recognition decision. The preliminary results presented in this paper show that such an approach, even using quite simple recombination strategies, can yield at least comparable performance on clean speech while providing better robustness in the case of noisy speech.

...read moreread less

Journal Article•DOI•

HMM based online handwriting recognition

[...]

Jianying Hu¹, M.K. Brown¹, William Turin²•Institutions (2)

Bell Labs¹, AT&T²

01 Oct 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A more sophisticated handwriting recognition system that achieves a writer independent recognition rate of 94.5% on 3,823 unconstrained handwritten word samples from 18 writers covering a 32 word vocabulary is built.

...read moreread less

Abstract: Hidden Markov model (HMM) based recognition of handwriting is now quite common, but the incorporation of HMM's into a complex stochastic language model for handwriting recognition is still in its infancy. We have taken advantage of developments in the speech processing field to build a more sophisticated handwriting recognition system. The pattern elements of the handwriting model are subcharacter stroke types modeled by HMMs. These HMMs are concatenated to form letter models, which are further embedded in a stochastic language model. In addition to better language modeling, we introduce new handwriting recognition features of various kinds. Some of these features have invariance properties, and some are segmental, covering a larger region of the input pattern. We have achieved a writer independent recognition rate of 94.5% on 3,823 unconstrained handwritten word samples from 18 writers covering a 32 word vocabulary.

...read moreread less

Proceedings Article•

MBT: A Memory-Based Part of Speech Tagger-Generator

[...]

Walter Daelemans, Jakub Zavrel, Peter Berck, Steven Gillis

01 Jul 1996

TL;DR: A large-scale application of the memory-based approach to part of speech tagging is shown to be feasible, obtaining a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases.

...read moreread less

Abstract: We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed. 1 I n t r o d u c t i o n Part of Speech (POS) tagging is a process in which syntactic categories are assigned to words. It can be seen as a mapping from sentences to strings of tags. Automatic tagging is useful for a number of applications: as a preprocessing stage to parsing, in information retrieval, in text to speech systems, in corpus linguistics, etc. The two factors determining the syntactic category of a word are its lexical probability (e.g. without context, man is more probably a noun than a verb), and its contextual probability (e.g. after a pronoun, man is more probably a verb than a noun, as in they man the boats). Several approaches have been proposed to construct automatic taggers. Most work on statistical methods has used n-gram models or Hidden Markov Model-based taggers (e.g. Church, 1988; DeRose, 1988; Cutting et al. 1992; Merialdo, 1994, etc.). In

...read moreread less

Proceedings Article•DOI•

Online, interactive learning of gestures for human/robot interfaces

[...]

Christopher Lee¹, Yangsheng Xu¹•Institutions (1)

Carnegie Mellon University¹

22 Apr 1996

TL;DR: A gesture recognition system, based on hidden Markov models, which can interactively recognize gestures and perform online learning of new gestures and is able to update its model of a gesture iteratively with each example it recognizes.

...read moreread less

Abstract: We have developed a gesture recognition system, based on hidden Markov models, which can interactively recognize gestures and perform online learning of new gestures. In addition, it is able to update its model of a gesture iteratively with each example it recognizes. This system has demonstrated reliable recognition of 14 different gestures after only one or two examples of each. The system is currently interfaced to a Cyberglove for use in recognition of gestures from the sign language alphabet. The system is being implemented as part of an interactive interface for robot teleoperation and programming by example.

...read moreread less

Efficient Algorithms for Speech Recognition.

[...]

Mosur Ravishankar

15 May 1996

TL;DR: The main contributions of this thesis are an 8-fold speedup and 4-fold memory size reduction over the baseline Sphinx-II system, and the improvement in speed is obtained from the following techniques: lexical tree search, phonetic fast match heuristic, and global best path search of the word lattice.

...read moreread less

Abstract: : Advances in speech technology and computing power have created a surge of interest in the practical application of speech recognition. However, the most accurate speech recognition systems in the research world are still far too slow and expensive to be used in practical, large vocabulary continuous speech applications. Their main goal has been recognition accuracy, with emphasis on acoustic and language modelling. But practical speech recognition also requires the computation to be carried out in real time within the limited resources CPU power and memory size of commonly available computers. There has been relatively little work in this direction while preserving the accuracy of research systems. In this thesis, we focus on efficient and accurate speech recognition. It is easy to improve recognition speed and reduce memory requirements by trading away accuracy, for example by greater pruning, and using simpler acoustic and language models. It is much harder to improve both the recognition speed and reduce main memory size while preserving the accuracy. This thesis presents several techniques for improving the overall performance of the CMU Sphinx-II system. Sphinx-II employs semi-continuous hidden Markov models for acoustics and trigram language models, and is one of the premier research systems of its kind. The techniques in this thesis are validated on several widely used benchmark test sets using two vocabulary sizes of about 20K and 58K words. The main contributions of this thesis are an 8-fold speedup and 4-fold memory size reduction over the baseline Sphinx-II system. The improvement in speed is obtained from the following techniques: lexical tree search, phonetic fast match heuristic, and global best path search of the word lattice.

...read moreread less

Patent•

Class-based word clustering for speech recognition using a three-level balanced hierarchical similarity

[...]

Akira Ushioda

18 Apr 1996

TL;DR: In a speech recognition system, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal as discussed by the authors.

...read moreread less

Abstract: In a word clustering apparatus for clustering words, a plurality of words is clustered to obtain a total tree diagram of a word dictionary representing a word clustering result, where the total tree diagram includes tree diagrams of an upper layer, a middle layer and a lower layer. In a speech recognition apparatus, a microphone converts an input utterance speech composed of a plurality of words into a speech signal, and a feature extractor extracts predetermined acoustic feature parameters from the converted speech signal. Then, a speech recognition controller executes a speech recognition process on the extracted acoustic feature parameters with reference to a predetermined Hidden Markov Model and the obtained total tree diagram of the word dictionary, and outputs a result of the speech recognition.

...read moreread less

Journal Article•DOI•

Signal bias removal by maximum likelihood estimation for robust telephone speech recognition

[...]

Biing-Hwang Juang¹, M. Rahim¹•Institutions (1)

AT&T¹

01 Jan 1996-IEEE Transactions on Speech and Audio Processing

TL;DR: The SBR method, integrated into a discrete density HMM, is applied to telephone speech recognition where the contamination due to extraneous signal components is assumed to be unknown and to enable real-time implementation, a sequential method for the estimation of the bias is presented.

...read moreread less

Abstract: An acoustical mismatch between the training and the testing conditions of hidden Markov model (HMM)-based speech recognition systems often causes a severte degradation in the recognition performance. In telephone speech recognition, for example, undesirable signal components due to ambient noise and channel distortion, as well as due to different variations of telephone handsets render the recognizer unusable for real- world applications. This paper presents a signal bias removal (SBR) method based on maximum likelihood1 estimation for the minimization of these undesirable effects. The proposed method is readily applicable in various architectures, i.e., dis- crete (vector-quantization based), semicontinuous and continuous density HMM. In this paper, the SBR method, integrated into a discrete density HMM, is applied to telephone speech recognition where the contamination due to extraneous signal components is assumed to be unknown. To enable real-time implementation, a sequential method for the estimation of the bias is presented. Experimental results for speaker-independent connected digit recognition show a reduction in the per digit error rate by up to 41% and 14% during mismatched and matclhed training and testing conditions, respectively.

...read moreread less

Journal Article•DOI•

Handwritten word recognition using segmentation-free hidden Markov modeling and segmentation-based dynamic programming techniques

[...]

Magdi A. Mohamed¹, Paul D. Gader¹•Institutions (1)

University of Missouri¹

01 May 1996-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A lexicon-based, handwritten word recognition system combining segmentation-free and segmentations-based techniques is described that uses dynamic programming to match word images and strings.

...read moreread less

Abstract: A lexicon-based, handwritten word recognition system combining segmentation-free and segmentation-based techniques is described. The segmentation-free technique constructs a continuous density hidden Markov model for each lexicon string. The segmentation-based technique uses dynamic programming to match word images and strings. The combination module uses differences in classifier capabilities to achieve significantly better performance.

...read moreread less

Proceedings Article•DOI•

Skill acquisition from human demonstration using a hidden Markov model

[...]

Geir Hovland¹, Pavan Sikka¹, B.J. McCarragher¹•Institutions (1)

Australian National University¹

22 Apr 1996

TL;DR: In this paper, a new approach to skill acquisition in assembly is proposed, where an assembly skill is represented by a hybrid dynamic system where a discrete event controller models the skill at the task level.

...read moreread less

Abstract: A new approach to skill acquisition in assembly is proposed. An assembly skill is represented by a hybrid dynamic system where a discrete event controller models the skill at the task level. The output of the discrete event controller provides the reference commands for the underlying robot controller. This structure is naturally encoded by a hidden Markov model (HMM). The HMM parameters are obtained by training on sensory data from human demonstrations of the skill. Currently, assembly tasks have to be performed by human operators or by robots using expensive fixtures. Our approach transfers the assembly skill from an expert human operator to the robot, thus making it possible for a robot to perform assembly tasks without the use of expensive fixtures.

...read moreread less

Speech recognition using neural networks

[...]

J. Tebelskis¹•Institutions (1)

Carnegie Mellon University¹

03 Oct 1996

TL;DR: It is argued that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity, more natural discrimination, and a more economical use of parameters.

...read moreread less

Abstract: This thesis examines how artificial neural networks can benefit a large vocabulary, speaker independent, continuous speech recognition system. Currently, most speech recognition systems are based on hidden Markov models (HMMs), a statistical framework that supports both acoustic and temporal modeling. Despite their state-of-the-art performance, HMMs make a number of suboptimal modeling assumptions that limit their potential effectiveness. Neural networks avoid many of these assumptions, while they can also learn complex functions, generalize effectively, tolerate noise, and support parallelism. While neural networks can readily be applied to acoustic modeling, it is not yet clear how they can be used for temporal modeling. Therefore, we explore a class of systems called NN-HMM hybrids, in which neural networks perform acoustic modeling, and HMMs perform temporal modeling. We argue that a NN-HMM hybrid has several theoretical advantages over a pure HMM system, including better acoustic modeling accuracy, better context sensitivity, more natural discrimination, and a more economical use of parameters. These advantages are confirmed experimentally by a NN-HMM hybrid that we developed, based on context-independent phoneme models, that achieved 90.5% word accuracy on the Resource Management database, in contrast to only 86.0% accuracy achieved by a pure HMM under similar conditions. In the course of developing this system, we explored two different ways to use neural networks for acoustic modeling: prediction and classification. We found that predictive networks yield poor results because of a lack of discrimination, but classification networks gave excellent results. We verified that, in accordance with theory, the output activations of a classification network form highly accurate estimates of the posterior probabilities P(class/input), and we showed how these can easily be converted to likelihoods P(input/class) for standard HMM recognition algorithms. Finally, this thesis reports how we optimized the accuracy of our system with many natural techniques, such as expanding the input window size, normalizing the inputs, increasing the number of hidden units, converting the network's output activations to log likelihoods, optimizing the learning rate schedule by automatic search, backpropagating error from word level outputs, and using gender dependent networks.

...read moreread less

Journal Article•DOI•

Genones: generalized mixture tying in continuous hidden Markov model-based speech recognizers

[...]

Vassilios Digalakis, P. Monaco¹, H. Murveit¹•Institutions (1)

Nuance Communications¹

01 Jul 1996-IEEE Transactions on Speech and Audio Processing

TL;DR: An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers.

...read moreread less

Abstract: An algorithm is proposed that achieves a good tradeoff between modeling resolution and robustness by using a new, general scheme for tying of mixture components in continuous mixture-density hidden Markov model (HMM)-based speech recognizers. The sets of HMM states that share the same mixture components are determined automatically using agglomerative clustering techniques. Experimental results on ARPA's Wall Street Journal corpus show that this scheme reduces errors by 25% over typical tied-mixture systems. New fast algorithms for computing Gaussian likelihoods-the-most time-consuming aspect of continuous-density HMM systems-are also presented. These new algorithms-significantly reduce the number of Gaussian densities that are evaluated with little or no impact on speech recognition accuracy.

...read moreread less

Proceedings Article•DOI•

Speech synthesis using HMMs with dynamic features

[...]

Takashi Masuko¹, Keiichi Tokuda², Takao Kobayashi³, Satoshi Imai•Institutions (3)

Tokyo Institute of Technology¹, University of Liverpool², IBM³

07 May 1996

TL;DR: A new text-to-speech synthesis system based on HMM which includes dynamic features, i.e., delta and delta-delta parameters of speech, which becomes quite smooth and natural even if the number of clustered states is small.

...read moreread less

Abstract: This paper presents a new text-to-speech synthesis system based on HMM which includes dynamic features, i.e., delta and delta-delta parameters of speech. The system uses triphone HMMs as the synthesis units. The triphone HMMs share less than 2,000 clustered states, each of which is modelled by a single Gaussian distribution. For a given text to be synthesized, a sentence HMM is constructed by concatenating the triphone HMMs. Speech parameters are generated from the sentence HMM in such a way that the output probability is maximized. The speech signal is synthesized directly from the obtained parameters using the mel log spectral approximation (MLSA) filter. Without dynamic features, the discontinuity of the generated speech spectra causes glitches in the synthesized speech. On the other hand, with dynamic features, the synthesized speech becomes quite smooth and natural even if the number of clustered states is small.

...read moreread less

Proceedings Article•DOI•

Recognition of space-time hand-gestures using hidden Markov model

[...]

Yanghee Nam, Kwangyun Wohn

01 Jul 1996

TL;DR: A hidden Markov(HMM)-based method for recognizing the space-time hand movement pattern and the three major attributes, which are regarded as the primary attribute of the hand gesture, are considered.

...read moreread less

Abstract: The rapidly growing interest in interactive three-dimensional(3D) computer environments highly recommend the hand gesture as one of their interaction modalities. Among several factors constituting a hand gesture, hand movement pattern is spatiotemporally variable and informative, but its automatic recognition is not trivial.In this paper, we describe a hidden Markov(HMM)-based method for recognizing the space-time hand movement pattern. HMM models the spatial variance and the time-scale variance in the hand movement. As for the recognition of the continuous, connected hand movement patterns, HMM-based segmentation method is introduced. To deal with the dimensional complexity caused by the 3D problem space, the plane fitting method is employed and the 3D data is reduced into 2D. These 2D data are then encoded as the input to HMMs.In addition to the hand movement, which is regarded as the primary attribute of the hand gesture, we also consider the hand configuration(posture) and the palm orientation. These three major attributes are processed in parallel and rather independently, followed by the inter-attribute communication for finding the proper interpretation.

...read moreread less

Journal Article•DOI•

Markov Chains with Measurement Error: Estimating the ‘True’ Course of a Marker of the Progression of Human Immunodeficiency Virus Disease

[...]

Glen A. Satten¹, Ira M. Longini²•Institutions (2)

Centers for Disease Control and Prevention¹, Emory University²

01 Sep 1996-Applied statistics

TL;DR: It is found that including measurement error both produces a significantly better fit and provides a model for CD4 progression that is more biologically reasonable.

...read moreread less

Abstract: SUMMARY A Markov chain is a useful way of describing cohort data. Longitudinal observations of a marker of the progression of the human immunodeficiency virus (HIV), such as CD4 cell count, measured on members of a cohort study, can be analysed as a continuous time Markov chain by categorizing the CD4 cell counts into stages. Unfortunately, CD4 cell counts are subject to substantial measurement error and short timescale variability. Thus, fitting a Markov chain to raw CD4 cell count measurements does not determine the transition probabilities for the true or underlying CD4 cell counts; the measurement error results in a process that is too rough. Assuming independent measurement errors, we propose a likelihood-based method for estimating the 'true'or underlying transition probabilities. The Markov structure allows efficient calculation of the likelihood by using hidden Markov model methodology. As an example, we consider CD4 cell count data from 430 HIV-infected participants in the San Francisco Men's Health Study by categofizing the marker data into seven stages; up to 17 observations are available for each individual.We find that including measurement error both produces a significantly better fit and provides a model for CD4 progression that is more biologically reasonable.

...read moreread less

Book•

Neural networks for speech and sequence recognition

[...]

Yoshua Bengio

01 Jan 1996

TL;DR: This paper presents post-processors based on dynamic programming ANN/DP hybrids ANN/HMM Hybrids, and experiments on phoneme recognition with RBFs and online handwriting recognition experiments.

...read moreread less

Abstract: Connectionist models Learning theory The back-propagation algorithm Introduction to back-propagation Formal description Heuristics to improve convergence and generalization Extensions Integrating domain knowledge and learning from examples Automatic speech recognition Importance of pre-processing input data Input coding Input invariances Importance of architecture constraints on the network Modularization Output coding Sequence analysis Introduction Time delay neural networks Recurrent networks BPS Supervision of a recurrent network does not need to be everywhere Problems with training of recurrent networks Dynamic programming post-processors Hidden Markov models Integrating ANNs with other systems Advantages and disadvantages of current algorithms for ANNs Modularization and joint optimization Radial basis functions and local representation Radial basis funtions networks Neurobiological plausibility Relation to vector quantization, clustering and semi-continuous HMMs Methodology Experiments on phoneme recognition with RBFs Density estimation with a neural network Relation between input PDF and output PDF Density estimation Conclusion Post-processors based on dynamic programming ANN/DP hybrids ANN/HMM Hybrids ANN/HMM Hybrid: Phoneme recognition experiments ANN/HMM hybrid: online handwriting recognition experiments

...read moreread less

Book Chapter•DOI•

The use of recurrent neural networks in continuous speech recognition

[...]

Tony Robinson¹, Mike Hochberg¹, Steve Renals¹•Institutions (1)

University of Cambridge¹

01 Jan 1996

TL;DR: This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition as well as an appropriate parameter estimation procedure.

...read moreread less

Abstract: This chapter describes a use of recurrent neural networks (i.e., feedback is incorporated in the computation) as an acoustic model for continuous speech recognition. The form of the recurrent neural network is described along with an appropriate parameter estimation procedure. For each frame of acoustic data, the recurrent network generates an estimate of the posterior probability of of the possible phones given the observed acoustic signal. The posteriors are then converted into scaled likelihoods and used as the observation probabilities within a conventional decoding paradigm (e.g., Viterbi decoding). The advantages of using recurrent networks are that they require a small number of parameters and provide a fast decoding capability (relative to conventional, large-vocabulary, HMM systems)3.

...read moreread less

Journal Article•DOI•

Distribution System Reliability Assessment Using Hierarchical Markov Modeling

[...]

R.E. Brown¹, S. Gupta¹, R.D. Christie¹, S.S. Venkata¹, R. Fletcher - Show less +1 more•Institutions (1)

University of Washington¹

01 Oct 1996-IEEE Power & Energy Magazine

TL;DR: In this article, a hierarchical Markov model (HMM) is proposed to decompose the reliability model based on power system topology, integrated protection systems and individual protection devices, which easily accommodates the effects of backup protection, fault isolation and load restoration.

...read moreread less

Abstract: Distribution system reliability assessment is concerned with power availability and power quality at each customer's service entrance. This paper presents a new method, termed hierarchical Markov modeling (HMM), which can perform predictive distribution system reliability assessment. HMM is unique in that it decomposes the reliability model based on power system topology, integrated protection systems and individual protection devices. This structure, which easily accommodates the effects of backup protection, fault isolation and load restoration, is compared to simpler reliability models. HMM is then used to assess the reliability of an existing utility distribution system and to explore the reliability impact of several design improvement options.

...read moreread less

Book Chapter•DOI•

Visionary Speech: Looking Ahead to Practical Speechreading Systems

[...]

Marcus E. Hennecke¹, David G. Stork¹, K. Venkatesh Prasad²•Institutions (2)

Stanford University¹, Ricoh²

01 Jan 1996

TL;DR: An overview of speechreading systems is presented, paying particular attention to the various approaches to key design decisions and the benefits and drawbacks of each.

...read moreread less

Abstract: We present an overview of speechreading systems, paying particular attention to the various approaches to key design decisions and the benefits and drawbacks of each. Our own speechreading efforts have centered on the use of Hidden Markov Models, and, for feature extraction, de-formable templates. We have explored several sensory integration schemes — early, late, and a novel intermediate method made possible by a new learning method, called Boltzmann zippers. We describe several possible practical applications of speechreading systems, and conclude with a list of important outstanding problems.

...read moreread less

Journal Article•DOI•

Speaker adaptation using combined transformation and Bayesian methods

[...]

Vassilios Digalakis¹, Leonardo Neumeyer²•Institutions (2)

University of Crete¹, SRI International²

01 Jul 1996-IEEE Transactions on Speech and Audio Processing

TL;DR: The authors' algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English approach the speaker-independent accuracy achieved for native speakers with only a small amount of adaptation data.

...read moreread less

Abstract: Adapting the parameters of a statistical speaker independent continuous-speech recognizer to the speaker and the channel can significantly improve the recognition performance and robustness of the system. In continuous mixture-density hidden Markov models the number of component densities is typically very large, and it may not be feasible to acquire a sufficient amount of adaptation data for robust maximum-likelihood estimates. To solve this problem, we have recently proposed a constrained estimation technique for Gaussian mixture densities. To improve the behavior of our adaptation scheme for large amounts of adaptation data, we combine it here with Bayesian techniques. We evaluate our algorithms on the large-vocabulary Wall Street Journal corpus for nonnative speakers of American English. The recognition error rate is approximately halved with only a small amount of adaptation data, and it approaches the speaker-independent accuracy achieved for native speakers.

...read moreread less

Collapse