scispace - formally typeset
Search or ask a question

Showing papers on "Hidden Markov model published in 1994"


Proceedings ArticleDOI
05 Dec 1994
TL;DR: This paper presents a set of experimental results in which various HMM parameterisations are analysed and shows that stochastic modelling can be used successfully to encode feature information.
Abstract: Recent work on face identification using continuous density Hidden Markov Models (HMMs) has shown that stochastic modelling can be used successfully to encode feature information. When frontal images of faces are sampled using top-bottom scanning, there is a natural order in which the features appear and this can be conveniently modelled using a top-bottom HMM. However, a top-bottom HMM is characterised by different parameters, the choice of which has so far been based on subjective intuition. This paper presents a set of experimental results in which various HMM parameterisations are analysed. >

2,677 citations


Journal ArticleDOI
TL;DR: A framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented, and Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications.
Abstract: In this paper, a framework for maximum a posteriori (MAP) estimation of hidden Markov models (HMM) is presented. Three key issues of MAP estimation, namely, the choice of prior distribution family, the specification of the parameters of prior densities, and the evaluation of the MAP estimates, are addressed. Using HMM's with Gaussian mixture state observation densities as an example, it is assumed that the prior densities for the HMM parameters can be adequately represented as a product of Dirichlet and normal-Wishart densities. The classical maximum likelihood estimation algorithms, namely, the forward-backward algorithm and the segmental k-means algorithm, are expanded, and MAP estimation formulas are developed. Prior density estimation issues are discussed for two classes of applications/spl minus/parameter smoothing and model adaptation/spl minus/and some experimental results are given illustrating the practical interest of this approach. Because of its adaptive nature, Bayesian learning is shown to serve as a unified approach for a wide range of speech recognition applications. >

2,430 citations


Book
16 Dec 1994
TL;DR: This paper presents a meta-modelling procedure called Markov Model Processing that automates the very labor-intensive and therefore time-heavy and therefore expensive process of HMMEstimation.
Abstract: Hidden Markov Model Processing.- Discrete-Time HMM Estimation.- Discrete States and Discrete Observations.- Continuous-Range Observations.- Continuous-Range States and Observations.- A General Recursive Filter.- Practical Recursive Filters.- Continuous-Time HMM Estimation.- Discrete-Range States and Observations.- Markov Chains in Brownian Motion.- Two-Dimensional HMM Estimation.- Hidden Markov Random Fields.- HMM Optimal Control.- Discrete-Time HMM Control.- Risk-Sensitive Control of HMM.- Continuous-Time HMM Control.

1,415 citations


Proceedings ArticleDOI
08 Mar 1994
TL;DR: This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree, which is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones.
Abstract: The key problem to be faced when building a HMM-based continuous speech recogniser is maintaining the balance between model complexity and available training data. For large vocabulary systems requiring cross-word context dependent modelling, this is particularly acute since many such contexts will never occur in the training data. This paper describes a method of creating a tied-state continuous speech recognition system using a phonetic decision tree. This tree-based clustering is shown to lead to similar recognition performance to that obtained using an earlier data-driven approach but to have the additional advantage of providing a mapping for unseen triphones. State-tying is also compared with traditional model-based tying and shown to be clearly superior. Experimental results are presented for both the Resource Management and Wall Street Journal tasks.

781 citations


Journal Article
TL;DR: Experminents show that the best training is obtained by using as much tagged text as possible, and show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy.
Abstract: In this paper we present some experiments on the use of a probabilistic model to tag English text, i.e. to assign to each word the correct tag (part of speech) in the context of the sentence. The main novelty of these experiments is the use of untagged text in the training of the model. We have used a simple triclass Markov model and are looking for the best way to estimate the parameters of this model, depending on the kind and amount of training data provided. Two approaches in particular are compared and combined:using text that has been tagged by hand and computing relative frequency counts,using text without tags and training the model as a hidden Markov process, according to a Maximum Likelihood principle.Experminents show that the best training is obtained by using as much tagged text as possible. They also show that Maximum Likelihood training, the procedure that is routinely used to estimate hidden Markov models parameters from training data, will not necessarily improve the tagging accuracy. In fact, it will generally degrade this accuracy, except when only a limited amount of hand-tagged text is available.

586 citations


Journal ArticleDOI
TL;DR: Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation.
Abstract: This paper presents an application of recurrent networks for phone probability estimation in large vocabulary speech recognition. The need for efficient exploitation of context information is discussed; a role for which the recurrent net appears suitable. An overview of early developments of recurrent nets for phone recognition is given along with the more recent improvements that include their integration with Markov models. Recognition results are presented for the DARPA TIMIT and Resource Management tasks, and it is concluded that recurrent nets are competitive with traditional means for performing phone probability estimation. >

497 citations


Journal ArticleDOI
TL;DR: A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family, yielding an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.
Abstract: Hidden Markov model (HMM) techniques are used to model families of biological sequences. A smooth and convergent algorithm is introduced to iteratively adapt the transition and emission parameters of the models from the examples in a given family. The HMM approach is applied to three protein families: globins, immunoglobulins, and kinases. In all cases, the models derived capture the important statistical characteristics of the family and can be used for a number of tasks, including multiple alignments, motif detection, and classification. For K sequences of average length N, this approach yields an effective multiple-alignment algorithm which requires O(KN2) operations, linear in the number of sequences.

475 citations


Proceedings Article
01 Jan 1994
TL;DR: A recurrent architecture having a modular structure that has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation is introduced.
Abstract: We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation.

344 citations


Journal ArticleDOI
TL;DR: It is described how two-dimensional face images can be converted into one-dimensional sequences to allow similar techniques to be applied and how a HMM can be used to automatically segment face images and extract features that can be use for identification.

343 citations


Book ChapterDOI
07 May 1994
TL;DR: A unified approach for Markov Random Field Models modeling in low and high level computer vision is presented, made possible due to a recent advance in MRF modeling for high level object recognition.
Abstract: A variety of computer vision problems can be optimally posed as Bayesian labeling in which the solution of a problem is defined as the maximum a posteriori (MAP) probability estimate of the true labeling. The posterior probability is usually derived from a prior model and a likelihood model. The latter relates to how data is observed and is problem domain dependent. The former depends on how various prior constraints are expressed. Markov Random Field Models (MRF) theory is a tool to encode contextual constraints into the prior probability. This paper presents a unified approach for MRF modeling in low and high level computer vision. The unification is made possible due to a recent advance in MRF modeling for high level object recognition. Such unification provides a systematic approach for vision modeling based on sound mathematical principles.

284 citations


Journal ArticleDOI
TL;DR: It is shown that a connectionist component improves a state-of-the-art HMM system through a statistical interpretation of connectionist networks as probability estimators.
Abstract: The authors are concerned with integrating connectionist networks into a hidden Markov model (HMM) speech recognition system. This is achieved through a statistical interpretation of connectionist networks as probability estimators. They review the basis of HMM speech recognition and point out the possible benefits of incorporating connectionist networks. Issues necessary to the construction of a connectionist HMM recognition system are discussed, including choice of connectionist probability estimator. They describe the performance of such a system using a multilayer perceptron probability estimator evaluated on the speaker-independent DARPA Resource Management database. In conclusion, they show that a connectionist component improves a state-of-the-art HMM system. >

PatentDOI
TL;DR: The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process, and phonetic context information is exploited, even before the complete context of a phoneme is known.
Abstract: The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process. A grammar probability is utilized upon recognition of each phoneme of a word, before recognition of the entire word is complete. Thus, grammar probabilities are exploited as early as possible during recognition of a word. At each frame of the recognition process, a grammar probability is determined for the transition from the most likely preceding grammar state to a set of words that share at least one common phoneme. The grammar probability is combined with accumulating phonetic evidence to provide a measure of the likelihood that a state in the HMM will lead to the word most likely to have been spoken. In a preferred embodiment, phonetic context information is exploited, even before the complete context of a phoneme is known. Instead of an exact triphone model, wherein the phonemes previous and subsequent to a phoneme are considered, a composite triphone model is used that exploits partial phonetic context information to provide a phonetic model that is more accurate than aphonetic model that ignores context. In another preferred embodiment, the single phonetic tree method is used as the forward pass of a forward/backward recognition process, wherein the backward pass employs a recognition process other than the single phonetic tree method.

Journal ArticleDOI
Shyh-Shiaw Kuo1, O.E. Agazzi
TL;DR: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented, where two statistical models, called pseudo 2-D hidden Markov models, are created for representing the actual keyword and all the other extraneous words, respectively.
Abstract: An algorithm for robust machine recognition of keywords embedded in a poorly printed document is presented For each keyword, two statistical models, called pseudo 2-D hidden Markov models, are created for representing the actual keyword and all the other extraneous words, respectively Dynamic programming is then used for matching an unknown input word with the two models and for making a maximum likelihood decision Although the models are pseudo 2-D in the sense that they are not fully connected 2-D networks, they are shown to be general enough in characterizing printed words efficiently These models facilitate a nice "elastic matching" property in both horizontal and vertical directions, which makes the recognizer not only independent of size and slant but also tolerant of highly deformed and noisy words The system is evaluated on a synthetically created database that contains about 26000 words Currently, the authors achieve a recognition accuracy of 99% when words in testing and training sets are of the same font size, and 96% when they are in different sizes In the latter case, the conventional 1-D HMM achieves only a 70% accuracy rate >

Journal ArticleDOI
TL;DR: A complete scheme for totally unconstrained handwritten word recognition based on a single contextual hidden Markov model type stochastic network is presented, which includes a morphology and heuristics based segmentation algorithm, a training algorithm that can adapt itself with the changing dictionary.
Abstract: Because of large variations involved in handwritten words, the recognition problem is very difficult. Hidden Markov models (HMM) have been widely and successfully used in speech processing and recognition. Recently HMM has also been used with some success in recognizing handwritten words with presegmented letters. In this paper, a complete scheme for totally unconstrained handwritten word recognition based on a single contextual hidden Markov model type stochastic network is presented. Our scheme includes a morphology and heuristics based segmentation algorithm, a training algorithm that can adapt itself with the changing dictionary, and a modified Viterbi algorithm which searches for the (l+1)th globally best path based on the previous l best paths. Detailed experiments are carried out and successful recognition results are reported. >

Posted Content
TL;DR: In this paper, the authors describe a framework for inducing probabilistic grammars from corpora of positive samples, where samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation.
Abstract: We describe a framework for inducing probabilistic grammars from corpora of positive samples. First, samples are {\em incorporated} by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are {\em merged} to achieve generalization and a more compact representation. The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (`Occam's Razor'). The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based $n$-grams, and stochastic context-free grammars.

Journal ArticleDOI
Gary E. Kopec1, Philip A. Chou1
TL;DR: The proposed approach is illustrated on the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings by constructing a finite-state model for yellow page columns using a Viterbi-like dynamic programming algorithm.
Abstract: Document image decoding (DID) is a communication theory approach to document image recognition. In DID, a document recognition problem is viewed as consisting of three elements: an image generator, a noisy channel and an image decoder. A document image generator is a Markov source (stochastic finite-state automaton) that combines a message source with an imager. The message source produces a string of symbols, or text, that contains the information to be transmitted. The imager is modeled as a finite-state transducer that converts the 1D message string into an ideal 2D bitmap. The channel transforms the ideal image into a noisy observed image. The decoder estimates the message, given the observed image, by finding the a posteriori most probable path through the combined source and channel models using a Viterbi-like dynamic programming algorithm. The proposed approach is illustrated on the problem of decoding scanned telephone yellow pages to extract names and numbers from the listings. A finite-state model for yellow page columns was constructed and used to decode a database of scanned column images containing about 1100 individual listings. >

Journal ArticleDOI
Yunxin Zhao1
TL;DR: Experiments of speaker adaptation on the TIMIT database using short calibration speech have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units.
Abstract: A new speaker adaptation technique is proposed for improving speaker-independent continuous speech recognition based on a decomposition of spectral variation sources. In this technique, the spectral variations are separated into two categories, one acoustic and the other phone-specific, where each variation source is modeled by a linear transformation system. The technique consists of two sequential steps: first, acoustic normalization is performed, and second, phone model parameters are adapted. Experiments of speaker adaptation on the TIMIT database using short calibration speech (5 s per speaker) have shown significant performance improvement over the baseline speaker-independent continuous speech recognition, where the recognition system uses Gaussian mixture density based hidden Markov models of phone units. For a vocabulary size of 853 and test set perplexity of 104, the recognition word accuracy has been improved from 86.9% for the baseline system to 90.5% after adaptation, corresponding to an error reduction of 27.5%. On a more difficult test set that contains an additional variation source due to recording channel mismatch, a more significant performance improvement has been obtained: for the same vocabulary and a test set perplexity of 101, the recognition word accuracy has been improved from 65.4% for the baseline to 86.0% after adaptation, corresponding to an error reduction of 59.5%. >

Book ChapterDOI
21 Sep 1994
TL;DR: A framework for inducing probabilistic grammars from corpora of positive samples is described, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’).
Abstract: We describe a framework for inducing probabilistic grammars from corpora of positive samples First, samples are incorporated by adding ad-hoc rules to a working grammar; subsequently, elements of the model (such as states or nonterminals) are merged to achieve generalization and a more compact representation The choice of what to merge and when to stop is governed by the Bayesian posterior probability of the grammar given the data, which formalizes a trade-off between a close fit to the data and a default preference for simpler models (‘Occam's Razor’) The general scheme is illustrated using three types of probabilistic grammars: Hidden Markov models, class-based n-grams, and stochastic context-free grammars

Journal ArticleDOI
TL;DR: A rotation and gray scale transform invariant texture recognition scheme using the combination of quadrature mirror filter (QMF) bank and hidden Markov model (HMM) to capture the trend of changes caused by rotation.
Abstract: In this correspondence, we have presented a rotation and gray scale transform invariant texture recognition scheme using the combination of quadrature mirror filter (QMF) bank and hidden Markov model (HMM). In the first stage, the QMF bank is used as the wavelet transform to decompose the texture image into subbands. The gray scale transform invariant features derived from the statistics based on first-order distribution of gray levels are then extracted from each subband image. In the second stage, the sequence of subbands is modeled as a hidden Markov model (HMM), and one HMM is designed for each class of textures. The HMM is used to exploit the dependence among these subbands, and is able to capture the trend of changes caused by rotation. During recognition, the unknown texture is matched against all the models. The best matched model identifies the texture class. Up to 93.33% classification accuracy is reported. >

Journal ArticleDOI
TL;DR: This new algorithm identifies islands of reliability (essentially the portion of speech contained between the first and the last vowel) using time and frequency-based features and then applies a noise adaptive procedure to refine the boundaries.
Abstract: The authors address the problem of automatic word boundary detection in quiet and in the presence of noise. Attention has been given to automatic word boundary detection for both additive noise and noise-induced changes in the talker's speech production (Lombard reflex). After a comparison of several automatic word boundary detection algorithms in different noisy-Lombard conditions, they propose a new algorithm that is robust in the presence of noise. This new algorithm identifies islands of reliability (essentially the portion of speech contained between the first and the last vowel) using time and frequency-based features and then, after a noise classification, applies a noise adaptive procedure to refine the boundaries. It is shown that this new algorithm outperforms the commonly used algorithm developed by Lamel (1981) et al. and several other recently developed methods. They evaluated the average recognition error rate due to word boundary detection in an HMM-based recognition system across several signal-to-noise ratios and noise conditions. The recognition error rate decreased to about 20% compared to an average of approximately 50% obtained with a modified version of the Lamel et al. algorithm. >

Journal ArticleDOI
TL;DR: The authors develop an efficient dynamic programming technique which includes the state sojourn time as an optimization variable, in conjunction with a state-dependent orthogonal polynomial regression method, for estimating the model parameters.
Abstract: Proposes, implements, and evaluates a class of nonstationary-state hidden Markov models (HMMs) having each state associated with a distinct polynomial regression function of time plus white Gaussian noise. The model represents the transitional acoustic trajectories of speech in a parametric manner, and includes the standard stationary-state HMM as a special, degenerated case. The authors develop an efficient dynamic programming technique which includes the state sojourn time as an optimization variable, in conjunction with a state-dependent orthogonal polynomial regression method, for estimating the model parameters. Experiments on fitting models to speech data and on limited-vocabulary speech recognition demonstrate consistent superiority of these nonstationary-state HMMs over the traditional stationary-state HMMs. >

Proceedings Article
01 Jan 1994
TL;DR: The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences, and the system achieved performance levels equivalent to untrained humans when asked to recognize the first four English digits.
Abstract: This paper presents ongoing work on a speaker independent visual speech recognition system. The work presented here builds on previous research efforts in this area and explores the potential use of simple hidden Markov models for limited vocabulary, speaker independent visual speech recognition. The task at hand is recognition of the first four English digits, a task with possible applications in car-phone dialing. The images were modeled as mixtures of independent Gaussian distributions, and the temporal dependencies were captured with standard left-to-right hidden Markov models. The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences. The system achieved performance levels equivalent to untrained humans when asked to recognize the first four English digits.

Proceedings ArticleDOI
13 Oct 1994
TL;DR: Two experiments designed to determine how much manual training information is needed for speech tagging by Hidden Markov Model suggest that initial biasing of either lexical or transition probabilities is essential to achieve a good accuracy and reveal three distinct patterns of Baum-Welch reestimation.
Abstract: In part of speech tagging by Hidden Markov Model, a statistical model is used to assign grammatical categories to words in a text. Early work in the field relied on a corpus which had been tagged by a human annotator to train the model. More recently, Cutting et al. (1992) suggest that training can be achieved with a minimal lexicon and a limited amount of a priori information about probabilities, by using an Baum-Welch re-estimation to automatically refine the model. In this paper, I report two experiments designed to determine how much manual training information is needed. The first experiment suggests that initial biasing of either lexical or transition probabilities is essential to achieve a good accuracy. The second experiment reveals that there are three distinct patterns of Baum-Welch reestimation. In two of the patterns, the re-estimation ultimately reduces the accuracy of the tagging rather than improving it. The pattern which is applicable can be predicted from the quality of the initial model and the similarity between the tagged training corpus (if any) and the corpus to be tagged. Heuristics for deciding how to use re-estimation in an effective manner are given. The conclusions are broadly in agreement with those of Merialdo (1994), but give greater detail about the contributions of different parts of the model.

Proceedings ArticleDOI
19 Apr 1994
TL;DR: A new method of creating speaker-specific phoneme models consisting of tied-mixture HMMs and adapts the feature space of the tied- mixtures to that of the speaker through phoneme-dependent/independent iterative training is proposed.
Abstract: Speaker adaptation methods for tied-mixture-based phoneme models are investigated for text-prompted speaker recognition. For this type of speaker recognition, speaker-specific phoneme models are essential for verifying both the key text and the speaker. This paper proposes a new method of creating speaker-specific phoneme models. This uses speaker-independent (universal) phoneme models consisting of tied-mixture HMMs and adapts the feature space of the tied-mixtures to that of the speaker through phoneme-dependent/independent iterative training. Therefore, it can adapt models of phonemes that have a small amount of training data to the speaker. The proposed method was tested using 15 speakers' voices recorded over 10 months and achieved a speaker and text verification rate of 99.4% even when both the voices of different speakers and different texts uttered by the true speaker were to be rejected. >

Posted Content
TL;DR: A new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy, and how the algorithm was incorporated in an operational speech understanding system, where it was combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.
Abstract: This report describes a new technique for inducing the structure of Hidden Markov Models from data which is based on the general `model merging' strategy (Omohundro 1992). The process begins with a maximum likelihood HMM that directly encodes the training data. Successively more general models are produced by merging HMM states. A Bayesian posterior probability criterion is used to determine which states to merge and when to stop generalizing. The procedure may be considered a heuristic search for the HMM structure with the highest posterior probability. We discuss a variety of possible priors for HMMs, as well as a number of approximations which improve the computational efficiency of the algorithm. We studied three applications to evaluate the procedure. The first compares the merging algorithm with the standard Baum-Welch approach in inducing simple finite-state languages from small, positive-only training samples. We found that the merging procedure is more robust and accurate, particularly with a small amount of training data. The second application uses labelled speech data from the TIMIT database to build compact, multiple-pronunciation word models that can be used in speech recognition. Finally, we describe how the algorithm was incorporated in an operational speech understanding system, where it is combined with neural network acoustic likelihood estimators to improve performance over single-pronunciation word models.

PatentDOI
TL;DR: In this article, a finite state grammar set corresponding to the range of word sequence patterns in the lesson is employed as a constraint on a hidden Markov model (HMM) search apparatus in an HMM speech recognizer.
Abstract: Spoken-language instruction method and apparatus employ context-based speech recognition for instruction and evaluation. A finite state grammar set (113) corresponding to the range of word sequence patterns in the lesson is employed as a constraint on a hidden Markov model (HMM) search apparatus in an HMM speech recognizer (112). The invention includes a system with an interactive decision mechanism which employs at least three levels of error tolerance to simulate a natural level of patience in human-based interactive instruction. A linguistically-sensitive utterance endpoint detector is provided for judging termination of a spoken utterance to simulate human turn-taking in conversational speech.


Journal ArticleDOI
TL;DR: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations, proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent.
Abstract: A simple learning algorithm for Hidden Markov Models (HMMs) is presented together with a number of variations. Unlike other classical algorithms such as the Baum-Welch algorithm, the algorithms described are smooth and can be used on-line (after each example presentation) or in batch mode, with or without the usual Viterbi most likely path approximation. The algorithms have simple expressions that result from using a normalized-exponential representation for the HMM parameters. All the algorithms presented are proved to be exact or approximate gradient optimization algorithms with respect to likelihood, log-likelihood, or cross-entropy functions, and as such are usually convergent. These algorithms can also be casted in the more general EM (Expectation-Maximization) framework where they can be viewed as exact or approximate GEM (Generalized Expectation-Maximization) algorithms. The mathematical properties of the algorithms are derived in the appendix.

Journal ArticleDOI
TL;DR: A scheme using two-layer network is proposed to cope with the difficulty resulting from the declination effect on the F0 contour of the declarative sentential utterance and the coarticulation effect coming from neighboring syllables is considered in the scheme using context-dependent model.
Abstract: In this paper, several tone recognition schemes for continuous Mandarin speech are discussed. First, an SCHMM is used to model the acoustic features of a syllable for tone discrimination. Parameters extracted from the F0 and energy contours of the syllable by discrete Legendre orthonormal transform are used as the recognition features. Then, a scheme using two-layer network is proposed to cope with the difficulty resulting from the declination effect on the F0 contour of the declarative sentential utterance. The declination effect is modeled by a sentence-level HMM on the upper layer and the acoustic features of each tone are modeled by a state-dependent SCHMM on the lower layer. Lastly, the coarticulation effect coming from neighboring syllables is considered in the scheme using context-dependent model. Performance of these recognition schemes was examined by simulations. A recognition rate of 86.34% was achieved.

Journal ArticleDOI
19 Apr 1994
TL;DR: In this article, a time delay neural network with local connections and shared weights is used to estimate a posteriori probabilities for characters in a word and a hidden Markov model segments the word into characters, which optimizes the global word score, taking a dictionary into account.
Abstract: Presents a writer independent system for on-line handwriting recognition which can handle both cursive script and hand-print. The pen trajectory is recorded by a touch sensitive pad, such as those used by note-pad computers. The input to the system contains the pen trajectory information, encoded as a time-ordered sequence of feature vectors. Features include X and Y coordinates, pen-lifts, speed, direction and curvature of the pen trajectory. A time delay neural network with local connections and shared weights is used to estimate a posteriori probabilities for characters in a word. A hidden Markov model segments the word into characters in a way which optimizes the global word score, taking a dictionary into account. A geometrical normalization scheme and a fast but efficient dictionary search are also presented. Trained on 20000 unconstrained cursive words from 59 writers and using a 25000 word dictionary the authors reached a 89% character and 80% word recognition rate on test data from a disjoint set of writers. >