scispace - formally typeset
Search or ask a question
Topic

Triphone

About: Triphone is a research topic. Over the lifetime, 466 publications have been published within this topic receiving 11569 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.
Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

3,120 citations

Proceedings ArticleDOI
Frank Seide1, Gang Li1, Xie Chen1, Dong Yu1
01 Dec 2011
TL;DR: This work investigates the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective to reduce the word error rate for speaker-independent transcription of phone calls.
Abstract: We investigate the potential of Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, from a feature-engineering perspective. Recently, we had shown that for speaker-independent transcription of phone calls (NIST RT03S Fisher data), CD-DNN-HMMs reduced the word error rate by as much as one third—from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs with HLDA features, to 18.5%—using 300+ hours of training data (Switchboard), 9000+ tied triphone states, and up to 9 hidden network layers.

702 citations

Journal ArticleDOI
TL;DR: SPHINX is a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition, based on discrete hidden Markov models with LPC- (linear-predictive-coding) derived parameters.
Abstract: A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with LPC- (linear-predictive-coding) derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96%, respectively, on a 997-word task. >

487 citations

Journal ArticleDOI
TL;DR: It is shown that HMMs trained with MMIE benefit as much as MLE-trained HMMs from applying model adaptation using maximum likelihood linear regression (MLLR), which has allowed the straightforward integration of MMIe- trained HMMs into complex multi-pass systems for transcription of conversational telephone speech.

360 citations

PatentDOI
TL;DR: The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process, and phonetic context information is exploited, even before the complete context of a phoneme is known.
Abstract: The invention provides a method of large vocabulary speech recognition that employs a single tree-structured phonetic hidden Markov model (HMM) at each frame of a time-synchronous process. A grammar probability is utilized upon recognition of each phoneme of a word, before recognition of the entire word is complete. Thus, grammar probabilities are exploited as early as possible during recognition of a word. At each frame of the recognition process, a grammar probability is determined for the transition from the most likely preceding grammar state to a set of words that share at least one common phoneme. The grammar probability is combined with accumulating phonetic evidence to provide a measure of the likelihood that a state in the HMM will lead to the word most likely to have been spoken. In a preferred embodiment, phonetic context information is exploited, even before the complete context of a phoneme is known. Instead of an exact triphone model, wherein the phonemes previous and subsequent to a phoneme are considered, a composite triphone model is used that exploits partial phonetic context information to provide a phonetic model that is more accurate than aphonetic model that ignores context. In another preferred embodiment, the single phonetic tree method is used as the forward pass of a forward/backward recognition process, wherein the backward pass employs a recognition process other than the single phonetic tree method.

278 citations


Network Information
Related Topics (5)
Noise
110.4K papers, 1.3M citations
73% related
Feature vector
48.8K papers, 954.4K citations
73% related
Feature extraction
111.8K papers, 2.1M citations
72% related
Recurrent neural network
29.2K papers, 890K citations
72% related
Convolutional neural network
74.7K papers, 2M citations
71% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202110
20207
201922
201814
20179
201618