Showing papers on "TIMIT published in 1992"

PDF

Open Access

Journal Article•DOI•

Global optimization of a neural network-hidden Markov model hybrid

[...]

Yoshua Bengio¹, R. De Mori¹, G. Flammia¹, R. Kompe¹•Institutions (1)

01 Mar 1992-IEEE Transactions on Neural Networks

TL;DR: In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM, and an algorithm is proposed for global optimization of all the parameters.

...read moreread less

Abstract: The integration of multilayered and recurrent artificial neural networks (ANNs) with hidden Markov models (HMMs) is addressed. ANNs are suitable for approximating functions that compute new acoustic parameters, whereas HMMs have been proven successful at modeling the temporal structure of the speech signal. In the approach described, the ANN outputs constitute the sequence of observation vectors for the HMM. An algorithm is proposed for global optimization of all the parameters. Results on speaker-independent recognition experiments using this integrated ANN-HMM system on the TIMIT continuous speech database are reported. >

...read moreread less

234 citations

Journal Article•DOI•

Preliminary results on speaker-dependent variation in the TIMIT database.

[...]

Dani Byrd¹•Institutions (1)

University of California, Los Angeles¹

01 Jul 1992-Journal of the Acoustical Society of America

TL;DR: A set of phonetic studies based on analysis of the TIMIT speech database detail new results in speaker-dependent variation due to sex and dialect region of the talker including effects on stop release frequency, speaking rate, vowel reduction, flapping, and the use of glottal stop.

...read moreread less

Abstract: A set of phonetic studies based on analysis of the TIMIT speech database is presented. Using a database methodological approach, these studies detail new results in speaker‐dependent variation due to sex and dialect region of the talker including effects on stop release frequency, speaking rate, vowel reduction, flapping, and the use of glottal stop. TIMIT was found to be fertile ground for gathering acoustic–phonetic knowledge having relevance to the phonetic classification and recognition goals for which TIMIT was designed, as well as to the linguist attempting to describe regularity and variability in the pronunciation of read English speech.

...read moreread less

94 citations

Journal Article•DOI•

Phonemic recognition using a large hidden Markov model

[...]

D.J. Pepper, Mark A. Clements¹•Institutions (1)

Georgia Institute of Technology¹

01 Jun 1992-IEEE Transactions on Signal Processing

TL;DR: A novel method for using the state sequence output of a large hidden Markov model as input to a phonemic recognition system demonstrates that a significant amount of speech information is preserved in the most likely state sequences produced by such a model.

...read moreread less

Abstract: The authors present a novel method for using the state sequence output of a large hidden Markov model as input to a phonemic recognition system. It thereby demonstrates that a significant amount of speech information is preserved in the most likely state sequences produced by such a model. Two different system formulations are presented, both achieving recognitions results equivalent to those achieved by other researchers when using systems with similar levels of complexity. The best system formulation achieved a 56.1% recognition rate with 10.8% insertions on a closed-set experiment and a 53.3% recognition rate with 11.8% insertions on a speaker-independent experiment using the TIMIT acoustic-phonetic database. this experiment used 80 male speakers for model training and a separate set of 24 male speakers for model testing. >

...read moreread less

15 citations

Proceedings Article•DOI•

Speech recognition using hidden Markov model decomposition and a general background speech model

[...]

M.Q. Wang¹, Steve Young¹•Institutions (1)

University of Cambridge¹

23 Mar 1992

TL;DR: The results show that the general ergodic background model is as effective as a vocabulary-specific model, however, the MC technique is not effective.

...read moreread less

Abstract: Hidden Markov model (HMM) decomposition is used for recognizing speech in the presence of an interfering background speaker. The foreground speech is modeled by a set of left-to-right isolated word HMMs trained on a small isolated word database, and the background speech is modeled by a parallel ergodic HMM trained on a subset of TIMIT. The standard output approximation (OA) method of estimating the output probability distributions is used, and compared with a simple model combination (MC) technique. Recent work in this area has shown the effectiveness of vocabulary-specific background speech models, and hence this is used as a baseline. The results show that the general ergodic background model is as effective as a vocabulary-specific model. However, the MC technique is not effective. >

...read moreread less

14 citations

Proceedings Article•DOI•

Phonetic classification on wide-band and telephone quality speech

[...]

Benjamin Chigier

23 Feb 1992

TL;DR: Phonetic classification algorithms have been developed for wide-band and telephone quality speech, and were tested on subsets of the TIMIT and N-TIMIT databases, and the telephone network seems to increase the error rate.

...read moreread less

Abstract: Benchmarking the performance for telephone-network-based speech recognition systems is hampered by two factors: lack of standardized databases for telephone network speech, and insufficient understanding of the impact of the telephone network on recognition systems. The N-TIMIT database was used in the experiments described in this paper in order to "calibrate" the effect of the telephone network on phonetic classification algorithms. Phonetic classification algorithms have been developed for wide-band and telephone quality speech, and were tested on subsets of the TIMIT and N-TIMIT databases. The classifier described in this paper provides accuracy of 75% on wide-band TIMIT data and 66.5% on telephone quality N-TIMIT data. Over-all the telephone network seems to increase the error rate by a factor of 1.3.

...read moreread less

14 citations

Proceedings Article•

Phonetic analyses of the TIMIT corpus of american English.

[...]

Patricia A. Keating, B. Blankenship, Dani Byrd, Edward Flemming, Yuichi Todaka - Show less +1 more

01 Jan 1992

8 citations

Cepstral and Auditory Model Features for Speaker Recognition

[...]

John M. Colombi

01 Dec 1992

TL;DR: The resulting Vector Quantized (VQ) distortion based classification indicates the auditory model provides slightly reduced recognition in clean studio quality recordings yet achieves similar performance to the LPC cepstral representation in both degraded environments and in test data recorded over multiple sessions.

...read moreread less

Abstract: : The TIMIT and KING databases, as well as a ten day AFIT speaker corpus, are used to compare proven spectral processing techniques to an auditory neural representation for speaker identification. The feature sets compared were Linear Predictive Coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model. This auditory model provides for the mechanisms found in the human middle and inner auditory periphery as well as neural transduction. Clustering algorithms were used to generate speaker specific codebooks - one statistically based and the other a neural approach. These algorithms are the Linde-Buzo-Gray (LBG) algorithm and a Kohonen self-organizing feature map (SOFM). The LBG algorithm consistently provided optimal codebook designs with corresponding better classification rates. The resulting Vector Quantized (VQ) distortion based classification indicates the auditory model provides slightly reduced recognition in clean studio quality recordings (LPC 100%, Payton 90%), yet achieves similar performance to the LPC cepstral representation in both degraded environments (both 95%) and in test data recorded over multiple sessions (both over 98%). A variety of normalization techniques, preprocessing procedures and classifier fusion methods were examined on this biologically motivated feature set. Speaker identification, Auditory models, Vector quantization, Neural networks, User verification.

...read moreread less

5 citations

Proceedings Article•

Phonetic classification of timit segments preprocessed with lyon's cochlear model using a supervised/unsupervised hybrid neural network.

[...]

Gary N. Tajchman, Nathan Intrator

01 Jan 1992

TL;DR: This work uses a very detailed biologically motivated input representation of the speech tokens-Lyon's cochlear model as implemented by Slaney 20 to produce results comparable to those obtained by others without the addition of time normaliza-tion.

...read moreread less

Abstract: We report results on vowel and stop consonant recognition with tokens extracted from the TIMIT database. Our current system diiers from others doing similar tasks in that we do not use any speciic time normalization techniques. We use a very detailed biologically motivated input representation of the speech tokens-Lyon's cochlear model as implemented by Slaney 20]. This detailed, high dimensional representation, known as a cochleagram, is classi-ed by either a back-propagation or by a hybrid super-vised/unsupervised neural network classiier. The hybrid network is composed of a biologically motivated unsuper-vised network and a supervised back-propagation network. This approach produces results comparable to those obtained by others without the addition of time normaliza-tion.

...read moreread less

3 citations

Proceedings Article•DOI•

Static representation of speech dynamics for isolated word recognition

[...]

Chorkin Chan¹, Jian-Xiong Wu¹•Institutions (1)

University of Hong Kong¹

23 Mar 1992

TL;DR: Experimental results indicate that apart from a rather mild limitation of SM in handling a certain type of vocabulary, SM actually performs better than baselined continuous hidden Markov models (CHMM) in terms of recognition rate as far as isolated word recognition is concerned, and it takes only 60% of the time needed by CHMM in recognition.

...read moreread less

Abstract: A static model (SM) in the form of a single vector is proposed to represent the temporal properties of a sequence of speech feature vectors. In contrast to a hidden Markov model which captures the conditional probabilities of state transitions of consecutive observations x/sup to //sub t/ and x/sup to //sub t+1/ over time, an SM captures their average joint probabilities of belonging to a pair of phonetic classes omega /sub i/ and omega /sub j/ without any Markovian assumption. SM is tested with isolated words derived from the TIMIT database as well as artificially created words. The vocabulary is a subset of TIMIT consisting of 21 words derived from the two 'sa' sentences spoken by 420 speakers. The artificial vocabulary of 10 words is designed to study the limitations of SM. Experimental results indicate that apart from a rather mild limitation of SM in handling a certain type of vocabulary, SM actually performs better than baselined continuous hidden Markov models (CHMM) in terms of recognition rate as far as isolated word recognition is concerned, and it takes only 60% of the time needed by CHMM in recognition. >

...read moreread less

1 citations