scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
26 Feb 1997
TL;DR: In this paper, an adaptation technique which exploits the inter/intra speaker vowel phoneme variations with respect to the tongue-hump-position within the oral cavity is presented. But it does not consider the speaker characteristics, since speaker information is comparable within these areas.
Abstract: In this paper we present an adaptation technique which exploits the inter/intra speaker vowel phoneme variations with respect to the tongue-hump-position within the oral cavity. The 13 vowels of American English speech can be classified into three areas according to the tongue-hump-position. The vowels, taken from the DARPA TIMIT phonetic database, in each of these areas are classified using one-class-in-one-network (OCON) feed forward subnets, similar to those proposed by Kung et al. (1995) and Jou et al. (1991), joined by a common front-end adaptation layer. This allows adaptation to be concentrated primarily on speaker characteristics, since speaker information is comparable within these areas, allowing adaptation towards a single phoneme to improve recognition of other vowel phonemes within the same network. This reduces the need for total vowel recital for complete vowel phoneme adaptation towards a new speaker. Results show increases of over 12% in the recognition rates of vowel phonemes after adaptation towards other phonemes in the same tongue-hump-position area. However, vowels that are well separated in the same group have little, even negative, effect on recognition after adaptation. (4 pages)

2 citations

01 Jan 1997
TL;DR: This paper presents methods and experimental results for phonetic classification using 39 phone classes and the NIST recommended training and test sets for NTIMIT and TIMIT to compare favorably to the best results reported in the literature for this task.
Abstract: This paper presents methods and experimental results for phonetic classification using 39 phone classes and the NIST recommended training and test sets for NTIMIT and TIMIT. Spectrdtemporal features which represent the smoothed trajectory of FlT derived speech spectra over 300 ms intervals are used for the analysis. Classification tests are made with both a binary-pair partitioned (BPP) neural network system (one neural network for each of the 741 pairs of phones) and a single large neural network. Classification accuracy is very similar for the two types of networks, but the BPP method has the advantage of much less training time. The best results obtained (77% for TIMIT and 67.4% for NTIMIT) compare favorably to the best results reported in the literature for this task.

2 citations

Proceedings ArticleDOI
17 Nov 2012
TL;DR: Experimental results show that the proposed Fisher discrimination dictionary learning method outperforms the Sparse Representation Classifier used for text-independent speaker recognition in both clean and noisy condition.
Abstract: In last decades, text-independent speaker recognition is a hot research topic attracted many researchers. In this paper, we proposed to apply the Fisher discrimination dictionary learning method to identify the text-independent speaker recognition. The feature used in classification is the Gaussian Mixture Model super vector. The proposed method is evaluated with public ally available dataset TIMIT. Experimental results show that the proposed method outperforms the Sparse Representation Classifier used for text-independent speaker recognition in both clean and noisy condition.

2 citations

Proceedings ArticleDOI
14 May 2006
TL;DR: An adaptation technique for ANNs is presented that, similar to the framework of MAP estimation, tries to exploit in the adaptation process prior information that is particularly useful to deal with the problem of sparse training data.
Abstract: Many techniques for speaker or channel adaptation have been successfully applied to automatic speech recognition. Most of these techniques have been proposed for the adaptation of Hidden Markov Models (HMMs). Far less proposals have been made for the adaptation of the Artificial Neural Networks (ANNs) used in the hybrid HMM-ANN approach. This paper presents an adaptation technique for ANNs that, similar to the framework of MAP estimation, tries to exploit in the adaptation process prior information that is particularly useful to deal with the problem of sparse training data. We show that the integration of a priori information can be simply achieved by linear interpolation of the weights of an "a priori" network and of a speaker specific network. Good improvements with respect to the baseline results are reported evaluating this technique on the Wall Street Journal WSJ0 and WSJ1 databases and on TIMIT corpus using different amounts of adaptation data.

2 citations

Posted Content
TL;DR: An objective critical distance (OCD) has been defined as that spacing between adjacent formants, when the level of the valley between them reaches the mean spectral level, which is similar to that of the spacing between the formants with an added advantage that it can be measured from the spectral envelope without an explicit knowledge of formant frequencies.
Abstract: An objective critical distance (OCD) has been defined as that spacing between adjacent formants, when the level of the valley between them reaches the mean spectral level. The measured OCD lies in the same range (viz., 3-3.5 bark) as the critical distance determined by subjective experiments for similar experimental conditions. The level of spectral valley serves a purpose similar to that of the spacing between the formants with an added advantage that it can be measured from the spectral envelope without an explicit knowledge of formant frequencies. Based on the relative spacing of formant frequencies, the level of the spectral valley, VI (between F1 and F2) is much higher than the level of VII (spectral valley between F2 and F3) for back vowels and vice-versa for front vowels. Classification of vowels into front/back distinction with the difference (VI-VII) as an acoustic feature, tested using TIMIT, NTIMIT, Tamil and Kannada language databases gives, on the average, an accuracy of about 95%, which is comparable to the accuracy (90.6%) obtained using a neural network classifier trained and tested using MFCC as the feature vector for TIMIT database. The acoustic feature (VI-VII) has also been tested for its robustness on the TIMIT database for additive white and babble noise and an accuracy of about 95% has been obtained for SNRs down to 25 dB for both types of noise.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895