scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
23 Mar 1992
TL;DR: The results show that the general ergodic background model is as effective as a vocabulary-specific model, however, the MC technique is not effective.
Abstract: Hidden Markov model (HMM) decomposition is used for recognizing speech in the presence of an interfering background speaker. The foreground speech is modeled by a set of left-to-right isolated word HMMs trained on a small isolated word database, and the background speech is modeled by a parallel ergodic HMM trained on a subset of TIMIT. The standard output approximation (OA) method of estimating the output probability distributions is used, and compared with a simple model combination (MC) technique. Recent work in this area has shown the effectiveness of vocabulary-specific background speech models, and hence this is used as a baseline. The results show that the general ergodic background model is as effective as a vocabulary-specific model. However, the MC technique is not effective. >

14 citations

Journal ArticleDOI
TL;DR: A statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented.
Abstract: Estimating speaker height can assist in voice forensic analysis and provide additional side knowledge to benefit automatic speaker identification or acoustic model selection for automatic speech recognition In this study, a statistical approach to height estimation that incorporates acoustic models within a non-uniform height bin width Gaussian mixture model structure as well as a formant analysis approach that employs linear regression on selected phones are presented The accuracy and trade-offs of these systems are explored by examining the consistency of the results, location, and causes of error as well a combined fusion of the two systems using data from the TIMIT corpus Open set testing is also presented using the Multi-session Audio Research Project corpus and publicly available YouTube audio to examine the effect of channel mismatch between training and testing data and provide a realistic open domain testing scenario The proposed algorithms achieve a highly competitive performance to previously published literature Although the different data partitioning in the literature and this study may prevent performance comparisons in absolute terms, the mean average error of 489 cm for males and 455 cm for females provided by the proposed algorithm on TIMIT utterances containing selected phones suggest a considerable estimation error decrease compared to past efforts

14 citations

Journal ArticleDOI
TL;DR: A semi-fragile and blind digital speech watermarking technique for online speaker recognition systems based on the discrete wavelet packet transform and quantization index modulation has been proposed that enables embedding of the watermark within an angle of the wavelet’s sub-bands.
Abstract: In this paper, a semi-fragile and blind digital speech watermarking technique for online speaker recognition systems based on the discrete wavelet packet transform (DWPT) and quantization index modulation (QIM) has been proposed that enables embedding of the watermark within an angle of the wavelet’s sub-bands. To minimize the degradation effects of the watermark, these sub-bands were selected from frequency ranges where little speaker-specific information was available (500–3500 Hz and 6000–7000 Hz). Experimental results on the TIMIT, MIT, and MOBIO speech databases show that the degradation results for speaker verification and identification are 0.39 and 0.97 %, respectively, which are negligible. In addition, the proposed watermark technique can provide the appropriate fragility required for different signal processing operations.

14 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: BGGMM using independent component analysis (ICA) is proposed and applied to an existing unsupervised keyword spotting setting for the generation of posteriorgrams and keyword detection results demonstrate the viability and effectiveness of the proposed algorithm in unsuper supervised keyword spotting framework.
Abstract: In this paper, bounded generalized Gaussian mixture model (BGGMM) using independent component analysis (ICA) is proposed and applied to an existing unsupervised keyword spotting setting for the generation of posteriorgrams. The ICA mixture model is trained without any transcription information to generate the posteriorgrams which further labels the speech frames of the keyword example(s) and test data. For the detection of occurrence of a specific keyword in the test data, the posteriorgrams of one or more keyword examples are compared with the posteriorgrams of test utterances using the segmental dynamic time warping (DTW). A score fusion method is used to obtain the result of the keyword detection by ranking the distortion scores of all the test utterances. The TIMIT speech corpus is used for the evaluation of this unsupervised keyword spotting setting. The keyword detection results demonstrate the viability and effectiveness of the proposed algorithm in unsupervised keyword spotting framework.

14 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: This work creates a new dictionary which is a function of the phonetic labels of the original dictionary, and refers to these new features as Spif, which allow for a 2.9% relative reduction in Phonetic Error Rate on the TIMIT phonetic recognition task and a 4.8% relative improvement on a large vocabulary 50 hour Broadcast News task.
Abstract: Exemplar-based techniques, such as k-nearest neighbors (kNNs) and Sparse Representations (SRs), can be used to model a test sample from a few training points in a dictionary set. In past work, we have shown that using a SR approach for phonetic classification allows for a higher accuracy than other classification techniques. These phones are the basic units of speech to be recognized. Motivated by this result, we create a new dictionary which is a function of the phonetic labels of the original dictionary. The SR method now selects relevant samples from this new dictionary to create a new feature representation of the test sample, where the new feature is better linked to the actual units to be recognized. We will refer to these new features as S pif . We present results using these new S pif features in a Hidden Markov Model (HMM) framework for speech recognition. We find that the S pif features allow for a 2.9% relative reduction in Phonetic Error Rate (PER) on the TIMIT phonetic recognition task. Furthermore, we find that the S pif features allow for a 4.8% relative improvement in Word Error Rate (WER) on a large vocabulary 50 hour Broadcast News task.

14 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895