scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients and a threshold value is obtained based on means and standard deviations of nonspeech frames.
Abstract: In this paper, a feature parameter is obtained by applying the Teager energy to the WPD(Wavelet Packet Decomposition) coefficients. The threshold value is obtained based on means and standard deviations of nonspeech frames. Experimental results by using TIMIT speech and NOISEX-92 noise databases show that the proposed algorithm is superior to the typical VAD algorithm. The ROC(Receiver Operating Characteristics) curves are used to compare performance of VAD`s for SNR values of ranging from 10 to -10 dB.

1 citations

Journal ArticleDOI
TL;DR: This investigation of two novel lattice-constrained Viterbi training strategies for improving sub-word unit (SWU) inventories that were discovered using an unsupervised sparse coding approach finds that this lightly supervised approach substantially increases correspondence with the reference phonemes, and in this case also improves pronunciation consistency.

1 citations

Dissertation
01 Jan 2009
TL;DR: In this paper, the authors used the Fisher's F-ratio to measure the frequency regions containing the most discriminative information and suppress the phonetic information in the speech.
Abstract: This Master's thesis presents an investigation of the features and models used when constructing a robust speaker identification system using the TIMIT speaker database. Investigations of the k-Means clustering algorithm and the Gaussian mixture models (GMM) for speaker modelling show an improvement in the identification rate when using the GMM speaker models. The features for the speaker identification should emphasize the individual differences in the speech while suppressing the phonetic information, the exact opposite is the case for the features used for speech recognition. However the same features, the MFCCs, have been used for both tasks. Using the Fisher's F-ratio to measure the frequency regions containing the most discriminative speaker information we present a new set of features, the FRFCCs. They emphasize the regions with speaker discriminative information and suppress the phonetic information in the speech. The Fisher's F-ratio shows that the regions around the fundamental frequency (100 Hz) and the third (2500 Hz) and fourth (3500 Hz) formant contain large speaker information, while the region around the first formant (500 Hz) contains only phonetic information. By adding noise to the TIMIT database we show that using the FRFCC features yield a better and more robust automatic speaker identification system. Finally testing on speech from Danish TV we show that using the FRFCCs instead of the MFCCs gives an improvement of 91%.

1 citations

Posted Content
TL;DR: In this article, the authors studied the problem of acoustic feature learning in the setting where they have access to an external, domain mismatched dataset of paired speech and articulatory measurements, either with or without labels.
Abstract: Previous work has shown that it is possible to improve speech recognition by learning acoustic features from paired acoustic-articulatory data, for example by using canonical correlation analysis (CCA) or its deep extensions. One limitation of this prior work is that the learned feature models are difficult to port to new datasets or domains, and articulatory data is not available for most speech corpora. In this work we study the problem of acoustic feature learning in the setting where we have access to an external, domain-mismatched dataset of paired speech and articulatory measurements, either with or without labels. We develop methods for acoustic feature learning in these settings, based on deep variational CCA and extensions that use both source and target domain data and labels. Using this approach, we improve phonetic recognition accuracies on both TIMIT and Wall Street Journal and analyze a number of design choices.

1 citations

Journal ArticleDOI
TL;DR: It is suggested that cepstral coefficients are able to model speech in a given environment in finer detail, whereas acoustic phonetic‐based features are more robust to changes in environment, so that combining both types of measurements leads to the best performance.
Abstract: This work classifies voiceless stop consonant place in CV tokens of English using burst release cues for clean (TIMIT) and telephone speech (NTIMIT). We compared the performance of cepstral coefficients to acoustic phonetics‐motivated features such as center of gravity, burst amplitude and relative difference of formant amplitudes. In clean speech, cepstral coefficients resulted in better classification. However, for test data from NTIMIT, acoustic phonetic‐based features outperformed cepstral coefficients, particularly if models were trained on clean speech. In addition, augmenting cepstral coefficients with acoustic phonetic‐based measurements resulted in the best performance. These findings suggest that cepstral coefficients are able to model speech in a given environment in finer detail, whereas acoustic phonetic‐based features are more robust to changes in environment, so that combining both types of measurements leads to the best performance.

1 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895