scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Posted Content
TL;DR: A deep time delay neural network (TDNN) for speech enhancement with full data learning, which has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design.
Abstract: Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence data. Therefore, all of the training data can be used to train the enhanced model. Our experiments are conducted on TIMIT dataset. Experimental results show that our proposed method could achieve a better performance than DNN and comparable even better performance than BLSTM. Meanwhile, compared with the BLSTM, the proposed method drastically reduce the inference time.

2 citations

Posted Content
TL;DR: The baseline multilayer-TDNN architecture is replaced with QuartzNet, a convolutional architecture that has gained success in the field of speech recognition and a two-staged transfer learning scheme is proposed, utilizing large scale speech datasets: VoxCeleb and Common Voice and usage of multitask learning to allow for joint age estimation and gender classification with a single system.
Abstract: In this paper we extend the x-vector framework for the task of speaker's age estimation and gender classification. In particular, we replace the baseline multilayer-TDNN architecture with QuartzNet, a convolutional architecture that has gained success in the field of speech recognition. We further propose a two-staged transfer learning scheme, utilizing large scale speech datasets: VoxCeleb and Common Voice, and usage of multitask learning to allow for joint age estimation and gender classification with a single system. We train and evaluate the performance on the TIMIT dataset. The proposed transfer learning scheme yields consecutive performance improvements in terms of both age estimation error and gender classification accuracy and the best performing system achieves new state-of-the-art results on the task of age estimation on the TIMIT TEST dataset with MAE of 5.12 and 5.29 years and RMSE of 7.24 and 8.12 years for male and female speakers respectively while maintaining a gender classification accuracy of 99.6%.

2 citations

Journal ArticleDOI
TL;DR: In this paper , the authors analyzed the spectral content of popular speech corpora used for speech perception research to highlight the potential shortcomings of using bandlimited speech materials, and provided an overview of the phenomena potentially missed by using band limited speech signals, and the factors to consider when selecting stimuli that are sensitive to these effects.
Abstract: The use of spectrally degraded speech signals deprives listeners of acoustic information that is useful for speech perception. Several popular speech corpora, recorded decades ago, have spectral degradations, including limited extended high-frequency (EHF) (>8 kHz) content. Although frequency content above 8 kHz is often assumed to play little or no role in speech perception, recent research suggests that EHF content in speech can have a significant beneficial impact on speech perception under a wide range of natural listening conditions. This paper provides an analysis of the spectral content of popular speech corpora used for speech perception research to highlight the potential shortcomings of using bandlimited speech materials. Two corpora analyzed here, the TIMIT and NU-6, have substantial low-frequency spectral degradation (<500 Hz) in addition to EHF degradation. We provide an overview of the phenomena potentially missed by using bandlimited speech signals, and the factors to consider when selecting stimuli that are sensitive to these effects.

2 citations

Proceedings Article
01 Jan 2006
TL;DR: It is shown that a consid-erable improvement in recognition performance can be achieved if the baseforms are selected properly, and the prelim-inary experiments carried out on the TIMIT speech corpus show a considerable improvement in the recognition performance over pure monophone/triphone-based systems when the larger-sized units are combined using proper selection of baseforms.
Abstract: A Longer-sized sub-word unit is known to be a better candi-date in the development of a continuous speech recognition sys-tem. However, the basic problem with such units is the data spar-sity. To overcome this problem, researchers have tried to com-bine longer-sized sub-word unit models with phoneme models. Inthis paper, we have considered only frequently occurring syllablesand VC (Vowel + Consonant) units, and phone-sized units (mono-phones and triphones) for the development of a continuous speechrecognition system. Insuch a case, even for a single pronunciationof a word, there can be multiple representational baseforms in thelexicon, each with different-sized units. We show that a consid-erable improvement in recognition performance can be achievedif the baseforms are selected properly. Out of all possible base-forms for a given word in the lexicon, the baseform that maxi-mizes the acoustic likelihood, for possible sub-word unit concate-nations to make a word, alone is considered. In the baseline sys-tems’ word-lexicon, like pure monophone or triphone-based sys-tems, since only the acoustically weaker baseforms are replacedby baseforms with longer-sized units, the resultant performance isguaranteed to be better than that of baseline systems. The prelim-inary experiments carried out on the TIMIT speech corpus showa considerable improvement in the recognition performance overa pure monophone/triphone-based systems when the larger-sizedunits are combined using proper selection of baseforms.

2 citations

Journal Article
TL;DR: Tests indicate that the fast DNN training algorithm implemented on multiple graphic processing units(GPUs) to improve the training efficiency significantly improves the DNNTraining speed.
Abstract: In recent years,deep neural networks(DNNs) have been successfully used for speech recognition as a popular recognition model with great potential.However,the computational complexity of the training algorithm means that the training time for this DNN model increases dramatically with larger amounts of training data and more neural network nodes.This paper describes a fast DNN training algorithm implemented on multiple graphic processing units(GPUs) to improve the training efficiency.Tests of phone speech recognition on the TIMIT corpus show that the training speed of the improved DNN training algorithm on 4 GPUs is 3.3 times faster than the general algorithm on a single GPU,while the recognition accuracy is almost the same.Tests indicate that the fast DNN training algorithm significantly improves the DNN training speed.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895