scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: In this article , a comparative analysis of three different approaches: classification with still images (CNN model), classification based on previous images (CRNN model), and classification of sequences of images (Seq2Seq model).
Abstract: Existing literature on speech activity detection (SAD) highlights different approaches within neural networks but does not provide a comprehensive comparison to these methods. This is important because such neural approaches often require hardware-intensive resources. In this article, we provide a comparative analysis of three different approaches: classification with still images (CNN model), classification based on previous images (CRNN model), and classification of sequences of images (Seq2Seq model). Our experimental results using the Vid-TIMIT dataset show that the CNN model can achieve an accuracy of 97% whereas the CRNN and Seq2Seq models increase the classification to 99%. Further experiments show that the CRNN model is almost as accurate as the Seq2Seq model (99.1% vs. 99.6% of classification accuracy, respectively) but 57% faster to train (326 vs. 761 secs. per epoch).
Proceedings ArticleDOI
01 Jan 2022
TL;DR: This article explored three different types of methods for DNN-based speaker meta information estimation and compared the estimation results between the original speech and the anonymized speech using the TIMIT dataset.
Abstract: There has been concerns about how speech data is collected and shared in the real world since human speech itself has personally identifiable information about the speaker and speech is available to reliably estimate speaker meta information. In this paper, we explore three different types of methods for DNN based speaker meta information estimation and compare the estimation results between the original speech and the anonymized speech. We used McAdam's coefficient-based signal processing technique to make the anonymized speech and privacy-preserving data. Experiments derived using TIMIT dataset show a slight degradation in performance of anonymized speech against the original. Experiments reveal that the model employing both DNN based embedding and voice anonymization can achieve comparable performance to the model using the original speech.
Posted ContentDOI
05 Apr 2022
TL;DR: In this article , the authors proposed a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used to account for the temporal property of speech signals.
Abstract: As an extension of variational autoencoder (VAE), complex VAE uses complex Gaussian distributions to model latent variables and data. This work proposes a complex recurrent VAE framework, specifically in which complex-valued recurrent neural network and L1 reconstruction loss are used. Firstly, to account for the temporal property of speech signals, this work introduces complex-valued recurrent neural network in the complex VAE framework. Besides, L1 loss is used as the reconstruction loss in this framework. To exemplify the use of the complex generative model in speech processing, we choose speech enhancement as the specific application in this paper. Experiments are based on the TIMIT dataset. The results show that the proposed method offers improvements on objective metrics in speech intelligibility and signal quality.
Book ChapterDOI
01 Jan 1994
TL;DR: A telephone speech database suitable for talker identification research (Godfrey, 1992) was not generally available at the time of this research, though clean speech databases such as TIMIT (Garofolo et al., 1988) have been available.
Abstract: It is difficult to implement talker recognition on the telephone network because of normal variation in the channel characteristics. The primary component of variation is due to the different telephone handsets or microphone frequency characteristics (Rosenberg and Soong, 1992). Lack of availability of telephone speech databases has also contributed to slow progress in the solution of these problems, though clean speech databases such as TIMIT (Garofolo et al., 1988) have been available. A telephone speech database suitable for talker identification research (Godfrey, 1992) was not generally available at the time of this research.
Proceedings ArticleDOI
01 Dec 2017
TL;DR: DNN trained on system combination of Mel-filterbank energies and SBAE features provide complementary information present in speech signal to help representation learning.
Abstract: Recently, unsupervised representation learning to learn the features from speech signals has seen a tremendous upsurge for speech processing applications. In this paper, we investigate a modified architecture of autoencoder namely, subband autoencoder (SBAE) for representation learning. Features were learned from spectrogram as an input to SBAE for speech recognition task. SBAE features and Mel-filterbank energies as spectral features were trained separately on DNN and then system combination is used. This technique was applied to speech recognition task on TIMIT and WSJO databases. On TIMIT database, we achieved an absolute improvement in PER on test set of 3% over Mel-filterbank energies alone. For WSJO database, we achieved relative improvement of 9.89% in WER on test sets compared to filterbank energies. Hence, DNN trained on system combination of Mel-filterbank energies and SBAE features provide complementary information present in speech signal.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895