scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced and the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD.
Abstract: A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input.
Book ChapterDOI
27 Aug 2017
TL;DR: Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed.
Abstract: Accurate and automatic phonetic segmentation is crucial for several speech based applications such as phone level articulation analysis and error detection, speech synthesis, annotation, speech recognition and emotion recognition. In this paper we examine the effectiveness of using visual features obtained by processing the image spectrogram of a speech utterance, as applied to phonetic segmentation. Further, we propose a mechanism to combine the knowledge from visual and perceptual domains for automatic phonetic segmentation. This process can be considered analogous to manual phonetic segmentation. The technique was evaluated on TIMIT American English Corpus. Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed.
Proceedings ArticleDOI
04 Jun 2023
TL;DR: In this article , the authors enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN, which injects instance noises of various intensities to the generator's output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint.
Abstract: We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN. Our model (1) injects instance noises of various intensities to the generator’s output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint, (2) asks diffusion timestep-dependent discriminators to separate them, and (3) back-propagates the gradients to update the generator. Word/phoneme error rate comparisons with wav2vec-U under Librispeech (3.1% for test-clean and 5.6% for test-other), TIMIT and MLS datasets, show that our enhancement strategies work effectively.
Proceedings Article
01 May 2006
TL;DR: LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community is described.
Abstract: Speech technology applications, such as speech recognition, speech synthesis, and speech dialog systems, often require corpora based on highly customized specifications. Existing corpora available to the community, such as TIMIT and other corpora distributed by LDC and ELDA, do not always meet the requirements of such applications. In such cases, the developers need to create their own corpora. The creation of a highly customized speech corpus, however, could be a very expensive and time-consuming task, especially for small organizations. It requires multidisciplinary expertise in linguistics, management and engineering as it involves subtasks such as the corpus design, human subject recruitment, recording, quality assurance, and in some cases, segmentation, transcription and annotation. This paper describes LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community.
Proceedings ArticleDOI
28 Mar 2013
TL;DR: A framework that addresses issues of resolution of the direction of arrival (DOA) estimation and permutation errors in ICA, using the MUSIC-Group delay method of DOA estimation has been described.
Abstract: The performance of an ICA-Beamforming framework in multi source environments is often limited by the resolution of the direction of arrival (DOA) estimation and by permutation errors. In this paper a framework that addresses these issues, using the MUSIC-Group delay method of DOA estimation has been described. A new cost function defined for this purpose iteratively computes the correlation between the signals recovered using ICA and beamforming methods with signals recovered from the MUSIC-Group delay method as a reference. This cost function is then used to select the demixing matrix at each iteration until a convergence criterion is met. Source separation is then carried out using the final demixing matrix. Since the MUSIC-Group delay method exhibits high resolution, the DOA estimates obtained can be sorted more effectively to solve the permutation problems in ICA. TIMIT speech data is spatialized under a reverberant environment at various direct-to-reverberant energy ratio (DRR) to obtain S-TIMIT data. Experiments on speaker dependent large vocabulary speech recognition are conducted for a mixture of two speakers from the S-TIMIT data. The word error rates corresponding to the target and the non-target speaker using the proposed method indicate reasonable improvements when compared to conventional methods like ICA and ICA-Beamforming methods.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895