Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Improvement in monaural speech separation using sparse non-negative tucker decomposition

[...]

Yash Vardhan Varshney¹, Prashant Upadhyaya¹, Z. A. Abbasi¹, Musiur Raza Abidi¹, Omar Farooq¹ - Show less +1 more•Institutions (1)

Aligarh Muslim University¹

01 Dec 2018-International Journal of Speech Technology

TL;DR: A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced and the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD.

...read moreread less

Abstract: A monaural speech separation/enhancement technique based on non-negative tucker decomposition (NTD) has been introduced in this paper. In the proposed work, the effect of sparsity regularization factor on the separation of mixed signal is included in the generalized cost function of NTD. By using the proposed algorithm, the vector components of both target and mixed signal can be exploited and used for the separation of any monaural mixture. Experiment was done on the monaural data generated by mixing the speech signals from two speakers and, by mixing noise and speech signals using TIMIT and noisex-92 dataset. The separation results are compared with the other existing algorithms in terms of correlation of separated signal with the original signal, signal to distortion ratio, perceptual evaluation of speech quality and short-time objective intelligibility. Further, to get more conclusive information about separation ability, speech recognition using Kaldi toolkit was also performed. The recognition results are compared in terms of word error rate (WER) using the MFCC based features. Results show the average improved WER using proposed algorithm over the nearest performing algorithm is up to 2.7% for mixed speech of two speakers and 1.52% for noisy speech input.

...read moreread less

Book Chapter•DOI•

Phonetic Segmentation Using Knowledge from Visual and Perceptual Domain

[...]

Bhavik Vachhani¹, Chitralekha Bhat¹, Sunil Kumar Kopparapu¹•Institutions (1)

Harvard University¹

27 Aug 2017

TL;DR: Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed.

...read moreread less

Abstract: Accurate and automatic phonetic segmentation is crucial for several speech based applications such as phone level articulation analysis and error detection, speech synthesis, annotation, speech recognition and emotion recognition. In this paper we examine the effectiveness of using visual features obtained by processing the image spectrogram of a speech utterance, as applied to phonetic segmentation. Further, we propose a mechanism to combine the knowledge from visual and perceptual domains for automatic phonetic segmentation. This process can be considered analogous to manual phonetic segmentation. The technique was evaluated on TIMIT American English Corpus. Experimental results show significant improvements in phonetic segmentation, especially for lower tolerances of 5, 10 and 15 ms, with an absolute improvement of 8.29% for TIMIT database for a 10 ms tolerance is observed.

...read moreread less

Proceedings Article•DOI•

Enhancing Unsupervised Speech Recognition with Diffusion GANS

[...]

Yupeng Gao

04 Jun 2023

TL;DR: In this article , the authors enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN, which injects instance noises of various intensities to the generator's output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint.

...read moreread less

Abstract: We enhance the vanilla adversarial training method for unsupervised Automatic Speech Recognition (ASR) by a diffusionGAN. Our model (1) injects instance noises of various intensities to the generator’s output and unlabeled reference text which are sampled from pretrained phoneme language models with a length constraint, (2) asks diffusion timestep-dependent discriminators to separate them, and (3) back-propagates the gradients to update the generator. Word/phoneme error rate comparisons with wav2vec-U under Librispeech (3.1% for test-clean and 5.6% for test-other), TIMIT and MLS datasets, show that our enhancement strategies work effectively.

...read moreread less

Proceedings Article•

Low-cost Customized Speech Corpus Creation for Speech Technology Applications

[...]

Kazuaki Maeda, Christopher Cieri, Kevin Walker¹•Institutions (1)

University of Pennsylvania¹

01 May 2006

TL;DR: LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community is described.

...read moreread less

Abstract: Speech technology applications, such as speech recognition, speech synthesis, and speech dialog systems, often require corpora based on highly customized specifications. Existing corpora available to the community, such as TIMIT and other corpora distributed by LDC and ELDA, do not always meet the requirements of such applications. In such cases, the developers need to create their own corpora. The creation of a highly customized speech corpus, however, could be a very expensive and time-consuming task, especially for small organizations. It requires multidisciplinary expertise in linguistics, management and engineering as it involves subtasks such as the corpus design, human subject recruitment, recording, quality assurance, and in some cases, segmentation, transcription and annotation. This paper describes LDC's recent involvement in the creation of a low-cost yet highly-customized speech corpus for a commercial organization under a novel data creation and licensing model, which benefits both the particular data requester and the general linguistic data user community.

...read moreread less

Proceedings Article•DOI•

Significance of the MUSIC-group delay method in an ICA-Beamforming framework for speech separation in multi source environments

[...]

Lalan Kumar¹, Kushagra Singhal¹, Rohit Kumar Sinha¹, Rajesh M. Hegde¹•Institutions (1)

Indian Institute of Technology Kanpur¹

28 Mar 2013

TL;DR: A framework that addresses issues of resolution of the direction of arrival (DOA) estimation and permutation errors in ICA, using the MUSIC-Group delay method of DOA estimation has been described.

...read moreread less

Abstract: The performance of an ICA-Beamforming framework in multi source environments is often limited by the resolution of the direction of arrival (DOA) estimation and by permutation errors. In this paper a framework that addresses these issues, using the MUSIC-Group delay method of DOA estimation has been described. A new cost function defined for this purpose iteratively computes the correlation between the signals recovered using ICA and beamforming methods with signals recovered from the MUSIC-Group delay method as a reference. This cost function is then used to select the demixing matrix at each iteration until a convergence criterion is met. Source separation is then carried out using the final demixing matrix. Since the MUSIC-Group delay method exhibits high resolution, the DOA estimates obtained can be sorted more effectively to solve the permutation problems in ICA. TIMIT speech data is spatialized under a reverberant environment at various direct-to-reverberant energy ratio (DRR) to obtain S-TIMIT data. Experiments on speaker dependent large vocabulary speech recognition are conducted for a mixture of two speakers from the S-TIMIT data. The word error rates corresponding to the target and the non-target speaker using the proposed method indicate reasonable improvements when compared to conventional methods like ICA and ICA-Beamforming methods.

...read moreread less

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics