Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Deep Time Delay Neural Network for Speech Enhancement with Full Data Learning

[...]

Cunhang Fan¹, Bin Liu¹, Jianhua Tao¹, Jiangyan Yi¹, Zhengqi Wen¹, Leichao Song¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

11 Nov 2020-arXiv: Sound

TL;DR: A deep time delay neural network (TDNN) for speech enhancement with full data learning, which has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design.

...read moreread less

Abstract: Recurrent neural networks (RNNs) have shown significant improvements in recent years for speech enhancement. However, the model complexity and inference time cost of RNNs are much higher than deep feed-forward neural networks (DNNs). Therefore, these limit the applications of speech enhancement. This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning. The TDNN has excellent potential for capturing long range temporal contexts, which utilizes a modular and incremental design. Besides, the TDNN preserves the feed-forward structure so that its inference cost is comparable to standard DNN. To make full use of the training data, we propose a full data learning method for speech enhancement. More specifically, we not only use the noisy-to-clean (input-to-target) to train the enhanced model, but also the clean-to-clean and noise-to-silence data. Therefore, all of the training data can be used to train the enhanced model. Our experiments are conducted on TIMIT dataset. Experimental results show that our proposed method could achieve a better performance than DNN and comparable even better performance than BLSTM. Meanwhile, compared with the BLSTM, the proposed method drastically reduce the inference time.

...read moreread less

2 citations

Posted Content•

Joint gender and age estimation based on speech signals using x-vectors and transfer learning.

[...]

Damian Kwasny, Daria Hemmerling

02 Dec 2020-arXiv: Audio and Speech Processing

TL;DR: The baseline multilayer-TDNN architecture is replaced with QuartzNet, a convolutional architecture that has gained success in the field of speech recognition and a two-staged transfer learning scheme is proposed, utilizing large scale speech datasets: VoxCeleb and Common Voice and usage of multitask learning to allow for joint age estimation and gender classification with a single system.

...read moreread less

Abstract: In this paper we extend the x-vector framework for the task of speaker's age estimation and gender classification. In particular, we replace the baseline multilayer-TDNN architecture with QuartzNet, a convolutional architecture that has gained success in the field of speech recognition. We further propose a two-staged transfer learning scheme, utilizing large scale speech datasets: VoxCeleb and Common Voice, and usage of multitask learning to allow for joint age estimation and gender classification with a single system. We train and evaluate the performance on the TIMIT dataset. The proposed transfer learning scheme yields consecutive performance improvements in terms of both age estimation error and gender classification accuracy and the best performing system achieves new state-of-the-art results on the task of age estimation on the TIMIT TEST dataset with MAE of 5.12 and 5.29 years and RMSE of 7.24 and 8.12 years for male and female speakers respectively while maintaining a gender classification accuracy of 99.6%.

...read moreread less

2 citations

Journal Article•DOI•

On the use of the TIMIT, QuickSIN, NU-6, and other widely used bandlimited speech materials for speech perception experiments.

[...]

Brian B. Monson, Emily Buss

01 Sep 2022-Berkeley Program in Law & Economics

TL;DR: In this paper , the authors analyzed the spectral content of popular speech corpora used for speech perception research to highlight the potential shortcomings of using bandlimited speech materials, and provided an overview of the phenomena potentially missed by using band limited speech signals, and the factors to consider when selecting stimuli that are sensitive to these effects.

...read moreread less

Abstract: The use of spectrally degraded speech signals deprives listeners of acoustic information that is useful for speech perception. Several popular speech corpora, recorded decades ago, have spectral degradations, including limited extended high-frequency (EHF) (>8 kHz) content. Although frequency content above 8 kHz is often assumed to play little or no role in speech perception, recent research suggests that EHF content in speech can have a significant beneficial impact on speech perception under a wide range of natural listening conditions. This paper provides an analysis of the spectral content of popular speech corpora used for speech perception research to highlight the potential shortcomings of using bandlimited speech materials. Two corpora analyzed here, the TIMIT and NU-6, have substantial low-frequency spectral degradation (<500 Hz) in addition to EHF degradation. We provide an overview of the phenomena potentially missed by using bandlimited speech signals, and the factors to consider when selecting stimuli that are sensitive to these effects.

...read moreread less

2 citations

Proceedings Article•

Combining multiple-sized sub-word units in a speech recognition system using baseform selection

[...]

T. Nagarajan, P. Vijayalakshmi, Douglas D. O'Shaughnessy

01 Jan 2006

TL;DR: It is shown that a consid-erable improvement in recognition performance can be achieved if the baseforms are selected properly, and the prelim-inary experiments carried out on the TIMIT speech corpus show a considerable improvement in the recognition performance over pure monophone/triphone-based systems when the larger-sized units are combined using proper selection of baseforms.

...read moreread less

Abstract: A Longer-sized sub-word unit is known to be a better candi-date in the development of a continuous speech recognition sys-tem. However, the basic problem with such units is the data spar-sity. To overcome this problem, researchers have tried to com-bine longer-sized sub-word unit models with phoneme models. Inthis paper, we have considered only frequently occurring syllablesand VC (Vowel + Consonant) units, and phone-sized units (mono-phones and triphones) for the development of a continuous speechrecognition system. Insuch a case, even for a single pronunciationof a word, there can be multiple representational baseforms in thelexicon, each with different-sized units. We show that a consid-erable improvement in recognition performance can be achievedif the baseforms are selected properly. Out of all possible base-forms for a given word in the lexicon, the baseform that maxi-mizes the acoustic likelihood, for possible sub-word unit concate-nations to make a word, alone is considered. In the baseline sys-tems’ word-lexicon, like pure monophone or triphone-based sys-tems, since only the acoustically weaker baseforms are replacedby baseforms with longer-sized units, the resultant performance isguaranteed to be better than that of baseline systems. The prelim-inary experiments carried out on the TIMIT speech corpus showa considerable improvement in the recognition performance overa pure monophone/triphone-based systems when the larger-sizedunits are combined using proper selection of baseforms.

...read moreread less

2 citations

Journal Article•

Fast training algorithm for deep neural network using multiple GPUs

[...]

Dai Lirong¹•Institutions (1)

University of Science and Technology of China¹

01 Jan 2013-Journal of Tsinghua University

TL;DR: Tests indicate that the fast DNN training algorithm implemented on multiple graphic processing units(GPUs) to improve the training efficiency significantly improves the DNNTraining speed.

...read moreread less

Abstract: In recent years,deep neural networks(DNNs) have been successfully used for speech recognition as a popular recognition model with great potential.However,the computational complexity of the training algorithm means that the training time for this DNN model increases dramatically with larger amounts of training data and more neural network nodes.This paper describes a fast DNN training algorithm implemented on multiple graphic processing units(GPUs) to improve the training efficiency.Tests of phone speech recognition on the TIMIT corpus show that the training speed of the improved DNN training algorithm on 4 GPUs is 3.3 times faster than the general algorithm on a single GPU,while the recognition accuracy is almost the same.Tests indicate that the fast DNN training algorithm significantly improves the DNN training speed.

...read moreread less

2 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics