scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
01 Jan 2005
TL;DR: Results from both sets of experiments suggest that GTCC and ZCPA perform better than the conventional methods in noisy conditions, including PLP, which is the best amongst the three conventional algorithms.
Abstract: Conventional speech feature extraction front-end algorithms suffer severe performance degradation in noisy environment, especially when there is a noise level mismatch between the training and testing environments It is necessary to search for new feature extraction algorithms, which perform better than the convention methods in adverse conditions In our literature survey, two recently developed algorithms are found to have better performances than the conventional algorithms They are Gammatone Cepstral Coefficients (GTCC) and Zero-Crossings with Peak Amplitude (ZCPA) Two sets of experiments are conducted to test their performances against the conventional methods, which include Linear Prediction Cepstral Coefficients (LPCC), Perceptual Linear Prediction Coefficients (PLP) and Mel Frequency Cepstral Coefficients (MFCC) The first set of experiments is a pilot study, which involves recognising speaker-dependent isolated numeric digits The second set of experiments is a more formal study, which includes HMM-based speaker-independent continuous speech recognition experiments using an accredited speech database called TIMIT In these two sets of experiments, training data is kept in clean condition while various levels of white Gaussian noise are added to the testing data Results from both sets of experiments suggest that GTCC and ZCPA perform better than the conventional methods in noisy conditions For example, in the TIMIT experiment, GTCC outperforms PLP, which is the best amongst the three conventional algorithms, by 16% in 0dB SNR to 44% in 20dB SNR In addition, GTCC performs equally well in clean environment For ZCPA, it does not perform well in clean conditions However, it performs better than PLP by 41% in 0dB SNR to15% in 20dB SNR

30 citations

Posted Content
TL;DR: In this article, the authors proposed a noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions.
Abstract: In this study, we propose a novel noise adaptive speech enhancement (SE) system, which employs a domain adversarial training (DAT) approach to tackle the issue of a noise type mismatch between the training and testing conditions Such a mismatch is a critical problem in deep-learning-based SE systems A large mismatch may cause a serious performance degradation to the SE performance Because we generally use a well-trained SE system to handle various unseen noise types, a noise type mismatch commonly occurs in real-world scenarios The proposed noise adaptive SE system contains an encoder-decoder-based enhancement model and a domain discriminator model During adaptation, the DAT approach encourages the encoder to produce noise-invariant features based on the information from the discriminator model and consequentially increases the robustness of the enhancement model to unseen noise types Herein, we regard stationary noises as the source domain (with the ground truth of clean speech) and non-stationary noises as the target domain (without the ground truth) We evaluated the proposed system on TIMIT sentences The experiment results show that the proposed noise adaptive SE system successfully provides significant improvements in PESQ (190%), SSNR (393%), and STOI (270%) over the SE system without an adaptation

30 citations

Journal ArticleDOI
TL;DR: The development of an RC-HMM hybrid that provides good recognition on the Wall Street Journal benchmark is described, and given that RC-based acoustic modeling is a fairly new approach, these results open up promising perspectives.
Abstract: Thanks to research in neural network based acoustic modeling, progress in Large Vocabulary Continuous Speech Recognition (LVCSR) seems to have gained momentum recently. In search for further progress, the present letter investigates Reservoir Computing (RC) as an alternative new paradigm for acoustic modeling. RC unifies the appealing dynamical modeling capacity of a Recurrent Neural Network (RNN) with the simplicity and robustness of linear regression as a model for training the weights of that network. In previous work, an RC-HMM hybrid yielding very good phone recognition accuracy on TIMIT could be designed, but no proof was offered yet that this success would also transfer to LVCSR. This letter describes the development of an RC-HMM hybrid that provides good recognition on the Wall Street Journal benchmark. For the WSJ0 5k word task, word error rates of 6.2% (bigram language model) and 3.9% (trigram) are obtained on the Nov-92 evaluation set. Given that RC-based acoustic modeling is a fairly new approach, these results open up promising perspectives.

30 citations

Proceedings ArticleDOI
05 Jun 2000
TL;DR: An algorithm to hide data in speech signals by inverted the polarity of the signal at every syllable according to the assigned bit and was able to successfully hide data and restore it automatically.
Abstract: In this paper we investigate how polarity inversion of speech signals effects human perception, and we apply this technique for data hiding. In most languages, glottal airflow during phonation is uni-directional, causing constant polarity of the speech waveform. On the other hand, the human auditory system cannot discriminate between speech signals with positive and negative polarity. Based on these facts, we developed an algorithm to hide data in speech signals. We assigned one bit to each syllable of speech, and inverted the polarity of the signal at every syllable according to the assigned bit. We performed a test using 20 sentences from the TIMIT corpus to determine both whether a human could distinguish between the original and polarity-inverted signal and whether we could automatically restore the embedded binary data. We found that we were able to successfully hide data and restore it automatically.

30 citations

Journal ArticleDOI
TL;DR: The results of experiments indicate that with the optimized feature set, the performance of the ASV system is improved and the speed of verification is significantly increased since by use of ACO, number of features is reduced over 80% which consequently decrease the complexity of the AsV system.
Abstract: With the growing trend toward remote security verification procedures for telephone banking, biometric security measures and similar applications, automatic speaker verification (ASV) has received a lot of attention in recent years The complexity of ASV system and its verification time depends on the number of feature vectors, their dimensionality, the complexity of the speaker models and the number of speakers In this paper, we concentrate on optimizing dimensionality of feature space by selecting relevant features At present there are several methods for feature selection in ASV systems To improve performance of ASV system we present another method that is based on ant colony optimization (ACO) algorithm After feature reduction phase, feature vectors are applied to a Gaussian mixture model universal background model (GMM-UBM) which is a text-independent speaker verification model The performance of proposed algorithm is compared to the performance of genetic algorithm on the task of feature selection in TIMIT corpora The results of experiments indicate that with the optimized feature set, the performance of the ASV system is improved Moreover, the speed of verification is significantly increased since by use of ACO, number of features is reduced over 80% which consequently decrease the complexity of our ASV system

30 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895