scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
TL;DR: In this article, a novel PCA/LDA-based approach that is faster and more efficient than traditional statistical model-based methods and achieves competitive results is presented. But the performance based on only PCA and only LDA is measured; then a mixed model is introduced.
Abstract: Various algorithms for text-independent speaker recognition have been developed through the decades, aiming to improve both accuracy and efficiency. This paper presents a novel PCA/LDA-based approach that is faster than traditional statistical model-based methods and achieves competitive results. First, the performance based on only PCA and only LDA is measured; then a mixed model, taking advantages of both methods, is introduced. A subset of the TIMIT corpus composed of 200 male speakers, is used for enrollment, validation and testing. The best results achieve 100%, 96% and 95% classification rate at population level 50, 100 and 200, using 39- dimensional MFCC features with delta and double delta. These results are based on 12-second text-independent speech for training and 4-second data for test. These are comparable to the conventional MFCC-GMM methods, but require significantly less time to train and operate.

9 citations

Journal ArticleDOI
TL;DR: This work proposes a Deep neural network (DNN) based method for reconstructing speech magnitude spectrum from Mel-frequency cepstral coefficients (MFCCs) and demonstrates that the proposed method achieves significantly better performance compared with traditional methods.
Abstract: This work proposes a Deep neural network (DNN) based method for reconstructing speech magnitude spectrum from Mel-frequency cepstral coefficients (MFCCs). We train a DNN using MFCC vectors as input and the corresponding speech magnitude spectrum as desired output. Exploiting the strong inference power of DNN, the proposed method has the capability to accurately estimate the speech magnitude spectrum even from truncated MFCC vectors. Experiments on TIMIT corpus demonstrate that the proposed method achieves significantly better performance compared with traditional methods.

9 citations

Journal Article
TL;DR: This paper introduces and motivates the use of the statistical method Gaussian Mixture Model (GMM) and Support Vector Machines (SVM) for robust textindependent speaker identification and proves that the hybrid GMM-SVM system is significantly more preferment than the SVM system.
Abstract: This paper introduces and motivates the use of the statistical method Gaussian Mixture Model (GMM) and Support Vector Machines (SVM) for robust textindependent speaker identification. Features are extracted from the dialect DR1 of the Timit corpus. They are presented by MFCC, energy, Delta and Delta-Delta coefficients. GMM is used to model the feature extractor of the input speech signal and SVM is used for handling the task of decision making. The SVM is trained using inputs, which are the feature vectors presented by the GMM. Our results prove that the hybrid GMM-SVM system is significantly more preferment than the SVM system. We report improvements of 85,37% amelioration in identification rate compared to the SVM identification rate.

9 citations

Proceedings ArticleDOI
03 Apr 2014
TL;DR: The aim of the proposed method is to reduce the background noise present in the speech signal by using spectral subtraction techniques, and enhanced speech is obtained.
Abstract: Speech enhancement is a technique used to reduce the background noise present in the speech signal. It simply means the improvement in intelligibility and quality of degraded speech. The noises present in the speech signal are additive noise, echo, reverbration and speaker interference. The aim of the proposed method is to reduce the background noise present in the speech signal by using spectral subtraction techniques. The magnitude of the spectrum of estimated noise is subtracted from the spectrum of noisy speech signal. Five clean speeches are taken as sample speech. Sample noise such as pink noise, white noise and volvo noise are taken from database (TIMIT & NOIZEUS corpus). By using Non-linear spectral subtraction and Multiband spectral subtraction techniques, enhanced speech is obtained. Performance of the above two methods are compared based on the two parameters namely Signal to Noise Ratio and Log Spectral Distance.

9 citations

Proceedings ArticleDOI
01 Aug 2016
TL;DR: DNN trained on ConvRBM with rectified units provide significant complementary information in terms of temporal modulation features to help unsupervised representation learning for speech recognition task.
Abstract: There has been a significant research attention for unsupervised representation learning to learn the features for speech processing applications. In this paper, we investigate unsupervised representation learning using Convolutional Restricted Boltzmann Machine (ConvRBM) with rectified units for speech recognition task. Temporal modulation representation is learned using log Mel-spectrogram as an input to ConvRBM. ConvRBM as modulation features and filterbank as spectral features were separately trained on DNNs and then system combination is used. With our proposed setup, ConvRBM features were applied to speech recognition task on TIMIT and WSJ0 databases. On TIMIT database, we achieved relative improvement of 5.93% in PER on test set compared to only filterbank features. For WSJ0 database, we achieved relative improvement of 3.63–4.3% in WER on test sets compared to filterbank features. Hence, DNN trained on ConvRBM with rectified units provide significant complementary information in terms of temporal modulation features.

9 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895