scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames, is introduced, where a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings.
Abstract: Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.

5 citations

Proceedings ArticleDOI
28 Mar 1993
TL;DR: The TIMIT and KING databases are used to compare proven spectral processing techinques to an auditory neural representation for speaker identification and the resulting vector-quantized distortion-based classification indicates the auditory model performs statistically equal to the LPC cepstral representation in clean environments and outperforms the L PCs in noisy environments and in test data recorded over multiple sessions.
Abstract: The TIMIT and KING databases are used to compare proven spectral processing techinques to an auditory neural representation for speaker identification. The feature sets compared are linear prediction coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model (1988). Two clustering algorithms, one statistically based and the other a neural approach, are used to generate speaker-specific codebook vectors. These algorithms are the Linde-Buzo-Gray algorithm and a Kohonen self-organizing feature map. The resulting vector-quantized distortion-based classification indicates the auditory model performs statistically equal to the LPC cepstral representation in clean environments and outperforms the LPC cepstral in noisy environments and in test data recorded over multiple sessions (greater intra-speaker distortions). >

5 citations

Journal ArticleDOI
TL;DR: This study presents new contribution towards the Adaptive Temporal Radial Basis Function (ATRBF) applied to Continuous speech recognition, in particular the recognition of phonemes like Timit Corpus.
Abstract: This study presents new contribution towards the Adaptive Temporal Radial Basis Function (ATRBF) applied to Continuous speech recognition, in particular the recognition of phonemes like Timit Corpus. ATRBF combines features from Time Delay Neural Network (TDNN) and the advantages of Radial Basis Function (RBF). The capacity to detect the acoustic features and their independent temporal report of the temporal localisation is inspired from the TDNN model. The main use of RBF is both their speed of treatment and few parameters to adjust for the training phase, which encourages to apply this model to new tasks in most delicate cases.

5 citations

Proceedings ArticleDOI
23 Nov 2014
TL;DR: Experimental results on the TIMIT corpus, with mismatched environment and low environmental signal to noise ratios (SNR) levels, show that the proposed Multitaper Gamma tone Cepstral Coefficient (MGCC) features outperform largely the conventional Mel Frequency CepStral Coefficients (MFCC).
Abstract: In this paper we present a novel feature extraction algorithm based on Multitaper windows and Gamma tone filters for robust speaker verification systems in mismatched noisy conditions encountered in forensic area. The idea is to couple the advantage of the low-variance multitaper short term spectral estimators with the acoustic robustness of the auditory Gamma tone filter banks. Experimental results on the TIMIT corpus, with mismatched environment and low environmental signal to noise ratios (SNR) levels, show that the proposed Multitaper Gamma tone Cepstral Coefficient (MGCC) features outperform largely the conventional Mel Frequency Cepstral Coefficients (MFCC) features. Furthermore, and interestingly the proposed features outperforms at almost all the operating signal to noise ratios the recently proposed auditory hearing inspired Gamma tone Frequency Cepstral Coefficient (GFCC) feature for white, babble and factory noises using both the GMM-UBM de facto standard and the state-of-the art I-vector speaker verification systems.

5 citations

Posted Content
24 Nov 2013
TL;DR: In this article, a primal-dual training method was proposed to formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics.
Abstract: We present an architecture of a recurrent neural network (RNN) with a fully-connected deep neural network (DNN) as its feature extractor. The RNN is equipped with both causal temporal prediction and non-causal look-ahead, via auto-regression (AR) and moving-average (MA), respectively. The focus of this paper is a primal-dual training method that formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics. Experimental results demonstrate the effectiveness of this new method, which achieves 18.86% phone recognition error on the TIMIT benchmark for the core test set. The result approaches the best result of 17.7%, which was obtained by using RNN with long short-term memory (LSTM). The results also show that the proposed primal-dual training method produces lower recognition errors than the popular RNN methods developed earlier based on the carefully tuned threshold parameter that heuristically prevents the gradient from exploding.

5 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895