Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data

[...]

Muhammad F. Mridha, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md. Abdul Hamid, Md. Rashedul Islam, Yutaka Watanobe - Show less +2 more

07 Feb 2021-Applied Sciences

TL;DR: In this paper, a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames, is introduced, where a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings.

...read moreread less

Abstract: Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.

...read moreread less

5 citations

Proceedings Article•DOI•

Auditory model representation and comparison for speaker recognition

[...]

John M. Colombi, T.R. Anderson, Steven K. Rogers, Dennis W. Ruck, G.T. Warhola - Show less +1 more

28 Mar 1993

TL;DR: The TIMIT and KING databases are used to compare proven spectral processing techinques to an auditory neural representation for speaker identification and the resulting vector-quantized distortion-based classification indicates the auditory model performs statistically equal to the LPC cepstral representation in clean environments and outperforms the L PCs in noisy environments and in test data recorded over multiple sessions.

...read moreread less

Abstract: The TIMIT and KING databases are used to compare proven spectral processing techinques to an auditory neural representation for speaker identification. The feature sets compared are linear prediction coding (LPC) cepstral coefficients and auditory nerve firing rates using the Payton model (1988). Two clustering algorithms, one statistically based and the other a neural approach, are used to generate speaker-specific codebook vectors. These algorithms are the Linde-Buzo-Gray algorithm and a Kohonen self-organizing feature map. The resulting vector-quantized distortion-based classification indicates the auditory model performs statistically equal to the LPC cepstral representation in clean environments and outperforms the LPC cepstral in noisy environments and in test data recorded over multiple sessions (greater intra-speaker distortions). >

...read moreread less

5 citations

Journal Article•DOI•

A New Look to Adaptive Temporal Radial Basis Function Applied in Speech Recognition

[...]

Mesbahi Larbi, Benyettou Abdelkader

31 Dec 2005-Journal of Computer Science

TL;DR: This study presents new contribution towards the Adaptive Temporal Radial Basis Function (ATRBF) applied to Continuous speech recognition, in particular the recognition of phonemes like Timit Corpus.

...read moreread less

Abstract: This study presents new contribution towards the Adaptive Temporal Radial Basis Function (ATRBF) applied to Continuous speech recognition, in particular the recognition of phonemes like Timit Corpus. ATRBF combines features from Time Delay Neural Network (TDNN) and the advantages of Radial Basis Function (RBF). The capacity to detect the acoustic features and their independent temporal report of the temporal localisation is inspired from the TDNN model. The main use of RBF is both their speed of treatment and few parameters to adjust for the training phase, which encourages to apply this model to new tasks in most delicate cases.

...read moreread less

5 citations

Proceedings Article•DOI•

Robust Speaker Verification Using a New Front End Based on Multitaper and Gammatone Filters

[...]

Fedila Meriem, Harizi Farid, Bengherabi Messaoud, Amrouche Abderrahmene

23 Nov 2014

TL;DR: Experimental results on the TIMIT corpus, with mismatched environment and low environmental signal to noise ratios (SNR) levels, show that the proposed Multitaper Gamma tone Cepstral Coefficient (MGCC) features outperform largely the conventional Mel Frequency CepStral Coefficients (MFCC).

...read moreread less

Abstract: In this paper we present a novel feature extraction algorithm based on Multitaper windows and Gamma tone filters for robust speaker verification systems in mismatched noisy conditions encountered in forensic area. The idea is to couple the advantage of the low-variance multitaper short term spectral estimators with the acoustic robustness of the auditory Gamma tone filter banks. Experimental results on the TIMIT corpus, with mismatched environment and low environmental signal to noise ratios (SNR) levels, show that the proposed Multitaper Gamma tone Cepstral Coefficient (MGCC) features outperform largely the conventional Mel Frequency Cepstral Coefficients (MFCC) features. Furthermore, and interestingly the proposed features outperforms at almost all the operating signal to noise ratios the recently proposed auditory hearing inspired Gamma tone Frequency Cepstral Coefficient (GFCC) feature for white, babble and factory noises using both the GMM-UBM de facto standard and the state-of-the art I-vector speaker verification systems.

...read moreread less

5 citations

Posted Content•

A New Method for Learning Deep Recurrent Neural Networks

[...]

Jianshu Chen, Li Deng

24 Nov 2013

TL;DR: In this article, a primal-dual training method was proposed to formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics.

...read moreread less

Abstract: We present an architecture of a recurrent neural network (RNN) with a fully-connected deep neural network (DNN) as its feature extractor. The RNN is equipped with both causal temporal prediction and non-causal look-ahead, via auto-regression (AR) and moving-average (MA), respectively. The focus of this paper is a primal-dual training method that formulates the learning of the RNN as a formal optimization problem with an inequality constraint that provides a sufficient condition for the stability of the network dynamics. Experimental results demonstrate the effectiveness of this new method, which achieves 18.86% phone recognition error on the TIMIT benchmark for the core test set. The result approaches the best result of 17.7%, which was obtained by using RNN with long short-term memory (LSTM). The results also show that the proposed primal-dual training method produces lower recognition errors than the popular RNN methods developed earlier based on the carefully tuned threshold parameter that heuristically prevents the gradient from exploding.

...read moreread less

5 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics