scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Book ChapterDOI
TL;DR: The article presents the example of using open source speech processing software to perform speaker verification experiments designed to test various speaker recognition models based on different scenarios.
Abstract: Creating of speaker recognition application requires advanced speech processing techniques realized by specialized speech processing software. It is very possible to improve the speaker recognition research by using speech processing platform based on open source software. The article presents the example of using open source speech processing software to perform speaker verification experiments designed to test various speaker recognition models based on different scenarios. Speaker verification efficiency was evaluated for each scenario using TIMIT speech corpus distributed by Linguistic Data Consortium. The experiment results allowed to compare and select the best scenario to build speaker model for speaker verification application.

12 citations

Posted Content
TL;DR: This work proposes to investigate modern quaternion-valued models such as convolutional and recurrentQuaternion neural networks in the context of speech recognition with the TIMIT dataset and shows that QNNs always outperform real-valued equivalent models with way less free parameters, leading to a more efficient, compact, and expressive representation of the relevant information.
Abstract: Neural network architectures are at the core of powerful automatic speech recognition systems (ASR). However, while recent researches focus on novel model architectures, the acoustic input features remain almost unchanged. Traditional ASR systems rely on multidimensional acoustic features such as the Mel filter bank energies alongside with the first, and second order derivatives to characterize time-frames that compose the signal sequence. Considering that these components describe three different views of the same element, neural networks have to learn both the internal relations that exist within these features, and external or global dependencies that exist between the time-frames. Quaternion-valued neural networks (QNN), recently received an important interest from researchers to process and learn such relations in multidimensional spaces. Indeed, quaternion numbers and QNNs have shown their efficiency to process multidimensional inputs as entities, to encode internal dependencies, and to solve many tasks with up to four times less learning parameters than real-valued models. We propose to investigate modern quaternion-valued models such as convolutional and recurrent quaternion neural networks in the context of speech recognition with the TIMIT dataset. The experiments show that QNNs always outperform real-valued equivalent models with way less free parameters, leading to a more efficient, compact, and expressive representation of the relevant information.

12 citations

Journal ArticleDOI
TL;DR: A new model based on some proposed components called matched filters (MFs), where instead of using a fixed filter bank for the entire speech signal, the proposed TOC is generated by adopting a pair of vowel and consonant MFs for each voiced speech frame.

12 citations

Journal ArticleDOI
TL;DR: Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients.
Abstract: Spectro-temporal modulations of speech encode speech structures and speaker characteristics. An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations using the TIMIT and GRID corpora. Simulation results show the proposed method produces much higher speaker identification rates in all signal-to-noise ratio (SNR) conditions than the baseline system using mel-frequency cepstral coefficients. In addition, the proposed method also outperforms the system, which uses auditory-based nonnegative tensor cepstral coefficients [Q. Wu and L. Zhang, “Auditory sparse representation for robust speaker recognition based on tensor structure,” EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008)], in low SNR (≤ 10 dB) conditions.

12 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: This paper combines three simple refinements proposed recently to improve HMM/ANN hybrid models to apply a hierarchy of two nets, where the second net models the contextual relations of the state posteriors produced by the first network.
Abstract: In this paper we combine three simple refinements proposed recently to improve HMM/ANN hybrid models The first refinement is to apply a hierarchy of two nets, where the second net models the contextual relations of the state posteriors produced by the first network The second idea is to train the network on context-dependent units (HMM states) instead of context-independent phones or phone states As the latter refinement results in a lot of output neurons, combining the two methods directly would be problematic Hence the third trick is to shrink the output layer of the first net using the bottleneck technique before applying the second net on top of it The phone recognition results obtained on the TIMIT database demonstrate that both the context-dependent and the 2-stage modeling methods can bring about marked improvements Using them in combination, however, results in a further significant gain in accuracy With the bottleneck technique a further improvement can be obtained, especially when the number of context-dependent units is large

12 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895