scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: A theoretical framework and an experimental evaluation are presented showing that reducing the dimension of features by applying the discrete Karhunen–Loève transform (DKLT) to the log-spectrum of the speech signal guarantees better performance compared to conventional MFCC features.
Abstract: Speaker identification plays a crucial role in biometric person identification as systems based on human speech are increasingly used for the recognition of people. Mel frequency cepstral coefficients (MFCCs) have been widely adopted for decades in speech processing to capture the speech-specific characteristics with a reduced dimensionality. However, although their ability to decorrelate the vocal source and the vocal tract filter make them suitable for speech recognition, they greatly mitigate the speaker variability, a specific characteristic that distinguishes different speakers. This paper presents a theoretical framework and an experimental evaluation showing that reducing the dimension of features by applying the discrete Karhunen–Loeve transform (DKLT) to the log-spectrum of the speech signal guarantees better performance compared to conventional MFCC features. In particular with short sequences of speech frames, with typical duration of less than 2 s, the performance of truncated DKLT representation achieved for the identification of five speakers are always better than those achieved with the MFCCs for the experiments we performed. Additionally, the framework was tested on up to 100 TIMIT speakers with sequences of less than 3.5 s showing very good recognition capabilities.

18 citations

Journal ArticleDOI
31 Mar 2012
TL;DR: In this paper, the performance of some supervised learning algorithms for vowel recognition was compared with two combined classifiers: SVM, KNN, Naïve Bayes, Quadratic Bayes Normal (QDC), and Nearst Mean.
Abstract: In this article, we conduct a study on the performance of some supervised learning algorithms for vowel recognition. This study aims to compare the accuracy of each algorithm. Thus, we present an empirical comparison between five supervised learning classifiers and two combined classifiers: SVM, KNN, Naive Bayes, Quadratic Bayes Normal (QDC) and Nearst Mean. Those algorithms were tested for vowel recognition using TIMIT Corpus and Mel-frequency cepstral coefficients (MFCCs).

18 citations

Proceedings ArticleDOI
01 Dec 2017
TL;DR: Experimental results show that the method of this paper can detect end-points of voice signal more accurately and outperforms the conventional VAD algorithms.
Abstract: In this paper, an efficient classification of voice segment from the silence segment, unvoiced segment algorithm, which is both more accurate and laid-back to implement is proposed by comparing to some previous algorithms. The proposed algorithm uses spectral entropy and short time features such as zero crossing rate, short time energy, linear prediction error are used for voice activity detection (VAD). A compound parameter, D, is calculated by using all these four parameters. Dmax is calculated from all the frames of the signal. Then the value of D/Dmax is used to determine whether the frames are classified as speech and non-speech and silence frames. The threshold values have to be obtained empirically. Experimental results show that the method of this paper can detect end-points of voice signal more accurately and outperforms the conventional VAD algorithms. The method we used in this work was evaluated on TIMIT Acoustic-Phonetic Continuous Speech Corpus. This corpus is mostly used for speech recognition application and contains clean speech data and is compared with some of the most recent proposed algorithms.

18 citations

Proceedings ArticleDOI
22 May 2011
TL;DR: A novel framework to integrate articulatory features (AFs) into HMM- based ASR system by using posterior probabilities of different AFs directly as observation features in Kullback-Leibler divergence based HMM (KL-HMM) system yields a best performance on the TIMIT phoneme recognition task.
Abstract: In this paper, we propose a novel framework to integrate articulatory features (AFs) into HMM- based ASR system. This is achieved by using posterior probabilities of different AFs (estimated by multilayer perceptrons) directly as observation features in Kullback-Leibler divergence based HMM (KL-HMM) system. On the TIMIT phoneme recognition task, the proposed framework yields a phoneme recognition accuracy of 72.4% which is comparable to KL-HMM system using posterior probabilities of phonemes as features (72.7%). Furthermore, a best performance of 73.5% phoneme recognition accuracy is achieved by jointly modeling AF probabilities and phoneme probabilities as features. This shows the efficacy and flexibility of the proposed approach.

18 citations

Journal ArticleDOI
TL;DR: Based on the findings of this study, speech processing tasks may be treated as object detection tasks and the effectiveness of using object detection techniques in phoneme recognition tasks is shown.
Abstract: The use of cutting edge object detection techniques to build an accurate phoneme sequence recognition system for English and Arabic languages is investigated in this study. Recently, numerous techniques have been proposed for object detection in daily life applications using deep learning. In this paper, we propose the use of object detection techniques in speech processing tasks. We selected two state-of-the-art object detectors, namely YOLO and CenterNet, based on a trade-off between detection accuracy and speed. We tackled the problem of phoneme sequence recognition using three systems: the domain transfer learning system (DTS) from image to speech, intra-language transfer leaning system (IaTS) between speech corpora within the same language (English to English), and inter-language transfer learning system (IeTS) between speech corpora from dissimilar languages (English to Arabic). For English phoneme recognition, the Texas Instruments/Massachusetts Institute of Technology (TIMIT) corpus is used to evaluate the performance of the proposed systems. Our IaTS based on the CenterNet detector achieves the best results using the test core set of TIMIT with 15.89% phone error rate (PER). For Arabic phoneme recognition, the best performance, with 7.58% PER, was achieved using the CenterNet. These results show the effectiveness of using object detection techniques in phoneme recognition tasks. Furthermore, based on the findings of this study, speech processing tasks may be treated as object detection tasks.

18 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895