scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: Improvement in the accuracy and robustness of phoneme recognition is investigated by refining posterior features extracted from single stream cepstral features using Multi Layer Perceptron in a cascaded structure.

2 citations

Proceedings ArticleDOI
01 Nov 2013
TL;DR: The experimental results of proposed method on speech dereverberation and distant speech recognition indicate reasonable improvement over conventional methods.
Abstract: In this work, a method for multi channel speech enhancement using linear prediction (LP) residual cepstrum is proposed. The method performs deconvolution at each microphone output using cepstral domain. The deconvolution of acoustic impulse response from reverberated signal in each individual channel removes early reverberation. This dereverberated output from each channel is then spatially filtered using delay and sum beamformer (DSB). The late reverberation components are then removed by temporal averaging of the glottal closure instants (GCI) computed using the dynamic programming projected phase-slope algorithm (DYPSA). The GCI obtained herein correspond to the LP residual peaks. These residual peaks are excluded from the averaging process, since they have significant impact on speech quality and should remain unmodified. The experiments on subjective and objective evaluation are conducted on TIMIT and MONC databases for proposed method and compared with other methods. The experimental results of proposed method on speech dereverberation and distant speech recognition indicate reasonable improvement over conventional methods.

2 citations

Posted Content
TL;DR: A two-stage approach for accurate detection of vowel onset points (VOPs) is proposed using continuous wavelet transform coefficients and the position of the detected VOPs are corrected using phone boundaries in the second stage.
Abstract: In this paper, we propose a novel approach for accurate detection of the vowel onset points (VOPs). VOP is the instant at which the vowel begins in the speech signal. Precise identification of VOPs is important for various speech applications such as speech segmentation and speech rate modification. The existing methods detect the majority of VOPs within 40 ms deviation, and it may not be appropriate for the above speech applications. To address this issue, we proposed a two-stage approach for accurate detection of VOPs. At the first stage, VOPs are detected using continuous wavelet transform coefficients, and the position of the detected VOPs are corrected using the phone boundaries in the second stage. The phone boundaries are detected by the spectral transition measure method. Experiments are done using TIMIT and Bengali speech corpora. Performance of the proposed approach is compared with two standard signal processing based methods. The evaluation results show that the proposed method performs better than the existing methods.

2 citations

Book ChapterDOI
01 Jan 2014
TL;DR: The proposed paradigm builds a binary tree for multiclass SVM, using the technical of portioning by criteria of natural classification: Separation and Homogeneity, with the aim of obtaining optimal tree, which is more accurate in the construction of the tree.
Abstract: In this paper we propose and examine the performance of a framework for solving multiclass problems with Support Vector Machine (SVM). Our methods based on the principle binary tree, leading to much faster convergence and compare it with very popular methods proposals in the literature, both in terms of computational needs for the feedforward phase and of classification accuracy. The proposed paradigm builds a binary tree for multiclass SVM, using the technical of portioning by criteria of natural classification: Separation and Homogeneity, with the aim of obtaining optimal tree. The main result, however, is the mapping of the multiclass problem to a several bi-classes sub-problem, in order to easing the resolution of the real and complex problems. Our approach is more accurate in the construction of the tree. Further, in the test phase OVA Tree Multiclass, due to its Log complexity, it is much faster than other methods in problems that have big class number. In this context, two corpus are used to evaluate our framework; TIMIT datasets for vowels classification and MNIST for recognition of handwritten digits. A recognition rate of 57 %, on the 20 vowels of TIMIT corpus and 97.73 % on MNIST datasets for 10 digits, was achieved. These results are comparable with the state of the arts. In addition, training time and number of support vectors, which determine the duration of the tests, are also reduced compared to other methods.

2 citations

Proceedings ArticleDOI
01 Dec 2017
TL;DR: A Joint Enhancement-Decoding (JED) algorithm is proposed to overcome this issue by jointly optimizing for labels of all the frames and the decoding path and gives the maximum likelihood path of state sequences as well as the best choice of the enhanced observation sequence as its output.
Abstract: We consider a dictionary based speech enhancement in the context of automatic recognition of noisy speech. Speech in each analysis frame is denoised as a front-end processing using a class-specific (e.g. phoneme) dictionary selected based on the estimated class label. However, when the estimated label is erroneous, a wrong class model is chosen for many frames. We propose a Joint Enhancement-Decoding (JED) algorithm to overcome this issue by jointly optimizing for labels of all the frames and the decoding path. The algorithm optimizes over multiple enhanced versions of each frame using different phoneme specific dictionaries and gives the maximum likelihood path of state sequences as well as the best (in the maximum likelihood sense) choice of the enhanced observation sequence as its output. The number of phoneme-specific dictionaries used for enhancement in an analysis frame is varied from 1 to 5 based on the phoneme confusion matrix and the recognition results are reported for each case. Experiments with TIMIT corpus and five different noises at 0, 5 and 10 dB SNRs show that the recognition performance varies with the number of dictionaries, and in most of the cases, is the best when two or three dictionaries are employed.

2 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895