scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
05 Jun 2000
TL;DR: A Bayesian method is proposed where model combination and model decomposition are employed for the estimation of parameters required to implement subband LP Wiener filters, which provides advantages in terms of improved parameter estimates and also in restoring the temporal-spectral composition of speech.
Abstract: The performance of Wiener filters in restoring the quality and intelligibility of noisy speech depends on: (i) the accuracy of the estimates of the power spectra or the correlation values of the noise and the speech processes, and (ii) on the Wiener filter structure. In this paper a Bayesian method is proposed where model combination and model decomposition are employed for the estimation of parameters required to implement subband LP Wiener filters. The use of subband LP Wiener filters provides advantages in terms of improved parameter estimates and also in restoring the temporal-spectral composition of speech. The method is evaluated, and compared with the parallel model combination, using the TIMIT continuous speech database with BMW and VOLVO car noise databases.

9 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: This work presents some initial studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training, through the use of orthonormal Hermite polynomials.
Abstract: The choice of hidden non-linearity in a feed-forward multi-layer perceptron (MLP) architecture is crucial to obtain good generalization capability and better performance. Nonetheless, little attention has been paid to this aspect in the ASR field. In this work, we present some initial, yet promising, studies toward improving ASR performance by adopting hidden activation functions that can be automatically learned from the data and change shape during training. This adaptive capability is achieved through the use of orthonormal Hermite polynomials. The “adaptive” MLP is used in two neural architectures that generate phone posterior estimates, namely, a standalone configuration and a hierarchical structure. The posteriors are input to a hybrid phone recognition system with good results on the TIMIT corpus. A scheme for optimizing the contributions of high-accuracy neural architectures is also investigated, resulting in a relative improvement of ∼9.0% over a non-optimized combination. Finally, initial experiments on the WSJ Nov92 task show that the proposed technique scales well up to large vocabulary continuous speech recognition (LVCSR) tasks.

9 citations

Book ChapterDOI
22 May 2007
TL;DR: The manifold learning techniques locally linear embedding and Isomap are considered and it is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features.
Abstract: This study aims to investigate approaches for low dimensional speech feature transformation using manifold learning. It has recently been shown that speech sounds may exist on a low dimensional manifold nonlinearly embedded in high dimensional space. A number of manifold learning techniques have been developed in recent years that attempt to discover this type of underlying geometric structure. The manifold learning techniques locally linear embedding and Isomap are considered in this study. The low dimensional representations produced by applying these techniques to MFCC feature vectors are evaluated in several phone classification tasks on the TIMIT corpus. Classification accuracy is analysed and compared to conventional MFCC features and those transformed with PCA, a linear dimensionality reduction method. It is shown that features resulting from manifold learning are capable of yielding higher classification accuracy than these baseline features. The best phone classification accuracy in general is demonstrated by feature transformation with Isomap.

9 citations

Journal ArticleDOI
TL;DR: With a view to using an articulatory representation in automatic recognition of conversational speech, two nonlinear methods for mapping from formants to short-term spectra were investigated: multilayered perceptrons (MLPs), and radial basis function (RBF) networks.
Abstract: With a view to using an articulatory representation in automatic recognition of conversational speech, two nonlinear methods for mapping from formants to short-term spectra were investigated: multilayered perceptrons (MLPs), and radial basis function (RBF) networks. Five schemes for dividing the TIMIT data according to their phone class were tested. The r.m.s. error of the RBF networks was 10%, less than that of the MLP, and the scheme based on discrete articulatory regions gave the greatest improvements over a single network.

9 citations

Posted Content
TL;DR: This work proposes two novel techniques for improving the performance of kernel acoustic models and presents a simple but effective method for feature selection, which can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them.
Abstract: We study large-scale kernel methods for acoustic modeling in speech recognition and compare their performance to deep neural networks (DNNs). We perform experiments on four speech recognition datasets, including the TIMIT and Broadcast News benchmark tasks, and compare these two types of models on frame-level performance metrics (accuracy, cross-entropy), as well as on recognition metrics (word/character error rate). In order to scale kernel methods to these large datasets, we use the random Fourier feature method of Rahimi and Recht (2007). We propose two novel techniques for improving the performance of kernel acoustic models. First, in order to reduce the number of random features required by kernel models, we propose a simple but effective method for feature selection. The method is able to explore a large number of non-linear features while maintaining a compact model more efficiently than existing approaches. Second, we present a number of frame-level metrics which correlate very strongly with recognition performance when computed on the heldout set; we take advantage of these correlations by monitoring these metrics during training in order to decide when to stop learning. This technique can noticeably improve the recognition performance of both DNN and kernel models, while narrowing the gap between them. Additionally, we show that the linear bottleneck method of Sainath et al. (2013) improves the performance of our kernel models significantly, in addition to speeding up training and making the models more compact. Together, these three methods dramatically improve the performance of kernel acoustic models, making their performance comparable to DNNs on the tasks we explored.

9 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895