scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
22 May 2011
TL;DR: This paper investigates the use of arccosine kernels for speech recognition, using these kernels in a hybrid support vector machine/hidden Markov model recognition system.
Abstract: Neural networks are a useful alternative to Gaussian mixture models for acoustic modeling; however, training multilayer networks involves a difficult, nonconvex optimization that requires some “art” to make work well in practice. In this paper we investigate the use of arccosine kernels for speech recognition, using these kernels in a hybrid support vector machine/hidden Markov model recognition system. Arccosine kernels approximate the computation in a certain class of infinite neural networks using a single kernel function, but can be used in learners that require only a convex optimization for training. Phone recognition experiments on the TIMIT corpus show that arccosine kernels can outperform radial basis function kernels.

17 citations

Proceedings ArticleDOI
01 Mar 2017
TL;DR: This work is aimed at studying the influence of various activation functions on speech recognition system, and it is observed that the performance of ReLU-networks is superior compared to the other networks for the smaller sized dataset (i.e., TIMIT dataset).
Abstract: Significant developments in deep learning methods have been achieved with the capability to train more deeper networks. The performance of speech recognition system has been greatly improved by the use of deep learning techniques. Most of the developments in deep learning are associated with the development of new activation functions and the corresponding initializations. The development of Rectified linear units (ReLU) has revolutionized the use of supervised deep learning methods for speech recognition. Recently there has been a great deal of research interest in the development of activation functions Leaky-ReLU (LReLU), Parametric-ReLU (PReLU), Exponential Linear units (ELU) and Parametric-ELU (PELU). This work is aimed at studying the influence of various activation functions on speech recognition system. In this work, a hidden Markov model-Deep neural network (HMM-DNN) based speech recognition is used, where deep neural networks with different activation functions have been employed to obtain the emission probabilities of hidden Markov model. In this work, two datasets i.e., TIMIT and WSJ are employed to study the behavior of various speech recognition systems with different sized datasets. During the study, it is observed that the performance of ReLU-networks is superior compared to the other networks for the smaller sized dataset (i.e., TIMIT dataset). For the datasets of sufficiently larger size (i.e., WSJ) performance of ELU-networks is superior to the other networks.

17 citations

Book ChapterDOI
TL;DR: Two new algorithms are proposed to combine the heuristic weighted distance and the partition normalized distance measure with the group vector quantization to take full advantage of both directions to linearly lift up higher order MFCC feature vector components.
Abstract: Weighted distance measure and discriminative training are two different directions to enhance VQ-based solutions for speaker identification. In the first direction, the partition normalized distance measure successfully used normalized feature components to account for varying importance of the LPC coefficients. In the second direction, the group vector quantization speeded up discriminative training by randomly selecting a group of vectors as a training unit in each learning step. This paper introduces an alternative, called heuristic weighted distance, to linearly lift up higher order MFCC feature vector components. Then two new algorithms are proposed to combine the heuristic weighted distance and the partition normalized distance measure with the group vector quantization to take full advantage of both directions. Testing on the TIMIT and NTIMIT corpora showed that the proposed methods are superior to current VQ-based solutions, and are in a comparable range to the Gaussian Mixture Model using the Wavelet or MFCC features.

17 citations

01 Jan 2004
TL;DR: A new multi-scale voice morphing algorithm that enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content.
Abstract: This paper presents a new multi-scale voice morphing algorithm. This algorithm enables a user to transform one person's speech pattern into another person's pattern with distinct characteristics, giving it a new identity, while preserving the original content. The voice morphing algorithm performs the morphing at different subbands by using the theory of wavelets and models the spectral conversion using the theory of Radial Basis Function Neural Networks. The results obtained on the TIMIT speech database demonstrate effective transformation of the speaker identity.

17 citations

Proceedings ArticleDOI
09 May 1995
TL;DR: The integrated model developed generalizes the conventional, currently widely used delta-parameter technique, which has been confined strictly to the preprocessing domain only, and contains state-dependent weighting functions responsible for transforming static speech features into the dynamic ones in a slowly time-varying manner.
Abstract: In this study we implemented a speech recognizer based on the integrated view, proposed first by Deng (see IEEE Signal Processing Letters, vol.1, no.4, p.66-69, 1994), on the speech preprocessing and speech modeling problems in the recognizer design. The integrated model we developed generalizes the conventional, currently widely used delta-parameter technique, which has been confined strictly to the preprocessing domain only, in two significant ways. First, the new model contains state-dependent weighting functions responsible for transforming static speech features into the dynamic ones in a slowly time-varying manner. Second, novel maximum-likelihood and minimum-classification-error based learning algorithms are developed for the model that allows joint optimization of the state-dependent weighting functions and the remaining conventional HMM parameters. The experimental results obtained from a standard TIMIT phonetic classification task provide preliminary evidence for the effectiveness of our new, general approaches to the use of the dynamic characteristics of speech spectra.

17 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895