scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings Article
01 Jan 2001
TL;DR: The algorithm presented here is applied to plosives detection, but can easily be adapted to any class of phonemes, and uses the loss-based multi-class decisions.
Abstract: This paper presents a novel algorithm for precise spotting of plosives. The algorithm is based on a pattern matching technique implemented with margin classifiers, such as support vector machines (SVM). A special hierarchical treatment to overcome the problem of fricative and false silence detection is presented. It uses the loss-based multi-class decisions. Furthermore, a method for smoothing the overall decisions by sequential linear programming is described. The proposed algorithm was tested on the TIMIT corpus, which produced a very high spotting accuracy. The algorithm presented here is applied to plosives detection, but can easily be adapted to any class of phonemes.

22 citations

Proceedings ArticleDOI
14 Mar 2010
TL;DR: Two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system are presented and it is shown that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality Reduction methods.
Abstract: This paper presents two nonlinear feature dimensionality reduction methods based on neural networks for a HMM-based phone recognition system. The neural networks are trained as feature classifiers to reduce feature dimensionality as well as maximize discrimination among speech features. The outputs of different network layers are used for obtaining transformed features. Moreover, the training of the neural networks uses the category information that corresponds to a state in HMMs so that the trained networks can better accommodate the temporal variability of features and obtain more discriminative features in a low dimensional space. Experimental evaluation using the TIMIT database shows that recognition accuracies with the transformed features are slightly higher than those obtained with original features and considerably higher than obtained with linear dimensionality reduction methods. The highest phone accuracy obtained with 39 phone classes and TIMIT was 74.9% using a large number of training iterations based on the state-specific targets.

22 citations

Book ChapterDOI
19 Apr 2005
TL;DR: The output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.
Abstract: Feature projection by non-linear discriminant analysis (NLDA) can substantially increase classification performance. In automatic speech recognition (ASR) the projection provided by the pre-squashed outputs from a one hidden layer multi-layer perceptron (MLP) trained to recognise speech sub-units (phonemes) has previously been shown to significantly increase ASR performance. An analogous approach cannot be applied directly to speaker recognition because there is no recognised set of "speaker sub-units" to provide a finite set of MLP target classes, and for many applications it is not practical to train an MLP with one output for each target speaker. In this paper we show that the output from the second hidden layer (compression layer) of an MLP with three hidden layers trained to identify a subset of 100 speakers selected at random from a set of 300 training speakers in Timit, can provide a 77% relative error reduction for common Gaussian mixture model (GMM) based speaker identification.

22 citations

Journal ArticleDOI
TL;DR: It is shown from single-talk and double-talk scenarios using speech signals from TIMIT database that the proposed algorithm achieves a better performance, more than 3 dB of attenuation in the misalignment evaluation compared to GSVSS-NLMS, non-parametric VSS- NLMS, and standard NLMS algorithms for a non-stationary input in noisy environments.

22 citations

Proceedings ArticleDOI
15 Mar 2010
TL;DR: The focus of this paper is to develop a knowledge-based robust syllable segmentation algorithm and to establish the importance of accurate segmentation in both the training and testing phases of a speech recognition system.
Abstract: The focus of this paper is two-fold: (a) to develop a knowledge-based robust syllable segmentation algorithm and (b) to establish the importance of accurate segmentation in both the training and testing phases of a speech recognition system. A robust segmentation algorithm for segmenting the speech signal into syllables is first developed. This uses a non-statistical technique that is based on group delay(GD) segmentation and Vowel Onset point(VOP) detection. The transcription corresponding to the utterance is syllabified using rules. This produces an annotation for the train data. The annotated train data is then used to train a syllable-based speech recognition system. The test signal is also segmented using the proposed algorithm. This segmentation information is then incorporated into the linguistic search space to reduce both computational complexity and word error rate(WER). WER's of 4.4% and 21.2% are reported on the TIMIT and NTIMIT databases respectively.

22 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895