scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
06 Sep 2015
TL;DR: It is demonstrated that the bottleneck features preserve well the trajectory continuity over time and can provide a suitable representation for the continuous-state hidden Markov model (CS-HMM), which considers speech as a sequence of dwell and transition regions.
Abstract: This paper presents an analysis of a low-dimensional representation of speech for modelling speech dynamics, extracted using bottleneck neural networks. The input to the neural network is a set of spectral feature vectors. We explore the effect of various designs and training of the network, such as varying the size of context in the input layer, size of the bottleneck and other hidden layers, and using input reconstruction or phone posteriors as targets. Experiments are performed on TIMIT. The bottleneck features are employed in a conventional HMMbased phoneme recognition system, with recognition accuracy of 70.6% on the core test achieved using only 9-dimensional features. We also analyse how the bottleneck features fit the assumptions of dynamic models of speech. Specifically, we employ the continuous-state hidden Markov model (CS-HMM), which considers speech as a sequence of dwell and transition regions. We demonstrate that the bottleneck features preserve well the trajectory continuity over time and can provide a suitable representation for CS-HMM.

16 citations

Journal ArticleDOI
TL;DR: An image analysis-based algorithm is proposed to enhance the binary T–F mask obtained in the initial segmentation stage of CASA-based monaural speech separation systems to improve the speech quality and reduce the noise residue.
Abstract: Monaural speech separation is the process of separating the target speech from the noisy speech mixture recorded using single microphone. It is a challenging problem in speech signal processing, and recently, computational auditory scene analysis (CASA) finds a reasonable solution to solve this problem. This research work proposes an image analysis-based algorithm to enhance the binary T–F mask obtained in the initial segmentation stage of CASA-based monaural speech separation systems to improve the speech quality. The proposed algorithm consists of labeling the initial segmentation mask, boundary extraction, active pixel detection and finally eliminating the noisy non-active pixels. In labeling, the T–F mask obtained from the initial segmentation is labeled as periodicity pixel matrix and non-periodicity pixel matrix. Next boundaries are created by connecting all the possible nearby periodicity pixel matrix and non-periodicity pixel matrix as speech boundary. Some speech boundary may include noisy T–F units as holes, and these holes are treated using the proposed algorithm to properly classify them as the speech-dominant or noise-dominant T–F units in the active pixel detection process. Finally, the noisy T–F units are eliminated. The performance of the proposed algorithm is evaluated using TIMIT speech database. The experimental results show that the proposed algorithm improves the quality of the separated speech by increasing the signal-to-noise ratio by an average value of 9.64 dB and reduces the noise residue by 25.55% as compared to the noisy speech mixture.

16 citations

Proceedings ArticleDOI
26 Jun 1995
TL;DR: The algorithm is based on using a speech recognition system to discover the surface pronunciations of words in speech corpora and shows the probabilities the system has learned for ten common phonological rules which model reductions and coarticulation effects.
Abstract: This paper presents an algorithm for learning the probabilities of optional phonological rules from corpora. The algorithm is based on using a speech recognition system to discover the surface pronunciations of words in speech corpora; using an automatic system obviates expensive phonetic labeling by hand. We describe the details of our algorithm and show the probabilities the system has learned for ten common phonological rules which model reductions and coarticulation effects. These probabilities were derived from a corpus of 7203 sentences of read speech from the Wall Street Journal, and are shown to be a reasonably close match to probabilities from phonetically hand-transcribed data (TIMIT). Finally, we analyze the probability differences between rule use in male versus female speech, and suggest that the differences are caused by differing average rates of speech.

16 citations

Proceedings ArticleDOI
01 Jan 2002
TL;DR: This paper attempts to overcome the above difficulty by using the alternative Lagrangian formulation which only requires the inversion of a matrix whose dimension is proportional to the size of the MFCC sequence of vectors.
Abstract: We study the performance of binary and multi-category SVMs for phoneme classification. The training process of the standard formulation involves the solution of a quadratic programming problem whose complexity depends on the size of the training set. The large size of speech corpora such as TIMIT limits seriously their practical use in continuous speech recognition tasks, using off the shelf personal computers in a reasonable time. In this paper, we attempt to overcome the above difficulty by using the alternative Lagrangian formulation which only requires the inversion of a matrix whose dimension is proportional to the size of the MFCC sequence of vectors. We provide computational results of all possible binary classifiers (1830) on the TIMIT database which are shown to be competitive in terms of recognition rates (96.8%) with those found in the literature (95.6%). The binary classifiers are introduced in the DAGSVM and voting algorithms to perform multi-category classification on some hand picked subsets from TIMIT corpus.

16 citations

Journal ArticleDOI
TL;DR: A two-stage speech activity detection system is presented which at first takes advantage of a voice activity detector to discard pause segments out of the audio signals; this is done even in presence of stationary background noises.

16 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895