scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
17 May 2004
TL;DR: A combined fixed/adaptive beamforming algorithm (CFA-BF) for speech enhancement with two single channel methods based on speech spectral constrained iterative processing (Auto-LSP), and an auditory masked threshold based method using equivalent rectangular bandwidth filtering (GMMSE-AMTERB).
Abstract: While a number of studies have investigated various speech enhancement and noise suppression schemes, most consider either a single channel or array processing framework. Clearly there are potential advantages in leveraging the strengths of array processing solutions in suppressing noise from a direction other than the speaker, with that seen in single channel methods that include speech spectral constraints or psychoacoustically motivated processing. In this paper, we propose to integrate a combined fixed/adaptive beamforming algorithm (CFA-BF) for speech enhancement with two single channel methods based on speech spectral constrained iterative processing (Auto-LSP), and an auditory masked threshold based method using equivalent rectangular bandwidth filtering (GMMSE-AMTERB). After formulating the method, we evaluate performance on a subset of the TIMIT corpus with four real noise sources. We demonstrate a consistent level of noise suppression and voice communication quality improvement using the proposed method as reflected by an overall average 26dB increase in SegSNR from the original degraded audio corpus.

4 citations

Book ChapterDOI
TL;DR: In this article, a novel modification of the ladder network was proposed for semi-supervised learning of recurrent neural networks, which was evaluated with a phoneme recognition task on the TIMIT corpus.
Abstract: Ladder networks are a notable new concept in the field of semi-supervised learning by showing state-of-the-art results in image recognition tasks while being compatible with many existing neural architectures. We present the recurrent ladder network, a novel modification of the ladder network, for semi-supervised learning of recurrent neural networks which we evaluate with a phoneme recognition task on the TIMIT corpus. Our results show that the model is able to consistently outperform the baseline and achieve fully-supervised baseline performance with only 75% of all labels which demonstrates that the model is capable of using unsupervised data as an effective regulariser.

4 citations

Proceedings ArticleDOI
01 May 2005
TL;DR: A method to detect syllables from a continuous stream of speech using PARCOR parameter associated with LPC that represents a vocal tract model based on a lattice filter structure is reported.
Abstract: Linear predictive coding (LPC) has been used to compress and encode speech signals for digital transmission at a low bit rate. PARCOR parameter associated with LPC that represents a vocal tract model based on a lattice filter structure is considered for speech recognition. The use of FIR coefficients and the frequency response of AR model were previously investigated. This paper reports a method to detect syllables from a continuous stream of speech. The system being developed slides a time window of 20 ms and calculates the PARCOR parameters continuously, feeding them to a syllable classifier. The syllable classifier is a supervised classifier that requires training. The training uses TIMIT speech database, which contains the recordings of 630 speakers of 8 major dialects of American English. The voiced/unvoiced switch built into the LPC vocoder was modified to segment words included in the speech records. Preliminary results of classification are presented in the paper

4 citations

Proceedings ArticleDOI
Xin Zheng1, Zhiyong Wu1, Binbin Shen1, Helen Meng1, Lianhong Cai1 
26 May 2013
TL;DR: T tandem DBN approach - a hierarchical architecture that consists of two or more deep belief networks (DBNs) in tandem manner - for phoneme recognition task on TIMIT is proposed and the full potential of this method is discovered.
Abstract: This paper proposes using tandem DBN approach - a hierarchical architecture that consists of two or more deep belief networks (DBNs) in tandem manner - for phoneme recognition task on TIMIT. First we describe the standard DBN approach applied in phoneme recognition and discuss the motivation of combining it with tandem classifier approach. We then perform series of experiments to find out the best configuration for the DBN in the second level and discover the full potential of this method. The experiments show that for the DBN in the second level, (a) 2048 units in each hidden layer is better than 1024 and 512 units, (b) for sufficient length of temporal context, two hidden layers are better, (c) the one gives best performance on development set shows 4% relative improvement on coretest set.

4 citations

Proceedings ArticleDOI
01 Dec 2011
TL;DR: A recurrent neural network based articulatory-phonetic inversion model is described for improved speech recognition and a specialized optimization algorithm is introduced to enable human-like heuristic learning in an efficient data-driven manner to capture the dynamic nature of English speech pronunciations.
Abstract: This paper describes a recurrent neural network (RNN) based articulatory-phonetic inversion (API) model for improved speech recognition. And a specialized optimization algorithm is introduced to enable human-like heuristic learning in an efficient data-driven manner to capture the dynamic nature of English speech pronunciations. The API model demonstrates superior pronunciation modeling ability and robustness against noise contaminations in large-vocabulary speech recognition experiments. Using a simple rescoring formula, it improves the hidden Markov model (HMM) baseline speech recognizer with consistent error rates reduction of 5.30% and 10.14% for phoneme recognition tasks on clean and noisy speech respectively on the selected TIMIT datasets. And an error rate reduction of 3.35% is obtained for the SCRIBE-TIMIT word recognition tasks. The proposed system qualifies as a competitive candidate for profound pronunciation modeling with intrinsic salient features such as generality and portability.

4 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895