Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
01 Feb 2014TL;DR: A method that factorizes the spectral magnitude matrix obtained from the group delay function (GDF) is described, showing reasonable improvements over other conventional methods on subjective evaluation, objective evaluation and multi speaker speech recognition on the TIMIT and the GRID corpus.
Abstract: Non-negative matrix factorization (NMF) methods have been widely used in single channel speaker separation. NMF methods use the magnitude of the Fourier transform for training the basis vectors. In this paper, a method that factorizes the spectral magnitude matrix obtained from the group delay function (GDF) is described. During training, pre-learning is applied on a training set of original sources. The bases are trained iteratively to minimize the approximation error. Separation of the mixed speech signal involves the factorization of the non negative group delay spectral matrix along with the use of fixed stacked bases computed during training. This matrix is then decomposed into a linear combination of trained bases for each individual speaker contained in the mixed speech signal. The estimated spectral magnitude for each speaker signal is modulated by the phase of mixed signal to reconstruct signal for each speaker signal. The separated speaker signals are further refined using a min-max masking method. Experiments on subjective evaluation, objective evaluation and multi speaker speech recognition on the TIMIT and the GRID corpus indicate reasonable improvements over other conventional methods.
6 citations
••
TL;DR: This work proposed an effective combination of features, PFG-Pitch Feature for Gender, for gender identification with machine learning algorithms, for speech processing with classical learning methods.
6 citations
•
01 Jan 1999
TL;DR: A recurrent neural network is trained to estimate ‘velum height’ during continuous speech by analyzing the network’s output for each phonetic segment contained in 50 hand-labelled utterances set aside for testing purposes.
Abstract: This paper reports on present work, in which a recurrent neural network is trained to estimate ‘velum height’ during continuous speech. Parallel acoustic-articulatory data comprising more than 400 read TIMIT sentences is obtained using electromagnetic articulography (EMA). This data is processed and used as training data for a range of neural network sizes. The network demonstrating the highest accuracy is identified. This performance is then evaluated in detail by analysing the network’s output for each phonetic segment contained in 50 hand-labelled utterances set aside for testing purposes.
6 citations
••
01 Sep 2016TL;DR: A novel DNN architecture for monaural speech enhancement is presented, taking into account the masking properties of the human auditory system, which is used to reduce the noise and make the residual noise perceptually inaudible.
Abstract: Monaural speech enhancement is a key yet challenging problem for many important real world applications. Recently, deep neural networks(DNNs)-based speech enhancement methods, which extract useful feature from complex feature, have demonstrated remarkable performance improvement. In this paper, we present a novel DNN architecture for monaural speech enhancement. Taking into account the masking properties of the human auditory system, a piecewise gain function is applied in the proposed DNN architecture, which is used to reduce the noise and make the residual noise perceptually inaudible. The proposed architecture jointly optimize the piecewise gain function and DNN. Systematic experiments on TIMIT corpus with 20 noise types at various signal-to-noise ratio (SNR) conditions demonstrate the superiority of the proposed DNN over the reference speech enhancement methods, no matter in the matched noise conditions or in the unmatched noise conditions.
6 citations
•
01 Jan 2001TL;DR: It is shown that some non-parametric classifiers have considerable advantages over traditional hidden Markov models and support vector machines were found the most suitable and the easiest to tune.
Abstract: This paper addresses the problem of classification of speech transition sounds. A number of non parametric classifiers are compared, and it is shown that some non-parametric classifiers have considerable advantages over traditional hidden Markov models. Among the non-parametric classifiers, support vector machines were found the most suitable and the easiest to tune. Some of the reasons for the superiority of non-parametric classifiers will be discussed. The algorithm was tested on the voiced stop consonant phones extracted from the TIMIT corpus and resulted in very low error rates.
6 citations