scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Posted Content
TL;DR: Prob-PIT is proposed which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution which defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains the speech separation networks by maximizing the log- likelihood function.
Abstract: Single-microphone, speaker-independent speech separation is normally performed through two steps: (i) separating the specific speech sources, and (ii) determining the best output-label assignment to find the separation error. The second step is the main obstacle in training neural networks for speech separation. Recently proposed Permutation Invariant Training (PIT) addresses this problem by determining the output-label assignment which minimizes the separation error. In this study, we show that a major drawback of this technique is the overconfident choice of the output-label assignment, especially in the initial steps of training when the network generates unreliable outputs. To solve this problem, we propose Probabilistic PIT (Prob-PIT) which considers the output-label permutation as a discrete latent random variable with a uniform prior distribution. Prob-PIT defines a log-likelihood function based on the prior distributions and the separation errors of all permutations; it trains the speech separation networks by maximizing the log-likelihood function. Prob-PIT can be easily implemented by replacing the minimum function of PIT with a soft-minimum function. We evaluate our approach for speech separation on both TIMIT and CHiME datasets. The results show that the proposed method significantly outperforms PIT in terms of Signal to Distortion Ratio and Signal to Interference Ratio.

7 citations

Journal ArticleDOI
TL;DR: Three new algorithms based on solutions for the MAXimum Feasible Subsystem problem (MAX FS) that improve upon the state of the art in recovery of compressed speech signals: more highly compressed signals can be successfully recovered with greater quality.
Abstract: The goal in signal compression is to reduce the size of the input signal without a significant loss in the quality of the recovered signal. One way to achieve this goal is to apply the principles of compressive sensing, but this has not been particularly successful for real-world signals that are insufficiently sparse, such as speech. We present three new algorithms based on solutions for the maximum feasible subsystem problem (MAX FS) that improve on the state of the art in recovery of compressed speech signals: more highly compressed signals can be successfully recovered with greater quality. The new recovery algorithms deliver sparser solutions when compared with those obtained using traditional compressive sensing recovery algorithms. When tested by recovering compressively sensed speech signals in the TIMIT speech database, the recovered speech has better perceptual quality than speech recovered using traditional compressive sensing recovery algorithms.

7 citations

Proceedings ArticleDOI
26 May 2013
TL;DR: Two distinct approaches for height estimation are explored, the first approach is statistical based and incorporates acoustic models within a GMM structure, while the second is a direct speech analysis approach that employs linear regression to obtain the height directly.
Abstract: There are both scientific and technology based motivations for establishing effective speech processing algorithms that estimate speaker traits. Estimating speaker height can assist in voice forensic analysis [1], as well as provide additional side knowledge to improve speaker ID systems, or acoustic model selection for improved speech recognition. In this study, two distinct approaches for height estimation are explored. The first approach is statistical based and incorporates acoustic models within a GMM structure, while the second is a direct speech analysis approach that employs linear regression to obtain the height directly. The accuracy and trade-offs of these systems are explored as well a fusion of the two systems using data from the TIMIT corpus (which includes ground truth on speaker height).

7 citations

Proceedings ArticleDOI
03 Aug 1997
TL;DR: A high correlation is found between results of the proposed method and the classification accuracy obtained by the network, which is applied to TIMIT phoneme classification.
Abstract: In this paper a method is proposed for choosing between different wavelets, and their corresponding parameters, and it is applied to TIMIT phoneme classification. The method involves the use of a Kohonen network to extract prototypes for each class. Distance measures between them are used as criteria for choosing the best wavelet. In the case of phoneme classification, a time delay neural network is used as the classifier. A high correlation is found between results of the proposed method and the classification accuracy obtained by the network.

7 citations

Proceedings ArticleDOI
01 Dec 2009
TL;DR: An efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion, employed to minimize the difference between noisy and clean speech is proposed.
Abstract: In this paper, we have proposed an efficient and effective nonlinear feature domain noise suppression algorithm, motivated by the minimum mean square error (MMSE) optimization criterion. A Multi Layer Perceptron (MLP) neural network in the log spectral domain has been employed to minimize the difference between noisy and clean speech. By using this method, as a pre-processing stage of a speech recognition system, the recognition rate in noisy environments has been improved. We extended the application of the system to different environments with different noises without retraining HMM model. We trained the feature extraction stage with a small portion of noisy data which was created by artificially adding different types of noises from the NOISEX-92 database to the TIMIT speech database. In real environment, where our speech recognition systems must work, different types of noises with various SNRs exist. Our proposed method suggests four strategies based on the system capability to identify the noise type and SNR. Experimental results show that the proposed method achieves significant improvement in recognition rates.

7 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895