scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Journal ArticleDOI
TL;DR: Comparative assessment over baseline enhancement algorithms like Auto-LSP, log-minimum mean squared error (log-MMSE), and log- MMSE with speech presence uncertainty (log -MMSE-SPU) demonstrate that the proposed solution exhibits greater consistency in improving speech quality over most phoneme classes and noise types considered in this study.
Abstract: The degree of influence of noise over phonemes is not uniform since it is dependent on their distinct acoustic properties. In this study, the problem of selectively enhancing speech based on broad phoneme classes is addressed using Auto-(LSP), a constrained iterative speech enhancement algorithm. Multiple enhanced utterances are generated for every noisy utterance by varying the Auto-LSP parameters. The noisy utterance is then partitioned into segments based on broad level phoneme classes, and constraints are applied on each segment using a hard decision solution. To alleviate the effect of hard decision errors, a Gaussian mixture model (GMM)-based maximum-likelihood (ML) soft decision solution is also presented. The resulting utterances are evaluated over the TIMIT speech corpus using the Itakura-Saito, segmental signal-to-noise ratio (SNR) and perceptual evaluation of speech quality (PESQ) metrics over four noise types at three SNR levels. Comparative assessment over baseline enhancement algorithms like Auto-LSP, log-minimum mean squared error (log-MMSE), and log-MMSE with speech presence uncertainty (log-MMSE-SPU) demonstrate that the proposed solution exhibits greater consistency in improving speech quality over most phoneme classes and noise types considered in this study.

6 citations

Proceedings ArticleDOI
01 Mar 2017
TL;DR: Results indicate that the graph-based semi-supervised learning method for acoustic data significantly improves classification accuracy compared to the fully- supervised case when the fraction of labeled data is low, and it is competitive with other methods in the fully labeled case.
Abstract: We describe a graph-based semi-supervised learning method for acoustic data that uses a Deep Neural Network (DNN) combined with a stochastic graph-based entropic regularizer to favor smooth solutions over a graph induced by the data. We consider graph embeddings constructed from the input features and also from dimensionality-reduced encodings obtained from the bottleneck layer of a separate deep auto-encoder. We use a computationally efficient, stochastic graph-regularization technique that uses mini-batches that are consistent with the graph structure but that also provide enough data diversity for the convergence of stochastic gradient descent methods to good solutions. For this work, we focus on results of frame-level phone classification accuracy on the TIMIT speech corpus but our method is general and scalable to much larger data sets. Results indicate that our method significantly improves classification accuracy compared to the fully-supervised case when the fraction of labeled data is low, and it is competitive with other methods in the fully labeled case.

6 citations

01 Jan 2009
TL;DR: It is observed that triphone-based system after necessary phoneme grouping based on place of articulation correlates well with the FDA scores, and apart from the articulatory problems, some of the speakers are affected with velopharyngeal incompetence also.
Abstract: Dysarthria is a neuromotor impairment of speech that affects one or more of the speech sub-systems. It is reflected in the acoustic characteristics of the phonemes as deviations from their healthy counterparts. To capture these deviations, in this work a continuous speech, an isolated-style monophone-based, and a triphone-based speech recognition systems are developed. These speech recognition systems are trained with the TIMIT speech corpus and tested with the Nemours database of dysarthric speech. The correlation coefficient between the performance of the speech recognition systems and the Frenchay dysarthria assessment (FDA) scores is computed for the assessment of articulatory sub-systems. It is observed that triphone-based system after necessary phoneme grouping based on place of articulation correlates well with the FDA scores. It is further observed that apart from the articulatory problems, some of the speakers are affected with velopharyngeal incompetence also. It is analyzed with group delay function-based acoustic measure for the detection of hypernasality on dysarthric speech and found that 4 out of 10 dysarthric speakers in the Nemours database are hypernasal.

6 citations

Proceedings ArticleDOI
20 Mar 2016
TL;DR: This work reports progress towards phoneme recognition using a model of speech which employs very few parameters and which is more faithful to the dynamics and model of human speech production.
Abstract: Recent advances in automatic speech recognition have used large corpora and powerful computational resources to train complex statistical models from high-dimensional features, to attempt to capture all the variability found in natural speech. Such models are difficult to interpret and may be fragile, and contradict or ignore knowledge of human speech production and perception. We report progress towards phoneme recognition using a model of speech which employs very few parameters and which is more faithful to the dynamics and model of human speech production. Using features generated from a neural network bottleneck layer, we obtain recognition accuracy on TIMIT which compares favourably with traditional models of similar power. We discuss the implications of these results for recognition using natural features such as vocal tract resonances and spectral energies.

6 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895