scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
06 Sep 2009
TL;DR: A feature extraction technique based on static and dynamic modulation spectrum derived from long-term envelopes in sub-bands that provides relative improvements in phoneme recognition accuracies for TIMIT and conversation telephone speech (CTS).
Abstract: We present a feature extraction technique based on static and dynamic modulation spectrum derived from long-term envelopes in sub-bands Estimation of the sub-band temporal envelopes is done using Frequency Domain Linear Prediction (FDLP) These sub-band envelopes are compressed with a static (logarithmic) and dynamic (adaptive loops) compression The compressed sub-band envelopes are transformed into modulation spectral components which are used as features for speech recognition Experiments are performed on a phoneme recognition task using a hybrid HMM-ANN phoneme recognition system and an ASR task using the TANDEM speech recognition system The proposed features provide a relative improvements of 38 % and 115 % in phoneme recognition accuracies for TIMIT and conversation telephone speech (CTS) respectively Further, these improvements are found to be consistent for ASR tasks on OGI-Digits database (relative improvement of 135 %)

19 citations

Proceedings ArticleDOI
21 Oct 2019
TL;DR: This work proposes an utterance-level classification-aided non-intrusive (UCAN) assessment approach that combines the task of quality score classification with the regression task ofquality score estimation, and uses a categorical quality ranking as an auxiliary constraint to assist with quality score estimation.
Abstract: Objective metrics, such as the perceptual evaluation of speech quality (PESQ) have become standard measures for evaluating speech. These metrics enable efficient and costless evaluations, where ratings are often computed by comparing a degraded speech signal to its underlying clean reference signal. Reference-based metrics, however, cannot be used to evaluate real-world signals that have inaccessible references. This project develops a nonintrusive framework for evaluating the perceptual quality of noisy and enhanced speech. We propose an utterance-level classification-aided non-intrusive (UCAN) assessment approach that combines the task of quality score classification with the regression task of quality score estimation. Our approach uses a categorical quality ranking as an auxiliary constraint to assist with quality score estimation, where we jointly train a multi-layered convolutional neural network in a multi-task manner. This approach is evaluated using the TIMIT speech corpus and several noises under a wide range of signal-to-noise ratios. The results show that the proposed system significantly improves quality score estimation as compared to several state-of-the-art approaches.

19 citations

Proceedings ArticleDOI
26 Sep 2010
TL;DR: This paper explores the use of two proposed glottal signatures, derived from the residual signal, for speaker identification, and promising results are shown to outperform other approaches based onglottal features.
Abstract: Most of current speaker recognition systems are based on features extracted from the magnitude spectrum of speech. However the excitation signal produced by the glottis is expected to convey complementary relevant information about the speaker identity. This paper explores the use of two proposed glottal signatures, derived from the residual signal, for speaker identification. Experiments using these signatures are performed on both TIMIT and YOHO databases. Promising results are shown to outperform other approaches based on glottal features. Besides it is highlighted that the signatures can be used for text-independent speaker recognition and that only several seconds of voiced speech are sufficient for estimating them reliably.

19 citations

Proceedings ArticleDOI
14 Sep 2014
TL;DR: A simple but effective way of using multi-frame targets to improve the accuracy of Artificial Neural NetworkHidden Markov Model (ANN-HMM) hybrid systems.
Abstract: We describe a simple but effective way of using multi-frame targets to improve the accuracy of Artificial Neural NetworkHidden Markov Model (ANN-HMM) hybrid systems. In this approach a Deep Neural Network (DNN) is trained to predict the forced-alignment state of multiple frames using a separate softmax unit for each of the frames. This is in contrast to the usual method of training a DNN to predict only the state of the central frame. By itself this is not sufficient to improve accuracy of the system significantly. However, if we average the predictions for each frame from the different contexts it is associated with we achieve state of the art results on TIMIT using a fully connected Deep Neural Network without convolutional architectures or dropout training. On a 14 hour subset of Wall Street Journal (WSJ) using a context dependent DNN-HMM system it leads to a relative improvement of 6.4% on the dev set (testdev93) and 9.3% on test set (test-eval92).

19 citations

Proceedings ArticleDOI
11 Feb 2020
TL;DR: This work proposes a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection and evaluates the model on a He-brew corpus to demonstrate such phonetic supervision can be beneficial in a multi-lingual setting.
Abstract: Phoneme boundary detection plays an essential first step for a variety of speech processing applications such as speaker diarization, speech science, keyword spotting, etc. In this work, we propose a neural architecture coupled with a parameterized structured loss function to learn segmental representations for the task of phoneme boundary detection. First, we evaluated our model when the spoken phonemes were not given as input. Results on the TIMIT and Buckeye corpora suggest that the proposed model is superior to the baseline models and reaches state-of-the-art performance in terms of F1 and R-value. We further explore the use of phonetic transcription as additional supervision and show this yields minor improvements in performance but substantially better convergence rates. We additionally evaluate the model on a He-brew corpus and demonstrate such phonetic supervision can be beneficial in a multi-lingual setting.

19 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895