scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
20 Mar 2016
TL;DR: The proposed template based approaches offer comparable and better spectral distortions, validating their ability to provide accurate high-resolution segmentation of the unit-database.
Abstract: We address the problem of automatic segmentation of the unit database in unit-selection based TTS and propose template based forced alignment segmentation in the one-pass dynamic programming (DP) framework with several variants: i) multi-template representation derived by modified K-means (MKM) algorithm, ii) context-independent and context-dependent templates for reduced multi-template representation, iii) segmental K-means algorithm with MKM modeling of phone classes, as a template-based equivalent of the conventional embedded re-estimation procedure for HMM based modeling and segmentation, that is typical for deriving unit-databases for TTS (eg EHMM in Festival) We first benchmark the performance of the proposed segmentation framework on TIMIT database for phonetic segmentation given the availability of phonetic labeling ground truth in TIMIT We then apply the proposed template based segmentation algorithms for syllabic Indian language TTS, and benchmark the proposed segmentation using objective measures based on spectral distortions (SD) obtained on time-aligned speech utterances and compare it with other recent segmentation approaches, namely the group-delay (GD) based semiautomatic method, Hybrid method, EHMM, HMM and SKM-HMM and show that the proposed template based approaches offer comparable and better spectral distortions, validating their ability to provide accurate high-resolution segmentation of the unit-database

1 citations

23 Sep 2008
TL;DR: This work attempted to improve recognition accuracy, avoiding extensive retraining when the vocabulary is changed or extended, by applying a hidden Markov model and neural associative memory based hybrid approach to continuous speech recognition.
Abstract: We attempted to improve recognition accuracy, avoiding extensive retraining when the vocabulary is changed or extended, by applying a hidden Markov model and neural associative memory based hybrid approach to continuous speech recognition. In this approach hidden Markov models are used for subword-unit recognition such as syllables. For a given subword-unit sequence a network of neural associative memories generates first spoken single words and then the whole sentence. The fault-tolerance property of neural associative memory enables the system to correctly recognize words although they are not perfectly pronounced or run into each other. The approach are evaluated for TIMIT, and for WSJ1 5k and 20k test sets. The obtained results are encouraging.

1 citations

Book ChapterDOI
16 Jan 2018
TL;DR: It will be shown that this convolutional neural network approach is particularly well suited to network noise and the distortion of speech data, as demonstrated by the state-of-the-art benchmark results for NTIMIT.
Abstract: A novel application of convolutional neural networks to phone recognition is presented in this paper. Both the TIMIT and NTIMIT speech corpora have been employed. The phonetic transcriptions of these corpora have been used to label spectrogram segments for training the convolutional neural network. A sliding window extracted fixed sized images from the spectrograms produced for the TIMIT and NTIMIT utterances. These images were assigned to the appropriate phone class by parsing the TIMIT and NTIMIT phone transcriptions. The GoogLeNet convolutional neural network was implemented and trained using stochastic gradient descent with mini batches. Post training, phonetic rescoring was performed to map each phone set to the smaller standard set, i.e. the 61 phone set was mapped to the 39 phone set. Benchmark results of both datasets are presented for comparison to other state-of-the-art approaches. It will be shown that this convolutional neural network approach is particularly well suited to network noise and the distortion of speech data, as demonstrated by the state-of-the-art benchmark results for NTIMIT.

1 citations

Journal ArticleDOI
TL;DR: This paper investigated speaker discrimination in utterances varying in syllable length and speaker gender taken from the TIMIT corpus of American English and found that male speakers were discriminated better than female speakers.
Abstract: This study investigated speaker discrimination in utterances varying in syllable length and speaker gender taken from the TIMIT corpus of American English. Twenty native English speakers presented one‐, two‐, and three‐syllable utterances (within speaker gender) in a two‐alternative forced‐choice task. Perception results were analyzed in light of both source level (F0 and long‐term average spectrum of LPC residuals) and formant level measurements (F1–F4). Results showed that male speakers were discriminated better than female speakers. Source features (F0 and LTAS of LPC residuals) significantly predicted listener response, while higher spectral information (F1–F4) had little effect. The varying importance of vocal source and vocal tract characteristics in speaker discrimination is discussed.

1 citations

Proceedings Article
19 Sep 2006
TL;DR: A novel time-synchronous decoder, designed specifically for a Hidden Trajectory Model ( HTM) whose likelihood s core computation depends on long-span phonetic contexts, is presented.
Abstract: A novel time-synchronous decoder, designed specifically for a Hidden Trajectory Model ( HTM) whose likelihood s core computation depends on long-span phonetic contexts, is presented. HTM is a recently developed acoustic model aimed to capture the underlying dynamic structure of speech coarticulation and reduction using a compact set of parameters. The long-span nature of the HTM had posed a great technical challenge for developing efficient search algorithms for full evaluation of the model. Taking on the challenge, the decoding algorithm is developed to deal effectively with the exponentially increased search space by HTMspecific t echniques for hypothesis representation, w ord-ending recombination, and hypothesis pruning. Experimental results obtained on the TIMIT phonetic recognition task are reported, extending our earlier HTM evaluation paradigms based on N-best and A* lattice rescoring. Index T erms: Hidden Trajectory Model, t ime-synchronous decoding, trace-based hypothesis, TIMIT

1 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895