scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2018
TL;DR: The spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum is exploited to detect two broad manners of articulation namely sonorants and obstruents and modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions.
Abstract: The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.

1 citations

Proceedings ArticleDOI
19 Jul 2020
TL;DR: The encoder-decoder architecture of U-Net is extended and it is shown it is capable of good performance in the acoustic modelling of a speech recognition system and the importance of the concatenation step is investigated.
Abstract: We train fully convolutional neural networks with no recurrent layers for the end-to-end phoneme recognition task, using the Connectionist Temporal Classification (CTC) loss function. The adopted network, U-Net, was introduced initially for semantic image segmentation tasks, and is often applied to segmenting features in medical imaging and remote sensing. The similarities between CTC-based automatic speech recognition and semantic segmentation problems are discussed. We extend the encoder-decoder architecture of U-Net and show it is capable of good performance in the acoustic modelling of a speech recognition system. We investigate the importance of the concatenation step in the design of U-net, and report results using the core test set of the TIMIT corpus.

1 citations

Proceedings ArticleDOI
23 Mar 1992
TL;DR: Experimental results indicate that apart from a rather mild limitation of SM in handling a certain type of vocabulary, SM actually performs better than baselined continuous hidden Markov models (CHMM) in terms of recognition rate as far as isolated word recognition is concerned, and it takes only 60% of the time needed by CHMM in recognition.
Abstract: A static model (SM) in the form of a single vector is proposed to represent the temporal properties of a sequence of speech feature vectors. In contrast to a hidden Markov model which captures the conditional probabilities of state transitions of consecutive observations x/sup to //sub t/ and x/sup to //sub t+1/ over time, an SM captures their average joint probabilities of belonging to a pair of phonetic classes omega /sub i/ and omega /sub j/ without any Markovian assumption. SM is tested with isolated words derived from the TIMIT database as well as artificially created words. The vocabulary is a subset of TIMIT consisting of 21 words derived from the two 'sa' sentences spoken by 420 speakers. The artificial vocabulary of 10 words is designed to study the limitations of SM. Experimental results indicate that apart from a rather mild limitation of SM in handling a certain type of vocabulary, SM actually performs better than baselined continuous hidden Markov models (CHMM) in terms of recognition rate as far as isolated word recognition is concerned, and it takes only 60% of the time needed by CHMM in recognition. >

1 citations

Proceedings ArticleDOI
01 Dec 2016
TL;DR: In this paper, a modified Gaussian posteriorgram based on the proposed Gaussian components selection algorithm as template representation was used for query-by-example Spoken Term Detection (QbE-STD), which emphasizes the discriminant among queries.
Abstract: Query-by-Example Spoken Term Detection(QbE-STD) has been a hot research topic in speech recognition field. While template representation is the key composition part of QbE-STD, many researchers have been committed to developing effective template representations to obtain the better performance. Gaussian posteriorgram has been widely used due to that the GMM model which generates the Gaussian posteriorgram can be convenient and easy to train. However, the corresponding performance is not that satisfactory. In this paper, we use modified Gaussian posteriorgram based on the proposed Gaussian components selection algorithm as template representation, which emphasizes the discriminant among queries. The selection algorithm is inspired by the TF-IDF concept well known to the information retrieval and text indexing fields. We carried out comparison on the TIMIT corpus, and the results showed that, with our approach, the P@N was increased by 12%, and the EER was reduced by 10%.

1 citations

Proceedings ArticleDOI
13 May 2002
TL;DR: The use of 3-state AF model with multiple observation distributions that gives a better modeling of the articulatory features within a phone is introduced that results in an improvement of about 1% in phone recognition on the TIMIT task.
Abstract: In this paper, we propose two improvements to the articulatory feature (AF) models We introduce the use of 3-state AF model with multiple observation distributions that gives a better modeling of the articulatory features within a phone This results in an improvement of about 1% in phone recognition on the TIMIT task Combining the AF model with acoustic-based HMM achieves an improvement of 16% compares to use acoustic features only We then introduce the asynchronous state combination of the 3-state AF models with acoustic-based HMM and obtain an additional improvement of 17%

1 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895