Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Modifying LSTM Posteriors with Manner of Articulation Knowledge to Improve Speech Recognition Performance

[...]

R. Pradeep¹, K. Sreenivasa Rao²•Institutions (2)

Amrita Vishwa Vidyapeetham¹, Indian Institute of Technology Kharagpur²

01 Dec 2018

TL;DR: The spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum is exploited to detect two broad manners of articulation namely sonorants and obstruents and modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions.

...read moreread less

Abstract: The variant of recurrent neural networks (RNN) such as long short-term memory (LSTM) is successful in sequence modelling such as automatic speech recognition (ASR) framework. However the decoded sequence is prune to have false substitutions, insertions and deletions. We exploit the spectral flatness measure (SFM) computed on the magnitude linear prediction (LP) spectrum to detect two broad manners of articulation namely sonorants and obstruents. In this paper, we modify the posteriors generated at the output layer of LSTM according to the manner of articulation detection. The modified posteriors are given to the conventional decoding graph to minimize the false substitutions and insertions. The proposed method decreased the phone error rate (PER) by nearly 0.7 % and 0.3 % when evaluated on core TIMIT test corpus as compared to the conventional decoding involved in the deep neural networks (DNN) and the state of the art LSTM respectively.

...read moreread less

1 citations

Proceedings Article•DOI•

End-to-End Phoneme Recognition using Models from Semantic Image Segmentation

[...]

Wei Gao¹, Ahmad Hashemi-Sakhtsari, Mark D. McDonnell¹•Institutions (1)

University of South Australia¹

19 Jul 2020

TL;DR: The encoder-decoder architecture of U-Net is extended and it is shown it is capable of good performance in the acoustic modelling of a speech recognition system and the importance of the concatenation step is investigated.

...read moreread less

Abstract: We train fully convolutional neural networks with no recurrent layers for the end-to-end phoneme recognition task, using the Connectionist Temporal Classification (CTC) loss function. The adopted network, U-Net, was introduced initially for semantic image segmentation tasks, and is often applied to segmenting features in medical imaging and remote sensing. The similarities between CTC-based automatic speech recognition and semantic segmentation problems are discussed. We extend the encoder-decoder architecture of U-Net and show it is capable of good performance in the acoustic modelling of a speech recognition system. We investigate the importance of the concatenation step in the design of U-net, and report results using the core test set of the TIMIT corpus.

...read moreread less

1 citations

Proceedings Article•DOI•

Static representation of speech dynamics for isolated word recognition

[...]

Chorkin Chan¹, Jian-Xiong Wu¹•Institutions (1)

University of Hong Kong¹

23 Mar 1992

TL;DR: Experimental results indicate that apart from a rather mild limitation of SM in handling a certain type of vocabulary, SM actually performs better than baselined continuous hidden Markov models (CHMM) in terms of recognition rate as far as isolated word recognition is concerned, and it takes only 60% of the time needed by CHMM in recognition.

...read moreread less

Abstract: A static model (SM) in the form of a single vector is proposed to represent the temporal properties of a sequence of speech feature vectors. In contrast to a hidden Markov model which captures the conditional probabilities of state transitions of consecutive observations x/sup to //sub t/ and x/sup to //sub t+1/ over time, an SM captures their average joint probabilities of belonging to a pair of phonetic classes omega /sub i/ and omega /sub j/ without any Markovian assumption. SM is tested with isolated words derived from the TIMIT database as well as artificially created words. The vocabulary is a subset of TIMIT consisting of 21 words derived from the two 'sa' sentences spoken by 420 speakers. The artificial vocabulary of 10 words is designed to study the limitations of SM. Experimental results indicate that apart from a rather mild limitation of SM in handling a certain type of vocabulary, SM actually performs better than baselined continuous hidden Markov models (CHMM) in terms of recognition rate as far as isolated word recognition is concerned, and it takes only 60% of the time needed by CHMM in recognition. >

...read moreread less

1 citations

Proceedings Article•DOI•

An approach for spoken term detection based on modified Gaussian posteriorgrams

[...]

Liyuan Wang¹, Lei Wang¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

01 Dec 2016

TL;DR: In this paper, a modified Gaussian posteriorgram based on the proposed Gaussian components selection algorithm as template representation was used for query-by-example Spoken Term Detection (QbE-STD), which emphasizes the discriminant among queries.

...read moreread less

Abstract: Query-by-Example Spoken Term Detection(QbE-STD) has been a hot research topic in speech recognition field. While template representation is the key composition part of QbE-STD, many researchers have been committed to developing effective template representations to obtain the better performance. Gaussian posteriorgram has been widely used due to that the GMM model which generates the Gaussian posteriorgram can be convenient and easy to train. However, the corresponding performance is not that satisfactory. In this paper, we use modified Gaussian posteriorgram based on the proposed Gaussian components selection algorithm as template representation, which emphasizes the discriminant among queries. The selection algorithm is inspired by the TF-IDF concept well known to the information retrieval and text indexing fields. We carried out comparison on the TIMIT corpus, and the results showed that, with our approach, the P@N was increased by 12%, and the EER was reduced by 10%.

...read moreread less

1 citations

Proceedings Article•DOI•

Speech recognition using multi-state acoustic and articulatory features models with asynchronous states transition

[...]

Ka-Yee Leung¹, Man-Hung Siu¹•Institutions (1)

Hong Kong University of Science and Technology¹

13 May 2002

TL;DR: The use of 3-state AF model with multiple observation distributions that gives a better modeling of the articulatory features within a phone is introduced that results in an improvement of about 1% in phone recognition on the TIMIT task.

...read moreread less

Abstract: In this paper, we propose two improvements to the articulatory feature (AF) models We introduce the use of 3-state AF model with multiple observation distributions that gives a better modeling of the articulatory features within a phone This results in an improvement of about 1% in phone recognition on the TIMIT task Combining the AF model with acoustic-based HMM achieves an improvement of 16% compares to use acoustic features only We then introduce the asynchronous state combination of the 3-state AF models with acoustic-based HMM and obtain an additional improvement of 17%

...read moreread less

1 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics