Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speech Recognition Using Linear Dynamic Models

[...]

Joe Frankel¹, Simon King¹•Institutions (1)

University of Edinburgh¹

01 Jan 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models, and a time-asynchronous decoding strategy suited to recognition with segment models is proposed.

...read moreread less

Abstract: The majority of automatic speech recognition systems rely on hidden Markov models, in which Gaussian mixtures model the output distributions associated with sub-phone states. This approach, whilst successful, models consecutive feature vectors (augmented to include derivative information) as statistically independent. Furthermore, spatial correlations present in speech parameters are frequently ignored through the use of diagonal covariance matrices. This paper continues the work of Digalakis and others who proposed instead a first-order linear state-space model which has the capacity to model underlying dynamics, and furthermore give a model of spatial correlations. This paper examines the assumptions made in applying such a model and shows that the addition of a hidden dynamic state leads to increases in accuracy over otherwise equivalent static models. We also propose a time-asynchronous decoding strategy suited to recognition with segment models. We describe implementation of decoding for linear dynamic models and present TIMIT phone recognition results

...read moreread less

34 citations

Proceedings Article•DOI•

Highly accurate phonetic segmentation using boundary correction models and system fusion

[...]

Andreas Stolcke¹, Neville Ryant², Vikramjit Mitra, Jiahong Yuan², Wen Wang, Mark Liberman² - Show less +2 more•Institutions (2)

Microsoft¹, University of Pennsylvania²

04 May 2014

TL;DR: This work investigates techniques for boosting the accuracy of automatic phonetic segmentation based on HMM acoustic-phonetic models and finds that combining multiple acoustic front-ends gives additional gains in accuracy, and that conditioning the combiner on phonetic context and side information helps.

...read moreread less

Abstract: Accurate phone-level segmentation of speech remains an important task for many subfields of speech research. We investigate techniques for boosting the accuracy of automatic phonetic segmentation based on HMM acoustic-phonetic models. In prior work [25] we were able to improve on state-of-the-art alignment accuracy by employing special phone boundary HMM models, trained on phonetically segmented training data, in conjunction with a simple boundary-time correction model. Here we present further improved results by using more powerful statistical models for boundary correction that are conditioned on phonetic context and duration features. Furthermore, we find that combining multiple acoustic front-ends gives additional gains in accuracy, and that conditioning the combiner on phonetic context and side information helps. Overall, we reduce segmentation errors on the TIMIT corpus by almost one half, from 93.9% to 96.8% boundary accuracy with a 20-ms tolerance.

...read moreread less

34 citations

Journal Article•DOI•

Learning representations for nonspeech audio events through their similarities to speech patterns

[...]

Huy Phan¹, Lars Hertel¹, Marco Maass¹, Radoslaw Mazur¹, Alfred Mertins¹ - Show less +1 more•Institutions (1)

University of Lübeck¹

01 Apr 2016-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This work considers speech patterns as basic acoustic concepts, which embody and represent the target nonspeech signal, and proposes an algorithm to select a sufficient subset, which provides an approximate representation capability of the entire set of available speech patterns.

...read moreread less

Abstract: The human auditory system is very well matched to both human speech and environmental sounds. Therefore, the question arises whether human speech material may provide useful information for training systems for analyzing nonspeech audio signals, e.g., in a classification task. In order to answer this question, we consider speech patterns as basic acoustic concepts, which embody and represent the target nonspeech signal. To find out how similar the nonspeech signal is to speech, we classify it with a classifier trained on the speech patterns and use the classification posteriors to represent the closeness to the speech bases. The speech similarities are finally employed as a descriptor to represent the target signal. We further show that a better descriptor can be obtained by learning to organize the speech categories hierarchically with a tree structure. Furthermore, these descriptors are generic. That is, once the speech classifier has been learned, it can be employed as a feature extractor for different datasets without retraining. Lastly, we propose an algorithm to select a sufficient subset, which provides an approximate representation capability of the entire set of available speech patterns. We conduct experiments for the application of audio event analysis. Phone triplets from the TIMIT dataset were used as speech patterns to learn the descriptors for audio events of three different datasets with different complexity, including UPC-TALP, Freiburg-106, and NAR. The experimental results on the event classification task show that a good performance can be easily obtained even if a simple linear classifier is used. Furthermore, fusion of the learned descriptors as an additional source leads to state-of-the-art performance on all the three target datasets.

...read moreread less

33 citations

Proceedings Article•DOI•

Graph-based semi-supervised learning for phone and segment classification.

[...]

Yuzong Liu¹, Katrin Kirchhoff¹•Institutions (1)

University of Washington¹

25 Aug 2013

TL;DR: This paper applies graphbased learning to variable-length segments rather than to the fixed-length vector representations that have been used previously, and finds that the best learning algorithms are those that can incorporate prior knowledge.

...read moreread less

Abstract: This paper presents several novel contributions to the emerging framework of graph-based semi-supervised learning for speech processing. First, we apply graphbased learning to variable-length segments rather than to the fixed-length vector representations that have been used previously. As part of this work we compare various graph-based learners, and we utilize an efficient feature selection technique for high-dimensional feature spaces that alleviates computational costs and improves the performance of graph-based learners. Finally, we present a method to improve regularization during the learning process. Experimental evaluation on the TIMIT frame and segment classification tasks demonstrates that the graphbased classifiers outperform standard baseline classifiers; furthermore, we find that the best learning algorithms are those that can incorporate prior knowledge.

...read moreread less

32 citations

Journal Article•DOI•

Parallel Genetic-Based Hybrid Pattern Matching Algorithm for Isolated Word Recognition

[...]

Sam Kwong¹, Qianhua He, Kim F. Man¹, Ke Tang¹, C. W. Chau¹ - Show less +1 more•Institutions (1)

City University of Hong Kong¹

01 Aug 1998-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: A Parallel Genetic Time Warping (PGTW) is proposed to solve the above said problems and showed that the PGTW had performed better than the TTS, but about 30% CPU time is saved in the single processor system.

...read moreread less

Abstract: Dynamic Time Warping (DTW) is a common technique widely used for nonlinear time normalization of different utterances in many speech recognition systems. Two major problems are usually encountered when the DTW is applied for recognizing speech utterances: (i) the normalization factors used in a warping path; and (ii) finding the K-best warping paths. Although DTW is modified to compute multiple warping paths by using the Tree-Trellis Search (TTS) algorithm, the use of actual normalization factor still remains a major problem for the DTW. In this paper, a Parallel Genetic Time Warping (PGTW) is proposed to solve the above said problems. A database extracted from the TIMIT speech database of 95 isolated words is set up for evaluating the performance of the PGTW. In the database, each of the first 15 words had 70 different utterances, and the remaining 80 words had only one utterance. For each of the 15 words, one utterance is arbitrarily selected as the test template for recognition. Distance measure for each test template to the utterances of the same word and to those of the 80 words is calculated with three different time warping algorithms: TTS, PGTW and Sequential Genetic Time Warping (SGTW). A Normal Distribution Model based on Rabiner23 is used to evaluate the performance of the three algorithms analytically. The analyzed results showed that the PGTW had performed better than the TTS. It also showed that the PGTW had very similar results as the SGTW, but about 30% CPU time is saved in the single processor system.

...read moreread less

32 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics