Topic
TIMIT
About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Papers published on a yearly basis
Papers
More filters
••
14 Oct 2008TL;DR: Two methods to add CM into the SVM outputs using trainable intelligent systems are described and the results show that the second method demonstrates better performance than the first, which is a linear combination of Platt sigmoid function using multi-layer perceptron.
Abstract: In this paper, a trainable confidence measuring system has been proposed and tested on speech recognition systems based on SVM classifiers. Classically, most of speech recognition methods have been established on the basis of probability models and statistical density estimation of each language unit and the confidence measure (CM) is extracted implicitly as a byproduct of the process of classification. Although support vector machines have shown their potential in optimizing the recognition rate, an appropriate CM has not been proposed for this purpose. This paper describes two methods to add CM into the SVM outputs using trainable intelligent systems. The first method is the simulation of Platt method using neural network and the second method is a linear combination of Platt sigmoid function using multi-layer perceptron. The experiments of these methods have been arranged on the dialects of TIMIT corpus. The results of these experiments show that the second method demonstrates better performance than the first one. e.g. After rejecting 20% of classifications by CM, the achieved error rates for ldquo/b/,/d/rdquo , ldquo/b/,/g/rdquo and ldquo/d/,g/rdquo phonemes are 6%, 3.5% and 2% respectively, while this error rate is much higher without employing neural networks. Although by increasing the number of phonemes, the performance of the second method will match that of the first method.
••
01 Nov 2016TL;DR: A blind method for phone segmentation without using prior knowledge of speech content is proposed and a two-step algorithm for detecting phone boundaries is derived that is effective for long speech.
Abstract: Phone segmentation is to divide a continuous speech signal into discrete, non-overlapping phone units. In this paper, a blind method for phone segmentation without using prior knowledge of speech content is proposed. A two-step algorithm for detecting phone boundaries is derived. The first step selects peaks of Euclidian curve as phone boundary candidates. The second step verifies these candidates using Gaussian function. The Gaussian function is computed locally. Therefore, it is suitable for speech feature at each local region of the utterance. Experiments show that our method is good for both short and long speech. Experiment 1 is conducted on a short speech corpus, the TIMIT. Our results are comparable to or more accurate than those of previous methods. Experiment 2 is conducted on a long speech corpus, the TCC300. Our results are more accurate than previous method. The relative improvement of F-value is 1.05%. This method is effective for long speech.
••
27 Apr 2010
TL;DR: A clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language based on a statistical similarity measurement rather than acoustical/phonetic knowledge.
Abstract: Phone recognition experiments give information about the confusions between phones. Grouping the most confusable phones and making a multilevel hierarchical classification should improve phone recognition. In this paper a clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language. The method is based on a statistical similarity measurement rather than acoustical/phonetic knowledge. Results are presented for two phone recognisers (TIMIT corpus and Portuguese TECNOVOZ database).
••
20 Jun 2023
TL;DR: In this paper , a method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by many real-world applications, in addition to the transcription.
Abstract: In this work, we describe a novel method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by many real-world applications, in addition to the transcription. The word timestamps enable the ASR to output word segmentations and word confusion networks without relying on a secondary model or forced alignment process when testing. Our proposed system has similar word segmentation accuracy as a hybrid DNN-HMM (Deep Neural Network-Hidden Markov Model) system, with less than 3ms difference in mean absolute error in word start times on TIMIT data. At the same time, we observed less than 5% relative increase in the word error rate compared to the non-timestamped system when using the same audio training data and nearly identical model size. We also contribute more rigorous analysis of multiple-hypothesis embedding-matching ASR in general.
01 Jan 1995
TL;DR: A general formalism for training neural predictive systems, and an approach for performing discrimination in predictive systems at the sequence level, which makes use of N-Best sequence selection.
Abstract: We describe ,a general formalism for training neural predictive systems. We then introduce discrimination at the frame level and show how it relates to maximum mutual information training. Last, we propose an approach for performing discrimination in predictive systems at the sequence level, it makes use of N-Best sequence selection. Performances, for acoustic-phonetic decoding reach 77.4% phone accuracy on 1988 version of TIMIT.