scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
14 Oct 2008
TL;DR: Two methods to add CM into the SVM outputs using trainable intelligent systems are described and the results show that the second method demonstrates better performance than the first, which is a linear combination of Platt sigmoid function using multi-layer perceptron.
Abstract: In this paper, a trainable confidence measuring system has been proposed and tested on speech recognition systems based on SVM classifiers. Classically, most of speech recognition methods have been established on the basis of probability models and statistical density estimation of each language unit and the confidence measure (CM) is extracted implicitly as a byproduct of the process of classification. Although support vector machines have shown their potential in optimizing the recognition rate, an appropriate CM has not been proposed for this purpose. This paper describes two methods to add CM into the SVM outputs using trainable intelligent systems. The first method is the simulation of Platt method using neural network and the second method is a linear combination of Platt sigmoid function using multi-layer perceptron. The experiments of these methods have been arranged on the dialects of TIMIT corpus. The results of these experiments show that the second method demonstrates better performance than the first one. e.g. After rejecting 20% of classifications by CM, the achieved error rates for ldquo/b/,/d/rdquo , ldquo/b/,/g/rdquo and ldquo/d/,g/rdquo phonemes are 6%, 3.5% and 2% respectively, while this error rate is much higher without employing neural networks. Although by increasing the number of phonemes, the performance of the second method will match that of the first method.
Proceedings ArticleDOI
01 Nov 2016
TL;DR: A blind method for phone segmentation without using prior knowledge of speech content is proposed and a two-step algorithm for detecting phone boundaries is derived that is effective for long speech.
Abstract: Phone segmentation is to divide a continuous speech signal into discrete, non-overlapping phone units. In this paper, a blind method for phone segmentation without using prior knowledge of speech content is proposed. A two-step algorithm for detecting phone boundaries is derived. The first step selects peaks of Euclidian curve as phone boundary candidates. The second step verifies these candidates using Gaussian function. The Gaussian function is computed locally. Therefore, it is suitable for speech feature at each local region of the utterance. Experiments show that our method is good for both short and long speech. Experiment 1 is conducted on a short speech corpus, the TIMIT. Our results are comparable to or more accurate than those of previous methods. Experiment 2 is conducted on a long speech corpus, the TCC300. Our results are more accurate than previous method. The relative improvement of F-value is 1.05%. This method is effective for long speech.
Book ChapterDOI
27 Apr 2010
TL;DR: A clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language based on a statistical similarity measurement rather than acoustical/phonetic knowledge.
Abstract: Phone recognition experiments give information about the confusions between phones. Grouping the most confusable phones and making a multilevel hierarchical classification should improve phone recognition. In this paper a clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language. The method is based on a statistical similarity measurement rather than acoustical/phonetic knowledge. Results are presented for two phone recognisers (TIMIT corpus and Portuguese TECNOVOZ database).
Posted ContentDOI
20 Jun 2023
TL;DR: In this paper , a method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by many real-world applications, in addition to the transcription.
Abstract: In this work, we describe a novel method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by many real-world applications, in addition to the transcription. The word timestamps enable the ASR to output word segmentations and word confusion networks without relying on a secondary model or forced alignment process when testing. Our proposed system has similar word segmentation accuracy as a hybrid DNN-HMM (Deep Neural Network-Hidden Markov Model) system, with less than 3ms difference in mean absolute error in word start times on TIMIT data. At the same time, we observed less than 5% relative increase in the word error rate compared to the non-timestamped system when using the same audio training data and nearly identical model size. We also contribute more rigorous analysis of multiple-hypothesis embedding-matching ASR in general.
01 Jan 1995
TL;DR: A general formalism for training neural predictive systems, and an approach for performing discrimination in predictive systems at the sequence level, which makes use of N-Best sequence selection.
Abstract: We describe ,a general formalism for training neural predictive systems. We then introduce discrimination at the frame level and show how it relates to maximum mutual information training. Last, we propose an approach for performing discrimination in predictive systems at the sequence level, it makes use of N-Best sequence selection. Performances, for acoustic-phonetic decoding reach 77.4% phone accuracy on 1988 version of TIMIT.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895