Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Automatic confidence measure extraction for SVM outputs using neural network

[...]

S. Amini¹, Farbod Razzazi¹, K. Nayebi•Institutions (1)

Islamic Azad University¹

14 Oct 2008

TL;DR: Two methods to add CM into the SVM outputs using trainable intelligent systems are described and the results show that the second method demonstrates better performance than the first, which is a linear combination of Platt sigmoid function using multi-layer perceptron.

...read moreread less

Abstract: In this paper, a trainable confidence measuring system has been proposed and tested on speech recognition systems based on SVM classifiers. Classically, most of speech recognition methods have been established on the basis of probability models and statistical density estimation of each language unit and the confidence measure (CM) is extracted implicitly as a byproduct of the process of classification. Although support vector machines have shown their potential in optimizing the recognition rate, an appropriate CM has not been proposed for this purpose. This paper describes two methods to add CM into the SVM outputs using trainable intelligent systems. The first method is the simulation of Platt method using neural network and the second method is a linear combination of Platt sigmoid function using multi-layer perceptron. The experiments of these methods have been arranged on the dialects of TIMIT corpus. The results of these experiments show that the second method demonstrates better performance than the first one. e.g. After rejecting 20% of classifications by CM, the achieved error rates for ldquo/b/,/d/rdquo , ldquo/b/,/g/rdquo and ldquo/d/,g/rdquo phonemes are 6%, 3.5% and 2% respectively, while this error rate is much higher without employing neural networks. Although by increasing the number of phonemes, the performance of the second method will match that of the first method.

...read moreread less

Proceedings Article•DOI•

Blind method for phone segmentation using Gaussian function locally

[...]

Dac-Thang Hoang¹, Tat-Thang Vu¹, Tung-Lam Phi¹•Institutions (1)

Vietnam Academy of Science and Technology¹

01 Nov 2016

TL;DR: A blind method for phone segmentation without using prior knowledge of speech content is proposed and a two-step algorithm for detecting phone boundaries is derived that is effective for long speech.

...read moreread less

Abstract: Phone segmentation is to divide a continuous speech signal into discrete, non-overlapping phone units. In this paper, a blind method for phone segmentation without using prior knowledge of speech content is proposed. A two-step algorithm for detecting phone boundaries is derived. The first step selects peaks of Euclidian curve as phone boundary candidates. The second step verifies these candidates using Gaussian function. The Gaussian function is computed locally. Therefore, it is suitable for speech feature at each local region of the utterance. Experiments show that our method is good for both short and long speech. Experiment 1 is conducted on a short speech corpus, the TIMIT. Our results are comparable to or more accurate than those of previous methods. Experiment 2 is conducted on a long speech corpus, the TCC300. Our results are more accurate than previous method. The relative improvement of F-value is 1.05%. This method is effective for long speech.

...read moreread less

Book Chapter•DOI•

Automatic phone clustering based on confusion matrices

[...]

Carla Teixeira Lopes, Arlindo Veiga, Fernando Perdigão

27 Apr 2010

TL;DR: A clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language based on a statistical similarity measurement rather than acoustical/phonetic knowledge.

...read moreread less

Abstract: Phone recognition experiments give information about the confusions between phones. Grouping the most confusable phones and making a multilevel hierarchical classification should improve phone recognition. In this paper a clustering method is investigated, based on phone confusion matrix, for the data-driven generation of phonetic broad classes (PBC) of the Portuguese language. The method is based on a statistical similarity measurement rather than acoustical/phonetic knowledge. Results are presented for two phone recognisers (TIMIT corpus and Portuguese TECNOVOZ database).

...read moreread less

Posted Content•DOI•

Timestamped Embedding-Matching Acoustic-to-Word CTC ASR

[...]

Henry Fielding

20 Jun 2023

TL;DR: In this paper , a method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by many real-world applications, in addition to the transcription.

...read moreread less

Abstract: In this work, we describe a novel method of training an embedding-matching word-level connectionist temporal classification (CTC) automatic speech recognizer (ASR) such that it directly produces word start times and durations, required by many real-world applications, in addition to the transcription. The word timestamps enable the ASR to output word segmentations and word confusion networks without relying on a secondary model or forced alignment process when testing. Our proposed system has similar word segmentation accuracy as a hybrid DNN-HMM (Deep Neural Network-Hidden Markov Model) system, with less than 3ms difference in mean absolute error in word start times on TIMIT data. At the same time, we observed less than 5% relative increase in the word error rate compared to the non-timestamped system when using the same audio training data and nearly identical model size. We also contribute more rigorous analysis of multiple-hypothesis embedding-matching ASR in general.

...read moreread less

Global discrimination algorithm.

[...]

Orsay Cedex

01 Jan 1995

TL;DR: A general formalism for training neural predictive systems, and an approach for performing discrimination in predictive systems at the sequence level, which makes use of N-Best sequence selection.

...read moreread less

Abstract: We describe ,a general formalism for training neural predictive systems. We then introduce discrimination at the frame level and show how it relates to maximum mutual information training. Last, we propose an approach for performing discrimination in predictive systems at the sequence level, it makes use of N-Best sequence selection. Performances, for acoustic-phonetic decoding reach 77.4% phone accuracy on 1988 version of TIMIT.

...read moreread less

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics