scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
01 Jan 1998
TL;DR: A frame selection procedure for textindependent speaker identification by averaging the frame likelihoods along the whole test utterance, some of these are rejected (pruning) and the final score is computed with a limited number of frames.
Abstract: In this paper, we propose a frame selection procedure for textindependent speaker identification. Instead of averaging the frame likelihoods along the whole test utterance, some of these are rejected (pruning) and the final score is computed with a limited number of frames. This pruning stage requires a prior frame level likelihocd normalization in order to make comparison between frames meaningful. This normalization procedure alone leads to a significative performance enhancement. As far as pruning is concerned, the optrmal number of frames pruned is learned on a tuning data set for normal and telephone: speech. Validation of the pruning procedure on 567 speakers leads to a 27% identification rate improvement on TIMIT, and to 17% on NTIMIT.
Proceedings ArticleDOI
01 Dec 2020
TL;DR: In this paper, a Wasserstein Generative Adversarial Network (WGAN) based spoken keyword detection method was proposed, where the generator in WGAN fits the observation data to generate new data, and the discriminator classifies the generated data and the labels.
Abstract: With the rapid development of artificial neural networks, it's applied to all areas of computer technologies. This paper combines deep neural network and keyword detection technology to propose a Wasserstein Generative Adversarial Network-based spoken keyword detection which is widely different from the existing methods. With the ability of Wasserstein Generative Adversarial Network (WGAN) to generates data autonomously, new sequences are generated, through which it analyzes whether keywords presence and where the keywords appear. In this method, the generator in WGAN fits the observation data to generate new data, and the discriminator classifies the generated data and the labels. The generator and discriminator are trained by combating learning. The method we propose is simple, does not require complex acoustic models, and does not need to be transcribed into text. It is also applicable to such languages without words. The TIMIT corpus and self-recorded Chinese corpus has been used for conducting experiments. Our method is compared with Convolutional Neural Network (CNN) and Deep Convolutional Generative Adversarial Network (DCGAN) and shows significant improvement over other techniques.
Proceedings Article
01 Jan 1999
TL;DR: An advanced multi-level vowel spotting method is used to achieve minimum vowel loss and accurate detection of the vowel location and duration and showed significant performance improvement compared to similar systems.
Abstract: This paper presents a hybrid ANN/HMM syllable recognition module based on vowel spotting. An advanced multi-level vowel spotting method is used to achieve minimum vowel loss and accurate detection of the vowel location and duration. Discrete Hidden Markov Models (DSHMM), Multi Layer Perceptrons (MLP) and Heuristics (HR) are used for this purpose. A hybrid ANN/HMM technique is then used to recognize the syllables between the detected vowels. We replace the usual DSHMM probability parameters with combined neural network outputs. For this purpose both context dependent (CD) and context independent (CI) neural networks are used. Global normalization is employed on the parameters as opposed to the local normalization used on parameters in standard HMMs. Also, all parameters are estimated simultaneously according to the discriminative conditional maximum likelihood (CML) criterion. The tests were performed on the TIMIT and NTIMIT databases and showed significant performance improvement compared to similar systems.
Proceedings ArticleDOI
01 Jul 2002
TL;DR: A novel speech beamformer for moving speakers in noisy environments that identifies the speech signal DOA in the direction where the signal's spectrum entropy is minimized and shows significant improvement in the recognition rate of moving speakers especially in very low SNR.
Abstract: In hands-free speech recognition of moving speakers, the time interval where the source position can be assumed stationary varies. It is very common for the speaker to move rapidly within the data window exploited. In such cases the conventional fixed-window direction of arrival (DOA) estimation may lead to poor tracking performance. In this paper we present a novel speech beamformer for moving speakers in noisy environments. The localization algorithm extracts a set of candidate DOA of the signal sources using array signal processing methods in the frequency domain. A minimum variance (MV) beamformer identifies the speech signal DOA in the direction where the signal's spectrum entropy is minimized. The same localization algorithm is used to detect the closest direction to the initial estimation using a smaller window. The proposed method is evaluated using a phoneme recognition system and noise recordings from an air-condition fan and the TIMIT speech corpus. Extended experiments, carried out in the range of 25-0 dB SNR, show significant improvement in the recognition rate of moving speakers especially in very low SNR.
Proceedings ArticleDOI
07 May 2001
TL;DR: Statistical methods for reconstructing speech at the phoneme level are used to find missing phonemes that are removed from sentences in the TIMIT corpus and the most likely candidate is selected to reconstruct the sentence.
Abstract: Statistical methods for reconstructing speech at the phoneme level are used to find missing phonemes that are removed from sentences in the TIMIT corpus. Probabilities for the occurrence of the missing phoneme(s) are generated and the most likely candidate(s) selected to reconstruct the sentence. The method includes symmetric and asymmetric 'confidence windowing' around the missing phoneme(s) for determination of the most likely candidates. The reconstruction rates for one or more phonemes missing in a sequence can exceed 85%.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895