scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Proceedings ArticleDOI
01 Sep 2016
TL;DR: The proposed feature transformation, improves the phone recognition accuracy when compared with classical methods using conventional cepstral feature vectors in the context of using HMMs with a number of Gaussians less than 16 by state.
Abstract: In this paper, we propose a novel vector transformation projecting the feature vectors in a new space, characterized by good discriminant properties, while reducing drastically the number of parameters used in the ASR systems. We call this method “N-to-1 Gaussian MFCC transformation”. It uses the HMM acoustic parameters obtained by N and 1 Gaussian in the training process in order to calculate the transformed vectors in the new projection space. Our transformation technique permits an important reduction of the number of Gaussians (in the GMM modeling of the emission probability of each state) while improving the performances of ASR systems. Our experimental results using both TIMIT and FPSD corpus demonstrate that the proposed feature transformation, improves the phone recognition accuracy when compared with classical methods using conventional cepstral feature vectors in the context of using HMMs with a number of Gaussians less than 16 by state.
Proceedings ArticleDOI
04 Aug 2020
TL;DR: A novel method which includes autoencoder and deep neural networks (DNNs) in a hierarchal structure for speech enhancement and shows a significant improvement in both seen and unseen noises compared to baselines is proposed.
Abstract: In this paper, we propose a novel method which includes autoencoder and deep neural networks (DNNs) in a hierarchal structure for speech enhancement. In this method, at first, two parallel autoencoders are employed to obtain the nonnegative matrix factorization (NMF) parameters of speech and noise in a nonlinear mapping. After that, by using the spectrum of noisy speech as the input of the encoder portion of the autoencoders, the outputs of the encoders are calculated and utilized as the input of DNNs in the next hierarchies to further enhance the speech spectrum more efficiently. Also, the last three hierarchies including the decoder portion of autoencoders and the DNN will be trained in a joint learning scenario to improve the results. The proposed method is evaluated on TIMIT corpus with perceptual-evaluation-of-speech-quality (PESQ) and frequency-weighted-segmental-signal-to-noise-ratio (fwSNRseg) criterions. The obtained results show a significant improvement in both seen and unseen noises compared to baselines.
Book ChapterDOI
11 Sep 2006
TL;DR: This paper is concerned with an alternative method for generating these phonetic questions automatically from misrecognition items and these questions are tested using the standard TIMIT phone recognition task.
Abstract: Most automatic speech recognition systems are currently based on tied state triphones These tied states are usually determined by a decision tree Decision trees can automatically cluster triphone states into many classes according to data available allowing each class to be trained efficiently In order to achieve higher accuracy, this clustering is constrained by manually generated phonetic questions Moreover, the tree generated from these phonetic questions can be used to synthesize unseen triphones The quality of decision trees therefore depends on the quality of the phonetic questions Unfortunately, manual creation of phonetic questions requires a lot of time and resources To overcome this problem, this paper is concerned with an alternative method for generating these phonetic questions automatically from misrecognition items These questions are tested using the standard TIMIT phone recognition task.
Journal ArticleDOI
TL;DR: In this paper , a new feature extraction approach for robust speaker recognition named Power Normalized Gammachirp Cepstral (PNGC) was introduced, which uses a biologically motivated auditory perceptual model.
Proceedings ArticleDOI
11 Dec 2022
TL;DR: In this article , a multi-task joint learning scheme was proposed to improve embedding aware audio-visual speech enhancement by adopting the phone and the articulation place together as the classification targets during the training of embedding extractor and enhancement network.
Abstract: In this paper, we propose a multi-task joint learning scheme to improve embedding aware audio-visual speech enhancement by adopting the phone and the articulation place together as the classification targets during the training of embedding extractor and enhancement network. Firstly, the multimodal embedding is extracted from noisy speech and lip frames, and supervised by the articulation place and the phone label levels together. Next, we train the embedding extractor and enhancement network jointly where the learning objects include the ideal ratio mask, the phone posteriori and the place posteriori. Experiments on the TCD-TIMIT corpus corrupted by simulated additive noises show that the proposed multimodal embedding at the multi-scale class level is more effective than the previous embedding at the place/phone level and the multi-task based joint learning framework further improves speech quality and intelligibility.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895