scispace - formally typeset
Search or ask a question
Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.


Papers
More filters
Posted Content
TL;DR: Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset.
Abstract: In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance To obtain the phonetic information, we usually prepare a phoneme-based acoustic model, which is trained using speech waveforms and phoneme labels Despite performing well in normal noisy conditions, when operating in very noisy conditions, however, the recognized phonemes may be erroneous and thus misguide the SE process To overcome the limitation, this study proposes to incorporate the broad phonetic class (BPC) information into the SE process We have investigated three criteria to build the BPC, including two knowledge-based criteria: place and manner of articulatory and one data-driven criterion Moreover, the recognition accuracies of BPCs are much higher than that of phonemes, thus providing more accurate phonetic information to guide the SE process under very noisy conditions Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset

4 citations

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A novel speaker adaptation algorithm for classifying speech based on deep neural networks (DNNs) using a k-nearest neighbor (k-NN) classifier, which reduces the number of phoneme classification error in the TIMIT dataset by 23%.
Abstract: This paper proposes a novel speaker adaptation algorithm for classifying speech based on deep neural networks (DNNs). The adaptation algorithm consists of two steps. In the first step a deep neural network is trained using raw Mel-frequency cepstral coefficient (MFCC) features to discover hidden structures in the data and employing the activations of the last hidden layers of the DNN as acoustic features. In the second step using nearest neighbor, an adaptation algorithm learns speaker similarity scores based on a small amount of adaptation data from each target speaker using the DNN-based acoustic features. Based on the speaker similarity score, classification is done using a k-nearest neighbor (k-NN) classifier. The novelty of this work is that instead of modifying and re-training the DNN for speaker adaptation, which comprises a large number of parameters and is computationally expensive, activations of the learned DNN are used to project features from MFCC to a sparse DNN space, then speaker adaptation is performed based on similarity (i.e. nearest neighbor) using k-NN algorithm. With only a small amount of adaptation data, it reduces the number of phoneme classification error in the TIMIT dataset by 23%. This work also analyzes impact of deep neural networks architecture on speaker adaptation performance.

4 citations

Posted Content
TL;DR: A type of neural network with feedback learning in the time domain called FTNet for monaural speech enhancement, where the proposed network consists of three principal components, called stage recurrent neural network, which is introduced to effectively aggregate the deep feature dependencies across different stages with a memory mechanism.
Abstract: In this paper, we propose a type of neural network with feedback learning in the time domain called FTNet for monaural speech enhancement, where the proposed network consists of three principal components. The first part is called stage recurrent neural network, which is introduced to effectively aggregate the deep feature dependencies across different stages with a memory mechanism and also remove the interference stage by stage. The second part is the convolutional auto-encoder. The third part consists of a series of concatenated gated linear units, which are capable of facilitating the information flow and gradually increasing the receptive fields. Feedback learning is adopted to improve the parameter efficiency and therefore, the number of trainable parameters is effectively reduced without sacrificing its performance. Numerous experiments are conducted on TIMIT corpus and experimental results demonstrate that the proposed network can achieve consistently better performance in terms of both PESQ and STOI scores than two state-of-the-art time domain-based baselines in different conditions.

4 citations

Proceedings ArticleDOI
01 May 2019
TL;DR: This paper proposes to estimate the speaking rate by segmenting the speech into syllable-like units using end point detection algorithms which do not require any training and fine-tuning.
Abstract: Speaking rate is an important attribute of the speech signal which plays a crucial role in the performance of automatic speech processing systems. In this paper, we propose to estimate the speaking rate by segmenting the speech into syllable-like units using end point detection algorithms which do not require any training and fine-tuning. Also, there are no predefined constraints on the expected number of syllabic segments. The acoustic subword units are obtained only from speech signal to estimate the speaking rate without any requirement of transcriptions or phonetic knowledge of the speech data. A recent theta-rate oscillator based syllabification algorithm is also employed for speaking rate estimation. The performance is evaluated on TIMIT corpus and spontaneous speech from Switchboard corpus. The correlation results are comparable to recent algorithms which are trained with specific training set and/or make use of the available transcriptions.

4 citations

Proceedings ArticleDOI
01 Jan 2017
TL;DR: Due to a system of recognition, the techniques of parameterization which handles the mechanisms of the human ear Mel Frequency cepstral coefficient MFCC and Perceptual linear prediction PLP starting from the database TIMIT are used.
Abstract: This document belongs to the implementation of a system of recognition of the bases on the Hidden Models of Markov HMM. Due to a system of recognition, we will use the techniques of parameterization which handles the mechanisms of the human ear Mel Frequency cepstral coefficient MFCC and Perceptual linear prediction PLP starting from the database TIMIT. We also used two indices Jitter and Shimmer since they show very precise information about the voice of the person.

4 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
76% related
Feature (machine learning)
33.9K papers, 798.7K citations
75% related
Feature vector
48.8K papers, 954.4K citations
74% related
Natural language
31.1K papers, 806.8K citations
73% related
Deep learning
79.8K papers, 2.1M citations
72% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202324
202262
202167
202086
201977
201895