Topic

TIMIT

About: TIMIT is a research topic. Over the lifetime, 1401 publications have been published within this topic receiving 59888 citations. The topic is also known as: TIMIT Acoustic-Phonetic Continuous Speech Corpus.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Incorporating Broad Phonetic Information for Speech Enhancement.

[...]

Yen-Ju Lu¹, Chien-Feng Liao¹, Xugang Lu², Jeih-weih Hung, Yu Tsao¹ - Show less +1 more•Institutions (2)

Center for Information Technology¹, National Institute of Information and Communications Technology²

13 Aug 2020-arXiv: Audio and Speech Processing

TL;DR: Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset.

...read moreread less

Abstract: In noisy conditions, knowing speech contents facilitates listeners to more effectively suppress background noise components and to retrieve pure speech signals Previous studies have also confirmed the benefits of incorporating phonetic information in a speech enhancement (SE) system to achieve better denoising performance To obtain the phonetic information, we usually prepare a phoneme-based acoustic model, which is trained using speech waveforms and phoneme labels Despite performing well in normal noisy conditions, when operating in very noisy conditions, however, the recognized phonemes may be erroneous and thus misguide the SE process To overcome the limitation, this study proposes to incorporate the broad phonetic class (BPC) information into the SE process We have investigated three criteria to build the BPC, including two knowledge-based criteria: place and manner of articulatory and one data-driven criterion Moreover, the recognition accuracies of BPCs are much higher than that of phonemes, thus providing more accurate phonetic information to guide the SE process under very noisy conditions Experimental results demonstrate that the proposed SE with the BPC information framework can achieve notable performance improvements over the baseline system and an SE system using monophonic information in terms of both speech quality intelligibility on the TIMIT dataset

...read moreread less

4 citations

Proceedings Article•DOI•

Speaker Adaptation Using Speaker Similarity Score on DNN Features

[...]

Muhammad Rizwan¹, David V. Anderson¹•Institutions (1)

Georgia Institute of Technology¹

01 Dec 2015

TL;DR: A novel speaker adaptation algorithm for classifying speech based on deep neural networks (DNNs) using a k-nearest neighbor (k-NN) classifier, which reduces the number of phoneme classification error in the TIMIT dataset by 23%.

...read moreread less

Abstract: This paper proposes a novel speaker adaptation algorithm for classifying speech based on deep neural networks (DNNs). The adaptation algorithm consists of two steps. In the first step a deep neural network is trained using raw Mel-frequency cepstral coefficient (MFCC) features to discover hidden structures in the data and employing the activations of the last hidden layers of the DNN as acoustic features. In the second step using nearest neighbor, an adaptation algorithm learns speaker similarity scores based on a small amount of adaptation data from each target speaker using the DNN-based acoustic features. Based on the speaker similarity score, classification is done using a k-nearest neighbor (k-NN) classifier. The novelty of this work is that instead of modifying and re-training the DNN for speaker adaptation, which comprises a large number of parameters and is computationally expensive, activations of the learned DNN are used to project features from MFCC to a sparse DNN space, then speaker adaptation is performed based on similarity (i.e. nearest neighbor) using k-NN algorithm. With only a small amount of adaptation data, it reduces the number of phoneme classification error in the TIMIT dataset by 23%. This work also analyzes impact of deep neural networks architecture on speaker adaptation performance.

...read moreread less

4 citations

Posted Content•

A Time-domain Monaural Speech Enhancement with Feedback Learning

[...]

Andong Li¹, Chengshi Zheng¹, Linjuan Cheng¹, Renhua Peng¹, Xiaodong Li¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

22 Mar 2020-arXiv: Sound

TL;DR: A type of neural network with feedback learning in the time domain called FTNet for monaural speech enhancement, where the proposed network consists of three principal components, called stage recurrent neural network, which is introduced to effectively aggregate the deep feature dependencies across different stages with a memory mechanism.

...read moreread less

Abstract: In this paper, we propose a type of neural network with feedback learning in the time domain called FTNet for monaural speech enhancement, where the proposed network consists of three principal components. The first part is called stage recurrent neural network, which is introduced to effectively aggregate the deep feature dependencies across different stages with a memory mechanism and also remove the interference stage by stage. The second part is the convolutional auto-encoder. The third part consists of a series of concatenated gated linear units, which are capable of facilitating the information flow and gradually increasing the receptive fields. Feedback learning is adopted to improve the parameter efficiency and therefore, the number of trainable parameters is effectively reduced without sacrificing its performance. Numerous experiments are conducted on TIMIT corpus and experimental results demonstrate that the proposed network can achieve consistently better performance in terms of both PESQ and STOI scores than two state-of-the-art time domain-based baselines in different conditions.

...read moreread less

4 citations

Proceedings Article•DOI•

Zero Resource Speaking Rate Estimation from Change Point Detection of Syllable-like Units

[...]

Shekhar Nayak¹, Saurabhchand Bhati², K. Sri Rama Murty¹•Institutions (2)

Indian Institute of Technology, Hyderabad¹, Johns Hopkins University²

01 May 2019

TL;DR: This paper proposes to estimate the speaking rate by segmenting the speech into syllable-like units using end point detection algorithms which do not require any training and fine-tuning.

...read moreread less

Abstract: Speaking rate is an important attribute of the speech signal which plays a crucial role in the performance of automatic speech processing systems. In this paper, we propose to estimate the speaking rate by segmenting the speech into syllable-like units using end point detection algorithms which do not require any training and fine-tuning. Also, there are no predefined constraints on the expected number of syllabic segments. The acoustic subword units are obtained only from speech signal to estimate the speaking rate without any requirement of transcriptions or phonetic knowledge of the speech data. A recent theta-rate oscillator based syllabification algorithm is also employed for speaking rate estimation. The performance is evaluated on TIMIT corpus and spontaneous speech from Switchboard corpus. The correlation results are comparable to recent algorithms which are trained with specific training set and/or make use of the available transcriptions.

...read moreread less

4 citations

Proceedings Article•DOI•

Speech analysis in search of speakers with MFCC, PLP, Jitter and Shimmer

[...]

Imen Daly, Zied Hajaiej, Ali Gharsallah

01 Jan 2017

TL;DR: Due to a system of recognition, the techniques of parameterization which handles the mechanisms of the human ear Mel Frequency cepstral coefficient MFCC and Perceptual linear prediction PLP starting from the database TIMIT are used.

...read moreread less

Abstract: This document belongs to the implementation of a system of recognition of the bases on the Hidden Models of Markov HMM. Due to a system of recognition, we will use the techniques of parameterization which handles the mechanisms of the human ear Mel Frequency cepstral coefficient MFCC and Perceptual linear prediction PLP starting from the database TIMIT. We also used two indices Jitter and Shimmer since they show very precise information about the voice of the person.

...read moreread less

4 citations

Collapse

Network Information

Performance

Metrics

1,488

Papers

68,688

Citations

No. of papers in the topic in previous years
Year	Papers
2023	24
2022	62
2021	67
2020	86
2019	77
2018	95

TIMIT

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics